  Exploring and Breaking the Google Captcha

    The storm worm is now using Youtube to spread itself via comments. That sucks. However, we learn something(that we already kind of knew). The Google CAPTCHA HAS BEEN BROKEN. That opens up gmail, blogspot, and a gang of other services that makes people in various marketing fields salivate.

    So we’re going to crack open the Google captcha, and show why it’s so tricky to defeat.

    Analyzing the Captcha
    The second step of captcha breaking is where they make their first attempt to block people from decoding it. For those of you who haven’t tried, the second step is splitting apart the letters, to be independently analyzed.
    Take a close look at these examples. The letters in some way connect on at least a few of the letters. Ordinarily, this can be easily dodged. In this case, however, it’s exceptionally hard. Normally you would scan through, and eliminate thin lines like this. However, there is often either a 1 pixel, or 0 pixel difference in the size of the connection point, and actual needed parts of letters.

    Google Connected Captcha Google Connected Captch 2 Google Connected Captch 3
    (r and s)
    Almost all of them

    Now, the second method of blocking the bots is a little less obvious, and doesn’t appear on ALL of the captchas.

    Google Tails Captcha As you can see with this image(ignoring the connections it has), there are odd and very pronounced tails coming off of nearly every letter(look at the t’s, h’s, and u’s). Those throw off many kinds of analysis software. Analysis software functions by looking for curves, and evaluating how many there are, and which directions a line could “escape” from, without hitting the captcha letters. A standard U for example, would just be a single bottom curve, extending upwards, escapable from above the curve. This one however, has 3 curves. 2 escapable up, one escapable down. That puts it beyond classification. If anything, it would show as a W (with one curve presumed to be lost to errors).

    Now, How to Break it.
    I myself have never broken the captcha. I get a headache eventual, cuss at the computer, and do something that will boost my confidence back up.
    First, and most obviously, you drop all the dark colors to black, light colors to white. With google’s captcha, that leaves you with a standard black and white image. Next, you look for extreme angles in the image. Angles that do not generally appear in real letters. Since Google’s images normally connect off of rounded lines, it almost always makes an intense V shape. By highlighting these points, we can start to get a good idea of where the image splits should be.

    Google Hack ImageHere you can see some of the spots drawn out with some software I wrote (ignore the green outline) The angle it tries to detect needs to be a bit more severe(you can see the issue with the “b”), but this Captcha image doesn’t have the same intensity of angles as most of them do. Anyways as a general rule, you start with the dots on the top half of the image. You then scan downwards, looking for another dot within 3 pixels horizontally, and any number of pixels vertically, from your start point. If one is found, draw a white line through the points, separating the letters. One thing this software doesn’t do(yet) that also needs to be done, is scan the inside rim of the letters for the same points.

    Now on to handling the tails. Initially, I tried to just eliminate thin lines. That doesn’t work though, since a lot of the letters have necessary lines that are just as thin as the tails. What you CAN do however, is evaluate the area above or below the curve, and determine if it’s substantial or not.

    That’s all I’ve got for today. Happy Hacking!

