• Home
  • About
  • Piqq.us Invite Feed
  • Links
  • RSS CULT
  • How To: De-Rotate Captcha Images

    Ok, so being the brilliant man that I am, I have failed to put away enough money for taxes. No problem though. I have a moderately high-risk cash making method stored away for just such an occasion. The “get rich, or get sued trying” method. Haha. The issue is that I need to break a lot of different captchas, very quickly. 1 complex one every other day, plus a lot of coding inbetween. Now, ordinarily I break captchas using the JAI(Java Advanced Imaging) library, and no pre-made OCR libraries. For this kind of speed though, that is no longer possible. So I began using pre-made OCR libraries. They work ok, but absolutely suck at identifying any image that has been rotated > 25 degrees. So I must de-rotate them.

    Prequisites:
    If you have not read much about breaking captchas, I have a couple articles I’ve written in the past about some captchas I have broken. They serve as a good prequisite. While some of these rotated, it was either not enough to be a problem, or I could afford to miss it a few times(not true in this case).
    Article 1: Basic Captcha Cracking Techniques
    Article 2: Cracking the Google Captcha
    Note: I know there’s many ways to break these, I’m just detailing the ones I like best. I’m quite good at it. To date, the ones I have been unable to break are the original YouTube captcha(before they adopted the Google Captcha), the Google captcha(I’ve gotten close), and the Ticketmaster Captcha.

    Introduction
    We’re going to act like we’re trying to break this captcha by dividing it up into zones. One in each corner, and a tiny one in the middle. In reality, many more calculations would take place, but this is just concentrating on rotation. Our overall goal will be to find a way to return a given letter to the point where these zones will be similar to it’s unrotated version.

    The Captchas
    (The Targetted Zones are highlighted in Blue)

    The Unrotated Letter The Rotated Letter
    Unrotated Image With Zones Rotated B

    Ok, so examining the pictures on the left, it’s obvious that the “zones” don’t match enough for very high OCR. At most 40%, and that’s providing we cleaned and seperated the letters perfectly. In particular, the upper left hand location is completely different. But none have the same angles, arcs, pixel counts, or total black areas. So something needs to be done here.

    What I’ve Considered
    I’ve considered many options. Rotating it so the longest side is on the bottom (but problems arise with letters like “n” and “u”), allowing the zones to shift(IE comparing the upper left hand corner to ALL zones rather than just the other upper left hand), splitting it down the middle(horizontally and vertically), splitting it up diagonally, testing all rotations of the image, and about a dozen other techniques. But they all have drawbacks.

    How We De-Rotate It!
    This is actually a really painless process! It just took a bit of thinking to come to.

    1. Divide it in Half, and Select the Highest, Lowest, and Furthest Left/Right Pixels in Each Section
      This works because scewing an image over 45 degrees makes it much harder for a user to identify
      Captcha With Dots
    2. Draw Lines Connecting the Pixels On Each Side, Then Figure Out At Which Points These Lines Intersect
      The lines connecting the left/right points are in blue, and the lines figuring out where those intersect are in dark green.
      Captcha With connecting Lines
    3. Calculate the Angle of the Bottom Green Line from The Bottom
      When drawing the triangle to get the angle of, make sure that your new line(to create the triangle) goes straight down from the highest intersection point that involves the bottom green line

      Captcha With Triangle Drawn

      The yellow line is the one we drew to complete our bottom triangle. Having done that, the red arrow points the the angle that shows how many degrees the letter is rotated!

    4. You Have The Angle, Just De-Rotate it That Many Degrees!
      I’m not going to do this part for you, since it varies language by language, and hey, I don’t want to saturate my own market ;-) . However, if the angle you’re calculating is on the right, rotate counter clockwise. If it’s on the left, rotate it clockwise.
      I are teh 1337

    That’s All There is To It!

    Of course, you still have to clean the captcha, de-warp the captcha, and split up the letters, but I’ve talked about that before. And once again, I’ll get you started on the right path, but I’m not going to hold your hand.

    Currently Out of Tylenol,
    -XMCP

    18 Responses to “How To: De-Rotate Captcha Images”

    1. GerBot says:

      that seems like a lot of work to me

    2. Gyutae Park says:

      Thanks for the geometry lesson. Pretty creative thinking here.

    3. admin says:

      @Gerbot:Eh it’s not so bad. It can be re-applied for pretty much any captcha out there as well, which is why I consider it worth it. Very few scripts can be used over and over again.

      @Gyutae: You’re welcome! :)

    4. W-Shadow says:

      Interesting… though I think you’d need to modify the technique slightly for certain characters?

    5. admin says:

      @W-Shadow: There’s normally an exception or two, but with this I didn’t see many. It’s a bit rough on letters like “T”, and other such with tiny bases. But if the captcha is decent sized, all should be well near as I can figure. The only other issue that even crosses my mind is “o”, but in reality it’d work fine with that. Doesn’t matter how many times you rotate a circle haha.

    6. Melanie Phung says:

      I’ve been noticing a lot of serif fonts in CAPTCHAs lately – I can barely make them out myself by looking at them if the letters overlap or touch at all. Seems like that makes the whole thing that much harder.

    7. admin says:

      Yes, overlapping letters are one of the trickies things do deal with from an automatic OR manual perspective. I have a few methods to deal with them, but at far below a 100% success rate.

    8. SEMSpot says:

      I do not know of any method out yet that is able to de-captcha overlapping letters. Breaking them up seems to be a pain in the ass because you end up breaking up parts of each letter that is overlapped. I do not know that much about breaking captcha, but I will be reading your other articles over and doing my homework to learn how to do it. Reading this gives me something new to learn that sure will be beneficial. Thank you!

      Steve

    9. admin says:

      @Steve: No Problem! If you want one method for breaking apart the letters, take a look at the google captcha cracking article I linked to at the top. It’s not 100%, but it’s decent.

    10. Dbyt3r says:

      Interesting way to say the least :)

      Usually, I just rotate it with 45 and 360-45 degress and label both. Obviously, that gives some space for errors :) .. I’ll try to implement this into one of my decoders and let you know how it goes.

      PS: What do you use neural nets for the recognition?

    11. admin says:

      It’s all about the training.
      2 modes: recognition, and training mode.
      In training mode, it just loads the captcha over and over again. Each time, it cleans it up/splits up the letters, tries to solve it, and shows me it’s solution. If it’s right, we continue. If it’s wrong, then I enter the real answer and records a bunch of statistics about it, and goes onto the next one.
      Rinse, lather, repeat until I’m ok with the OCR success %

    12. Dbyt3r says:

      I mean the guessing the letter part, do you use an algo you developed or do you use some sort of machine learning? I usually use my own algo’s but I’ve been considering machine learning for tougher captchas lately.

    13. Nate` says:

      STOP SPAMMING MY FORUMz…

      Breaking captchas? why the crap would you want to do that? TO SPAM…

    14. admin says:

      @Nate: You’re obviously not from around here. I’m a blackhat. So yes. Although forums are old news.

    15. knoopx says:

      your technique fails for F’s in clockwise rotation for P’s counterclockwise and probably for a lot of more characters like J’s etc…

    16. admin says:

      knoopx: If it’s written in one pixel wide cursive, yes. Otherwise, not really. It has a slightly higher rate of failure, but thats always true for captchas.

    17. Tim Stassen says:

      Hey,

      Nice stragegy, but you are you using JAI to derotate the letters? When I do the rotation, even with zero degrees, some letter parts simply dissapear, there seems to be an bug:
      http://www.slajerek.demon.pl/others/jai_rotate_bug/

    18. website design service says:

      Very interesting. But I have to say one thing that Captcha sometimes irritate me very much. By the way nice geometry work. Thanks for the nice post. I liked it very much.

    Leave a Reply

    XHTML: You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

    Marketing & SEO Blogs - Blog Top Sites
    © Slightly Shady SEO, All Rights Reserved. Scrape me, and I will eat your soul.