Google acquires reCAPTCHA in two-for-one deal

Google ceremoniously announced today that they were acquiring a small academic company called reCAPTCHA, which builds software that tries to differentiate humans from algorithms on web submissions.


CAPTCHAs (Completely Automated Public Turing test to tell Computers and Humans Apart) are those random letters you have to enter when submitting a form on a webpage, often a comment (Computerword uses reCAPTCHA for our commenting system).  Interestingly, reCAPTCHA's founder Luis Von Ahn is one of the people who came up with the term in 2000.  They gave up trying to trademark it in 2008.

Wikipedia defines CAPTCHAs as: a type of challenge-response test used in computing to ensure that the response is not generated by a computer. The process usually involves one computer (a server) asking a user to complete a simple test which the computer is able to generate and grade. Because other computers are unable to solve the CAPTCHA, any user entering a correct solution is presumed to be human. Thus, it is sometimes described as a reverse Turing test, because it is administered by a machine and targeted to a human, in contrast to the standard Turing test that is typically administered by a human and targeted to a machine. A common type of CAPTCHA requires that the user type letters or digits from a distorted image that appears on the screen.

There are now many types of CAPTCHAS on the Internet.  What's interesting about reCAPTCHA is how it works (and is why it is doubley valuable to Google). reCAPTCHA takes passages from newsclippings, articles and old books that can't be read by OCR machines - the same OCR software that hackers are using to try to get through CAPTCHAs.  It then feeds it to humans one at a time with other words that it knows.  The user then enters both words.  The word that reCAPTCHA knows is tested - if correct, it now learns an additional word to use on other challenges. 

This accomplishes two things, both of which would be useful to Google. 

One, it helps Google keep automated machines from signing up for its many services.  It also keeps its comment spam on Blogger and its other content management systems to a minimum.

Perhaps most importantly, it provides a way for Google to harness the power of its users to help recognize passages in old or damaged works.  

The software isn't perfect.  Notorious/hilarious hacker group Anonymous (part of 4Chan) broke reCAPTCHA's technology to rig Time's 100 most important people of the year - the Marblecake incident.  They did this with a combined brute force/guessing algorithm. Google will undoubtedly try to avoid this type of exploitation in the future.

