Building a better spam-blocking CAPTCHA

New approaches may give the CAPTCHA antispam technology a second chance

1 2 3 Page 2
Page 2 of 3

Breaking into CAPTCHA protected systems isn't just something that individual crackers do for fun and financial gain. CAPTCHA cracking, believe it or not, has become a business in its own right. For example, Indian-based company DeCaptcher.com will solve CAPTCHAs for your spamming needs at a rate of $2 per 1,000 successfully cracked CAPTCHAs. The site explains:

"Using the advertisement in blogs, social networks, etc. significantly increases the efficiency of the business. Many services use pictures called CAPTCHAs in order to prevent automated use of these services. Solve CAPTCHAs with the help of this portal, increase your business efficiency now!"

Is it any wonder that CAPTCHA, while still popular, is becoming almost as useful a security technique as locking the barn door after the horse has been stolen?

A second chance for CAPTCHA?

So with all that, can CAPTCHA be saved? According to Carnegie Mellon computer scientists, the answer is yes. The first of their redesigns of CAPTCHA, according to Luis von Ahn, a professor of computer science at the university, is the aptly named reCAPTCHA.

This system, von Ahn said, works in conjunction with the Google Books Project and the Internet Archive, two projects that are converting paper books to digital format using OCR software. As explained above, OCR software often doesn't read words accurately. When the projects' OCR programs flag a word as unreadable, it's saved as an image and used on the Web as a CAPTCHA test.

This has two positive results. First, these CAPTCHAs are already known to be resistant to OCR attacks, making Web sites that use reCAPTCHA less vulnerable to CAPTCHA crackers. Second, human users are decoding the words that the book projects' OCR software can't read, and thus helping to complete the two projects' accurate conversion of older books to digital formats.

How does reCAPTCHA know that the human got a word right? By using a control word, where the system already knows the correct spelling, along with the unknown word. Von Ahn explains, "If a user enters the correct answer to the control word, the user's other answer is recorded as a plausible guess for the unknown word. If the first three human guesses match each other, but differ from the OCRs' guesses, the word is marked as correct and becomes a potential control word."

Image-based CAPTCHA

The Carnegie Mellon crew is also looking at image-based CAPTCHA. The first of these, ESP-PIX, requires users to pick a word that describes all four objects in an image. The newest of them, SQ-PIX, requires users to first pick out the right image from three and then trace the outline of the object within the image. For example, you might see an image of a cat, one of a flower and one of a balloon, with the instruction "Trace all balloons."

SQ-PIX image-based CAPTCHA
The SQ-PIX image-based CAPTCHA

These tests do have their shortcomings. For starters, what is clear to the designers may not be clear to users. In the ESP-PIX test, for example, the answer "girl" for three images of adult women and one of a young girl doesn't make much sense. And the SQ-PIX test may require a degree of manual dexterity that not all users have. My editor, who is right-handed but uses a trackball with her left hand, found that the test failed her more often than it passed her. However, these are works in progress; Carnegie Mellon doesn't have a scheduled completion date.

Step 1 of Imagination CAPTCHA
Step 1 of Imagination CAPTCHA

Carnegie Mellon isn't the only group looking at image-based CAPTCHA. Penn State developers are working on Imagination CAPTCHA. In this system, a user must first pick out the geometric center of a distorted image from a page that's filled with similar overlapping pictures.

If you get that right, you're presented with another carefully distorted image and asked to pick a word to describe what you're seeing. The Imagination system is based on ALIPR (Automatic Linguistic Indexing of Pictures), an automated image-tagging and searching technology.

1 2 3 Page 2
Page 2 of 3
7 inconvenient truths about the hybrid work trend
 
Shop Tech Products at Amazon