Posts Tagged ‘crowdsourcing’

Crowds teach computers to read the scanned text

Wednesday, September 16th, 2009

From the Google Acquires reCAPTCHA article at Mashable.com:

Why exactly does Google want to own this technology?

… many of the CAPTCHAs provided by reCAPTCHA come from scanned archival newspapers and old books. Computers find it hard to recognize these words because the ink and paper have degraded over time, but by typing them in as a CAPTCHA, crowds teach computers to read the scanned text.”

… those 100,000+ captcha forms are now Google-powered, with the data being used to improve Google’s ability to digitize old books and newspapers to make them Web searchable. It makes a lot of sense, and gives Google yet another strategic advantage over would-be competitors.