Type
Article

not demonic, creative

A bunch of the regular commentators here have noted the weird word combinations thrown up by the Recaptcha anti-spam filter — I remember on one occasion it made me type ‘vodka hag’. The general consensus was that Recaptcha was possessed by Satan. The truth, however, could not be more different. It turns out that every time you copy the Recaptcha phrases you are helping digitalise out of copyright texts.

Walrus magazine explains:

Now a growing number of websites, from e-commerce (Ticketmaster) to social networking (Facebook) to blogging (WordPress), have implemented the precocious professor’s new tool, dubbed recaptcha. If you’ve visited those sites, your squiggly-letter- reading ability has been harnessed for a massive project that aims to scan and make freely available every out-of- copyright book in the world, by deciphering words from old texts that have stumped scanning software. [snip]

Once the text is scanned, the file is sent to a server in California, where it’s run through optical character recognition software to produce a digital full-text version. For the newer books, OCR is about 90 percent accurate. But that success rate drops to as low as 60 percent for older texts, which often contain fonts that are blurry and less uniform. These troublesome scans are sent on to the reCAPTCHA servers at Carnegie Mellon University in Pittsburgh. [snip]

The program distorts a known word so that it will have a way to check that the user is human, and then pairs it up with a word OCR has failed to decipher. Each mystery word is served up in multiple reCAPTCHAs, until a consensus about the correct answer emerges. Sometimes a single user confirms the computer’s best guess, but the average is about four users per word. The system is now correcting over 10 million words a day, with 99.1 percent accuracy, von Ahn says.

Coolest. Anti. Spam. Device. Ever.

Overland is a not-for-profit magazine with a proud history of supporting writers, and publishing ideas and voices often excluded from other places.

Subscribe | Renew | Donate November 9–16 to support progressive literary culture for another year – and for the chance to win magnificent prizes!

Jeff Sparrow is the former editor of Overland. He is the co-author (with Jill Sparrow) of Radical Melbourne: A Secret History and Radical Melbourne 2: The Enemy Within, the editor (with Antony Loewenstein) of Left Turn: Essays for the New Left and the author of Communism: a love story, Killing: Misadventures in violence, and Money Shot: A Journey into Censorship and Porn.  On Twitter, he's @Jeff_Sparrow.

More by

Comments

  1. When I posted something about this on the OL facebook site, the reaction was quite negative, which initially surprised me cos my first reaction was simply about what a great idea it represented. But since then I’ve been thinking. I mean, should they have to tell us about this? OK, they’re digitalising books, which is a good thing, but what if they were doing something evil — or simply making money out of us? Is there something exploitative about the whole idea?

    (I just contributed the words ’10 compressed’ to the world’s body of knowledge.)

  2. I suppose the issue is consent. Maybe it is just free labour and I can go back to being a hater, which I do infinitely better.

Leave a Reply

Your email address will not be published.

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>