not demonic, creative

A bunch of the regular commentators here have noted the weird word combinations thrown up by the Recaptcha anti-spam filter — I remember on one occasion it made me type ‘vodka hag’. The general consensus was that Recaptcha was possessed by Satan. The truth, however, could not be more different. It turns out that every time you copy the Recaptcha phrases you are helping digitalise out of copyright texts.

Walrus magazine explains:

Now a growing number of websites, from e-commerce (Ticketmaster) to social networking (Facebook) to blogging (WordPress), have implemented the precocious professor’s new tool, dubbed recaptcha. If you’ve visited those sites, your squiggly-letter- reading ability has been harnessed for a massive project that aims to scan and make freely available every out-of- copyright book in the world, by deciphering words from old texts that have stumped scanning software. [snip]

Once the text is scanned, the file is sent to a server in California, where it’s run through optical character recognition software to produce a digital full-text version. For the newer books, OCR is about 90 percent accurate. But that success rate drops to as low as 60 percent for older texts, which often contain fonts that are blurry and less uniform. These troublesome scans are sent on to the reCAPTCHA servers at Carnegie Mellon University in Pittsburgh. [snip]

The program distorts a known word so that it will have a way to check that the user is human, and then pairs it up with a word OCR has failed to decipher. Each mystery word is served up in multiple reCAPTCHAs, until a consensus about the correct answer emerges. Sometimes a single user confirms the computer’s best guess, but the average is about four users per word. The system is now correcting over 10 million words a day, with 99.1 percent accuracy, von Ahn says.

Coolest. Anti. Spam. Device. Ever.

254 Autumn 2024

Buy/subscribe now Browse the issue

Overland 254 is the first in a set of four special editions dedicated to commemorating 70 years of Overland. This issue also launches a new design and format by Common Room Editions, inspired by Overland’s trove of radical literature spanning from 1954 to today. Andrew Brooks and Astrid Lorange consider the asymmetrical responses to two events: the wearing of keffiyehs by three cast members during the Sydney Theatre Company’s production of Anton Chekov’s The Seagull, and, on the same day in the US, the shooting of three Palestinian men wearing keffiyehs. Jeff Sparrow uncovers the Sydney Herald’s legacy of Terra Nullius, and Daniel Lopez writes on Marx, Meredith and the festival as an inversion of modern life.

Donate to Overland

In 2024, help us celebrate 70 years of Overland. Overland is a not-for-profit magazine with a proud history of supporting voices excluded from traditional media. Any and all donations are valued. All donations are tax-deductable.

Overland literary journal

Jeff Sparrow

Related articles & Essays

Announcing the Nakata Brophy Prize for Young Indigenous Writers 2024 longlist

pork lullaby