Literature is becoming data.

In 2002, on a Friday, Larry Page began to end the book as we know it. Using the 20 percent of his time that Google then allotted to its engineers for personal projects, Page and Vice-President Marissa Mayer developed a machine for turning books into data. The original was a crude plywood affair with simple clamps, a metronome, a scanner, and a blade for cutting the books into sheets. The process took 40 minutes. The first refinement Page developed was a means of digitizing books without cutting off their spines — a gesture of tender-hearted sentimentality towards print. The great disbinding was to be metaphorical rather than literal. A team of Page-supervised engineers developed an infrared camera that took into account the curvature of pages around the spine. They resurrected a long dormant piece of Optical Character Recognition software from Hewlett-Packard and released it to the open-source community for improvements. They then crowd-sourced textual correction at a minimal cost through a brilliant program called reCAPTCHA, which employs an anti-bot service to get users to read and type in words the Optical Character Recognition software can’t recognize. (A miracle of cleverness: everyone who has entered a security identification has also, without knowing it, aided the perfection of the world’s texts.) Soon after, the world’s five largest libraries signed on as partners. And, more or less just like that, literature became data.

Source: Los Angeles Review of Books.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s