It’s a problem that has absorbed technologists, librarians and archivists for the best part of three decades, but it rarely pops up in the mainstream conversation: what will become of our digital information? How long can we expect it to last? Will it be available social historians or our children 50 years from now, let alone 100? For a culture obsessed with measuring the future, we are strangely reluctant to consider these questions.
But then, from time to time, a person of sufficiently high profile raises the issue for the media to pay attention. The first such luminary may have been Stewart Brand, who declared in 1998 that ‘historians will consider this a dark age’. We can read Galileo’s correspondence from the 1590s but not Marvin Minsky’s from the 1960s, he went on to explain.
Now it’s the turn of Vint Cerf, Google vice-president and resident ‘Chief Internet Evangelist’ (his actual title), and generally regarded as one of the ‘co-founders’ of the Internet.
In a widely reported presentation to the American Association for the Advancement of Science (AAAS), Cerf expressed last week the belief that we live ‘in a digital Dark Age’ in which ‘an information black hole’ threatens to create ‘a forgotten generation, or even a forgotten century’. You couldn’t frame the problem in more alarmist, apocalyptic terms. But is true? Yes and no.
Consider a written English text that exists in two formats: as an analogue document printed on paper and as a digital document written in Word Perfect 2.0 and stored on a 3½ inch floppy disc formatted by a 1989 Apple Macintosh. The paper document will be readable for as long as people can comprehend the English language as it is spoken and written today, and the paper and ink hold together. By contrast, interpreting the digital document requires – besides knowledge of English – a 3½ inch floppy disc drive and the means to emulate the operation of a 1989 Apple Macintosh and of Word Perfect 2.0.
Because of the above, digital information needs to be periodically copied and transferred in order to survive, which places it in a unique position with respect to older analogue technologies. If you think that this is quite elementary and obvious, consider the grandiose title of the book with which Microsoft Press hailed in 1986 the mass marketing of the CD-ROM: The New Papyrus. The old papyrus was the dominant encoding and storage medium for written texts for approximately five thousand years. Meanwhile, the life expectancy of individual CD-ROMs is estimated to vary between 20 and 100 years, and the format is already all but obsolete.
Serious as the issues of data migration and device obsolescence are, the dependence of data on software has emerged over time as a more intractable problem. One of its first proponents was Jeff Rothenberg, author of the quip ‘digital information lasts forever – or five years, whatever comes first.’ In 1995, just as the Word Wide Web was exploding, Rothenberg wrote:
As documents become more complex than simple streams of alphabetic characters, it becomes increasingly meaningless to think of them as existing at all except when they are interpreted by the software that created them. The bits in each document file are meaningful only to the program that created that file. In effect, document files are programs, consisting of instructions and data that can be interpreted only by the appropriate software. That is, a document file is not a document in its own right: it merely describes a document that comes into existence only when the file is ‘run’ by the program that created it. Without this authoring program – or some equivalent viewing software – the document is held cryptic hostage to its own encoding.
Cerf used his AAAS speech to advocate a possible solution, in the form of a protocol under development at Carnegie Mellon for encoding data along with its software. It’s not the first time this idea has been mooted, and although I’m not a technical expert, I suspect it would run into the issues highlighted in 2013 by David Rosenthal. Namely:
No matter how faithful the emulator may be, almost all programs these days execute in a context of network services, which themselves execute in a context of network services, and so on ad infinitum. Simply preserving the bits and the ability to re-execute them is not enough if the network services the bits call on are no longer available, or return different results. (…) Every execution of an instance of the kind of digital artifact that readers commonly encounter, such as web pages, is now a unique experience.
In other words, to fully preserve documents produced on networks such as the internet requires nothing less than the dynamic, real-time storage of the network itself – a technical impossibility.
This is the bad news. The slightly better news is that this is a greater problem for institutions like national libraries and repositories – which are used to taking a comprehensive approach to preservation – than for individuals. Duplicating and migrating a personal archive of documents and photographs is easier than preserving a website, let alone the internet at large. As Samuel Gibbs has noted in response to Cerf’s ominous prophecy, the formats for the majority of common file types such as videos, images and text are at this time relatively open and stable. Most of us can get by.
However there is a tension here that needs to be accounted for between the personal and the collective dimension of the problem. Ours is the first generation since the internet came of age, which makes this moment uniquely important. Further: isn’t the promise of the internet precisely to make the world’s information infinitely accessible – acting at once as its storage and delivery mechanism – therefore to become the culture? If this is true, how do we measure the loss of this collective archive, and is the sum of each of our personal archives enough to make up for it?
I wrote recently for Overland about the right to oblivion and the search for an internet that forgets. Cerf presents us with the ironic inversion of that proposition, in the form of an internet that cannot remember and is ultimately doomed to forget itself. It won’t come to that. But in between apocalyptic pronouncements we have yet to discover how to safeguard and carry forward the things that are truly important to us, which is the work of cultural memory.