It’s a problem that has absorbed technologists, librarians and archivists for the best part of three decades, but it rarely pops up in the mainstream conversation: what will become of our digital information? How long can we expect it to last? Will it be available social historians or our children 50 years from now, let alone 100? For a culture obsessed with measuring the future, we are strangely reluctant to consider these questions.

But then, from time to time, a person of sufficiently high profile raises the issue for the media to pay attention. The first such luminary may have been Stewart Brand, who declared in 1998 that ‘historians will consider this a dark age’. We can read Galileo’s correspondence from the 1590s but not Marvin Minsky’s from the 1960s, he went on to explain.

Now it’s the turn of Vint Cerf, Google vice-president and resident ‘Chief Internet Evangelist’ (his actual title), and generally regarded as one of the ‘co-founders’ of the Internet.

In a widely reported presentation to the American Association for the Advancement of Science (AAAS), Cerf expressed last week the belief that we live ‘in a digital Dark Age’ in which ‘an information black hole’ threatens to create ‘a forgotten generation, or even a forgotten century’. You couldn’t frame the problem in more alarmist, apocalyptic terms. But is true? Yes and no.

Consider a written English text that exists in two formats: as an analogue document printed on paper and as a digital document written in Word Perfect 2.0 and stored on a 3½ inch floppy disc formatted by a 1989 Apple Macintosh. The paper document will be readable for as long as people can comprehend the English language as it is spoken and written today, and the paper and ink hold together. By contrast, interpreting the digital document requires – besides knowledge of English – a 3½ inch floppy disc drive and the means to emulate the operation of a 1989 Apple Macintosh and of Word Perfect 2.0.

Because of the above, digital information needs to be periodically copied and transferred in order to survive, which places it in a unique position with respect to older analogue technologies. If you think that this is quite elementary and obvious, consider the grandiose title of the book with which Microsoft Press hailed in 1986 the mass marketing of the CD-ROM: The New Papyrus. The old papyrus was the dominant encoding and storage medium for written texts for approximately five thousand years. Meanwhile, the life expectancy of individual CD-ROMs is estimated to vary between 20 and 100 years, and the format is already all but obsolete.

Serious as the issues of data migration and device obsolescence are, the dependence of data on software has emerged over time as a more intractable problem. One of its first proponents was Jeff Rothenberg, author of the quip ‘digital information lasts forever – or five years, whatever comes first.’ In 1995, just as the Word Wide Web was exploding, Rothenberg wrote:

As documents become more complex than simple streams of alphabetic characters, it becomes increasingly meaningless to think of them as existing at all except when they are interpreted by the software that created them. The bits in each document file are meaningful only to the program that created that file. In effect, document files are programs, consisting of instructions and data that can be interpreted only by the appropriate software. That is, a document file is not a document in its own right: it merely describes a document that comes into existence only when the file is ‘run’ by the program that created it. Without this authoring program – or some equivalent viewing software – the document is held cryptic hostage to its own encoding.

Cerf used his AAAS speech to advocate a possible solution, in the form of a protocol under development at Carnegie Mellon for encoding data along with its software. It’s not the first time this idea has been mooted, and although I’m not a technical expert, I suspect it would run into the issues highlighted in 2013 by David Rosenthal. Namely:

No matter how faithful the emulator may be, almost all programs these days execute in a context of network services, which themselves execute in a context of network services, and so on ad infinitum. Simply preserving the bits and the ability to re-execute them is not enough if the network services the bits call on are no longer available, or return different results. (…) Every execution of an instance of the kind of digital artifact that readers commonly encounter, such as web pages, is now a unique experience.

In other words, to fully preserve documents produced on networks such as the internet requires nothing less than the dynamic, real-time storage of the network itself – a technical impossibility.

This is the bad news. The slightly better news is that this is a greater problem for institutions like national libraries and repositories – which are used to taking a comprehensive approach to preservation – than for individuals. Duplicating and migrating a personal archive of documents and photographs is easier than preserving a website, let alone the internet at large. As Samuel Gibbs has noted in response to Cerf’s ominous prophecy, the formats for the majority of common file types such as videos, images and text are at this time relatively open and stable. Most of us can get by.

However there is a tension here that needs to be accounted for between the personal and the collective dimension of the problem. Ours is the first generation since the internet came of age, which makes this moment uniquely important. Further: isn’t the promise of the internet precisely to make the world’s information infinitely accessible – acting at once as its storage and delivery mechanism – therefore to become the culture? If this is true, how do we measure the loss of this collective archive, and is the sum of each of our personal archives enough to make up for it?

I wrote recently for Overland about the right to oblivion and the search for an internet that forgets. Cerf presents us with the ironic inversion of that proposition, in the form of an internet that cannot remember and is ultimately doomed to forget itself. It won’t come to that. But in between apocalyptic pronouncements we have yet to discover how to safeguard and carry forward the things that are truly important to us, which is the work of cultural memory.

Giovanni Tiso

Giovanni Tiso is an Italian writer and translator based in Aotearoa/New Zealand and the editor of Overland’s online magazine. He tweets as @gtiso.

Overland is a not-for-profit magazine with a proud history of supporting writers, and publishing ideas and voices often excluded from other places.

If you like this piece, or support Overland’s work in general, please subscribe or donate.

Related articles & Essays

Contribute to the conversation

  1. It begs the question how much of the content of what was written on a 1989 Macintosh is relevant? Was it a rambling mission statement of a long forgotten company, the yearly budget projections of a medium sized franchise, or a treatsie on Pepsi Cola versus Coke? Was it a ground breaking alternative theory of Heim’s Hyperdrive, lost forever? Or video’s of their pet dog barking?

    There is so much data captured in electronic format only about 1 to 5% (I believe) is processed in any meaningful way. I’d go further and say “who’s job will it be to sift and sort through all of this mind boggling array of information?”

    Lets start hoarding old 386’s PC’s and don’t throw away that old dot matrix printer! That Pascal code you did in 1991 could also come in in handy, perhaps show programmers of today how it was done. However, will it run on the latest compiler? Can we reverse engineer the binary code, and if so, is it worth the effort?

    I agree, strongly, that old technology is not necessarily bad technology – I always travel with pad and pen – I don’t need a wall socket to recharge it and I can write and read when and what I want. It only costs a few silver coins, too. I wrap it in a plastic waterproof envelope that I found on the street. Simplicity is bliss.

    I am pitching a similar essay to Overland about the staggering amount of data captured and what it means in regards to our privacy, freedoms and choices. Is the next step to take away our cash and make us all buy android / iphones that you “swipe” for every transaction? WHO does this benefit – certainly not me. There is already a system when you use your smart phone to enter your house with a compicated array of keys, smart phone and your own fingerprint. handy if you live in Fort Knox.

    I use an old fashioned metal key.

    Do we really need video cameras on every street corner? Do I need multi-nationals tracking my google searches and having some overzealous idiot call me on a Saturday morning trying to sell me health insurance?

    I say no.

    Keep it simple.

    1. “There is so much data captured in electronic format only about 1 to 5% (I believe) is processed in any meaningful way. I’d go further and say “who’s job will it be to sift and sort through all of this mind boggling array of information?””

      That’s the thing, though. There is a theory that says we must keep everything because it is not for us to say what will be regarded as important by future generations of scholars (or just people); further to that, proponents of the sociology of text like Don McKenzie would argue that we need to concern ourselves with the material circumstances of textual production, and preserve those as well – even though digital technology conspires to erase them. Almost every side of the problem is wicked. And saying but oh, most information we produce is meaningless anyway is not quite the answer, nor is suggesting that whatever survives will have survived because of some inherent merit (that’s not how culture works). Removing oneself from the equation and producing one’s texts by means of older technologies (Cerf says: print your pictures) is a solution only at an individual level, whereas once entire libraries and archives are migrated, and most textual production occurs online, it is clear that digital preservation is a problem that concerns us collectively.

  2. One other thing: a friend and I are full supporters of self storage.

    The Cloud vs. Self Storage debate is another can of worms.
    Pros and Cons, I’ll be pitching it as an essay.

  3. An interesting post. It’s almost a case of producing more than ever so we’ll lose more than ever. That being said while we have projects like the Internet Archive, which retains as best it can, as someone who writes code (utilities and new media works) I am painfully aware of the interdependency and ephemerality of working with software. Even with version control all sorts of configurations are lost even in the simplest of programs. Software is written against moving targets of hardware and various libraries let alone the changing web.

    However, my current project which relies on quite a lot of nineteenth century documentation arguably also has the problem of that “to fully preserve documents produced on networks such as the internet requires nothing less than the dynamic, real-time storage of the network itself – a technical impossibility”.

    I have letters without context, photos without descriptions and basically a lot of documentation of events without the “dynamic, real-time storage” of the society at the time. So, I can only read competing histories and other external sources as a sort of emulation of that period (though certainly worse than a C64 emulation of an ’80s game).

    I’m also not sure that “duplicating and migrating a personal archive of documents and photographs is easier than preserving a website”. While we’ll certain lose dynamic aspects of the website it’s not that much easier, worse still these physical archives or at least the ones I use that have missing documents, degrade and are laborious to digitise. And once digitised face the same problems.

    1. This is all true, but a key problem is that digital and analogue artifacts degrade in a completely different fashion. And they also become separated from their context in very different ways.

      Lose a CD ROM for longer than its life span, you’ll be left left with a plastic coaster. Lose a roll of film in a damp environment for one hundred years, you’ll still be able to read some of the information on it. Develop it, probably. Date it, almost certainly. You’ll be able to tell many things based on the stock, and so forth. Being immaterial, digital information relies on the context being supplied as digital information, in the form of metadata, which is likely to vanish at the same time as the rest of the information.

      Two years ago I had the unlikely experience of finding undeveloped photos from my parents wedding, in 1961, hidden in a piece of furniture and that nobody knew existed. I was able to develop the film and get perfect pictures. I don’t think you could hope to find and recover lost digital pictures in that way. (I wrote about the experience here).

  4. What if we were to flip back into orality and oracy through oral interfacing, just as we flipped from orality and oracy into literality and literacy through writing and reading? Powers controlling the written medium didn’t seem to care too much about the impending loss of oracy and oral cultures at the time, more a case of good riddance seemingly. Would we care again, or simply say good riddance to reading a load of written shite, and with the new medium, would it become a case of too much to conserve and can’t be bothered retrieving and listening? Whither culture and memory?

    1. That was Walter Ong’s hypothesis: that with electronic media we have reached a stage of secondary orality. So maybe we should tackle the problem of cultural transmission as oral cultures do. Personally I think it’s our best recourse, and that we have much exciting work and thinking to do.

      1. either way the new literacy will be (is) the capacity to control machines (electronic objects etc.), whether in the sense of an electronic revolution akin to the industrial revolution, or something other, who can say

        1. and to spell it out, the other, of course, being the stuff of sci-fantasy: battles with robots, computer technology surpassing human attributes

  5. Good piece thanks! What you haven’t addressed is that some file formats are more ‘closed’ than others. Part of the problem of reading a Word Perfect file is that the specficiation for the file format were probably never opened up to public view – they were all locked up in proprietary software.
    However, if you use an open format (e.g. -> .odt files for text etc) and open source software, the way in which the file is encoded is open for all to view, and hence has a much better chance of being able to be opened in many years time. (Moreover, the source code for the original software will be available…)

  6. Fascinating discussion.But bearing in mind that despite literacy and oralcy and therefore despite advancing and changing technologies to keep us aware of the past we still manage to make the same mistakes politically and socially. Seems to me we might be better off learning how to learn.

Leave a Reply

Your email address will not be published. Required fields are marked *