Throughout human history, the documentation of events and thoughts usually required a good deal of time and effort. Somebody had to sit down with a stylus or a pen or, later, a typewriter or a tape recorder, and make a deliberate recording. That happened only rarely. Most events and thoughts vanished from memory, individual and collective, soon after they occurred. If they were described or discussed at all, it was usually in conversation, face to face or over a phone line, and the words evaporated as they were spoken.
That’s all changed now. Thanks to digital tools, media, and networks, recording is easy, cheap, and often automatic. Hard drives, flash drives, CDs, DVDs, and other storage devices brim with audio, video, photographic, and textual recordings. Evidence of even the most trivial of events and thoughts, communicated through texts, posts, status updates, and tweets, is retained in the data centers of the companies that operate popular Internet sites and services.
We live, it seems, in a golden age of documentation. But that’s not quite true. The problem with making a task cheap and effortless is that the results of that task come to be taken for granted. You care about work that’s difficult and expensive, and you want to preserve its product; you don’t pay much attention to the things that happen automatically and at little or no cost. In Avoiding a Digital Dark Age, an article appearing in the new edition of American Scientist, Kurt Bollacker, of the Long Now Foundation, expertly describes the conundrum of digital recording: everything’s documented, but the documents don’t last. The problem stems from the fact that, with digital recordings, we don’t only have to preserve the data itself; we have to preserve the devices and techniques used to read the data and output it in a form we can understand. As Bollacker writes:
With most analog technologies such as photographic prints and paper text documents, one can look directly at the medium to access the information. With all digital media, a machine and software are required to read and translate the data into a human-observable and comprehensible form. If the machine or software is lost, the data are likely to be unavailable or, effectively, lost as well.
The problem is magnified by the speed with which old digital media and recording techniques, including devices and software, are replaced by new ones. It’s further magnified by the fact that even modest damage to a digital recording can render that recording useless (as anyone who has scratched a CD or DVD knows). In contrast, damage to an analog recording – a scratch in a vinyl record, a torn page in a book – may be troublesome and annoying, but it rarely renders the recording useless. You can still listen to a scratched record, and you can still read a book with a missing page. Analog recordings are generally more robust than digital ones. As Bollacker explains, history reveals a clear and continuing trend: “new media types tend to have shorter lifespans than older ones, and digital types have shorter lifespans than analog ones.” The lifespan of a stone tablet was measured in centuries or millennia; the lifespan of a magnetic tape or a hard drive is measured in years or, if you’re very lucky, decades.
After describing the problem, Bollacker goes on to provide a series of suggestions for how digital recordings could be made more robust. The suggestions include applying better error correction algorithms when recording data and being more thoughtful about the digital formats and recording techniques we use. None of the recommendations would be particularly difficult to carry out. What’s required more than anything else is that people come to care about the problem. Apathy remains the biggest challenge in combating digital decay.
But there’s a new wrinkle to this story, and it’s one that Bollacker doesn’t address in his article: the cloud. Up to now, there has been one characteristic of digital recordings that has provided an important counterweight to the fragility of digital media – it’s what Bollacker refers to as “data promiscuity.” Because it’s easy to make copies of digital files, we’ve tended to make a lot of them. The proliferation of perfect digital copies has provided an important safeguard against the loss of data. An MP3 of even a moderately popular song will, for instance, exist on many thousands of computer hard drives as well as on many thousands of iPods, CDs, and other media. The more copies that are made of a recording, and the more widely the copies are dispersed, the more durable that recording becomes.
By centralizing the storage of digital information, cloud computing promises to dramatically reduce data promiscuity. When all of us are able to, in effect, share a copy of a digital file, whether a song or a video or a book, then we don’t need to make our own copies of that file. Cloud computing replaces the download with the stream, and that means that, as people come to use the cloud as their default data store, we’ll have fewer copies of files and hence less of the protection that multiple copies provides. Indeed, in the ultimate form of cloud computing, you’d need only a single copy of any digital recording.
Apple’s new iPad, which arrived with much fanfare over the weekend, provides a good example of where computing is heading. The iPad is much more of a player than a recorder. It has a much smaller storage capacity than traditional desktops and laptops, because it’s designed on the assumption that more and more of what we do with computers will involve streaming data over the Net rather than storing it on our devices. The iPad manifests a large and rapidly accelerating trend away from local, redundant storage and toward central storage. In fact, I’d bet that if you charted the average disk size of personal computers, including smartphones, netbooks and tablets as well as laptops and desktops, you would discover that in recent years it has shrunk, marking a sea change in the history of personal computing. An enormous amount of digital copying and local storage still goes on, of course, but the trend is clear. Streaming will continue to replace downloading, and the number of copies of digital recordings will decline.
The big cloud computing companies take the safeguarding of data very seriously, of course. For them, loss of data means loss of business, and catastrophic data loss means catastrophic business loss. A company like Google stores copies of its files in many locations, and it takes elaborate steps to protect its data centers and systems. Nevertheless, one can envision a future scenario (unlikely but not impossible) involving a catastrophe – natural, technological, political, or even commercial – that obliterates a cloud operator’s store of data. More prosaically, companies go out of business, change hands, and alter their strategies and priorities. They may not always care that much about data that once seemed very important, particularly data that has lost its commercial value. A business exists to make money, not to run an archive in perpetuity. Seen in this light, our embrace of the cloud may have the unintended effect of making digital recordings even more fragile, especially over the long run.
As digital recordings displace physical ones, the risks expand. Think about books. Google’s effort to scan every physical book ever published into its database has been compared to the creation of the great library of Alexandria. Should Google (or another organization) succeed in creating an easy-to-use, universally available store of digital books, we might well become dependent on that store – and take it for granted. We would stream books as we today stream videos. In time, we would find fewer and fewer reasons to maintain our own digital copies of books inside our devices; we would keep our e-books in the cloud. We would also find it increasingly hard to justify the cost of keeping physical copies of books, particularly old ones, on shelves, either in our homes or in libraries.
At that point, if we hadn’t been very, very careful in how we developed and maintained our great cloud library, we would be left with few safeguards in the event that, for whatever foreseeable or unforeseeable reason, that library was compromised or ceased to function. We all know what happened to the library of Alexandria.