Category Archives: Uncategorized

Digital decay and the archival cloud

Throughout human history, the documentation of events and thoughts usually required a good deal of time and effort. Somebody had to sit down with a stylus or a pen or, later, a typewriter or a tape recorder, and make a deliberate recording. That happened only rarely. Most events and thoughts vanished from memory, individual and collective, soon after they occurred. If they were described or discussed at all, it was usually in conversation, face to face or over a phone line, and the words evaporated as they were spoken.

That’s all changed now. Thanks to digital tools, media, and networks, recording is easy, cheap, and often automatic. Hard drives, flash drives, CDs, DVDs, and other storage devices brim with audio, video, photographic, and textual recordings. Evidence of even the most trivial of events and thoughts, communicated through texts, posts, status updates, and tweets, is retained in the data centers of the companies that operate popular Internet sites and services.

We live, it seems, in a golden age of documentation. But that’s not quite true. The problem with making a task cheap and effortless is that the results of that task come to be taken for granted. You care about work that’s difficult and expensive, and you want to preserve its product; you don’t pay much attention to the things that happen automatically and at little or no cost. In Avoiding a Digital Dark Age, an article appearing in the new edition of American Scientist, Kurt Bollacker, of the Long Now Foundation, expertly describes the conundrum of digital recording: everything’s documented, but the documents don’t last. The problem stems from the fact that, with digital recordings, we don’t only have to preserve the data itself; we have to preserve the devices and techniques used to read the data and output it in a form we can understand. As Bollacker writes:

With most analog technologies such as photographic prints and paper text documents, one can look directly at the medium to access the information. With all digital media, a machine and software are required to read and translate the data into a human-observable and comprehensible form. If the machine or software is lost, the data are likely to be unavailable or, effectively, lost as well.

The problem is magnified by the speed with which old digital media and recording techniques, including devices and software, are replaced by new ones. It’s further magnified by the fact that even modest damage to a digital recording can render that recording useless (as anyone who has scratched a CD or DVD knows). In contrast, damage to an analog recording – a scratch in a vinyl record, a torn page in a book – may be troublesome and annoying, but it rarely renders the recording useless. You can still listen to a scratched record, and you can still read a book with a missing page. Analog recordings are generally more robust than digital ones. As Bollacker explains, history reveals a clear and continuing trend: “new media types tend to have shorter lifespans than older ones, and digital types have shorter lifespans than analog ones.” The lifespan of a stone tablet was measured in centuries or millennia; the lifespan of a magnetic tape or a hard drive is measured in years or, if you’re very lucky, decades.

After describing the problem, Bollacker goes on to provide a series of suggestions for how digital recordings could be made more robust. The suggestions include applying better error correction algorithms when recording data and being more thoughtful about the digital formats and recording techniques we use. None of the recommendations would be particularly difficult to carry out. What’s required more than anything else is that people come to care about the problem. Apathy remains the biggest challenge in combating digital decay.

But there’s a new wrinkle to this story, and it’s one that Bollacker doesn’t address in his article: the cloud. Up to now, there has been one characteristic of digital recordings that has provided an important counterweight to the fragility of digital media – it’s what Bollacker refers to as “data promiscuity.” Because it’s easy to make copies of digital files, we’ve tended to make a lot of them. The proliferation of perfect digital copies has provided an important safeguard against the loss of data. An MP3 of even a moderately popular song will, for instance, exist on many thousands of computer hard drives as well as on many thousands of iPods, CDs, and other media. The more copies that are made of a recording, and the more widely the copies are dispersed, the more durable that recording becomes.

By centralizing the storage of digital information, cloud computing promises to dramatically reduce data promiscuity. When all of us are able to, in effect, share a copy of a digital file, whether a song or a video or a book, then we don’t need to make our own copies of that file. Cloud computing replaces the download with the stream, and that means that, as people come to use the cloud as their default data store, we’ll have fewer copies of files and hence less of the protection that multiple copies provides. Indeed, in the ultimate form of cloud computing, you’d need only a single copy of any digital recording.

Apple’s new iPad, which arrived with much fanfare over the weekend, provides a good example of where computing is heading. The iPad is much more of a player than a recorder. It has a much smaller storage capacity than traditional desktops and laptops, because it’s designed on the assumption that more and more of what we do with computers will involve streaming data over the Net rather than storing it on our devices. The iPad manifests a large and rapidly accelerating trend away from local, redundant storage and toward central storage. In fact, I’d bet that if you charted the average disk size of personal computers, including smartphones, netbooks and tablets as well as laptops and desktops, you would discover that in recent years it has shrunk, marking a sea change in the history of personal computing. An enormous amount of digital copying and local storage still goes on, of course, but the trend is clear. Streaming will continue to replace downloading, and the number of copies of digital recordings will decline.

The big cloud computing companies take the safeguarding of data very seriously, of course. For them, loss of data means loss of business, and catastrophic data loss means catastrophic business loss. A company like Google stores copies of its files in many locations, and it takes elaborate steps to protect its data centers and systems. Nevertheless, one can envision a future scenario (unlikely but not impossible) involving a catastrophe – natural, technological, political, or even commercial – that obliterates a cloud operator’s store of data. More prosaically, companies go out of business, change hands, and alter their strategies and priorities. They may not always care that much about data that once seemed very important, particularly data that has lost its commercial value. A business exists to make money, not to run an archive in perpetuity. Seen in this light, our embrace of the cloud may have the unintended effect of making digital recordings even more fragile, especially over the long run.

As digital recordings displace physical ones, the risks expand. Think about books. Google’s effort to scan every physical book ever published into its database has been compared to the creation of the great library of Alexandria. Should Google (or another organization) succeed in creating an easy-to-use, universally available store of digital books, we might well become dependent on that store – and take it for granted. We would stream books as we today stream videos. In time, we would find fewer and fewer reasons to maintain our own digital copies of books inside our devices; we would keep our e-books in the cloud. We would also find it increasingly hard to justify the cost of keeping physical copies of books, particularly old ones, on shelves, either in our homes or in libraries.

At that point, if we hadn’t been very, very careful in how we developed and maintained our great cloud library, we would be left with few safeguards in the event that, for whatever foreseeable or unforeseeable reason, that library was compromised or ceased to function. We all know what happened to the library of Alexandria.

The post-book book

The iPad’s iBooks application may or may not become our e-reader of choice – even uber-fanboy David Pogue seems a mite skeptical this morning – but the model of book reading (and hence book writing) the iPad promotes seems fated, in time, to become the dominant one. The book itself, in this model, becomes an app, a multihypermediated experience to click through rather than a simple sequence of pages to read through. To compete with the iPad, the current top-selling e-reader, Amazon’s Kindle, will no doubt be adding more bells and whistles to its suddenly tired-seeming interface. Already, Amazon has announced it will be opening an app store for the Kindle later this year. “People don’t read anymore,” Steve Jobs famously said, and the iPad emanates from that assumption.

John Makinson, the CEO of publishing giant Penguin Books, is thrilled about the iPad’s potential to refresh his company’s product line. “The definition of the book itself seems up for grabs,” he said at a recent media industry powwow. Unlike traditional e-book readers, which had a rather old-fashioned attachment to linear text, the iPad opens the doors to incorporating all sorts of “cool stuff,” Makinson continued. “We will be embedding audio, video and streaming into everything we do.” He foresees sprinkling movie clips among Jane Austen’s paragraphs in future editions of “Pride and Prejudice.” No need to conjure up a picture of Lizzie Bennet in your own mind; there’s Keira Knightley stomping through the grounds of Netherfield, cute as a mouse button.

Makinson gave a preview of the post-book book, which seems unsurprisingly toylike:

A sentence from The Shallows may be pertinent here: “When a printed book is transferred to an electronic device connected to the Internet, it turns into something very like a Web site.” Makinson’s presentation leads Peta Jinnath Andersen, of PopMatters, to ask, “What makes a book a book?” A book, she concludes, is just “a delivery system” for text, and one delivery system is as good as another: “How the words are delivered doesn’t matter.” A stone tablet is a scroll is a wax tablet is a scribal codex is a printed book is a Kindle is an iPad. And yet history shows us that each change in the physical form of the written word was accompanied by a change – often a profound one – in reading and writing habits. If the delivery system mattered so much in the past, are we really to believe that it won’t matter in the future?

Jobs is no dummy. As a text delivery system, the iPad is perfectly suited to readers who don’t read anymore.

Greenpeace raids the cloud

In late 2006, I wrote a post about the energy consumption of modern computing plants, in which I made a prediction:

As soon as activists, and the public in general, begin to understand how much electricity is wasted by computing and communication systems – and the consequences of that waste for the environment and in particular global warming – they’ll begin demanding that the makers and users of information technology improve efficiency dramatically. Greenpeace and its rainbow warriors will soon storm the data center – your data center.

Soon is now. Today, Greenpeace issued a report on “cloud computing and its contribution to climate change,” in which it specifically targets big cloud operators like Google, Amazon, Apple, Facebook, Salesforce.com, and Microsoft. The report is timed to coincide with the launch of Apple’s iPad, an event that underscores just how dramatically personal computing has changed, and expanded, over the last few years. Many of us now own a slew of computers in various forms – desktops, laptops, smartphones, iPods, tablets, e-readers, gaming consoles – that don’t just suck up electricity themselves but are connected to the vast cloud grid that also consumes enormous amounts of energy. Drawing mainly on a 2008 analysis by the Climate Group and the Global e-Sustainability Initiative, Greenpeace predicts that the electricity consumed by the cloud – defined as both Internet data centers and the communications network that connects all of us to those centers – will rise from 623 billion kWh in 2007 to 1,964 billion kWh in 2020.

The rise of cloud computing is a two-edged sword when it comes to energy consumption and related carbon emissions. On the one hand, since electricity is a critical component of the cost of running a cloud operation, major cloud computing providers like Google and Microsoft have a big economic incentive to become more energy efficient, and they have been admirably aggressive in pioneering technologies that reduce energy use. The energy-conserving equipment, designs, and processes that the cloud giants invent should in time spread throughout the information technology industry, making computing in general much more energy efficient. At the same time, however, the free data and services supplied through the cloud are rapidly expanding the scope of computing and its attractiveness – people use computers, particularly internet-connected computers, much more than in the past – and so even as computing is becoming more efficient, when measured by units of output, the dramatic expansion in its use means that it is, in absolute terms, sucking up much more electricity than it has in the past, a trend that promises to accelerate pretty much indefinitely.

What that means is that, as the Greenpeace report makes clear, both the economic and the political stakes involved in mitigating the environmental impact of the cloud will increase. Greenpeace argues that what’s important is not only the efficiency of data centers but the sources of the power they use. The heavenly cloud, it turns out, runs largely on earthbound coal. In this regard, it singles out Facebook for criticism:

Facebook’s decision to build its own highly-efficient data centre in Oregon that will be substantially powered by coal-fired electricity clearly underscores the relative priority for many cloud companies. Increasing the energy efficiency of its servers and reducing the energy footprint of the infrastructure of data centres are clearly to be commended, but efficiency by itself is not green if you are simply working to maximise output from the cheapest and dirtiest energy source available.

Greenpeace also links Apple’s decision to locate a huge cloud data center in North Carolina to that state’s cheap electricity supplies, which come mainly from coal-fired plants. Other companies, including Google, also run big data center operations in the Carolinas. Noting that the IT industry “holds many of the keys to reaching our climate goals,” Greenpeace says that it is pursuing a “Cool IT Campaign” that is intended to pressure the industry to “put forward solutions to achieve economy-wide greenhouse gas emissions reductions and to be strong advocates for policies that combat climate change and increase the use of renewable energy.”

The Greenpeace action promises to intensify the public’s focus on the cloud’s environmental shadow. But while Greenpeace’s main target appears to be the big cloud providers, its report also suggests, if only in passing, that the devices that all of us use to connect to the cloud actually consume more energy than the cloud itself. Those of us who spend a large proportion of our waking hours peering into multiple computer screens can’t offload responsibility for the environmental consequences of our habits to companies like Google and Facebook. The cloud, after all, exists for us.

The Shallows at SXSW

I will be reading from my forthcoming book, The Shallows, a week from today at the South by Southwest conference in Austin. The reading is scheduled to take place on March 16 at 11:30 am on the Day Stage. If you are in the neighborhood, and are properly badged, please stop by.

A typology of crowds

Over the last few days, I’ve been involved in an email discussion on “The Crowd,” which will be excerpted on PBS’s Digital Nation site. One thing that has long bothered me about discussions of online crowds is that they tend to yoke lots of different sorts of groups together under a single rubric. Important differences end up being glossed over.

With that in mind, I’ve been trying to think through the various forms that online crowds take. As a rough starting point, I came up with four:

“Social production crowd”: consists of a large group of individuals who lend their distinct talents to the creation of some product like Wikipedia or Linux.

“Averaging crowd”: acts essentially as a survey group, providing an average judgment about some complex matter that, in some cases, is more accurate than the judgment of any one individual (the crowd behind prediction markets like the Iowa Electronic Markets, not to mention the stock market and other financial exchanges).

“Data mine crowd”: a large group that, through its actions but usually without the explicit knowledge of its members, produces a set of behavioral data that can be collected and analyzed in order to gain insight into behavioral or market patterns (the crowd that, for instance, feeds Google’s search algorithm and Amazon’s recommendation system).

“Networking crowd”: a group that trades information through a shared communication system such as the phone network or Facebook or Twitter.

Clay Shirky, who is also participating in the discussion, suggested a fifth crowd type for this list:

“Transactional crowd”: a group used to instigate and coordinate what are mainly or solely point-to-point transactions, such as the type of crowd gathered by Match.com, eBay, Innocentive, LinkedIn and similar services. (I would think that contests like the Netflix Prize also fall into this category.)

Each of these “crowds” (and there are surely others) has its own unique characteristics and its own unique strengths and weaknesses. Some crowds, for instance, gain their usefulness from the individual talents of their members. Others (notably the “averaging” sort) gain their usefulness by essentially filtering out those individual talents. Some crowds might be called “hives,” which implies some degree of individual unconsciousness about how one’s work or behavior fits into the larger whole, while others aren’t anything like mindless hives. Some crowds become more useful as they get bigger; others work best when kept to a small scale. “Crowdsourcing” and its cousin “digital sharecropping” may draw on any or all of the different types of crowds, to various effects and with various ethical implications.

As this nascent typology indicates, there’s not really any such thing as “The Crowd.”

UPDATE: Tom Lord, in a comment, suggests a sixth category:

“Event crowd”: A group organized through online communication for a particular event, which can take place either online or in the real world and may have a political, social, aesthetic, or other purpose.

The end of corporate computing, revisited

Five years ago, in early 2005, I wrote an article for the MIT Sloan Management Review called “The End of Corporate Computing.” The article, which predicted an imminent shift to “utility computing,” was the seed for my book The Big Switch. Usually, the article lies behind the Review’s paywall, but for the moment it is freely available to read. Here’s a bit from the beginning of the piece:

[Information technology] is beginning an inexorable shift from being an asset that companies own in the form of computers, software and myriad related components to being a service that they purchase from utility providers. Few in the business world have contemplated the full magnitude of this change or its far-reaching consequences. To date, popular discussions of utility computing have rarely progressed beyond a recitation of IT vendors’ marketing slogans …

The prevailing rhetoric is, moreover, too conservative. It assumes that the existing model of IT supply and use will endure, as will the corporate data center that lies at its core. But that view is perilously shortsighted. The traditional model’s economic foundation already is crumbling and is unlikely to survive in the long run. As the earlier transformation of electricity supply suggests, IT’s shift from a fragmented capital asset to a centralized utility service will be momentous. It will overturn strategic and operating assumptions, alter industrial economics, upset markets and pose daunting challenges to every user and vendor. The history of the commercial application of information technology has been characterized by astounding leaps, but nothing that has come before — not even the introduction of the personal computer or the opening of the Internet — will match the upheaval that lies just over the horizon.

A little breathless, maybe, but I was looking in the right direction. Here’s the rest of it.

Also out from behind the Review’s paywall is Andrew McAfee’s influential 2006 article “Enterprise 2.0: The Dawn of Emergent Collaboration.”