The shrinking web

In a column in today’s Guardian, I look at the consolidation of online traffic and content at a small number of “information plantations” – the megasites like Google, MySpace, Facebook, and Wikipedia that increasingly dominate the new medium.

Snippet:

On the internet, the big get bigger.

It wasn’t supposed to be like that. When the web arrived in the early 1990s, it was heralded as a liberating force that would free us from the confines of gated communities like AOL and Compuserve. The web was supposed to be an open, democratic medium, an information bazaar putting individuals on the same footing as big companies. In the end, though, the internet seems to be following the same pattern that has always characterised popular media. A few huge outlets come to dominate readership and viewership and smaller, more specialised ones are consigned to the periphery.

Big Switch: contents and preorders

Amazon has begun taking advance orders for my next book, The Big Switch: Our New Digital Destiny. The book’s official publication date is January 7, 2008, though I’ve heard there’s a good chance it will be available in time for holiday gift-giving. (I personally cannot imagine a finer gift.) Amazon is discounting pre-orders by 34%, and should the price go down further between now and when the book ships, you’ll get the lower price.

Here is the book’s table of contents:

Prologue: A Doorway in Boston

Part 1: One Machine

1. Burden’s Wheel

2. The Inventor and His Clerk

3. Digital Millwork

4. Goodbye, Mr. Gates

5. The White City

Part 2: Living in the Cloud

6. World Wide Computer

7. From the Many to the Few

8. The Great Unbundling

9. Fighting the Net

10. A Spider’s Web

11. iGod

Epilogue: Flame and Filament

In due course, I’ll be providing more information about The Big Switch at the book site.

Facebookipedia

I was at a college graduation ceremony yesterday, and when one of the student speakers mentioned Wikipedia the graduates broke into applause. “Now we can finally admit that we use Wikipedia for research,” the speaker continued. That brought another round of cheers from the kids as well as some futile boos and hisses from parents and faculty.

Also yesterday, Facebook let it be known that it would launch a free classified-advertising service, which will compete with Craigslist. That’s a smart move. Facebook’s core users – college and high-school kids – are also big users of Craigslist. When Facebookers go off-network, Craigslist is probably one of their most likely destinations. So creating an in-network version of Craigslist will significantly expand Facebook’s control over its members’ online time. “We don’t try to lock people up or take more of their time,” Facebook founder Mark Zuckerberg fibs to the New York Times today. Then he tells the truth: “If we can provide people with efficient tools, they will use the site more.” Every page view Zuckerberg steals from Craigslist is money in his pocket.

But if Craigslist is a big draw for Facebook members, my guess is that Wikipedia is an even bigger draw. I’m too lazy to look for the stats, but Wikipedia must be at or near the top of the list of sites that Facebookers go to when they leave Facebook. To generalize: Facebook is the dorm; Wikipedia is the library; and Craigslist is the mall. One’s for socializing; one’s for studying; one’s for trading.

Which brings me to my suggestion for Zuckerberg: He should capitalize on Wikipedia’s open license and create an in-network edition of the encyclopedia. It would be a cinch: Suck in Wikipedia’s contents, incorporate a Wikipedia search engine into Facebook (Wikipedia’s own search engine stinks, so it should be easy to build a better one), serve up Wikipedia’s pages in a new, better-designed Facebook format, and, yes, incorporate some advertising. There may also be some social-networking tools that could be added for blending Wikipedia content with Facebook content.

Suddenly, all those Wikipedia page views become Facebook page views – and additional ad revenues. And, of course, all the content is free for the taking. I continue to be amazed that more sites aren’t using Wikipedia content in creative ways. Of all the sites that could capitalize on that opportunity, Facebook probably has the most to gain.

Womb-based SEO

Here’s a sign of the times. Expectant parents are beginning to google prospective baby names to ensure that their kids won’t face too much competition in securing a high search rank. The Wall Street Journal reports on one example of a couple using search engine optimization in picking a name:

When [Abigail] Wilson, now 32, was pregnant with her first child, she ran every baby name she and her husband, Justin, considered through Google to make sure her baby wouldn’t be born unsearchable. Her top choice: Kohler, an old family name that had the key, rare distinction of being uncommon on the Web when paired with Wilson. “Justin and I wanted our son’s name to be as special as he is,” she explains.

Hmm. If SEOing babies tempts parents to name a child after a toilet manufacturer, I’m not sure it’s such a great idea.

Google preparing to police web

Increasingly worried by the use of conventional web sites to distribute the viruses that turn innocent PCs into botnet “zombies,” Google appears to be readying a plan to police the web. If the plan goes forward, Google will use new software to automatically identify compromised web pages in its database and label them as “potentially harmful” in its search results. Because being labeled as suspicious by Google could devastate a site’s traffic, the move would raise the security stakes for site owners dramatically.

Google security specialist Niels Provos tells New Scientist, “The firewall is dead.” He’s referring to a shift in the way botnet infections are spread – and it’s this shift that’s making Google particularly nervous. Botnet viruses used to be distributed mainly through email attachments or computer worms, both of which could be blocked by firewalls or sniffed out by antivirus software. Over the past year, however, the operators of botnets have shifted to using regular web sites to distribute their malware. Reports New Scientist:

As users have grown wary of email attachments and installed firewalls and anti-virus software, however, the bad guys have shifted their attentions to websites in a bid to find more victims … Even an ordinary website can be risky. At a meeting on botnets held last month in Cambridge, Massachusetts, Provos warned that many web users are becoming the victims of “drive-by” downloads of bots from innocent websites corrupted to exploit browser vulnerabilities. As firewalls allow free passage to code or programs downloaded through the browser, the bot is able to install itself on the PC. Anti-virus software kicks in at this point, but some bots avoid detection by immediately disabling it.

A recent Google study, led by Provos, discovered “around 450,000 web pages that launched drive-by downloads of malicious programs. Another 700,000 pages launched downloads of suspicious software. More than two-thirds of the malicious programs identified were those that infected computers with bot software or programs that collected data on banking transactions and emailed it to a temporary email account.”

Anything that makes people wary of visiting web sites or clicking on links stands as a big threat to Google’s business. It’s not surprising, then, that the company has a unit investigating the dissemination of malware through the web. The paper that Provos and four of his Google colleagues have written on the subject, The Ghost in the Browser, explains how Google is preparing to respond to the threat by incorporating an automated security analysis into its routine spidering and indexing of sites:

To address this problem and to protect users from being infected while browsing the web, we have started an effort to identify all web pages on the Internet that could potentially be malicious. Google already crawls billions of web pages on the Internet. We apply simple heuristics to the crawled pages repository to determine which pages attempt to exploit web browsers. The heuristics reduce the number of URLs we subject to further processing significantly. The pages classified as potentially malicious are used as input to instrumented browser instances running under virtual machines. Our goal is to observe the malware behavior when visiting malicious URLs and discover if malware binaries are being downloaded as a result of visiting a URL. Web sites that have been identified as malicious, using our verification procedure, are labeled as potentially harmful when returned as a search result. Marking pages with a label allows users to avoid exposure to such sites and results in fewer users being infected.

The authors note that Web 2.0 trends, including the incorporation of user-generated content and third-party widgets into sites, raise the risk of innocent sites being exploited by botnet masters. For example, they write:

Many websites feature web applications that allow visitors to contribute their own content. This is often in the form of blogs, profiles, comments, or reviews. Web applications usually support only a limited subset of the hypertext markup language, but in some cases poor sanitization or checking allows users to post or insert arbitrary HTML into web pages. If the inserted HTML contains an exploit, all visitors of the posts or profile pages are exposed to the attack. Taking advantage of poor sanitization becomes even easier if the site permits anonymous posts, since all visitors are allowed to insert arbitrary HTML.

The paper goes into considerable detail about the system Google is building for identifying suspicious pages. Given the stakes involved, site owners and designers may want to give it a careful read.

UPDATE: As noted in a comment to this post by Google’s Matt Cutts, the company’s anti-malware program is actually already under way.

The tweet-filled void

James Governor has posted two love notes to Twitter over the last couple of days. In the latest, he argues that Twitter’s 140-character limit promotes brevity. He says that the suggestion that you can’t be “either deep or meaningful” in 140 characters or fewer is nonsense – it’s “evidence of the verbosity of our culture.”

I think he’s right that there are far too many words in circulation today, and I also think he’s right that meaning and even profundity can come in tweet-sized packages. But I think he’s wrong to suggest that Twitter is the friend of brevity. For that to be true, we’d have to assume that the messages streaming through Twitter are briefer than they would have been otherwise – that they’ve been pared down to their essence, like telegraphs. I don’t think that’s what’s happening. I don’t think that most tweets are substitutes for longer messages. Rather, they’re additional verbiage layered atop all the existing verbiage. Twitter adds to the great landfill of words; it doesn’t subtract from it.

Twitter, in other words, is the real “evidence of the verbosity of our culture.” But it’s more than that. It speaks to what seems to be a growing fear of silence, of being alone with one’s thoughts. It’s as if there’s some great emptiness that we have to keep throwing words into. To hold one’s tongue is to risk – what, exactly?