Wilf this

Having recently been called “a bit of a tosser” in a comment on this very blog (unfortunately, the comment appeared before the Code of Conduct was issued; if it had been posted after the CoC, I could have deleted it as uncivil with a clear conscience), I have taken a new interest in the slang used by our coalition partners across the pond. I was therefore very excited to note today that, as reported in the Guardian, the Brits have coined a great new term for an old affliction: wasting time on the web. They call it “wilfing,” which, apparently, is derived from the universal question “What was I looking for?”

Wilfing may be the single most common activity of internet patrons. A big new survey of web users reveals that two-thirds confess to engaging in wilfing (the other third are tossers) and one-quarter spend at least 30% of their online time in wilfing mode, which represents, the Guardian notes, “the equivalent of spending an entire working day every fortnight pointlessly jumping between random pages.” That’s a lot of wilfing.

Anyway, go check out the article. You have nothing better to do.

Salesforce vs. Google

Up to now, Salesforce.com and Google have been defined, competitively, by their mutual disdain for the Horrible Monster of Redmond. (“I don’t think it makes sense for me to comment on the words and actions of Steve Ballmer,” sniffs Chief Googler Eric Schmidt at the start of a new Wired interview.) But with Salesforce’s announcement today of a broad push into content management, the two new-age enterprise IT companies are now on a collision course. They’re not just bedfellows anymore. They’re competitors.

One of Google’s central strategic thrusts is to store “100%” of users’ data. That, it seems clear now, doesn’t just apply to consumer users. It applies to business users as well. Google is in the early stages of a major, multi-year thrust into the corporate market. In that Wired interview, Schmidt calls the company’s fledgling package of business applications, Google Apps, its “most interesting” opportunity to find new sales growth beyond advertising. Google, he says, is “already beginning to get some significant enterprise deals … Corporations are tired of dealing with the complexity of the old model, and our products are now strong enough to serve business needs reliably.”

In announcing Salesforce’s expansion into content management, through the launch of Salesforce Content and its ContentExchange service and the related acquisition of Koral, CEO Marc Benioff called the move “a decisive step towards our vision of managing all information on demand. With Salesforce Content, we not only manage a company’s traditional structured information, but their unstructured information as well.” Salesforce, too, wants ultimately to be the repository of 100% of a company’s data. To underscore the point, Salesforce exec Bruce Francis told Richard MacManus that the company aims “to help our customers manage and share all their business information on-demand.”

It’s illuminating to think of Salesforce and Google as competitors because their distinctive strengths point to some of the future terms of rivalry in the new IT market. Salesforce’s strength lies in the customer interface – not just the friendly interface of its software services but its Benioff-fueled prowess as a marketer to businesses. Google, to put it generously, is relatively weak in those areas. Where Google’s strength lies is in the back-end infrastructure. Its network of data centers, and the software that connects them, presents a forbidding technological (and financial) challenge to other vendors seeking to “own” all of a client’s data.

Both Google and Salesforce are still small fish in the ocean of enterprise data. But as they swim ahead of the fat bottom-feeders of the old IT market, they may at the moment be the most interesting fish in the sea.

Thanks, Tim and Jimbo!

According to Technorati, there are now 70 million blogs in existence. That can make it very difficult to figure out which blogs you should pay attention to and which aren’t worth your time. But fortunately for us all, Tim O’Reilly and Jimbo Wales have teamed up to introduce a nifty system that will make our lives much easier. In the future, blogs that can safely be ignored will be marked with a cute little badge that looks like this:

bcclogo.gif

Teaching computers to see

The “human operated nodes” of Amazon’s Mechanical Turk may soon have competition, at least when it comes to identifying objects in photographs. Researchers at the University of California at San Diego are making progress in developing a machine-learning approach that enables computers to automatically interpret photographs and other images, reports Technology Review.

As described in a paper that appears in the latest issue of IEEE Transactions on Pattern Analysis and Machine Intelligence, the system, called Supervised Multiclass Labeling (SML), combines semantic, or text, labels that describe an image’s contents with a statistical analysis of the image. A computer is first trained to recognize an object – a tree, say – by being shown many images containing the object that have been labeled, or tagged, with the description “tree” by people. The computer learns to make an association between the tag and a statistical, pixel-level analysis of the image. It learns, in effect, to spot a tree, regardless of where the tree happens to appear in a given image.

Having been seeded with intelligence, the computer can then begin to interpret images on its own, applying probabilities to what it “sees” (eg, “there is an 80% probability that this picture contains a tree”). As it interprets more and more images, the computer becomes smarter and the tags it applies to images more accurate. The computer-generated tags can then be used as the basis for an automated image-search service.

As shown in the example below, the labels a trained computer applies to images bear a disconcertingly strong resemblance to the tags that people give:

03-07Vasconcelos-compare.jpg

In fact, according to the researchers – Nuno Vasconcelos, Gustavo Carneiro, and Antoni Chan of UCSD and Pedro Moreno of Google – the tags generated by the machines can be more precise than those assigned by people because people tend to be less rigorous and more subjective than computers. People’s tags contain a lot of noise, as do the searches that are based on them. The authors write:

When compared with previous approaches, SML has the advantage of combining classification and retrieval optimality with 1) scalability in database and vocabulary sizes, 2) ability to produce a natural ordering for semantic labels at annotation time, and 3) implementation with algorithms that are conceptually simple and do not require prior semantic image segmentation. We have also presented the results of an extensive experimental evaluation, under various previously proposed experimental protocols, which demonstrated superior performance with respect to a sizable number of state-of-the-art methods, for both semantic labeling and retrieval.

Tests of the SML system at Google “indicate that the system can be used on large image collections,” according to Chan. In a brief video, Vasconcelos explains the system’s workings and says that the technique can be applied to other machine-learning challenges, such as teaching computers to understand sounds or read text. Give computers a little intelligence, and there’s just no stopping them.

The real Web 2.0

While the Techmeme crowd oohs and aahs over the latest social networking knockoff from Silicon Valley – do you think, perhaps, we’ve reached the point of diminishing returns? – the real second generation of the internet continues to take shape quietly in places like Quincy, Washington; Lenoir, North Carolina; and San Antonio, Texas. Web 2.0 isn’t about applications. It’s about bricks and mortar. It’s about capital assets. It’s about infrastructure.

Yesterday, Google formally announced that, in addition to building a big utility computing plant in Lenoir, it will also build one a little to the south, at a 520-acre site in Mt. Holly, South Carolina, near Charleston. The company will be reimbursed by the state for some of its building expenses, and, the governor reports, legislators have “updated the state tax code to exempt the electricity and the capital investment in equipment necessary for this kind of a facility … from sales tax,” an exemption similar to one granted manufacturers. Google expects to invest $600 million in the facility and hire a modest 200 workers to man the largely automated plant. Google may also build yet another data center in Columbia, South Carolina.

At a pork barbecue celebrating the announcement of the data center deal, Google held a question and answer session with local dignitaries, but it was characteristically closed-mouthed about the details of its operation. Asked how it uses water and electricity at its sites, Google executive Rhett Weiss said, “We’re in a highly competitive industry and, frankly, one or two little pieces of information like that in the hands of our competitors can do us considerable damage. So we can’t discuss it.”

Meanwhile, one of those competitors, Microsoft, has just put into operation the first phase of a new data center in Quincy, Washington, not far from a recently built Google center in The Dalles, Oregon. A reporter from a newspaper in San Antonio, where Microsoft plans to build its next big center, got a sneak peak inside the tightly guarded Quincy plant:

The Microsoft building looks like a massive manufacturing plant, with a banklike security system. A perimeter fence and a security guard regulate who gets in and out. Inside the main lobby, employees need badges with radio frequency identification smart chips to enter. Even with a badge, they still have to go through telephone-booth-sized revolving tubes in which they insert their hand into biometric scanner to gain entry …

It’s easy to get lost inside Microsoft’s main building, which contains long halls with a tile floor and a maze of rooms centering around five 12,000-square-foot brain centers that contain tens of thousands of computer servers. Each server room has two adjoining rooms lined with refrigerator-sized air-conditioning units to keep the temperature between 60 to 68 degrees Fahrenheit. Another room contains row after row of batteries to kick in for 18 seconds if a power failure should occur before the truck-sized backup generators fire up.

The 470,000-square-foot data center is the first of an expected six centers that Microsoft will build on the Quincy site, a former bean field. The facility will ultimately encompass 1.5 million square feet of server-packed space. Nearby, Yahoo, Intuit, and Ask.com are also building big computing centers to power their online services. The attraction of the area is cheap power:

High-tech companies come here for the nation’s cheapest hydroelectric power rates, thanks to Grant County’s two enormous dams, which pump out power as cheap as 1.5 cents per kilowatt-hour, said Tim Snead, the city’s administrator. That compares with a national industrial rate of 9 cents. The data centers gobble up 40-plus megawatts of electricity each.

Now, you have to admit: this is a lot more interesting than Flackr or Knackr or Wankr or whatever the newest new thing is called.

Ozzie walks the line

Knowledge@Wharton has published an excellent (once you get past the interminable introduction) interview with Ray Ozzie, Microsoft’s official software visionary. Ozzie doesn’t say anything unexpected, but he provides a through and often subtle explanation of his view of the future of software and of Microsoft.

He lays out what he sees as the five great transformations in the computer business – “mainframes to minis, minis to PCs, PCs to LANs, LANs to the web, the web to where we’re going, which is services” – and argues that “we only have one shared future as a software industry. And that is centrally deployed code that has a different lifetime associated with it on the device it’s deployed to.” The lifetime may be as brief as the length of time a browser window remains open or it may be as long as a person owns a device. “All apps – whether Win32 code, Flash code, managed WPF [Windows Presentation Foundation] code – are going to have those lifetime choices and will all be centrally deployed, whether that central deployment is from an enterprise or from a service provider on the web. The concept of CD-based installs, floppy-based installs or USB stick installs are artifacts of a time when we were not fully connected.”

In that context, Ozzie emphasizes his belief that software for the foreseeable future will be a hybrid of code running centrally, somewhere in the Internet “cloud,” and code running locally, on a PC or other device.

When we, as an industry, communicate the meaning of an architectural shift to customers, sometimes it’s great to take an extreme position because it helps people to understand the benefit of this new era. In first generation “software as a service,” people tried to push the browser as far as it could go. But the most important mission of vendors is to figure out what value they are delivering, not how they are delivering it. If you look at people like Salesforce.com, they may talk “software as a service” being [exclusively] through a browser, but they have an offline edition … What we as an industry need to deliver are seamless experiences – however those things are accomplished – to do the appropriate thing in the browser and the appropriate thing on a laptop or on a device to solve that problem. So the way I view it is, first generation “software as a service” really just meant browser. Second generation means weave together hardware, software and services to accomplish a specific solution.

Ozzie divides the challenges facing Microsoft into “little i” innovation and “big I” innovation. “Little i” innovation involves the ongoing improvement of its traditional products – Office and Windows, in particular – in their traditional form. “Big I” innovation means adapting to the shift to the provision of software as a set of online services and achieving the right hybrid solutions for customers.

So each group within Microsoft – and in our industry – is at a point where we should be saying, “If we’re aspiring to deliver productivity to a customer, how should we best weave that into services that are deployed through a browser? What aspects do you want mobile? What kind of synchronization should automatically be built in? Should I use the camera in that mobile device to snap a picture of the white board and have it automatically go up to the service and integrate it with the other documents related to this meeting that I’m working on?”

Ozzie is asked why he, and Microsoft in general, has been slow to unveil concrete details of how the company will transform its software and its business in response to the “services disruption.” He responds:

I have been trying to work internally on some fairly interesting things and we will talk about them as they become more real. We’re out there talking about what the most important things are to deliver for the company today, which are Office and Vista. Those are the primary things we are talking about right now. Are we ready right now to talk about how to change the game in search or how Microsoft might weave services into our various offerings? No, we’re not. But we will.

Here we see the very real dilemma that Microsoft finds itself in today. To maintain its growth and profits, it has to focus for the time being on promoting the new versions of Office and Windows, even though they are, by Ozzie’s own implication, relics of the past – “artifacts of a time when we were not fully connected.” If the company were to excitedly “talk about how to change the game,” through offering software more as a set of services deployed centrally, it would trample on the Office and Vista sales pitches. Who wants to buy relics of the past?

What Microsoft is counting on is that the transformation of the software business will proceed at a measured pace, that the company will be able to continue to reap large profits from its traditional products even as it slowly and steadily changes their nature. “Whenever someone has a very successful business,” says Ozzie, “there is absolutely a risk of innovator’s dilemma. I believe it’s too soon to tell whether there is a significant risk of that kind of disruption in [Microsoft’s] core businesses – simply because we’re in the early days of understanding the role of web-based productivity versus PC-based productivity. I am not one to believe that suddenly you snap fingers and everything that you do on the PC is doable on the web.”

Counting on a measured and manageable transition – and in particular on the sustainability of existing pricing and profit models – is a risk, but it would seem to be a risk that Microsoft has little choice but to take.