CNet’s Tom Krazit has posted a brief but very interesting interview with the Berkeley economist Hal Varian, who now serves as one of Google’s big thinkers. Krazit asks Varian whether search scale offers a quality advantage – in other words, does the ability to collect and analyze more data on more searches translate into better search results and better search-linked ads. Here’s the exchange:
Krazit: One thing we’ve been talking about over the last two weeks is scale in search and search advertising. Is there a point at which it doesn’t matter whether you have more market share in looking to make your product better?
Varian: Absolutely. We’re very skeptical about the scale argument, as you might expect. There’s a lot of aspects to this subject that are not very well understood.
On this data issue, people keep talking about how more data gives you a bigger advantage. But when you look at data, there’s a small statistical point that the accuracy with which you can measure things as they go up is the square root of the sample size. So there’s a kind of natural diminishing returns to scale just because of statistics: you have to have four times as big a sample to get twice as good an estimate.
Another point that I think is very important to remember … query traffic is growing at over 40 percent a year. If you have something that is growing at 40 percent a year, that means it doubles in two years.
So the amount of traffic that Yahoo, say, has now is about what Google had two years ago. So where’s this scale business? I mean, this is kind of crazy.
The other thing is, when we do improvements at Google, everything we do essentially is tested on a 1 percent or 0.5 percent experiment to see whether it’s really offering an improvement. So, if you’re half the size, well, you run a 2 percent experiment.
So in all of this stuff, the scale arguments are pretty bogus in our view…
This surprised me because there’s a fairly widespread assumption out there that Google’s search scale is an important source of its competitive advantage. Varian seems to be talking only about the effects of data scale on the quality of results and ads (there are other possible scale advantages, such as the efficiency of the underlying computing infrastructure), but if he’s right that Google long ago hit the point of diminishing returns on data, that’s going to require some rethinking of a few basic orthodoxies about competition on the web.
I was reminded, in particular, of one of Tim O’Reilly’s fundamental beliefs about the business implications of Web 2.0: that a company’s scale of data aggregation is crucial to its competitive success. As he recently wrote: “Understanding the dynamics of increasing returns on the web is the essence of what I called Web 2.0. Ultimately, on the network, applications win if they get better the more people use them. As I pointed out back in 2005, Google, Amazon, ebay, craigslist, wikipedia, and all other Web 2.0 superstar applications have this in common.” (The italics are O’Reilly’s.)
I had previously taken issue with O’Reilly’s argument that Google’s search business is characterized by a strong network effect, which I think is wrong. But Varian’s argument goes much further than that. He’s saying that the assumption of an increasing returns dynamic in data collection – what O’Reilly calls “the essence” of Web 2.0 – is “pretty bogus.” The benefit from aggregating data is actually subject to decreasing returns, thanks to the laws of statistics.
That doesn’t mean that data scale wasn’t once crucial to the quality of Google’s search results. The company certainly needed a critical mass of data – on links, on user behavior, etc. – to run the analyses necessary to deliver relevant results. It does mean that the advantages of data scale seem to go away pretty quickly – and at that point what determines competitive advantage is smarter algorithms (ie, better ideas), not more data.