The shape of the tail
August 14, 2006
When it comes to evaluating a tail, which matters more: its length or its shape? The answer, of course, is largely a matter of personal taste. But Douglas Galbi argues, compellingly, that we have been so focused on long tails and short tails that we have forgotten that tails of all lengths can come in an infinite variety of shapes. And those shapes - the slopes the tails form - are not fixed. They can change, and change dramatically, over time - even if (and this is important) the number of items in the tail remains the same. Here's Galbi:
For a concrete example, consider the popularity of the ten-most-popular given names. The set of possible given names (given names on offer) is huge, and probably hasn't changed much in the past two-hundred years. However, the popularity of the ten-most-popular given names for males in England has fallen from about 85% in 1800 to about 28% in 1994. If you want to understand changes in the popularity of the most popular items in a collection of symbols instantiated and used in a similar way, try to understand this change.
The shape of demand in a market, in other words, depends on many factors beyond just the number of items on offer, and it can vary independently of that number. Galbi believes "that size, which tail authorities have categorized as long or short, matters less than shape." So could it be that, when it comes to the pattern of demand in a market, even a market or purely informational goods, the effect of the Internet may be considerably less important than we currently assume?
UPDATE: On a related note, Chris Edwards takes a close look at Jakob Nielsen's drooping tail. Plotting a demand curve on a linear scale, it appears, may mask important variations in tail shape that become clear when the curve is plotted on a logarithmic scale. Edwards and Nielsen come to different conclusions about what one particular (and probably common) tail shape may mean.
UPDATE 2: Chief Long Tailer Chris Anderson has also been thinking about tail shape and in particular the differences between the classic long tail (powerlaw) and the drooping tail (lognormal). (Anderson has also posted a presentation on the subject that he gave at Google last weekend.) "The difference between those two curves," he writes, "is the subject of a lot of research at the cutting edge of complexity theory, and the simple answer seems to be that it comes down to the nature of the network effects that create unequal ('rich get richer') distributions such as the powerlaw and lognormal in the first place." Anderson's focus on network effects as the ultimate determinant of demand patterns seems in line with Edwards's focus on the nature and dynamics of linking in a market. Galbi's view seems very different; he seems to be warning against trying to explain a market's demand pattern solely in terms of network effects or any other universal "laws." Markets are messier than that, is what (I think) he's implying. My own sense - and I speak here as an ignorant bystander - is that the long tail and the drooping tail are overarching themes that provide a great deal of insight into the workings of markets in general but that don't necessarily provide all that much help in understanding variations on those themes - and all real markets will, to one extent or another, be variations on the themes.
The set of "given names on offer" has changed dramatically in the past 30 years.
In the United States, it used to be common for immigrant and non-white communities to choose "mainstream" American names (Mike, Joe, Roy, etc).
This is no longer the case, resulting in a (welcome) explosion of new names - some related to a country of origin, some to an ethnic identity, others essentially made up. For example, the 70th most popular girl's name in 2005 was "Neaveh" ("Heaven" spelled backwards) as described in the NYTimes.
I'm not sure if this invalidates Albi's entire point, but certainly the "name" tail has increased in length quite a bit. That Albi starts by positing it "hasn't changed in much in 200 years" suggests a blinkered view.
oops, misspelled "Nevaeh"!
Finn, You also misspelled Galbi, so clearly the analysis of names is not your strong suit. :-)
But you raise a good point, and I wonder how Galbi would respond. His analysis, by the way, is limited to England and ends in 1994. Also, I don't think your point undermines his so much as underscores it. There was no hard limit to the number of potential names 200 years ago any more than there is today, yet the shape of the tail changed dramatically, indicating that there were other factors at work (including the demographic and social factors you allude to).
Posted by: Nick Carr at August 14, 2006 04:45 PM
I want to build on finn's point, and note the comparison is of VERY different societies. I believe England in 1800 was a lot less ethnically diverse in population statistics than 1994.
If you change the market, you change the results. This isn't wrong - but it's not as meaningful as it may appear at first.
That is, his statement appears to be something of a statistical artifact, from comparing Caucasians to Caucasions + Indian immigrants + African immigrants.
Posted by: Seth Finkelstein at August 14, 2006 05:17 PM
The declining popularity of the most popular names in England occurred steadily from 1800 onward. Immigration to England changed little from 1800 through to WWII, but the popularity of the ten-most-popular male names fell from 85% in 1800 to 38% in 1925. See the name popularity table.
A similar change in name popularity occured in the U.S., despite a rather different immigration record.
The number of given names on offer is difficult to specify. If you consider this number to be the number of names already produced (in use), it may be roughly proportional to population (constant per capita innovation rate). That would make the number of names on offer in England about six times larger in 1994 than in 1800. Empirically, the number is not well-defined without a specified approach to spelling mistakes, non-standardized spellings, second names, homophonic names, and nicknames.
Analyzing the total number of given names on offer isn't an insightful approach to considering the popularity of the most popular names.
Posted by: Douglas Galbi at August 14, 2006 09:10 PM
Mr. Carr wrote, "So could it be that, when it comes to the pattern of demand in a market, even a market or purely informational goods, the effect of the Internet may be considerably less important than we currently assume?"
The effect of the Internet in lowering transaction costs (search & discovery) is probably hard to quantify but intuitively, it would seem to exert a large effect.
What the Internet has done is to make it easier to find it an outlet for one's esoteric tastes, multiplied that by a hundred thousand users, and demand will increase no matter how obscure the product may be.
Posted by: Allen Tan at August 15, 2006 03:36 AM
Galbi's argument is good, but it doesn't go quite far enough. The question for me is whether there's any reason to use the 'power law' model at all - and whether there might be ways of analysing 'Long Tail' data with more explanatory power than lining columns up from highest to lowest and looking at the shape of the curve.
A while ago, I crunched some numbers from NZ Bear's long-running blog popularity contest and found it impossible to make them fit any consistent value of 1/x: the curve was either too steep to begin with or too shallow further out. I don't pretend to understand Cosma Shalizi's maths, but this post
appears to suggest that there are lots of distributions which can look like a power law, and most of them can be better explained by other means. (As Adam pointed out at
even a vanilla normal distribution can look a bit like a power-law curve, if you order the values from high to low - but it's not exactly a gain in information.)
We are being distracted by a false dichotomy here: "Which matters more: its length or its shape?"
Clearly, what's far more important is its colour.
It seems that shape and length are only relevant for historical analysis, since it is impossible to know with certainty at the time what the natural market curve is. And, even if you had this information, it is somewhat doubtful that anyone outside the search analytics business could employ it usefully.
What is certain is that the internet provides increased accessibility to products and information about products which allows us to find and select items that would be otherwise unavailable to us. In this regard, the internet would be beneficial to long tail sales even without its attendant benefit of reducing or eliminating distribution and inventory costs.
It is also noteworthy that long tail sales and marketing is not dependent on the internet, only accelerated by it. Many categories of goods that Chris does not discuss in his book follow long tail models such as art, wine, antiques and collectibles, real estate, expertise, and so on. The truly interesting insights will come from understanding the commonalities and differences of these markets and how the long tail observations that Chris has documented can be used by producers rather than retailers to improve the way they build customer relationships and market their products.
Posted by: Paul at August 16, 2006 12:03 AM
Post a comment
Thanks for signing in, . Now you can comment. (sign out)(If you haven't left a comment here before, you may need to be approved by the site owner before your comment will appear. Until then, it won't appear on the entry. Thanks for waiting.)
"Riveting" -San Francisco Chronicle
"Rewarding" -Financial Times
"Ominously prescient" -Kirkus Reviews
"Riveting stuff" -New York Post