The wisdom of a crowd is often in the eye of the beholder, but most of us understand that, at its most basic level, “crowd wisdom” refers to a fairly simple phenomenon: when you ask a whole bunch of random people a question that can be answered with a number (eg, what’s the population of Swaziland?) and then you add up all the answers and divide the sum by the number of people providing those answers – ie, calculate the average – you’ll frequently get a close approximation of the actual answer. Indeed, it’s often suggested, the crowd’s average answer tends to me more accurate than an estimate from an actual expert. As the science writer Jonah Lehrer put it in a column in the Wall Street Journal on Saturday:
The good news is that the wisdom of crowds exists. When groups of people are asked a difficult question – say, to estimate the number of marbles in a jar, or the murder rate of New York City – their mistakes tend to cancel each other out. As a result, the average answer is often surprisingly accurate.
To back this up, Lehrer points to a new study by a group of Swiss researchers:
The researchers gathered 144 Swiss college students, sat them in isolated cubicles, and then asked them to answer [six] questions, such as the number of new immigrants living in Zurich. In many instances, the crowd proved correct. When asked about those immigrants, for instance, the median guess of the students was 10,000. The answer was 10,067.
Except, well, it’s not quite that clear-cut. In fact, it’s not clear-cut at all. If you read the paper, you’ll find that the crowd did not “prove correct” in many instances. The only time the crowd proved even close to correct was in the particular instance cited by Lehrer – and that was only because Lehrer used the median answer rather than the mean. In most cases, the average answer provided by the crowd was wildly wrong.
Peter Freed, a neuroscience researcher at Columbia, let loose on Lehrer in a long, amusing blog post, arguing that he (Lehrer) had misread the evidence in the study. Freed pointed out that if you look at the crowd’s average answers – “average” as in “mean” – to the six questions the researchers posed, you’ll find that they are, as Freed says, “horrrrrrrrrrrrrendous”:
… the crowd was hundreds of percents – yes, hundreds of percents – off the mark. They were less than 100% off in response to only one out of the six questions! At their worst – to take a single value, as Lehrer wrongly did with the 0.7% [median] – the 144 Swiss students, as a true crowd (unlike the 0.7%), guessed that there had been 135,051 assaults in 2006 in Switzerland – in fact there had been 9,272 – an error of 1,356%.
Or, as the researchers themselves report:
In our case, the arithmetic mean performs poorly, as we have validated by comparing its distance to the truth with the individual distances to the truth. In only 21.3% of the cases is the arithmetic mean closer to the truth than the individual first estimates.
So, far from providing evidence that supports the existence of the wisdom-of-crowds effect, the study actually suggests that the effect may not be real at all, or at least may be a much rarer phenomenon than we assume.
But since this is statistics, that’s by no means (no pun intended) the end of the story. As the researchers go on to explain, it’s quite natural for a crowd’s average answer, calculated as the mean, to be way too high – and hence ridiculously unwise. That’s because, while individuals’ underestimates for these kinds of questions are bounded at zero, there’s no upper bound to their overestimates. “In other words,” as the researchers write, “a minority of estimates are scattered in a fat right tail,” which ends up skewing the mean far beyond any semblance of “wisdom.”
Fortunately (or not), the arcane art of statistics allows you to correct for the crowd’s errors. By massaging the results – “tuning” them, as the researchers put it – you can effectively weed out the overestimates and (presto-chango) manufacture a wisdom-of-crowds effect. In this case, the researchers performed this magic by calculating the “geometric mean” of the group’s answers rather than the simple “arithmetic mean”:
As a large number of our subjects had problems choosing the right order of magnitude of their responses, they faced a problem of logarithmic nature. When using logarithms of estimates, the arithmetic mean is closer to the logarithm of the truth than the individuals’ estimates in 77.1% of the cases. This confirms that the geometric mean (i.e., exponential of the mean of the logarithmized data) is an accurate measure of the wisdom of crowds for our data.
Well, it further turns out that the median answer – the centermost individual answer – in a big set of answers often replicates, roughly, the geometric mean. Again, that’s no big surprise. The median, like the geometric mean, serves to neutralize wildly wrong guesses. It hides the magnitude of people’s errors. The researchers point this fact out in their paper, but Freed, having criticizing Lehrer for a sloppy reading of the study, seems to have overlooked that point. Which earns Freed a righteous tongue-lashing from another blogger, the physics professor Chad Orzel:
Freed’s proud ignorance of the underlying statistics completely undermines everything else. His core argument is that the “wisdom of crowds” effect is bunk because the arithmetic mean of the guesses is a lousy estimate of the real value. Which is not surprising, given the nature of the distribution – that’s why the authors prefer the geometric mean. He blasts Lehrer for using a median value as his example, without noting that the median values are generally pretty close to the geometric means – all but one are within 20% of the geometric mean – making the median a not-too-bad (and much easier to explain) characterization of the distribution.
You get the sense that this could go on forever. And I sort of hope it does, because I enjoyed Lehrer’s original column (the main point of which, by the way, was that the more a crowd socializes the less “wise” it becomes), and I enjoyed Freed’s vigorous debunking of Lehrer’s reading of (one part of) the study, and I also enjoyed Orzel’s equally vigorous debunking of (one part of) Freed’s debunking.
But beyond the points and counterpoints, there is a big picture here, and it can be described this way: Even in its most basic expression, the wisdom-of-crowds effect seems to be exaggerated. In many cases, including the ones covered by the Swiss researchers, it’s only by using a statistical trick that you can nudge a crowd’s responses toward accuracy. By looking at the geometric mean rather than the simple arithmetic mean, the researchers performed the statistical equivalent of cosmetic surgery on the crowd: they snipped away those responses that didn’t fit the theoretical wisdom-of-crowds effect that they wanted to display. As soon as you start massaging the answers of a crowd in a way that gives more weight to some answers and less weight to other answers, you’re no longer dealing with a true crowd, a real writhing mass of humanity. You’re dealing with a statistical fiction. You’re dealing, in other words, not with the wisdom of crowds, but with the wisdom of statisticians. There’s absolutely nothing wrong with that – from a purely statistical perspective, it’s the right thing to do – but you shouldn’t then pretend that you’re documenting a real-world phenomenon.
Freed gets at this point in a comment he makes on Orzel’s post:
Statistics’ dislike of long right tails is *not a scientific position.* It is an aesthetic position that, at least personally, I find robs us of a great deal of psychological richness … [T]o understand the behavior of a crowd – a real world crowd, not a group of prisoners in segregation – or of society in general, right tails matter, and extreme opinions are over-weighted.
The next time somebody tells you about a wisdom-of-crowds effect, make sure you ask them whether they’re talking about a real crowd or a statistically enhanced crowd.