Nature’s Flawed Study of Wikipedia’s Quality

While most of the discussion at this week’s Open Source Business Conference was refreshingly pragmatic, focused on the commercial role and prospects of open source software, there were a few more cosmic moments. Notably, Mitch Kapor brought a bit of Wikimania to the proceedings, offering a Zen-like “meditation” on Wikipedia as a harbinger of a much broader open-source movement in the future. (Wikipreneur Ross Mayfield summarizes the talk.) Kapor believes that the community-run online encyclopedia explodes the myth “that someone has to be in charge” as well as the assumption “that experts count.” He argues that Wikipedia shows you can create high-quality products through the contributions of a broad, democratic community of amateurs, a self-governing collective operating on the internet without any hierarchy. That, in Kapor’s view, is “the next big thing.”

Kapor’s argument hinges on the contention that Wikipedia is actually good. In recent months, the quality of Wikipedia’s content has come under considerable criticism, accused of everything from libel to infantilism. Like many of the encyclopedia’s defenders, Kapor counters those criticisms by citing a recent article in the journal Nature that ostensibly proves that the quality of Wikipedia is “roughly equivalent” to that of the venerable Encyclopedia Britannica. The Nature article has become something of a get-out-of-jail-free card for Wikipedia and its fans. Today, whenever someone raises questions about the encyclopedia’s quality, the readymade retort is: “Nature says it’s as good as Britannica.”

Kapor’s remarks inspired me to take a look at that much-cited Nature article. I found that it was something less than I had expected. It is not one of the peer-reviewed, expert-written research articles for which the journal is renowned. (UPDATE: I confirmed this with the article’s author, Jim Giles. In an e-mail to me, he wrote, “The article appeared in the news section and is a piece of journalism, so it did not go through the normal peer review process that we use when considering academic papers.”) Rather, it’s a fairly short, staff-written piece based on an informal survey carried out by a group of Nature reporters. The reporters chose 50 scientific topics that are covered by both Wikipedia and Britannica, selecting entries that were of relatively similar length in both publications. For each topic, they also chose an academic expert. They then sent copies of both entries to the respective experts, asking them to list any “errors or critical omissions” appearing in the writeups. They received 42 responses.

The article itself doesn’t actually go into much detail about the survey’s findings. It says that the “expert-led investigation” revealed that “the difference in accuracy [between the encyclopedias] was not particularly great: the average science entry in Wikipedia contained around four inaccuracies; Britannica, about three.” But Nature subsequently released “supplementary information” about the survey, including more details on the methodology and a full list of the errors cited by the experts. (In total, Wikipedia had 162 errors while Britannica had 123.) Read together, the article and the supplementary information indicate that the survey probably exaggerated Wikipedia’s overall quality considerably.

First and most important, the survey looked only at scientific subjects. As has often been noted, Wikipedia’s quality tends to be highest in esoteric scientific and technological topics. That’s not surprising. Because such topics tend to be unfamiliar to most people, they will tend to attract a narrower and more knowledgeable group of contributors than will more general-interest subjects. Who, after all, would contribute to an entry on “kinetic isotope effect” or “Meliaceae” (both of which were in the Nature survey) than those who have some specialized understanding of the topic? The Nature survey, in other words, played to Wikipedia’s strength.

That’s fine. Nature is, after all, a scientific journal. But, unfortunately, the narrowness of the survey has tended to get lost in media coverage of it. CNET, for instance, ran a story on the survey under the headline “Study: Wikipedia as Accurate as Britannica.” The story reported that “Nature chose articles from both sites in a wide range of topics” and that it found that “Wikipedia is about as good a source of accurate information as Britannica.” Such incomplete, if not misleading, descriptions have informed subsequent coverage. For example, one prominent technology blogger covering Kapor’s speech this week wrote simply that “a recent study showed that Wikipedia is just as accurate as the Encyclopedia Britannica.”

Second, the Nature reporters filtered out some of the criticisms offered by the experts. They note, in the supplementary information, that the experts’ reviews were “examined by Nature’s news team and the total number of errors estimated for each article. In doing so, we sometimes disregarded items that our reviewers had identified as errors or critical omissions. In particular, as we were interested in testing the entries from the point of view of ‘typical encyclopaedia users’, we felt that experts in the field might sometimes cite omissions as critical when in fact they probably weren’t – at least for a general understanding of the topic. Likewise, the ‘errors’ identified sometimes strayed into merely being badly phrased – so we ignored these unless they significantly hindered understanding.” Since the reporters don’t document the “errors or critical omissions” that they subjectively filtered out, it’s impossible to judge whether they applied more to one publication than the other. But the Nature article implies that, beyond the errors and omissions tallied by the survey, the expert reviewers offered considerable criticism of the quality of the writing in Wikipedia: “Several Nature reviewers [commented] that the Wikipedia article they reviewed was poorly structured and confusing.” The article notes further that such criticism of readability “is common among information scientists, who also point to other problems with article quality, such as undue prominence given to controversial scientific theories.” The findings of the Nature survey, in other words, appear to filter out criticisms of Wikipedia’s quality that the Nature reporters decided went beyond their definition of “accuracy.”

Third, in reporting the results, the Nature reporters view all inaccuracies as being equal. In reality, of course, there are considerable variations in the degree and importance of the inaccuracies. Fortunately, in the supplementary information, Nature documents all the errors and omissions cited by the expert reviewers. I am no expert in the subjects covered by the survey – and my judgement may be mistaken – but my sense in reading through the lists was that the inaccuracies in Wikipedia tended to be more substantial than those in Britannica. Here, for example, is a comparison of the inaccuracies noted in the first three entries (they’re arranged alphabetically):

Acheulean Industry

Britannica:

1. I would not use the term ‘early Homo sapiens’. Instead, use Homo heidelbergensis.

Wikipedia:

1. Cro-Magnons (early Homo sapiens) did not use the Acheulean!!

2. Date range is off, its about 1.5 my to 200 ka

3. The following statement is inaccurate and poorly written: ‘The period during which these these tools were innovated is usually thought to be the early Paleolithic era or the beginning of the middle Paleolithic era.’

4. I have no idea what this following statement means: ‘However, the Acheulean industry continued to be used by some primitive hominid cultures up until 100,000 years ago.’ It’s not correct.

5. This is an awful set of sentences: ‘by efficient scavengers, who were still preyed upon frequently by larger animals and often bewildered by their environment. Adversely, Acheulean tools gave their masters the ability to hunt and defend themselves successfully and gave them the distinction of being equally as deadly as the greatest predators of the prehistoric Earth.’ Early hominins were probably hunting and scavenging. Acheulean hominins also likely scavenged and hunted. Acheuelean tools are often associated with large carcasses, suggesting that they had access to large quantities of meat. The sentence about Acheulean hominins abilities is overstated.

6. Regarding Asia, I would say West and Southern Asia. Acheulean hominins did not spread to Eastern Asia.

7. The statement ‘It flourished roughly 400,000 to 100,000 years ago in Eastern Europe and Northern Asia.’ has nothing to do with the Acheulean, I am not sure what it means.

Agent Orange

Britannica:

1. A very minor error is that Agent Orange is considered by the Vietnamese to be the cause of the diseases listed in the second paragraph from the 1970s to the present, not just from the 1970s to the ’90s.

2. The entry should include the statement that other mixtures containing dioxin were also sprayed, including Agents Purple, Pink and Green, albeit in lesser amounts.

Wikipedia:

1. This entry implies that it was the herbicides that are problematic, which is not the case. It was dioxin, a byproduct of manufacture of 2,4,5-T that is of concern. Dioxin is persistent in the environment and in the human body, whereas the herbicides are not. In addition, there was a significant amount of dioxin in Agents Purple, Pink and Green, all of which contained 2, 4, 5 – T as well. However, we have less information on these compounds and they were used in lesser quantities.

2. The entry is on the verge of bias, at least. By use of the word “disputedly” in the second sentence there is at least an implication that the evidence of harm to exposed persons is in question. That is not the case, and the World Health Organization has identified dioxin as a “known human carcinogen”, and other organizations such as the US National Academy of Sciences has documented harmful effects to US Air Force personnel.

Aldol

Britannica:

1. The aldol REACTION is not the same as the aldol CONDENSATION.

2. Sodium hydroxide is by no means the only base to be used in the aldol and acid catalysed aldol reactions also occur (usually with concomitant loss of water).

3. The reaction steps in the second reaction sequence should be equilibria up to the dehydration step.

4. In particular, there is no mention of the acid catalysed process and scant mention of related reactions

Wikipedia:

1. The mechanisms of base and acid catalysed aldol reactions should have every step as an equilibrium process

2. The acid catalysed process should include the dehydration step, which occurs spontaneously under acid conditions and, being effectively irreversible, pulls the equilibrium through to product.

3. The statement that LDA is avoided [if] at all possible as it is difficult to handle is rubbish. Organic chemists routinely use this reagent – which they either make as required or use commercially available material.

If you were to state the conclusion of the Nature survey accurately, then, the most you could say is something like this: “If you only look at scientific topics, if you ignore the structure and clarity of the writing, and if you treat all inaccuracies as equivalent, then you would still find that Wikipedia has about 32% more errors and omissions than Encyclopedia Britannica.” That’s hardly a ringing endorsement.

The problem with those who would like to use “open source” as a metaphor, stretching it to cover the production of encyclopedias, media, and other sorts of information, is that they tend to focus solely on the “community” aspect of the open-source-software model. They ignore the fact that above the programmer community is a carefully structured hierarchy, a group of talented individuals who play a critical oversight role in filtering the contributions of the community and ensuring the quality of the resulting code. Someone is in charge, and experts do count.

The open source model is not a democratic model. It is the combination of community and hierarchy that makes it work. Community without hierarchy means mediocrity.

20 Comments

Filed under Uncategorized

20 Responses to Nature’s Flawed Study of Wikipedia’s Quality

  1. Makio Yamazaki

    Wiki is just like the logic of Web 2.0.

    In other words, reliable informations would be born by the opinions and a lot of information of the people.However, at the present, the Internet brings an information flood to us.

    Now we seek to discover a jewel from the inside of the information flood.

    One stop or the professional would be more and more important for us.

  2. Hi nick,

    I actually disagree with you regarding the quality of Wikipedia’s scientific articles. It might be that I have much higher standards for them. I posted my first blog entry ever there. Feel free to visit.

    Tony

  3. SJ

    Hi Nick,

    Nice to read you, as always. I agree with you about many of the problems with the Nature survey, which was unacceptably biased in favor of Wikipedia. Much better metrics, and more easily-evaluated metrics, are needed for reference works of all kinds. I hope that groups like the one which published this sketchy study address their energies towards this end.

    On the flipside, you might want to note (as it may not be obvious to all of your readers) that shortly after the full list of errors was published, Wikipedia was able to report that all the errors had been fixed.

    Cheers, SJ

  4. Good points on the flaws of the _Nature_ survey.

    I was looking into this myself, but never followed-up on some of it.

    More importantly:

    “They ignore the fact that above the programmer community is a carefully structured hierarchy, a group of talented individuals who play a critical oversight role in filtering the contributions of the community and ensuring the quality of the resulting code. Someone is in charge, and experts do count.”

    All true, plus, THERE ARE OBJECTIVE METRICS!

    The patch compiles, or it doesn’t. The bug is fixed, or it isn’t. Code is tested against external reality, rather than being a popularity contest as to whether most people polled think it should work. That’s the difference between being appealing, and being right.

  5. Sandra

    Hello Nick,

    The fact of the matter is that the Wikipedia community is not some grand experiment in Democracy. In fact, Jimmy Wales has stated in many public fora that in fact the bulk of the work that get’s done is performed by a closely-knit core of about 1000 Wikipedians. Something else you don’t cover in your piece is that Wikipedia is very swift in dealing with accuracy problems. All the errors in the Nature survey are long gone. In any case, Jimmy Wales has also stated that he does not feel that anyone should be citing Wikipedia, as it is merely a tertiary source. All information should be taken with a grain of salt, and this is no different for Britannica or Wikipedia.

    The concerns you raise about libel are of course very relevant, and this is something that the Wikipedia community has put a big focus on since the Seigenthaler incident.

  6. len

    Yes. The ‘wisdom of crowds’ is an averaging phenomenon. The reply to that should always be the question “crowds of what?”. The answer is typically, “self-selected contributors” and some say “self-selected experts”. The fact of self-selection does not cover the chasm.

    It is narcissism but a great deal of the web experience has been exactly that with the usual rewriting of history, ignoring inconvenient facts, and so forth. Yet at the end of the day, WikiPedia and the answers.com widget are very convenient and very productive resources. It leaves it to the reader just as to the viewer of CNN or Fox News to react slowly, check other resources, and bet only as much as one is willing to lose on a source.

    Web 2.0? Pushing big chunks of markup rather than a viewset to a client is a really old idea and yes it works marvelously. Load balancing has value in a distributed network, so wouldn’t one expect that? Is it pundit-established hype? Most certainly. That’s how O’Reilly makes money. So does Wired. All marketing works that way. See the articles on the marketing firm taking out adds for ring tones that act like pheromones. Do you believe it? Check around or buy a hula hoop. If the joy is in the having, spend. If the joy is in the certainty, validate and verify. No news here.

  7. Gianni

    I am always flabbergasted whenever people (including venerable Mitch Kapor, of course) use the EITHER/OR toggle when comparing e.g. WP and Britannica.

    Why on Earth should a random-chosen cross section of web surfers assume they can do a better job than a carefully, purposedly selected group of experts?

    The things that I find interesting about the whole WP project and similar others are rather the continuity of its improvement process or the speed at which errors can be fixed – you could almost say that the mere fact of publicly pointing out errors fixes them.

    So while the collective quality of the brainpower powering WP is not as good as Britannica’s, by attracting attention and scrutiny to its endeavor, it attracts also some of that superior brainpower, thereby improving its own pool.

    THAT is amazing!

  8. Rob

    I like to see the metrics taken even one step further. It’s of some use to me to know that wikipedia is 32% less accurate than britannica only if I use britannica as THE source. But if I question britannica, then I need an absolute measure and not a relative measure. The ideal study would count up the total number of facts in an article and then give the percentage that are correct.

    My other thought on reading this is how britannica really epitomizes an opportunity lost. Imagine if britannica had gone quicker to the web, allowing some type of user commentary. That would have been a powerful play.

  9. Expertise matters of course. But in having a dig at Wikipedia you leave unexamined the contention that expertise produces excellence. In peer review for scientific journals for example, it does not do so with any degree of reliability. (http://www.ama-assn.org/public/peer/prc_program2001.htm) Personally I find Wikipedia good enough on topics such as the history of statistical theory. But no one expects an encyclopaedia – produced in a meritocratic or a democratic way – to say anything really novel.

  10. ordaj

    Since there is a dearth of absolute truth in many of the topics, contributor bias will always creep in. It’s matter of degree. And if there is no referee to arbitrate, there will be a constant editing or “correcting” of material.

    Apart from this, the real value or appeal is in

    being able to contribute and to “talk back” at the usual rigid and closed media. It comes down to motivation.

  11. Nick-

    I think you’re right about the transferrence of open source aspirations from programming to Wikipedia producing a Quixotic view of the quality of W.

    However, how often does Britannica get edited, or errors corrected? I may be wrong, but it is not continuously.

    I believe this is a losing jag for you, Nick — this marking of expectations to market on Wikipedia. You’re forgetting context. I think it’s right to moderate the irrational exhuberance anywhere, even stamp it out; but the position that Wikipedia is not revolutionary because it’s not as good at some things as traditional publishing forms (by the standards of traditional publishing forms), or that readers cannot fend for themselves in the InfoStorm, is Elitist.

    Good enough here is what works. And as I’m an educated user, I can consult alternative sources to confirm information when accuracy is important. Rarely is the accuracy of what’s on Wikipedia strictly my first concern (since I’m smart and a good judge of quality in itself), nor is it ever the only concern.

    If the gas company is setting rates incorrectly because of false information on Wikipedia, we have other problems.

  12. Ian Blyth

    I will let these wise people quote on the fallacies of using statistics.

    “There are three kinds of lies – lies, damned lies and statistics.”

    Benjamin Disraeli

    “Statistics are like bikinis. What they reveal is suggestive, but what they conceal is vital.”

    Aaron Levenstein

    “Do not put your faith in what statistics say until you have carefully considered what they do not say.”

    William W. Watt

  13. Much Ado About Wikipedia

    Much ink has been spilled of late by the digerati praising or declaiming Wikipedia. Arrayed on one side are the likes of Mitch Kapor who assert with revolutionary ardor that Wikipedia is not only the next big thing but will

  14. Much Ado About Wikipedia

    Much ink has been spilled of late by the digerati praising or declaiming Wikipedia. Arrayed on one side are the likes of Mitch Kapor who assert with revolutionary ardor that Wikipedia is not only the next big thing but will

  15. Ross fact checks Carr fact checking Wikipedia

  16. Infidel

    As always the same: “All the errors have now been fixed.” “Wikipedia should not be used as a primary source.”

    To the first I reply, as always, “there are thousands of errors no one bothered to point out to you.” And to the second, “then don’t call yourself an encyclopedia.”

    In five years we will wonder how much time we actually spent on discussing this large scale “blog.”

  17. We had tremendous difficulty at the digital think tank building a definition of “digital culture” on Wikipedia due to the reasons you cited in your insightful crititique of “open source” without hierarchy.

    It strikes me that the Wiki model is to reflexively defensive when it comes to developing entries, a result of the totally unfiltered model of contribution, which tends to create a pathogenic pyschology towards creative contributions.

  18. SallyF

    I would like to point out that, after the easy fixes for the Nature study were done before Christmas, the progress on fixing the rest of the errors slowed down, so I assigned the task to myself of fixing the harder ones. I have a degree in Chemical Engineering and understand most science articles fairly well. I did about half of all the work in January 2006 to fix the errors. I was User:Pinktulip at the time and you can see the progress of my work through January 2006 in this history of this page. I decided to also go and write up an essay on the correction progress. In particular, I was annoyed to later hear David Weinberger suggest that “all but one” corrections was finished in the 24 hours after the report was made available. Such is the hype of Web 2.0.

    Having good science at Wikipedia would be great, but you also have to fight those who would take a controversial ethical issue and just flood it with NPOV but unimportant “scientific” information. I also wrote the Elizabeth Morgan article and I tried to more onto the Terri Schiavo article in February 2006. I ran into some pro-life resisters there. Even today, that aricle is treated as a medical science rather than as a matter of ethics and law. The pro-lifers just want to condemn the judge as evil, flood the article with dreary facts about the drawn-out delays Schiavo’s parents caused for five years. This just flatters the egos of these pro-life English majors and musicians who imagine that they are qualified and well-informed enough to second-guess the one-site professionals who assisted Terri and her relatives. They just want to demonize the judges who ruled against the parents. The kind of objectivity required to recognize that Terri Schiavo is about law, not science, will be destroyed if the pro-lifers like Musical_Linguist and FloNight and plagiarizers like David Gerard and the hate-list-makers like Jimmy Wales of the Wikipedia world are allowed to prevail.

  19. SallyF’s comment above encourages me to relay the story of an award-winning physicist who was shown the door at Wikipedia by an admin who is an undergraduate student. These are the words of Dr. John Harnad:

    ***

    Absurdity upon absurdity. Self appointed pundits who have no scientific competence whatsoever casting aspersions upon precise and pertinent remarks by experts in the field; then insulting them with their derisory remarks and even imperiously commanding them to desist from expressing themselves! “Administrators” with no other visible qualifications than the fact that they have made thousands of edits to Wikipedia, and have attained to certain special powers through a questionable process of scrutiny within this self-referential setting. The latter, or at least some of them, apparently feel entitled to register totally unfounded, intimidating and derisory remarks like “…a new account. Possibly suspicious.” that would be worthy of thought police, to redefine the English language so as to comply with their notions of “Wikipedia usage” and “good practice”, and to overtly express their hostility to anything that might be viewed as “expert knowledge”. Users hiding behind anonymous pseudonyms casting aspersions on the integrity of highly respected, well-known scientists, who have no other motive than to set the record straight regarding scientific content. The same users reorganizing the material in arbitrary tendentious ways, to suit their tastes, deleting legitimate contributions, hiding them in boxes, transferring them to other pages, and reordering so as to lose all logic or sense in the sequence of contributions and edits; in short, creating an anarchic circus, all within view of these “Administrators”, who do nothing to intervene.

    Is this science fiction, fantasy, an “other-world” nightmare or reality.? What is Wikipedia all about? The tyranny of the ignorant? I am very curious what all the threatening remarks, gratuitous insults and assaults by the uneducated upon the integrity of the knowledgeable leads up to. Is this a serious process, or one in which a small number of Wikipedia “insiders” act out fantasies of power and importance, while those who, in the real world, are highly qualified scientists and professionals devoted to advancing our actual state knowledge, are silenced by threats, intimidation, and manipulative tactics, while administrators who believe that “expertise” is irrelevant, do nothing to intervene? Is it that only Wikipedia experience and status has any importance in this environment?

    I have a feeling the outcome of this debate will have more significance for Wikipedia than merely whether this poor article is kept or deleted. If the questionably empowered class of “Administrators” turns out to be the only real decision makers, wielding the power to overrule all others, then all depends on them. If they choose to ignore the advice of those who are best placed to provide expert opinion on the substance of the article in question, and decide simply according to their own notions, even though they have no knowledge, but prefer to heed the “all-inclusive” principle, or the views of other users who are equally ignorant of the subject, the outcome is meaningless, and the implication for the reliability of Wikipedia as a source of knowledge is clear.

    Having said this, I expect to receive a barrage of attacks, threats, intimidating remarks, citations for violations of rules, aspersions cast on my character, integrity, competence, etc. from those seasoned “insiders” who feel insulted or threatened by these self-evident remarks. But are there also those who believe in the value of Wikipedia and hold another view? Are there enough of those who do have an adequate respect for knowledge, qualifications, real-word competence and, simply, the truth, who have a say in how Wikipedia is run and decisions are made to tilt the balance? I am curious to see who actually holds sway in this strange setting, that claims to represent “the masses” and knowledge simultaneously. R_Physicist (talk) 08:14, 23 March 2008 (UTC)

  20. I work at Britannica , and just for the record, it’s published online and revised continuously. Errors can be fixed and changes made in minutes. When notable peope die, for example, their dates of death are uploaded right away.

    I am coming here, of course, when this thread is very old and this discussion has long ago run its course, but if thinks Wikipedia fixes errors and Britannica doesn’t, that’s mistaken.