December 23, 2006

In one corner: Larry, Sergey and the Robot Algorithm. In the other corner: Wales, Bezos and the Human Masses. The prize: Control of the Internet.

Am I dreaming? Is this just one more eggnog hallucination? No, it's the truth.

Jimbo is already rolling out the trash talk: "Google is very good at many types of search, but in many instances it produces nothing but spam and useless crap."

Hot diggity dog. I cannot wait for 2007.

OK, I'll shut up now. Really.

Update: Jimmy Wales writes: "Amazon has nothing to do with this project. They are a valued investor in Wikia, but people are realllllly speculating beyond the facts. This has nothing to do with A9, Amazon, etc." The proposed new engine, according to Wales, will be based on the open source search technologies Lucene and Nutch, not on Amazon technologies.


That is really interesting, for a lot of reasons.

Of course, he's just created even more of an incentive to spam Wikipedia.

It'd be nice if somebody could figure out where's-the-money in these new projects (we know where there's money-in, but what's the model for the exit?)

Posted by: Seth Finkelstein [TypeKey Profile Page] at December 23, 2006 04:50 PM

The Wikia Search project homepage explains, Amazon has nothing to do with this.

And, to respond to Seth, what does this have to do with the evil of spamming Wikipedia exactly?

Posted by: Jimbo Wales [TypeKey Profile Page] at December 23, 2006 07:05 PM

No link in the above.

I got the impression from the article that the proposed search engine was going to use Wikipedia pages as the initial basis for link-quality, from these paragraphs:

"Google searches are conducted using an algorithm that calculates how many other websites are linked to a certain site, which in turn gives the material found by the search a ranking. Therefore, the first result in any Google search is the website that has the most links pointing to it.

Wikipedia is an encyclopaedia written by thousands of contributors from around the world, known as “Wikipedians”, using free open-source software.

Mr Wales aims to exploit the same network of followers and the same type of free software to create his search engine. "

Posted by: Seth Finkelstein [TypeKey Profile Page] at December 23, 2006 07:38 PM

Posted by: Seth Finkelstein [TypeKey Profile Page] at December 23, 2006 07:38 PM

In general, I think this is a great idea!

Trash-talk or not, I have to agree with Jimbo that Google results can sometimes be heavily spam-oriented, especially for domains saturated with SEO-savvy web marketers. Using inbound links as a proxy for Relevance and Authority is no longer as effective as it was five years ago. Humans can be far more effective at judging the meaning and "commercial-ness" of web pages.

The model of leveraging the collective wisdom of a large group of people for solving very difficult problems has been proven to be effective in numerous cases - GoldCorp, Innocentive, the Netflix prize (currently ongoing), even Wikipedia itself.

Having said that, it seems to me that Wikipedia suffers from two issues (similar to Digg):
a. Attempts to manipulate content for marketing leverage
b. The opinion of the many is really expressed by the actions of a few
[See Eric Goldman's posts here and here for reference.]

It seems to me these two issues will only worsen with an "open content/open source" search engine - the incentive for marketers to try to game the system will only deepen, and the amount of effort needed from citizen volunteers will increase. How does Jimmy Wales plan to address these issues? Or am I way off-base here?

Posted by: NitinK [TypeKey Profile Page] at December 23, 2006 08:30 PM

I agree that dealing with spamming, gaming, etc., will be difficult. But if you set that challenge aside for the moment, it seems to me that something like this is in theory do-able (though I don't see why or how it would be organized like a wiki). Search has three basic components, so far as I can see:

1. Scan web content (crawling).

2. Relate web pages to search terms (indexing).

3. Evaluate web pages according to some definition of quality (ranking).

1 and 2 are fairly straightforward tasks at this point (I would think). That leaves 3, which would be where Wales would need to amass a wikipedian horde to assign a quality score (simple 1 - 10 scale would probably do) to pages. So you'd get a PageRank-like score by averaging ratings assigned by people instead of analyzing links. Rating pages is a perfect task for crowds to perform, as NitinK notes. So it comes down to getting a huge, huge number of people to participate (and to keep participating over the long haul). That's very hard but not necessarily impossible, though the decision to organize this as a for-profit rather than a non-profit operation may kill it. Will the wikipedian horde contribute to an effort aimed at lining Wales's (or somebody's) pockets? That's a tough sell.

Anyway, Wales is right that Google's results are getting worse rather than better, so there is vulnerability there.

By the way, Seth, didn't I predict something like this?

Posted by: Nick Carr [TypeKey Profile Page] at December 23, 2006 10:10 PM

Funny. Google’s results for many (most?) terms already include links to Wikipedia — hardly the ‘useless crap’ he claims. And Googling tampa hotels (as he challenges us to do) yields a link to Google Local — a very helpful result.

Posted by: Sid Steward [TypeKey Profile Page] at December 23, 2006 10:34 PM

Nick, I don't take the view that it can't work, but rather that there are some tough problem (as you've noted) that have to be addressed for it to work. I believe Google already does some investigation for feedback of its results, in sampling occasionally what people click-through in results.

I find Wikipedia fascinating in part for the hodgepodge of ways it has managed to solve the problem of getting material (dream-selling, intellectual "extortion", plagiarism, and more), combined with the really elaborate ideological defenses it's evolved to deflect criticism of its flaws. It's all not a combination one would be able to foresee working in advance. People often mystify this, but it's not that the elements are unknown, it's that making a going concern out of them all is very hard.

But I could see some approaches it would be interesting to at least try for a search system, especially if someone else is paying for it all with venture capital money. Heck, if I didn't have such baggage as a sometimes-critic of Wikipedia, and similarly vis-a-vis the Harvard Berkman Center, I'd make a proposal to Wales.

Posted by: Seth Finkelstein [TypeKey Profile Page] at December 24, 2006 06:10 AM

