« The real Web 2.0 | Main | Thanks, Tim and Jimbo! »

Teaching computers to see

April 09, 2007

The "human operated nodes" of Amazon's Mechanical Turk may soon have competition, at least when it comes to identifying objects in photographs. Researchers at the University of California at San Diego are making progress in developing a machine-learning approach that enables computers to automatically interpret photographs and other images, reports Technology Review.

As described in a paper that appears in the latest issue of IEEE Transactions on Pattern Analysis and Machine Intelligence, the system, called Supervised Multiclass Labeling (SML), combines semantic, or text, labels that describe an image's contents with a statistical analysis of the image. A computer is first trained to recognize an object - a tree, say - by being shown many images containing the object that have been labeled, or tagged, with the description "tree" by people. The computer learns to make an association between the tag and a statistical, pixel-level analysis of the image. It learns, in effect, to spot a tree, regardless of where the tree happens to appear in a given image.

Having been seeded with intelligence, the computer can then begin to interpret images on its own, applying probabilities to what it "sees" (eg, "there is an 80% probability that this picture contains a tree"). As it interprets more and more images, the computer becomes smarter and the tags it applies to images more accurate. The computer-generated tags can then be used as the basis for an automated image-search service.

As shown in the example below, the labels a trained computer applies to images bear a disconcertingly strong resemblance to the tags that people give:

03-07Vasconcelos-compare.jpg

In fact, according to the researchers - Nuno Vasconcelos, Gustavo Carneiro, and Antoni Chan of UCSD and Pedro Moreno of Google - the tags generated by the machines can be more precise than those assigned by people because people tend to be less rigorous and more subjective than computers. People's tags contain a lot of noise, as do the searches that are based on them. The authors write:

When compared with previous approaches, SML has the advantage of combining classification and retrieval optimality with 1) scalability in database and vocabulary sizes, 2) ability to produce a natural ordering for semantic labels at annotation time, and 3) implementation with algorithms that are conceptually simple and do not require prior semantic image segmentation. We have also presented the results of an extensive experimental evaluation, under various previously proposed experimental protocols, which demonstrated superior performance with respect to a sizable number of state-of-the-art methods, for both semantic labeling and retrieval.

Tests of the SML system at Google "indicate that the system can be used on large image collections," according to Chan. In a brief video, Vasconcelos explains the system's workings and says that the technique can be applied to other machine-learning challenges, such as teaching computers to understand sounds or read text. Give computers a little intelligence, and there's just no stopping them.

Comments

Ahem!

Care to revisit this post from last June now? To what end, indeed?

Posted by: Scott Wilson [TypeKey Profile Page] at April 9, 2007 12:42 PM

Post a comment

Thanks for signing in, . Now you can comment. (sign out)

(If you haven't left a comment here before, you may need to be approved by the site owner before your comment will appear. Until then, it won't appear on the entry. Thanks for waiting.)


Remember me?


carrshot5.jpg Subscribe to Rough Type

Now in paperback:
shallowspbk2.jpg Pulitzer Prize Finalist

"Riveting" -San Francisco Chronicle

"Rewarding" -Financial Times

"Revelatory" -Booklist

Order from Amazon

Visit The Shallows site

The Cloud, demystified: bigswitchcover2thumb.jpg "Future Shock for the web-apps era" -Fast Company

"Ominously prescient" -Kirkus Reviews

"Riveting stuff" -New York Post

Order from Amazon

Visit Big Switch site

Greatest hits

The amorality of Web 2.0

Twitter dot dash

The engine of serendipity

The editor and the crowd

Avatars consume as much electricity as Brazilians

The great unread

The love song of J. Alfred Prufrock's avatar

Flight of the wingless coffin fly

Sharecropping the long tail

The social graft

Steve's devices

MySpace's vacancy

The dingo stole my avatar

Excuse me while I blog

Other writing

Is Google Making Us Stupid?

The ignorance of crowds

The recorded life

The end of corporate computing

IT doesn't matter

The parasitic blogger

The sixth force

Hypermediation

More

Nick's first book: Order from Amazon

Visit book site

Rough Type is:

Written and published by
Nicholas Carr

Designed by

JavaScript must be enabled to display this email address.

What?