Google preparing to police web

Increasingly worried by the use of conventional web sites to distribute the viruses that turn innocent PCs into botnet “zombies,” Google appears to be readying a plan to police the web. If the plan goes forward, Google will use new software to automatically identify compromised web pages in its database and label them as “potentially harmful” in its search results. Because being labeled as suspicious by Google could devastate a site’s traffic, the move would raise the security stakes for site owners dramatically.

Google security specialist Niels Provos tells New Scientist, “The firewall is dead.” He’s referring to a shift in the way botnet infections are spread – and it’s this shift that’s making Google particularly nervous. Botnet viruses used to be distributed mainly through email attachments or computer worms, both of which could be blocked by firewalls or sniffed out by antivirus software. Over the past year, however, the operators of botnets have shifted to using regular web sites to distribute their malware. Reports New Scientist:

As users have grown wary of email attachments and installed firewalls and anti-virus software, however, the bad guys have shifted their attentions to websites in a bid to find more victims … Even an ordinary website can be risky. At a meeting on botnets held last month in Cambridge, Massachusetts, Provos warned that many web users are becoming the victims of “drive-by” downloads of bots from innocent websites corrupted to exploit browser vulnerabilities. As firewalls allow free passage to code or programs downloaded through the browser, the bot is able to install itself on the PC. Anti-virus software kicks in at this point, but some bots avoid detection by immediately disabling it.

A recent Google study, led by Provos, discovered “around 450,000 web pages that launched drive-by downloads of malicious programs. Another 700,000 pages launched downloads of suspicious software. More than two-thirds of the malicious programs identified were those that infected computers with bot software or programs that collected data on banking transactions and emailed it to a temporary email account.”

Anything that makes people wary of visiting web sites or clicking on links stands as a big threat to Google’s business. It’s not surprising, then, that the company has a unit investigating the dissemination of malware through the web. The paper that Provos and four of his Google colleagues have written on the subject, The Ghost in the Browser, explains how Google is preparing to respond to the threat by incorporating an automated security analysis into its routine spidering and indexing of sites:

To address this problem and to protect users from being infected while browsing the web, we have started an effort to identify all web pages on the Internet that could potentially be malicious. Google already crawls billions of web pages on the Internet. We apply simple heuristics to the crawled pages repository to determine which pages attempt to exploit web browsers. The heuristics reduce the number of URLs we subject to further processing significantly. The pages classified as potentially malicious are used as input to instrumented browser instances running under virtual machines. Our goal is to observe the malware behavior when visiting malicious URLs and discover if malware binaries are being downloaded as a result of visiting a URL. Web sites that have been identified as malicious, using our verification procedure, are labeled as potentially harmful when returned as a search result. Marking pages with a label allows users to avoid exposure to such sites and results in fewer users being infected.

The authors note that Web 2.0 trends, including the incorporation of user-generated content and third-party widgets into sites, raise the risk of innocent sites being exploited by botnet masters. For example, they write:

Many websites feature web applications that allow visitors to contribute their own content. This is often in the form of blogs, profiles, comments, or reviews. Web applications usually support only a limited subset of the hypertext markup language, but in some cases poor sanitization or checking allows users to post or insert arbitrary HTML into web pages. If the inserted HTML contains an exploit, all visitors of the posts or profile pages are exposed to the attack. Taking advantage of poor sanitization becomes even easier if the site permits anonymous posts, since all visitors are allowed to insert arbitrary HTML.

The paper goes into considerable detail about the system Google is building for identifying suspicious pages. Given the stakes involved, site owners and designers may want to give it a careful read.

UPDATE: As noted in a comment to this post by Google’s Matt Cutts, the company’s anti-malware program is actually already under way.