Our algorithms, ourselves


An earlier version of this essay appeared last year, under the headline “The Manipulators,” in the Los Angeles Review of Books.

Since the launch of Netscape and Yahoo twenty years ago, the story of the internet has been one of new companies and new products, a story shaped largely by the interests of entrepreneurs and venture capitalists. The plot has been linear; the pace, relentless. In 1995 came Amazon and Craigslist; in 1997, Google and Netflix; in 1999, Napster and Blogger; in 2001, iTunes; in 2003, MySpace; in 2004, Facebook; in 2005, YouTube; in 2006, Twitter; in 2007, the iPhone and the Kindle; in 2008, Airbnb; in 2010, Instagram and Uber; in 2011, Snapchat; in 2012, Coursera; in 2013, Tinder. It has been a carnival ride, and we, the public, have been the giddy passengers.

The story may be changing now. Though the current remains swift, eddies are appearing in the stream. Last year, the big news about the net came not in the form of buzzy startups or cool gadgets, but in the shape of two dry, arcane documents. One was a scientific paper describing an experiment in which researchers attempted to alter the moods of Facebook users by secretly manipulating the messages they saw. The other was a ruling by the European Union’s highest court granting citizens the right to have outdated or inaccurate information about them erased from Google and other search engines. Both documents provoked consternation, anger, and argument. Both raised important, complicated issues without resolving them. Arriving in the wake of Edward Snowden’s revelations about the NSA’s online spying operation, both seemed to herald, in very different ways, a new stage in the net’s history — one in which the public will be called upon to guide the technology, rather than the other way around. We may look back on 2014 as the year the internet began to grow up.

* * *

The Facebook study seemed fated to stir up controversy. Its title read like a bulletin from a dystopian future: Experimental Evidence of Massive-Scale Emotional Contagion through Social Networks. But when, on June 2, 2014, the article first appeared on the website of the Proceedings of the National Academy of Sciences (PNAS), it drew little notice or comment. It sank quietly into the vast swamp of academic publishing. That changed abruptly three weeks later, on June 26, when technology reporter Aviva Rutkin posted a brief account of the study on the website of New Scientist magazine. She noted that the research had been run by a Facebook employee, a social psychologist named Adam Kramer who worked in the firm’s large Data Science unit, and that it had involved more than half a million members of the social network. Smelling a scandal, other journalists rushed to the PNAS site to give the paper a read. They discovered that Facebook had not bothered to inform its members about their participation in the experiment, much less ask their consent.

Outrage ensued, as the story pinballed through the media. “If you were still unsure how much contempt Facebook has for its users,” declared the technology news site PandoDaily, “this will make everything hideously clear.” A New York Times writer accused Facebook of treating people like “lab rats,” while The Washington Post, in an editorial, criticized the study for “cross[ing] an ethical line.” US Senator Mark Warner called on the Federal Trade Commission to investigate the matter, and at least two European governments opened probes. The response from social media was furious. “Get off Facebook,” tweeted Erin Kissane, an editor at a software site. “If you work there, quit. They’re fucking awful.” Writing on Google Plus, the privacy activist Lauren Weinstein wondered whether Facebook “KILLED anyone with their emotion manipulation stunt.”

The ethical concerns were justified. Although Facebook, as a private company, is not bound by the informed-consent guidelines of universities and government agencies, its decision to carry out psychological research on people without telling them was at best rash and at worst reprehensible. It violated the US Department of Health & Human Services’ policy for the protection of human research subjects (known as the “Common Rule”) as well as the ethics code of the American Psychological Association. Making the transgression all the more inexcusable was the company’s failure to exclude minors from the test group. The fact that the manipulation of information was carried out by the researchers’ computers rather than by the researchers themselves — a detail that Facebook offered in its defense — was beside the point. As University of Maryland law professor James Grimmelmann observed, psychological manipulation remains psychological manipulation “even when it’s carried out automatically.”

Still, the intensity of the reaction seemed incommensurate with its object. Once you got past the dubious ethics and the alarming title, the study turned out to be a meager piece of work. Earlier psychological research had suggested that moods, like sneezes, could be contagious. If you hang out with sad people, you’ll probably end up feeling a little blue yourself. Kramer and his collaborators (the paper was coauthored by two Cornell scientists) wanted to see if such emotional contagion might also be spread through online social networks. During a week in January 2012, they programmed Facebook’s News Feed algorithm — the program that selects which messages to funnel onto a member’s home page and which to omit — to make slight adjustments in the “emotional content” of the feeds delivered to a random sample of members. One group of test subjects saw a slightly higher number of “positive” messages than normal, while another group saw slightly more “negative” messages. To categorize messages as positive or negative, the researchers used a standard text-analysis program, called Linguistic Inquiry and Word Count, that spots words expressing emotions in written works. They then evaluated each subject’s subsequent Facebook posts to see whether the emotional content of the messages had been influenced by the alterations in the News Feed.

The researchers did discover an influence. People exposed to more negative words went on to use more negative words than would have been expected, while people exposed to more positive words used more of the same — but the effect was vanishingly small, measurable only in a tiny fraction of a percentage point. If the effect had been any more trifling, it would have been undetectable. As Kramer later explained, in a contrite Facebook post, “the actual impact on people in the experiment was the minimal amount to statistically detect it — the result was that people produced an average of one fewer emotional word, per thousand words, over the following week.” As contagions go, that’s a pretty feeble one. It seems unlikely that any participant in the study suffered the slightest bit of harm. As Kramer admitted, “the research benefits of the paper may not have justified all of this anxiety.”

* * *

What was most worrisome about the study lay not in its design or its findings, but in its ordinariness. As Facebook made clear in its official responses to the controversy, Kramer’s experiment was just the visible tip of an enormous and otherwise well-concealed iceberg. In an email to the press, a company spokesperson said the PNAS study was part of the continuing research Facebook does to understand “how people respond to different types of content, whether it’s positive or negative in tone, news from friends, or information from pages they follow.” Sheryl Sandberg, the company’s chief operating officer, reinforced that message in a press conference: “This was part of ongoing research companies do to test different products, and that was what it was.” The only problem with the study, she went on, was that it “was poorly communicated.” A former member of Facebook’s Data Science unit, Andrew Ledvina, told The Wall Street Journal that the in-house lab operates with few restrictions. “Anyone on that team could run a test,” he said. “They’re always trying to alter people’s behavior.”

Businesses have been trying to alter people’s behavior for as long as businesses have been around. Marketing departments and advertising agencies are experts at formulating, testing, and disseminating images and words that provoke emotional responses, shape attitudes, and trigger purchases. From the apple-cheeked Ivory Snow baby to the chiseled Marlboro man to the moon-eyed Cialis couple, we have for decades been bombarded by messages intended to influence our feelings. The Facebook study is part of that venerable tradition, a fact that the few intrepid folks who came forward to defend the experiment often emphasized. “We are being manipulated without our knowledge or consent all the time — by advertisers, marketers, politicians — and we all just accept that as a part of life,” argued Duncan Watts, a researcher who studies online behavior for Microsoft. “Marketing as a whole is designed to manipulate emotions,” said Nicholas Christakis, a Yale sociologist who has used Facebook data in his own research.

The “everybody does it” excuse is rarely convincing, and in this case it’s specious. Thanks to the reach of the internet, the kind of psychological and behavioral testing that Facebook does is different in both scale and kind from the market research of the past. Never before have companies been able to gather such intimate data on people’s thoughts and lives, and never before have they been able to so broadly and minutely shape the information that people see. If the Post Office had ever disclosed that it was reading everyone’s mail and choosing which letters to deliver and which not to, people would have been apoplectic, yet that is essentially what Facebook has been doing. In formulating the algorithms that run its News Feed and other media services, it molds what its billion-plus members see and then tracks their responses. It uses the resulting data to further adjust its algorithms, and the cycle of experiments begins anew. Because the algorithms are secret, people have no idea which of their buttons are being pushed — or when, or why.

Facebook is hardly unique. Pretty much every internet company performs extensive experiments on its users, trying to figure out, among other things, how to increase the time they spend using an app or a site, or how to increase the likelihood they will click on an advertisement or a link. Much of this research is innocuous. Google once tested 41 different shades of blue on a web-page toolbar to determine which color would produce the most clicks. But not all of it is innocuous. You don’t have to be paranoid to conclude that the PNAS test was far from the most manipulative of the experiments going on behind the scenes at internet companies. You only have to be sensible.

That became clear, in the midst of the Facebook controversy, when another popular web operation, the matchmaking site OKCupid, disclosed that it routinely conducts psychological research in which it doctors the information it provides to its love-seeking clientele. It has, for instance, done experiments in which it altered people’s profile pictures and descriptions. It has even circulated false “compatibility ratings” to see what happens when ill-matched strangers believe they’ll be well-matched couples. OKCupid was not exactly contrite about abusing its customers’ trust. “Guess what, everybody,” blogged the company’s cofounder, Christian Rudder: “if you use the internet, you’re the subject of hundreds of experiments at any given time, on every site. That’s how websites work.”

The problem with manipulation is that it hems us in. It weakens our volition and circumscribes our will, substituting the intentions of others for our own. When efforts to manipulate us are hidden from us, the likelihood that we’ll fall victim to them grows. Other than the dim or gullible, most people in the past understood that corporate marketing tactics, from advertisements to celebrity endorsements to package designs, were intended to be manipulative. As long as those tactics were visible, we could evaluate them and resist them — maybe even make jokes about them. That’s no longer the case, at least not when it comes to online services. When companies wield moment-by-moment control over the flow of personal correspondence and other intimate or sensitive information, tweaking it in ways that are concealed from us, we’re unable to discern, much less evaluate, the manipulative acts. We find ourselves inside a black box.

* * *

Put yourself in the shoes of Mario Costeja González. In 1998, the Spaniard ran into a little financial difficulty. He had defaulted on a debt, and to pay it off he was forced to put some real estate up for auction. The sale was duly noted in the venerable Barcelona newspaper La Vanguardia. The matter settled, Costeja González went on with his life as a graphologist, an interpreter of handwriting. The debt and the auction, as well as the 36-word press notice about them, faded from public memory. The bruise healed.

But then, in 2009, nearly a dozen years later, the episode sprang back to life. La Vanguardia put its archives online, Google’s web-crawling “bot” sniffed out the old article about the auction, the article was automatically added to the search engine’s database, and a link to it began popping into prominent view whenever someone in Spain did a search on Costeja’s name. Costeja was dismayed. It seemed unfair to have his reputation sullied by an out-of-context report on an old personal problem that had long ago been resolved. Presented without explanation in search results, the article made him look like a deadbeat. He felt, as he would later explain, that his dignity was at stake.

Costeja lodged a formal complaint with the Spanish government’s data-protection agency. He asked the regulators to order La Vanguardia to remove the article from its website and to order Google to stop linking to the notice in its search results. The agency refused to act on the newspaper request, citing the legality of the article’s original publication, but it agreed with Costeja about the unfairness of the Google listing. It told the company to remove the auction story from its results. Appalled, Google appealed the decision, arguing that in listing the story it was merely highlighting information published elsewhere. The dispute quickly made its way to the Court of Justice of the European Union in Luxembourg, where it became known as the “right to be forgotten” case. On May 13 of 2014, the high court issued its decision. Siding with Costeja and the Spanish data-protection agency, the justices ruled that Google was obligated to obey the order and remove the La Vanguardia piece from its search results. The upshot: European citizens suddenly had the right to get certain unflattering information about them deleted from search engines.

Most Americans, and quite a few Europeans, were flabbergasted by the decision. They saw it not only as unworkable (how can a global search engine processing some six billion searches a day be expected to evaluate the personal grouses of individuals?), but also as a threat to the free flow of information online. Many accused the court of licensing censorship or even of creating “memory holes” in history.

But the heated reactions, however understandable, were off the mark. They reflected a misinterpretation of the decision. The court had not established a “right to be forgotten.” That essentially metaphorical phrase is mentioned only in passing in the ruling, and its attachment to the case has proven a distraction. In an open society, where freedom of thought and speech are protected, where people’s thoughts and words are their own, a right to be forgotten is as untenable as a right to be remembered. What the case was really about was an individual’s right not to be systematically misrepresented. But even putting the decision into those more modest terms is misleading. It implies that the court’s ruling was broader than it actually was.

The essential issue the justices were called upon to address was how, if at all, a 1995 European Union policy on the processing of personal data, the so-called Data Protection Directive, applied to companies that, like Google, engage in the large-scale aggregation of information online. The directive had been enacted to ease the cross-border exchange of data, while also establishing privacy and other protections for citizens. “Whereas data-processing systems are designed to serve man,” the policy reads, “they must, whatever the nationality or residence of natural persons, respect their fundamental rights and freedoms, notably the right to privacy, and contribute to economic and social progress, trade expansion and the well-being of individuals.” To shield people from abusive or unjust treatment, the directive imposed strict regulations on businesses and other organizations that act as “controllers” of the processing of personal information. It required, among other things, that any data disseminated by such controllers be not only accurate and up-to-date, but fair, relevant, and “not excessive in relation to the purposes for which they are collected and/or further processed.” What the directive left unclear was whether companies that aggregated information produced by others — companies like Google and Facebook — fell into the category of controllers. That was what the court had to decide.

Search engines, social networks, and other online aggregators have always presented themselves as playing a neutral and essentially passive role when it comes to the processing of information. They’re not creating the content they distribute — that’s done by publishers in the case of search engines, or by individual members in the case of social networks. Rather, they’re simply gathering the information and arranging it in a useful form. This view, tirelessly promoted by Google — and used by the company as a defense in the Costeja case — has been embraced by much of the public. It has become the default view. When Wikipedia cofounder Jimmy Wales, in criticizing the European court’s decision, said, “Google just helps us to find the things that are online,” he was not only mouthing the company line; he was expressing the popular conception of information aggregators.

The court took a different view. Online aggregation is not a neutral act, it ruled, but a transformative one. In collecting, organizing, and ranking information, a search engine is creating something new: a distinctive and influential product that reflects the company’s own editorial intentions and judgments, as expressed through its information-processing algorithms. “The processing of personal data carried out in the context of the activity of a search engine can be distinguished from and is additional to that carried out by publishers of websites,” the justices wrote. “Inasmuch as the activity of a search engine is therefore liable to affect significantly […] the fundamental rights to privacy and to the protection of personal data, the operator of the search engine as the person determining the purposes and means of that activity must ensure, within the framework of its responsibilities, powers and capabilities, that the activity meets the requirements of [the Data Protection Directive] in order that the guarantees laid down by the directive may have full effect.”

The European court did not pass judgment on the guarantees established by the Data Protection Directive, nor on any other existing or prospective laws or policies pertaining to the processing of personal information. It did not tell society how to assess or regulate the activities of aggregators like Google or Facebook. It did not even offer an opinion as to the process companies or lawmakers should use in deciding which personal information warranted exclusion from search results — an undertaking every bit as thorny as it’s been made out to be. What the justices did, with perspicuity and prudence, was provide us with a way to think rationally about the algorithmic manipulation of digital information and the social responsibilities it entails. The interests of a powerful international company like Google, a company that provides an indispensable service to many people, do not automatically trump the interests of a lone individual. When it comes to the operation of search engines and other information aggregators, fairness is at least as important as expedience.

Ten months have passed since the court’s ruling, and we now know that the judgment is not going to “break the internet,” as was widely predicted when it was issued. The web still works. Google has a process in place for adjudicating requests for the removal of personal information — it accepts about forty percent of them — just as it has a process in place for adjudicating requests to remove copyrighted information. Last month, Google’s Advisory Council on the Right to Be Forgotten issued a report that put the ruling and the company’s response into context. “In fact,” the council wrote, “the Ruling does not establish a general Right to to Be Forgotten. Implementation of the Ruling does not have the effect of ‘forgetting’ information about a data subject. Instead, it requires Google to remove links returned in search results based on an individual’s name when those results are ‘inadequate, irrelevant or no longer relevant, or excessive.’ Google is not required to remove those results if there is an overriding public interest in them ‘for particular reasons, such as the role played by the data subject in public life.'” It is possible, in other words, to strike a reasonable balance between an individual’s interests, the interests of the public in finding information quickly, and the commercial interests of internet companies.

* * *

We have had a hard time thinking clearly about companies like Google and Facebook because we have never before had to deal with companies like Google and Facebook. They are something new in the world, and they don’t fit neatly into our existing legal and cultural templates. Because they operate at such unimaginable magnitude, carrying out millions of informational transactions every second, we’ve tended to think of them as vast, faceless, dispassionate computers — as information-processing machines that exist outside the realm of human intention and control. That’s a misperception, and a dangerous one.

Modern computers and computer networks enable human judgment to be automated, to be exercised on a vast scale and at a breathtaking pace. But it’s still human judgment. Algorithms are constructed by people, and they reflect the interests, biases, and flaws of their makers. As Google’s founders themselves pointed out many years ago, an information aggregator operated for commercial gain will inevitably be compromised and should always be treated with suspicion. That is certainly true of a search engine that mediates our intellectual explorations; it is even more true of a social network that mediates our personal associations and conversations.

Because algorithms impose on us the interests and biases of others, we have a right and an obligation to carefully examine and, when appropriate, judiciously regulate those algorithms. We have a right and an obligation to understand how we, and our information, are being manipulated. To ignore that responsibility, or to shirk it because it raises hard problems, is to grant a small group of people — the kind of people who carried out the Facebook and OKCupid experiments — the power to play with us at their whim.

Image: Emily Hummel.