Monthly Archives: February 2013

SEO for scholarship

masks

The way the creation of the Google search engine was inspired by the traditional method for measuring the value of scholarly works, with links becoming an analogue to citations, has become one of the web’s great origin myths. And the way the new search engine set off a rush to game the system, weakening the usefulness of links as markers of value, has become a lesson in the drawbacks of what might be called the automation of judgment. Every online currency inspires its own debasement, to one degree or another.

Now, in a perverse twist, the circle is completing itself, as Google provides web tools — Google Scholar Citations and Google Scholar Metrics — for tracking and measuring the value of academic articles and other scholarly works. The new tools offer a lot of benefits, but they also provide both the temptation and the means to game the scholarly citation system. Attempts to manipulate citations aren’t new, but now it’s possible to take the shenanigans to web scale, to bring black-hat techniques of search engine optimization to the ivory tower. Nat Torkington points to a 2012 paper, “Manipulating Google Scholar Citations and Google Scholar Metrics: Simple, Easy and Tempting,” in which three Spanish scholars describe how they used fake documents from a fake researcher to skew Google Scholar rankings and measures.

Over the course of a few hours, the researchers cobbled together six documents by cutting-and-pasting text and figures from other works. All the fake documents were attributed to the same, fake author. They included in each document citations to 129 other papers that were authored or coauthored by at least one member of the “EC3” research group to which they belong. They translated the documents into English using Google Translate. Then they created, within the University of Granada’s domain, a web page citing each of the six fake papers and including links to the full texts. At that point, they sat back and let Google take over:

Google indexed these documents nearly a month after they were uploaded, on 12 May, 2012. At that time the members of the research group [cited in the fake documents] along with the three co-authors of this paper, received an alert from GS Citations pointing out that [the fake scholar] had cited their Works. The citation explosion was thrilling, especially in the case of the youngest researchers where their citation rates were multiplied by six, notoriously increasing in size their profiles. …

The results of our experiment show how easy and simple it is to modify the citation profiles offered by Google. This exposes the dangers it may lead to in the hands of editors and researchers tempted to do “citations engineering.”

When the experiment was over, the researchers removed all trace of their work from the web, though the fake papers, and the fake author, lived on in the Google Scholar database. They conclude:

Even if we have previously argued in favour of Google Scholar as a research evaluation tool minimizing its biases and technical and methodological issues, in this paper we alert the research community over how easy it is to manipulate data and bibliometric indicators. Switching from a controlled environment where the production, dissemination and evaluation of scientific knowledge is monitored (even accepting all the shortcomings of peer review) to a environment that lacks any kind of control rather than researchers’ consciousness is a radical novelty that encounters many dangers. … [The Google tools] do not only awaken the Narcissus within researchers, but can unleash malpractices aiming at manipulating the orientation and meaning of numbers as a consequence of the ever growing pressure for publishing fuelled by the research evaluation exercises of each country.

Google, of course, only provides the temptation. It doesn’t force anyone to give in to it. Maybe, in the end, we’ll come to discover that Google was put on this earth to test our ethical mettle. That would give a deeper resonance to the origin myth.

Photo by Carlos Castillo.

Hot hands, cold data

ray

David Brooks, in his Times column today, looks at the rise of “data-ism” — the rapidly spreading belief “that everything that can be measured should be measured; that data is a transparent and reliable lens that allows us to filter out emotionalism and ideology; that data will help us do remarkable things.” Brooks is wary of the worship of number-crunching. He worries, wisely, that as our stores of digital data swell we’ll “get carried away in our desire to reduce everything to the quantifiable.” But he grants that there are some obvious benefits to statistical analysis. For one thing, “it’s really good at exposing when our intuitive view of reality is wrong.”

As his prime example, he points to a perception shared by pretty much every sports fan and certainly every basketball fan: that sometimes players get in the zone and can do no wrong. They’re on fire. They’re on a tear. They’re HOT. Not true, says Brooks. The hot streak is a fiction, a figment born of a flaw in our mental makeup. Its existence was disproven, he says, in a famous paper from the 1980s:

Every person who plays basketball and nearly every person who watches it believes that players go through hot streaks, when they are in the groove, and cold streaks, when they are just not feeling it. But Thomas Gilovich, Amos Tversky and Robert Vallone found that a player who has made six consecutive foul shots has the same chance of making his seventh as if he had missed the previous six foul shots.

When a player has hit six shots in a row, we imagine that he has tapped into some elevated performance groove. In fact, it’s just random statistical noise, like having a coin flip come up tails repeatedly. Each individual shot’s success rate will still devolve back to the player’s career shooting percentage.

My own intuition howled in agony as I read this. I have, after all, watched Ray Allen go on clutch three-pointer sprees in the fourth quarter, effortlessly draining one bomb after another. Unconscious. And, sadly, I have seen the opposite: Ray Allen throwing bricks from the same spots in the same situations. But I’m no fool. I’m willing to accept the hard, spoil-sport facts. My intuition has to bow down to the stats.

Or does it? It turns out that this hot streak issue is not as clear-cut as Brooks makes it out to be. The data’s slippery.

The existence or nonexistence of the hot hand in basketball, and elsewhere, continues to be debated in statistical and economic circles, and there’s evidence to support both sides of the debate. Several studies have questioned the reliability of the Gilovich paper’s conclusions. This one, for instance, suggests that the sample size was too small, that the original researchers’ “statistical tests were of such low power that they could not have been expected to find a Hot Hand even if it were present.” And a series of recently published studies — this one, this one, this one — have found at least some evidence of a hot hand effect among basketball players. The most recent paper to call into question the Gilovich conclusions was published last year in The American Statistician. Written by Daniel Stone, a Bowdoin College economics professor, it presents evidence that “the widespread belief among players and fans in the hot hand is not necessarily a cognitive fallacy.”

After reading Brooks’s column, I sent an email to Professor Stone asking if he had any reaction to it and also asking whether the hot-hand question is in fact considered settled, as Brooks suggests. He soon wrote back. “I saw the Brooks article and cringed,” he said, “– as the answer to your question is no, it’s not settled. There is recent research showing there is a hot hand in basketball (Arkes 2010), and mine shows analysis may greatly underestimate the effect. Put those together and there could be major hot hand.” Stone did emphasize that fans often see a hot hand where none exists — “people are too quick to infer a player is hot based on limited data” — but that doesn’t mean that players don’t sometimes go on real streaks.

Stone also pointed me to a recent article he wrote with another researcher, Jeremy Arkes, that, in addition to showing how the hot hand remains a bone of contention, provides a quick explanation of why we should be cautious about accepting the received statistical wisdom. They conclude: “Our overall conclusion – based on the intuition, experience and judgment of millions of bball fans/players (that, of course, we only have a sense of), what’s been found and not found in the data (from bball and other sports), and our recent theoretical analysis—is that behavioral scientists have been too quick to conclude that there is no hot hand in bball, and in fact it’s likely that players do occasionally get hot, to varying degrees.”

After nearly 30 years of intensive analysis, the hot hand remains mysterious. Our flawed intuition may be seeing something—something real—that the data is missing. This ends up, I think, underscoring Brooks’s sense that we have to be wary about data-ism and its promises. A transparent lens can also be a warped lens.

AFTERTHOUGHT: By the way, isn’t it kind of asinine to look at the free throw line for evidence of a hot hand? Free throws are the hothouse flowers of basketball. You have to look at field goals.

Photo by Keith Allison.

A new dent in the universe

rodentride

The Rough Type headline of the day comes courtesy of TechCrunch:

Pet Boarding and ‘Dogbnb’ Startup Rover Raises $7M to Take On DogVacay

This bodes well for my new startup, RodentRyde, which will allow people to sell spare cycles on their hamster wheels. Here’s the pitch: “It’s Lyft meets DogVacay for micropets.”

Photo from Wikipedia.

Dancing to the same drum

marker

The Edge question this year was “What should we be worried about?” I was befuddled by that, as it implies that there may be something we shouldn’t be worried about. But I managed to write, anxiously, a short piece on a theme that comes up every so often on this blog: technology’s effect on our time sense. Here’s a bit from the beginning, slightly edited:

Human beings, like other animals, seem to have remarkably accurate internal clocks. Take away our wristwatches and our cell phones, take away all those glowing digital tickers that gaze out at us from the faces of our appliances, and we can still make pretty decent estimates about the length of passing minutes and hours. That faculty is easily warped, though. Our perception of time changes with our circumstances. “Our sense of time,” observed William James in his 1890 masterwork The Principles of Psychology, “seems subject to the law of contrast.”

In a 2009 article in the Philosophical Transactions of the Royal Society, the French psychologists Sylvie Droit-Volet and Sandrine Gil described what they call the paradox of time: “although humans are able to accurately estimate time as if they possess a specific mechanism that allows them to measure time,” they wrote, “their representations of time are easily distorted by the context.” Indeed, they continued, “our studies also suggest that these contextual variations of subjective time do not result from the incorrect functioning of the internal clock but, on the contrary, from the excellent ability of the internal clock to adapt to events in the environment.” Our immediate social milieu, in particular, influences the way we experience time. There’s evidence, Droit-Volet and Gill wrote, “that individuals match their time with that of others.” The “activity rhythm” of those around us alters our own perception of the passing of time.

I’m intrigued by this idea that our sense of time adapts to the “activity rhythm” of our social circumstances. The activity rhythm of an online social network seems very different from what people traditionally experienced in their lives. It’s not just that it’s a faster rhythm; it’s also a more insistent rhythm. There’s less variation — fewer slow passages — than you would have previously found in a person’s everyday experience, when conversation and other social interaction ebbed and flowed.

Of course, changes in society’s activity rhythm are nothing new. When people  moved from the country to the city, they had to adapt to a new pace. Still, having that rhythm mediated so intensively by a communication technology does seem pretty different.  Is there a psychological cost to this “unnatural” rhythm, this new and contagious setting for our internal clocks? For some, I expect there is. For others, maybe not.

Image from the Chris Marker film A Grin Without a Cat.

Worldstream of consciousness

Yale computer scientist David Gelernter sketches, on a napkin, the future of everything:

lifestream

I sketched almost the exact same thing on a napkin one Saturday night 35 years ago while listening to a Country Joe and the Fish album.

Gelernter also verbalizes the concept in a Wired piece:

By adding together every timestream on the net — including the private lifestreams that are just beginning to emerge — into a single flood of data, we get the worldstream: a way to picture the cybersphere as a whole. … Instead of today’s static web, information will flow constantly and steadily through the worldstream into the past. … What people really want is to tune in to information. Since many millions of separate lifestreams will exist in the cybersphere soon, our basic software will be the stream-browser: like today’s browsers, but designed to add, subtract, and navigate streams. … Stream-browsers will help us tune in to the information we want by implementing a type of custom-coffee blender: We’re offered thousands of different stream “flavors,” we choose the flavors we want, and the blender mixes our streams to order.

Executive summary:

Jamba Juice + Starbucks + SiriusXM = Future of Culture

Once you get past the mumbo-jumbo, this all sounds like old news. “Today’s static web”? The stream replaced the page as the web’s dominant metaphor a few years ago. Gelernter’s vision is the Zuckerbergian personal-timeline view of the web, in which every person sits at the center of his or her own little cyber-universe as swirls of custom-fit information stream in and then turn into “the past.” And it’s the Google Now “search without searching” vision of continuous, preemptive delivery of relevant info. “Finally, the web — soon to become the cybersphere — will no longer resemble a chaotic cobweb,” concludes Gelernter. “Instead, billions of users will spin their own tales, which will merge seamlessly into an ongoing, endless narrative” — all funneled through “the same interface.” It’s not so much post-web as anti-web. Imagine Whitman’s “Song of Myself” as a media production, with tracking and ads.

This post is an installment in Rough Type’s ongoing series “The Realtime Chronicles,” which began here.

What, no smartboards?

verses

At a time when public discussions of education seem dominated by technological considerations — Should we give kindergartners iPads, or should we wait until they enter first grade? Should we ban printed books from public schools by 2017 or by 2019? When will Tom Friedman write about MOOCs again? — it seems only fair, purely in the interest of balance, to allow a different voice to be heard. So here is Helen Vendler, the gifted poetry critic, describing the perfect grammar school:

I would propose, for the ultimate maintenance of the humanities and all other higher learning, an elementary-school curriculum that would make every ordinary child a proficient reader by the end of the fourth grade — not to pass a test, but rather to ensure progressive expansion of awareness. Other than mathematics, the curriculum of my ideal elementary school would be wholly occupied, all day, every day, with “reading” in its very fullest sense. Let us imagine the day divided into short 20-minute “periods.” Here are 14 daily such periods of “reading,” each divisible into two 10-minute periods, or extended to a half-hour, as seems most practical to teachers in different grades. Many such periods can be spent outside, to break up the tedium of long sitting for young children. The pupils would:

  1. engage in choral singing of traditional melodic song (folk songs, country songs, rounds);
  2. be read to from poems and stories beyond their own current ability to read;
  3. mount short plays — learning roles, rehearsing, and eventually performing;
  4. march or dance to counting rhymes, poems, or music, “reading” rhythms and sentences with their bodies;
  5. read aloud, chorally, to the teacher;
  6. read aloud singly to the teacher, and recite memorized poems either chorally or singly;
  7. notice, and describe aloud, the reproduced images of powerful works of art, with the accompanying story told by the teacher (Orpheus, the three kings at Bethlehem, etc.);
  8. read silently, and retell in their own words, for discussion, the story they have read;
  9. expand their vocabulary to specialized registers through walks where they would learn the names of trees, plants, flowers, and fruits;
  10. visit museums of art and natural history to learn to name exotic or extinct things, or visit an orchestra to discover the names and sounds of orchestral instruments;
  11. learn conjoined prefixes, suffixes, and roots as they learn new words;
  12. tell stories of their own devising;
  13. compose words to be sung to tunes they already know; and
  14. if they are studying a foreign language, carry out these practices for it as well.

The only homework, in addition to mathematics, would be additional reading practices over the weekends (to be checked by a brief Monday discussion by students).

Because Vendler’s plan doesn’t fit our current frame for thinking about primary education, it won’t — indeed can’t — be taken into account. Our reaction to it is that of the mythical robot: does not compute.

UPDATE: As for dread middle-schoolers, Susan Sontag had a plan.

Illustration from a 1951 edition of A Child’s Garden of Verses.

The utopia of global warming

oranges

In 1955, Life magazine looked into its crystal ball to imagine “what life may be like in A.D. 1980.” Here’s the first prediction:

Unhappy about the weather? Everybody talking but nobody doing anything about it? Well, just get in touch with the Atomic Weather Commission. A flick of the nuclear switch, and presto! — the North Pole melts, the vast continent of Antarctica thaws into productive use, Greenland grows bananas, Vermont grows oranges, and everybody’s heating bill vanishes. Not fantastic at all, according to mathematician John von Neumann, who also predicts that energy may be just about as “free as the unmetered air.” So, no light bills.

The future was sunny back then, though Von Neumann, one of the architects of both the atomic bomb and the digital computer, did temper his enthusiasm with a dash of the apocalyptic:

Weather control carries with it the possibility of climatic warfare (e.g., freezing your enemy with another Ice Age). “All this will merge each nation’s affairs with those of every other,” concludes Von Neumann, “more thoroughly than the threat of a nuclear or any other war already have done.” Political forms will have to change, in ways now unforeseeable, to accommodate these realities. (Von Neumann’s implication is that there will either be world government or no government — and no world.)

Photo by R.F. Katzenberger.