Managing the friend portfolio

Home_Owners'_Loan_Corporation_Philadelphia_redlining_map

The Economist reports that lenders are beginning to scour social networks for data to refine the credit ratings of would-be borrowers:

Professional contacts on LinkedIn are especially revealing of an applicant’s “character and capacity” to repay, says Navin Bathija, the founder of Neo, a start-up that assesses the creditworthiness of car-loan applicants. … As statistics accumulate, algorithms get better at spotting correlations in the data. Applicants who type only in lower-case letters, or entirely in upper case, are less likely to repay loans, other factors being equal, says Douglas Merrill, founder of ZestFinance … Neo’s efforts to improve accuracy include recording borrowers’ Facebook data: Mr Bathija reckons that within a year there will be enough evidence to determine if making racist comments on Facebook is correlated with a lack of creditworthiness.

The social graph, too, provides a rich store of information for gleaning risk-worthiness. Your friends say a lot about you:

Facebook data already inform lending decisions at Kreditech, [where] applicants are asked to provide access for a limited time to their account on Facebook or another social network. Much is revealed by your friends, says Alexander Graubner-Müller, one of the firm’s founders. An applicant whose friends appear to have well-paid jobs and live in nice neighbourhoods is more likely to secure a loan. An applicant with a friend who has defaulted on a Kreditech loan is more likely to be rejected.

More than that, though, your friends provide leverage should you fall behind on a payment:

 [To borrow from Hong Kong-based Lenddo,] loan-seekers ask Facebook friends to vouch for them. To determine if those who say “yes” are real friends rather than mere Facebook contacts, Lenddo’s software checks messages for shared slang or wording that suggests affinity. What’s more, the credit scores of those who have vouched for a borrower are damaged if he or she fails to repay. Put the word out about this “social-enforcement mechanism” and “boom, the money shows up,” says Jeff Stewart, Lenddo’s boss.

It may give a new meaning to getting poked.

Rob Horning points out that people may need to start redlining their friends:

Better purge all those high school friends from your Facebook who aren’t likely to be successful; get rid of all those college friends who seem weird or who update about unsavory low-class, low-status things. … It is dismaying to see how readily social media can be used not as a tool of connectivity but as a sorting mechanism that helps rationalize social inequality. It doesn’t merely map the social territory, but starts to dictate it, along the segregated lines it reveals and then reinforces.

I see a new revenue stream for Facebook here — some kind of automated friend-portfolio management app that optimizes your mix of friends and alerts you whenever a buddy spends too much time in a bad neighborhood or starts hanging out with low-lifes. Maybe Facebook could even set up an exchange for trading friend-portfolio derivatives. You could have everything from Aaa-rated friend portfolios (stable marriages, high-net-worth zip codes, regular statin intake) to speculative junk-rated friend portfolios (druggies, socialists, poets).

Boogie men

hook

I have to share a tiny bit from Will Sheff’s long, fine essay on Dr. Hook and the Medicine Show: Live 1974. (Yes, you read that correctly.) Sheff’s piece is what you want to read this weekend.

By this point, the camera has pulled in so close to George’s face that it takes up the entire screen. George’s mouth is hidden behind the red handkerchief, so when his voice comes out it sounds weirdly disembodied, like it was piped in from somewhere else. In spite of the macro close-up, his face barely seems to move. He stands there, stone-still, filling the screen, a frozen giant, so massive you can see every pore in his nose. His eyes, though, are hidden in deep shadow. The camera lingers on this close-up as the disembodied words flow out, holding the shot for so long that for a while it becomes abstracted and you almost forget you’re looking at a face. You get the illusion instead that you’re peering into two deep caves burrowed into the pale side of an ancient cliff, with overgrown black vines shrouding the cave on either side, and with a booming voice off in the distance, or maybe it’s thunder, breaking against itself, or maybe the voice is coming from the miles and miles of endlessness deep inside, a voice of someone thousands of feet below the earth’s surface, a damp, earthy voice, a voice like mud or like dirt or like black grease, intoning “Mmmmmboooooogie….

Thanks to The Browser.

Used e-book, slightly foxed

used

There are continuing signs that the e-book market is cooling. Sales growth remains strong, but the rate of expansion, which fell sharply over the last year, still seems to be heading down. But whether that’s a blip or a trend, e-books are already a substantial segment of the overall book market, and there’s little reason to believe that they won’t be an even bigger segment in the future.

Which means that the question of the used e-book — what it is, what can be done with it, who controls what can be done with it — is not going to go away. As was widely reported, Amazon was recently granted a very interesting patent on a method that allows e-books (and other “digital objects”) to be resold or given away through a secondary market. The method essentially allows the rights to the file (ie, the ability to open it) to be traded a certain number of times. When the maximum number of transfers is reached, the rights remain with the last “owner” in perpetuity. No further trades are possible.

Marcus Wohlsen provides a lucid explanation of the patent. He points out a crucial difference between a used e-book and a used print book: the former is a perfect copy (in theory), whereas the latter is a degraded copy. Just as a used car is a different product from a new car, a used physical book is different product from a new physical book. They’re not perfect substitutes. A used e-book, on the other hand, is the same product as a new e-book. They are perfect substitutes.

“There are no dog-eared pages or scratches or nicks or cuts or highlighter marks or whatever,” says Bill Rosenblatt, a consultant and expert witness in digital content patent cases. “It’s the same exact product.” In other words, a customer given the choice between a “new” e-book and a less expensive “used” e-book will buy the used copy every time. The extra expense of “new” won’t get you anything better.

Not only that, but since e-books don’t suffer decay, as physical books do, they can essentially be resold, as perfect substitutes, an infinite number of times. For those reasons, people who make their living in the book trade find the very idea of a used e-book awfully scary, which is why they’ve worked to prevent that idea from becoming a reality. Books aren’t songs, but, still, the specter of perfect digital copies of books being traded endlessly over the Net is more than a little discomfiting for those who deal in words.

So why would Amazon patent a method for permitting used e-book sales? One theory is that it’s a defensive patent, intended simply to make it more difficult for others to come up with a viable method for selling used e-books. That seems far-fetched to me. It’s not how Amazon operates. Amazon’s all about the offense. A more plausible explanation is that Amazon wants to restrict the number of times an e-book can be copied — to prevent infinite copying and hence protect new-book sales. By establishing such restrictions on copying, you also, to a small degree, add an imperfection to the used copies: every time a copy is transferred, the e-book loses a little of its value because the number of times it can be transferred in the future is reduced by one.

More important to Amazon, though, is the prospect of being able to set up and control a marketplace for used e-book sales, just as it operates a lucrative marketplace for sales of used print books. Such a marketplace is particularly attractive to Amazon because it cuts publishers out of the picture, or at least provides Amazon with another source of leverage over publishers. Amazon’s long-term goal is to influence the power structure of markets, giving itself dominance. The used e-book could make a good tactical weapon in this struggle. At the very least, it’s a weapon you’d prefer to control rather than allowing control to fall into the hands of another player. Hence the patent.

But there’s another angle here. The fact that a used e-book is a (theoretical) perfect copy of a new e-book is a bad thing for those who produce and sell books. But software has its advantages. As soon as a physical book was sold, the author, publisher, and bookseller lost all control over it—indeed, all knowledge of it. It became the property of the buyer. That, for better or worse, is not at all the case with an e-book. An e-book remains tethered, electronically, to the seller. Amazon knows, for instance, what you do with a Kindle book. It knows when you read it (or don’t read it), it knows when you lend it, and it will know when you sell it—and it will also know when the new owner reads or lends or sells it, and on and on and on. Indeed, that kind of post-sale tracking and control is what made the Amazon patent possible in the first place. The patent would make no sense for print books.

This is where things get interesting. If, for instance, Amazon made it possible to resell (or even give away) the e-books it sells, it could also, via restrictions coded into the file, control the way the resale takes place and even the terms of the sale. So if you wanted to sell a Kindle edition, you might be required to sell it through the Amazon store and you might be required to pay a set fee or a set percentage to complete the transfer. In order to set up such a system, Amazon would need the cooperation of the rights holders (publishers and authors), but now it has a carrot to offer them: a cut of the transfer fee. In this way, the downside of allowing a customer to trade a perfect copy of an e-book is offset, at least to a degree, by the ability to participate in those trades, in perpetuity. So maybe the used e-book is, for the book business, not quite as ugly as it seems.

One thing that all of this makes clearer than ever is this: being able to exert control over the ultimate shape and workings of the e-book market — rules, rights, copy protocols, percentages, etc. — will be critical over the long run. This is a power game — a game of tomes. I’ve argued in the past that publishers should give away a downloadable electronic copy with every copy of a physical book that’s purchased. The e-book should be a complement to the print book. That would not only make a print book more valuable and hence more likely to be purchased. It would also give publishers more control over the future of the e-book market. By setting up their own mechanism for downloads, they’d also gain more control over the rights and restrictions built into e-books, including the ability to make money directly from future transfers. They would not cede to Amazon another important set of tactical weapons. In the short run, though, making an e-book a free add-on to a print book would almost certainly mean sacrificing some e-book sales. And that’s not something that’s easy to swallow.

The big danger for publishers is that their view does not seem to extend out as far as Amazon’s does. In a long war, that could well be a fatal disadvantage.

Photo by How I See Life.

The international federation of bees

hives

Online crowdsourcing platforms like Amazon’s Mechanical Turk and the prettily named CrowdFlower provide companies and entrepreneurs with an easy way to tap into the so-called hive mind. Businesses can hire anonymous bee-laborers to perform what Amazon calls “human intelligence tasks,” or HITs, paying them microwages for cognitive piecework. Up to now, this has all been run as something of an underground economy, and the bee-laborers have had few rights and little recourse when their task masters, or “requesters,” stiff them, as the New Scientist‘s Hal Hodson reports:

Mechanical Turk’s entire business model hinges on persuading large numbers of workers to do tiny tasks for pennies at a time. And it relies on turning its group of human workers into “a system that doesn’t talk back”, says Lilly Irani, a computer scientist at the University of California in Irvine. Turkers, as they are known, have no idea whether an individual “requester” is likely to pay them promptly for their work, or even at all, as requesters can choose to reject work without any repercussions. This is vital because around 20 per cent of Turkers say that they always or sometimes need money earned during crowd work to make ends meet, according to a small survey carried out by Irani. “There are people for whom this is a crucial source of income,” she says.

Now, though, an incipient movement is afoot to organize crowdworkers and give them a little more power, individually and collectively. Irani, for instance, has set up a service, Turkopticon, that provides a way for workers to share information about requesters. It makes the hive a little more transparent. Crowdworkers are also beginning to take legal action against the platforms, charging them with violating labor and minimum-wage laws, according to Hodson. Calls for unionization are even being heard:

Without legal redress for online workers these efforts count for little, says Trebor Scholz at New School University in New York City. “People fought for 100 years for the 8-hour work day and paid vacation, against child labour. All of that is wiped away in these digital environments,” he says, and calls for crowd workers to form a transnational union.

At the very least, these efforts call attention to a small but growing part of the economy that has operated in the shadows. Terms like “crowdsourcing” and “hive mind” can obscure the fact that what we’re really talking about are people.

Photo by The Co-operative.

SEO for scholarship

masks

The way the creation of the Google search engine was inspired by the traditional method for measuring the value of scholarly works, with links becoming an analogue to citations, has become one of the web’s great origin myths. And the way the new search engine set off a rush to game the system, weakening the usefulness of links as markers of value, has become a lesson in the drawbacks of what might be called the automation of judgment. Every online currency inspires its own debasement, to one degree or another.

Now, in a perverse twist, the circle is completing itself, as Google provides web tools — Google Scholar Citations and Google Scholar Metrics — for tracking and measuring the value of academic articles and other scholarly works. The new tools offer a lot of benefits, but they also provide both the temptation and the means to game the scholarly citation system. Attempts to manipulate citations aren’t new, but now it’s possible to take the shenanigans to web scale, to bring black-hat techniques of search engine optimization to the ivory tower. Nat Torkington points to a 2012 paper, “Manipulating Google Scholar Citations and Google Scholar Metrics: Simple, Easy and Tempting,” in which three Spanish scholars describe how they used fake documents from a fake researcher to skew Google Scholar rankings and measures.

Over the course of a few hours, the researchers cobbled together six documents by cutting-and-pasting text and figures from other works. All the fake documents were attributed to the same, fake author. They included in each document citations to 129 other papers that were authored or coauthored by at least one member of the “EC3” research group to which they belong. They translated the documents into English using Google Translate. Then they created, within the University of Granada’s domain, a web page citing each of the six fake papers and including links to the full texts. At that point, they sat back and let Google take over:

Google indexed these documents nearly a month after they were uploaded, on 12 May, 2012. At that time the members of the research group [cited in the fake documents] along with the three co-authors of this paper, received an alert from GS Citations pointing out that [the fake scholar] had cited their Works. The citation explosion was thrilling, especially in the case of the youngest researchers where their citation rates were multiplied by six, notoriously increasing in size their profiles. …

The results of our experiment show how easy and simple it is to modify the citation profiles offered by Google. This exposes the dangers it may lead to in the hands of editors and researchers tempted to do “citations engineering.”

When the experiment was over, the researchers removed all trace of their work from the web, though the fake papers, and the fake author, lived on in the Google Scholar database. They conclude:

Even if we have previously argued in favour of Google Scholar as a research evaluation tool minimizing its biases and technical and methodological issues, in this paper we alert the research community over how easy it is to manipulate data and bibliometric indicators. Switching from a controlled environment where the production, dissemination and evaluation of scientific knowledge is monitored (even accepting all the shortcomings of peer review) to a environment that lacks any kind of control rather than researchers’ consciousness is a radical novelty that encounters many dangers. … [The Google tools] do not only awaken the Narcissus within researchers, but can unleash malpractices aiming at manipulating the orientation and meaning of numbers as a consequence of the ever growing pressure for publishing fuelled by the research evaluation exercises of each country.

Google, of course, only provides the temptation. It doesn’t force anyone to give in to it. Maybe, in the end, we’ll come to discover that Google was put on this earth to test our ethical mettle. That would give a deeper resonance to the origin myth.

Photo by Carlos Castillo.

Hot hands, cold data

ray

David Brooks, in his Times column today, looks at the rise of “data-ism” — the rapidly spreading belief “that everything that can be measured should be measured; that data is a transparent and reliable lens that allows us to filter out emotionalism and ideology; that data will help us do remarkable things.” Brooks is wary of the worship of number-crunching. He worries, wisely, that as our stores of digital data swell we’ll “get carried away in our desire to reduce everything to the quantifiable.” But he grants that there are some obvious benefits to statistical analysis. For one thing, “it’s really good at exposing when our intuitive view of reality is wrong.”

As his prime example, he points to a perception shared by pretty much every sports fan and certainly every basketball fan: that sometimes players get in the zone and can do no wrong. They’re on fire. They’re on a tear. They’re HOT. Not true, says Brooks. The hot streak is a fiction, a figment born of a flaw in our mental makeup. Its existence was disproven, he says, in a famous paper from the 1980s:

Every person who plays basketball and nearly every person who watches it believes that players go through hot streaks, when they are in the groove, and cold streaks, when they are just not feeling it. But Thomas Gilovich, Amos Tversky and Robert Vallone found that a player who has made six consecutive foul shots has the same chance of making his seventh as if he had missed the previous six foul shots.

When a player has hit six shots in a row, we imagine that he has tapped into some elevated performance groove. In fact, it’s just random statistical noise, like having a coin flip come up tails repeatedly. Each individual shot’s success rate will still devolve back to the player’s career shooting percentage.

My own intuition howled in agony as I read this. I have, after all, watched Ray Allen go on clutch three-pointer sprees in the fourth quarter, effortlessly draining one bomb after another. Unconscious. And, sadly, I have seen the opposite: Ray Allen throwing bricks from the same spots in the same situations. But I’m no fool. I’m willing to accept the hard, spoil-sport facts. My intuition has to bow down to the stats.

Or does it? It turns out that this hot streak issue is not as clear-cut as Brooks makes it out to be. The data’s slippery.

The existence or nonexistence of the hot hand in basketball, and elsewhere, continues to be debated in statistical and economic circles, and there’s evidence to support both sides of the debate. Several studies have questioned the reliability of the Gilovich paper’s conclusions. This one, for instance, suggests that the sample size was too small, that the original researchers’ “statistical tests were of such low power that they could not have been expected to find a Hot Hand even if it were present.” And a series of recently published studies — this one, this one, this one — have found at least some evidence of a hot hand effect among basketball players. The most recent paper to call into question the Gilovich conclusions was published last year in The American Statistician. Written by Daniel Stone, a Bowdoin College economics professor, it presents evidence that “the widespread belief among players and fans in the hot hand is not necessarily a cognitive fallacy.”

After reading Brooks’s column, I sent an email to Professor Stone asking if he had any reaction to it and also asking whether the hot-hand question is in fact considered settled, as Brooks suggests. He soon wrote back. “I saw the Brooks article and cringed,” he said, “– as the answer to your question is no, it’s not settled. There is recent research showing there is a hot hand in basketball (Arkes 2010), and mine shows analysis may greatly underestimate the effect. Put those together and there could be major hot hand.” Stone did emphasize that fans often see a hot hand where none exists — “people are too quick to infer a player is hot based on limited data” — but that doesn’t mean that players don’t sometimes go on real streaks.

Stone also pointed me to a recent article he wrote with another researcher, Jeremy Arkes, that, in addition to showing how the hot hand remains a bone of contention, provides a quick explanation of why we should be cautious about accepting the received statistical wisdom. They conclude: “Our overall conclusion – based on the intuition, experience and judgment of millions of bball fans/players (that, of course, we only have a sense of), what’s been found and not found in the data (from bball and other sports), and our recent theoretical analysis—is that behavioral scientists have been too quick to conclude that there is no hot hand in bball, and in fact it’s likely that players do occasionally get hot, to varying degrees.”

After nearly 30 years of intensive analysis, the hot hand remains mysterious. Our flawed intuition may be seeing something—something real—that the data is missing. This ends up, I think, underscoring Brooks’s sense that we have to be wary about data-ism and its promises. A transparent lens can also be a warped lens.

AFTERTHOUGHT: By the way, isn’t it kind of asinine to look at the free throw line for evidence of a hot hand? Free throws are the hothouse flowers of basketball. You have to look at field goals.

Photo by Keith Allison.

A new dent in the universe

rodentride

The Rough Type headline of the day comes courtesy of TechCrunch:

Pet Boarding and ‘Dogbnb’ Startup Rover Raises $7M to Take On DogVacay

This bodes well for my new startup, RodentRyde, which will allow people to sell spare cycles on their hamster wheels. Here’s the pitch: “It’s Lyft meets DogVacay for micropets.”

Photo from Wikipedia.