Scoble: freedom fighter or data thief?

The great revolutionary activist of our day, Robert Scoble (or “Che Scoble,” as Mike Butcher says), is battling for the ideal of data freedom with the evil forces of Facebook. At issue, writes Kara Swisher, in a post titled “Free the Scoble 5,000!!,” is “how much control you should have over your own information online.” Mathew Ingram chimes in, saying “there’s no question that the information itself should belong to Scoble.”

Sounds black and white. Scoble: good. Facebook: evil. But it’s not quite that simple. When Scoble broke into Facebook’s databank, he opened a Pandora’s Box, and I would argue that neither he nor Facebook is in the right.

I mean, is this really about “your own information,” as Swisher terms it? Is there really “no question” that the data “belong to Scoble,” as Ingram assumes? I don’t think so. Scoble got himself kicked out of Facebook for using a software script to automatically “scrape” information from Facebook’s database and move it elsewhere. Far from being just “his own information,” however, the information included the names, email addresses, and birthdays of 5,000 Facebookers who had “friended” Scoble. The act of “friending” on a social network site, it’s important to remember, is a fairly cavalier act, often undertaken with little thought.

Now, if you happen to be one of those “friends,” would you think of your name, email address, and birthday as being “Scoble’s data” or as being “my data.” If you’re smart, you’ll think of it as being “my data,” and you’ll be very nervous about the ability of someone to easily suck it out of Facebook’s database and move it into another database without your knowledge or permission. After all, if someone has your name, email address, and birthday, they pretty much have your identity – not just your online identity, but your real-world identity.

Scoble says his purpose in swiping the data was benign. He just wanted to see which of his Facebook “friends” were also members of another site he uses, Plaxo. He learned, he writes, that “of the 5,000 people in my Facebook account about 1,800 were already on Plaxo. [The scraping software] did NOT look at anything else. Just this stuff, no social graph data. No personal information.” I have no doubt that Scoble didn’t mean any harm, but in what sense are names, email addresses, and birthdays not “personal information”? The important question isn’t what Scoble intended to do with the information. The important question is this: Will others who use such scraping scripts necessarily have benign intentions? And the answer is: No.

Facebook has an obligation to protect the data entrusted to it by its members. At the very least, members should have the right to decide whether or not their personal information can be scraped out of the Facebook database. Scoble did not give them that choice. That doesn’t mean that Facebook is the hero. It, like other social networks, happily scrapes information from members’ email accounts to identify possible new members. Facebook will scrape when it suits its commercial interest but will block scraping when it doesn’t. Still, in this particular case, Facebook did what it needed to do: protect the information and the interests of its members. Until controls are in place, unauthorized scraping of other members’ personal information shouldn’t be allowed.

What the Scoble affair reveals is that the issue of “data portability” is not a simple issue but a fraught one. Data scraping can make our lives easier, but it can also put us at risk.

UPDATE: Dare Obasanjo adds more detail to the discussion, while Ian Betteridge comments: “I think the point we have to ask is what the expectations are of users who friend someone. You clearly expect them to have access to whatever data you make available to them through Facebook. You don’t, though, expect them to take that data wholesale and sell it to a spammer. So while friending someone is saying ‘you can use this information,’ it’s not saying ‘you can use this information in any way you see fit.'”

23 thoughts on “Scoble: freedom fighter or data thief?

  1. Simon

    That’s a great take on the situation.

    Perhaps the problem here, is that we’re all becoming fairly blase about putting sensitive information into the public domain for strangers or near-strangers to access.

    It seems curious that in real life, most of us live in houses with locked doors, security alarms etc. but online – and especially with social-networking sites – our mentalities change drastically.

    So many social networking site users seem to behave like they’re occupants of a mythical 1950s US small town where everyone knows your name and where you can leave your door open.

    Of course there is the very real threat of people trying to steal data and of accidental losses (recently 25m records relating to families in the UK were lost), but perhaps we all shouldn’t lull ourselves into thinking that online, Jimmy Stewart lives next door…

  2. BobWarfield

    Nick, you’re just flat out wrong on this, even while you’re right. Facebook is pursuing a terrible strategy in terms of customer relations, although they’re completely within their rights to do so.

    As for people welcoming Facebook’s protection, oh please, do you really think Facebook is doing anything at all to protect you in any meaningful way? Come on, the only protection is in who you decide to accept a friend invitation from. People need to accept responsibility for these actions.

    More on my blog:

  3. Kevin

    Very good contrarian view, Nick.

    While I think the data Scoble was trying to import was fairly innocuous, there clearly need to be limits and clear acceptance of what is personal information.

    To me, it is pretty clear that people should be able to see if friends use another social network and should be able to transfer their own photos, biographical info, etc to other networks. As for information that describes someone else, who knows? Could people consent to it somehow?

  4. Shelley

    The application he was using was using OCR technology times 5000 friends. I would have kicked his butt off the site, too.

    This was such a non-story. This was so unimportant. Is this what we’ve been reduced too? Robert Scoble can’t use Facebook OMG! The world we know is dying?

  5. Bob Morris

    Um, if I give a friend my email address written on my business card, I kind of assume he’ll enter it in any number of online apps. So why shouldn’t that data be easy to copy from one app to another?

    The bigger issue, as mentioned, is Facebook’s resolute determination to turn what could have been just a minor blip into yet another raging issue.

    Any company with even the slightest clue about their user base or pr image would have been on this 10 seconds after Scoble’s post appeared doing damage control. And maybe could have turned it into a pr victory for them. “Robert, let’s have an open discussion about this on your blog, then you agree not to do it again and we give you back your data.”

    Instead, we get the usual crickets as response from Facebook.

  6. Ethan

    “A strange game, Facebook. The only winning move is not to play.”

    Indeed. FWIW, I cancelled my account, being that I wasn’t using it anyway. I hardly gave Facebook much in the way of personal information to begin with, but I figure if I ain’t using it, stuff it. Anyway, tempest, teapot, tell me what my opinion should be post-dustup.

  7. Prokofy Neva

    Wait a second. Just the other day everyone was complaining that Facebook was scraping all this advertising data by trailing people as they clicked on ads, and they had to apologize and undo it. Now people are mad because Facebook jammed when Scoble tried to port out what could well have been a mass-mailing spam list, not just his 5,000 “friends”. Or are these different sets of people complaining?

    I’m the first to rally against data-scraping in Second Life, for example, when bots go around harvesting not only basic data like the avatar’s name and start date but proximity data or land holdings data and group membership. I think this should be an opt-in, not opt-out.

    However, I think you have to demarcate where the privacy line really should be drawn, and recognize that it’s a slider for people who have different values.

    I think the basic information of Facebook, your name, email, and birthdate, are pretty much findable anyway on Google and basically your essential passport on the Internet.

    But the internal parts of Facebook, so to speak, *should* be private. That means all those vampire bites and trivia quizzes and videos linked and flogs updated and flowers sent. So the outside of the package with your name on it seems like a basic for the envelope that you can’t really expect to remain private, but the inside contents of the envelope, the letter for that particular’s sites world, should belong only to you, the writer or the recipient. Like copyright belongs to the author.

    At some point this will be normalized, no? You will have a basic account with a slider and each time you join one of these social network thingies or virtual worlds or games you will calibrate it. At least, that’s how it should work. If people stop rhapsodizing about OpenID, which is annoying to try to use and doesn’t have any kind of calibration to it.

    Somebody has to come up with a way to have a suitcase you can pack and unpack as you wish to go between platforms and worlds. But which company could be trusted to be the suitcase vendor?

  8. mndoci

    This is such an interesting little problem isn’t it. Your data, including your list of contacts should be portable. Does that mean Facebook and others using my gmail address book to find friends is OK? Is someone else’s email address given to me my data?

    One thing that bothers me, and I am as big an open social network person as anyone else, is that the immediate question is not whether FB is closed of not, but the fact that Plaxo, a competitor as it were went all covert to try and get info out of FB using means that they know violate FBs terms of service. To me, that’s just plain unethical, if not illegal. That Scoble got baited into this is unfortunate, since I really like the fellow.

    The best thing that can come out of this is an acceleration around the entire issue of data portability and the definition of data and ownership.

  9. Simon Owens

    What would the difference be between Scoble scraping the information with this script and him manually visiting each friend and manually copying the information? Are we arguing over the fact that he scraped the material, or that he scraped it at a rapid rate?

  10. Ian Betteridge

    Simon says: “What would the difference be between Scoble scraping the information with this script and him manually visiting each friend and manually copying the information?”

    The issue really isn’t giving Robert access to the data. After all, I have already made available that data to him, by friending him on Facebook.

    The issue is what he does with it: and, for lots of people, putting into Plaxo is a violation of trust. I don’t have an issue with Plaxo, but lots of people do, and giving consent to someone to view my information on Facebook does not imply giving him the permission to import it into another service.

  11. Bertil

    Is Facebook the first service with such data? No: GMail, and many other SNSs have made an API to do just that. Is it problematic? See Fred Stuzman’s take on the question: one could run a fake list of contacts to reveal some identity — and I would answer that the only way to handle this is to say that an e-mail is an ID, and make is clearer to new adopters.

    I don’t think you need thousands of birthday data to check about your friends’ use of Plaxo, and doing so would be yet another breach in some people attempt to separate their public and private life.

  12. Dennis D. McDonald

    I don’t see a way around this quandary without (a) defining and reaching a consensus on what constitutes “public” and “private” information and (b) creating a legal definition of ownership rights relating to private information. Until such unlikely events occur, the current Wild West situation will continue where anyone — with or without good intentions — can figure out ways around ineffective “terms of service” to scrape semi-public data from one system to another.

  13. ckeene

    As usual, you have a contrarian and well-reasoned point of view. Facebook is becoming the Roach Motel of Social Media.

    An interesting related point is that Google also prevents bots from scraping the data that they themselves have scraped from others. How long before the content providers wise up and charge scrapers like Google for the privilege of Hoovering their content?

    I predict that the 11th commandment for Web 2.0 is “Always be the scraper, never the scrapee.”

  14. Josh McHugh

    If anyone’s dying to do a little Scoblesque mining of their social network and webmail accounts, there’s a brilliant three-man company out of Kuala Lumpur called Octazen that makes fine scraping tools that are employed by many of the better-known social networks.

    Amen to ckeene: the name of the Web Deuce game is to scrape rapaciously but not get scraped: “See which of your friends are already on the network!…by giving your webmail username and password to our scraping bot, which will then proceed to impersonate you, log in and gut — er, import — your webmail contacts. For a more detailed description of how the process works, please see our terms of service, which prohibit you from the exact same thing to us.”

    How big do we think Fbook’s user base would be if people had to add their contacts by hand? Not so big.

    More on scraping, social, antisocial, and otherwise, in my scraping story in this month’s Wired.

  15. Simon Wardley

    Is Nick (please excuse the pun) up to some wicked and devilish posting with an item on the SaaS future followed in quick succession by a subtle caveat emptor on data portability?

    So as SaaS becomes more mainstream, I’m in agreement with Christopher Hoff and James Urquhart that at some point we are going to face a major security or resilience issue with a SaaS Vendor and a repeat of second sourcing lessons.

    Could 2008 turn out to be the year we start to realise that without portability we face being seduced by an almost irresistable proposition which comes with some serious handcuffs? Is this when we start to recognise that we really need choice and competitive utility markets in the SaaS or HaaS or X as a Service fields? This was my concern at oscon and web 2.0

    Of course unlike electricity where switching providers is simple, we have a relationship with our XaaS vendor as our data resides with them. However portability (why stop at data, with Haas vendors you need to port your code as well) is more than just open standards and APIs to transfer that data. They are necessary but not sufficient and as you hint at it’s a big and complex issue.

    There is no point in portability if you have no other provider to go to, or if not all the data is covered by the standard, or if the data is not interpreted or used in the same way. To solve this invariably requires multiple providers running the same XaaS system but no provider is ever going to hand over strategic control of their business to another. The only realistic way we are ever likely to have an ecosystem of providers with portability between them is if we end up with open source XaaS technology complying to open standards and providers competing on operational implementation.

    But even that isn’t enough to ensure actual portability. Switching providers has got to be an easily achievable task, not buried in arcane concepts and obscurity.

    So could portability well turn out to be the real big issue of 2008? Well it’s going to be this year or the next, but then you already know that.

    Wonderful posts, can’t wait until the book arrives.

  16. benkepes

    Nick – I have to concur with Bob. What do people expect here – there is no compulsion to either sign up to Facebook or accept a friend request. You do it with the understanding that there are some benefits to be made, social or otherwise. Well sorry but publicly viewed information is just that – in the public domain and available to be scraped, mashed or rehashed in any number of ways.

    If Facebookers can’t handle the heat……

Comments are closed.