The ethics of MOOC research

In writing my recent article on massive open online courses, I talked with the leaders of the Big Three in the nascent industry — Coursera, edX, and Udacity — and they all stressed the importance of large-scale data collection and analysis to their plans. By meticulously tracking the actions of students, they hope to build large behavioral data bases that can then be mined for pedagogical insights. The findings, they believe, will help improve particular classes as well as bolster our general understanding of teaching and learning.

The MOOCs’ research agenda seems entirely wholesome. But it does raise some tricky ethical issues, as a correspondent from academia pointed out to me after my article appeared. “At most institutions,” he wrote, the kind of behavioral research the MOOCs are doing “would qualify as research on human subjects, and it would have to be approved and monitored by an institutional review board, yet I have heard nothing about that being the case with this new adventure in technology.” Universities are, for good reason, very careful about regulating, approving, and monitoring biological and behavioral research involving human subjects. In addition to the general ethical issues raised by such studies, there are strict federal regulations governing them. I am no expert on this subject, but my quick reading of some of the federal regulations suggests that certain kinds of purely pedagogical research are exempt from the government rules, and it may well be that the bulk of the MOOC research falls into that category.

Nevertheless, given the sensitivities involved, you’d think that schools partnering with the MOOC providers, particularly the for-profit providers, would be giving the research programs a thorough review and demanding some kind of ongoing oversight. Yet if you look at the contract between the University of Michigan and Coursera, a contract that Coursera says is similar to the ones it has with other institutions, you find almost nothing about data and research. There is a section (#14) establishing basic confidentiality safeguards for student data (names, email addresses, test scores), but it doesn’t say anything about research. The only other thing I saw was a short note in an exhibit appended to the contract, which says that Coursera “will administer assessments and make available to University certain aggregate analytics regarding End User behavior and performance, which will include information on any of the following: End User demographics, module usage, aggregate assessment scores (stratified by demographics) and reviews by demographics.” I saw nothing about any review, oversight, or restriction of research programs or of the use of the resulting data.

I also glanced through Coursera’s terms of service. They lay out, in broad terms, the “personal” and “non-personal” information that the company will collect from students. The personal information is mainly used for formal communications with students. The non-personal information is what’s collected for research and other purposes:

When users come to our Site, we may track, collect and aggregate Non-Personal Information indicating, among other things, which pages of our Site were visited, the order in which they were visited, when they were visited, and which hyperlinks were “clicked.” We also collect information from the URLs from which you linked to our Site. Collecting such information may involve logging the IP address, operating system and browser software used by each user of the Site. Although such information is not Personally Identifiable Information, we may be able to determine from an IP address a user’s Internet Service Provider and the geographic location of his or her point of connectivity. We also use or may use cookies and/or web beacons to help us determine and identify repeat visitors, the type of content and sites to which a user of our Site links, the length of time each user spends at any particular area of our Site, and the specific functionalities that users choose to use.

Coursera says it will use the information “in aggregate form to build higher quality, more useful services by performing statistical analyses of the collective characteristics and behavior of our users, and by measuring demographics and interests regarding specific areas of our Site.” But then the company also notes, “We may also use it for other business purposes.” That sounds like carte blanche.

I have no reason to think that Coursera, or any other MOOC, has anything but noble intentions when it comes to data collection and data mining. I certainly believe that the leaders of the companies are motivated by a desire to improve education. But Coursera is a for-profit business, backed by venture capitalists. Sooner or later, it will have to make money, and, given the current excitement in Silicon Valley and elsewhere about the commercial potential of “Big Data,” it seems inevitable that the company and its investors will explore “other business purposes” for its data, including ones that would bring in revenues.

In their excitement to join forces with MOOC providers, university administrators and professors may not be giving enough thought to all the data that’s going to be collected and all the research activities that are going to be pursued. It’s an oversight they may come to regret.

3 Comments

Filed under Uncategorized

3 Responses to The ethics of MOOC research

  1. The user tracking performed by the MOOC providers sounds no different to me from what Facebook or Google are doing. Surely if their data collection is legal then so is the MOOCs’. Now of the equivalent would require oversight if performed specifically for a research project… that would be rather ironic.

  2. Nick

    Yeah, that’s a good point. Then again, the legal, regulatory, and ethical strictures on private companies are different from those on schools, particularly when schools have public funding. So I’m not sure how far your point goes. Also, it doesn’t really address the questions of oversight and disclosure raised when a third party does research on data collected as a university delivers a class to students.

  3. Worse yet, the Gates Foundation is collecting confidential student info in a “data store” called the “Shared Learning Collaborative” to be provided to for-profit companies to help them develop their “learning products.”

    This data will include their names, addresses, grades, test scores, disciplinary records, and special education status, and is being done with state and district assent but without parental knowledge or consent.
    For more on this, see http://shar.es/cx7IZ and http://shar.es/cx7SC
    Petition here: http://www.classsizematters.org/stop-slc-capture-kids-data/
    What do people think of the ethics of this?