A record of more than 3 million Facebook users and a variety of their personal data collected by Cambridge researchers was available to anyone who can be downloaded for four years, New Scientist reports. It's probably just one of many places where such huge amounts of personal data were collected during a period of permissive Facebook access conditions.
The data was collected as part of a personality test, myPersonality, based on its own wiki (now mined), operating from 2007 to 2012, but new data was added in August 2016. It started as a side project of David Stillwell (now deputy director) of the Cambridge Psychometrics Center, but later graduated to a more organized research effort. The project "has close academic ties," the site says, "but it's an independent business." (Presumably for liability purposes, the group has never requested access to the data.)
Although "Cambridge" in the name, there is no real connection to Cambridge Analytica, only a very weak one by Aleksandr Kogan, which is explained below.
Like other quiz apps, he requested permission to access the user's profile (the data of friends was not collected) Along with answers to questionnaires, this resulted in an extensive record with entries for millions of users. The data collected includes demographics, status updates, some profile pictures, likes, and more, but not private messages or data from friends.
How many users are affected is a bit hard to say: the wiki claims that the database contains 6 million test scores of 4 million profiles (hence the headline), although only 3.1
Although the data of identifying information, such as the actual name of the user, the amount and width of it makes the set vulnerable to de-anonymization, because there is no better term. (I should add that there is no evidence that this has actually happened, simple anonymization processes on rich datasets are simply more fundamentally more vulnerable to this kind of rebuilding efforts.)
This dataset was for certified academics who had a wiki, available to agree with the terms and conditions of the team. It has been used by hundreds of researchers from dozens of institutions and companies for numerous works and projects, including some from Google, Microsoft, Yahoo, and even Facebook itself. (I asked the latter about this weird incident, and one representative told me that two Researchers had signed up for the data before work, it's unclear why the name I've seen lists Facebook as their affiliation, but there you have it.)
In the case of MyPersonality, the data should only be distributed to actual researchers; Stillwell and his co-worker, Michal Kosinski, personally reviewed applications to list the data needed and why, as this sample application shows:
I plan to use the following variables:
* USE AND Tell us how
* you plan to analyze them.]
However, a lecturer published his references on GitHub to allow their students to use the data. These credentials were available to anyone seeking New Scientist's access to the myPersonality database after about four years.
This seems to demonstrate the negligence with which Facebook guarded the data it allegedly guarded. Once that data had left the company premises, there was no way for the company to control it, but the fact that a set of millions of entries were sent to every academic who asked them and to anyone with a public username and password Password, suggests that it has not even tried.
A Facebook researcher actually requested the data violating the policies of his own company. I'm not sure what to make of it, except that the company was totally uninterested in buying such sets, and was much more interested in protesting any future liability. After all, if the app was in violation, Facebook can simply suspend it – as did the company last month – and put all the burden on the infringer.
"We blocked the myPersonality app almost a month ago because we believe it violated Facebook's policies," said Facebook's Product Partnership VP, Ime Archibong, in a statement. "We are currently investigating the app, and if myPersonality refuses to cooperate or fails our test, we will ban it."
In a statement to TechCrunch, David Stillwell defended the data collection and distribution of the myPersonality project
myPersonality employees have published more than 100 social science papers on key topics that enhance our understanding of the increasing use and impact of social networks. " He said, "We believe that academic research benefits from a properly controlled exchange of anonymized data in the research community."
In a separate e-mail, Michal Kosinski also emphasized the importance of published research based on their dataset is a recent example that examines how people judge their own personality, not how they know it, and how a computer designed for it works.
"Facebook has recognized and encouraged our research since at least 2011," it said. This is difficult to reconcile with Facebook's claim that the project was suspended due to policy violations based on the language of its redistribution conditions, as a company spokesperson told me. The most likely explanation is that Facebook never looked closely until this type of sharing of profile data became unpopular and the use and distribution among academics was more closely scrutinized.
Stillwell said (and the center has specifically stated) that Aleksandr Kogan was not really associated with the project; However, he was one of the people who got access to the data, as found in other institutions. He appears to have confirmed that he did not use this information in his SCL and Cambridge Analytica businesses.
The statement also says that the latest data is six years old, which seems to be essentially what it is, except for a lot of nearly 800,000 user data on the 2015 Rainbow Profile Image Filter Campaign, added in August 2016. That does not change much, but I think it's worth mentioning.
Facebook has blocked hundreds of apps and services and is investigating thousands more. In the case of Cambridge Analytica, it became clear that data collected by its users for a particular purpose was used by vicious and other actors for all sorts of purposes. One of them is a separate effort from the Cambridge Psychometrics Center called Apply Magic Sauce; I asked the researchers for the link between her and my personal data.
The excerpt from the small selection of these suspensions and collection methods that have been published suggests that Facebook during the longest period (until about 2014) has allowed data from countless users (the totals will only increase) to escape their authority And these data are still out there, completely out of the company's control and used by almost everyone.
Researchers working with provided user data Approval is not the enemy, but the total inability of Facebook (and to some degree the researchers themselves) to exercise some sort of meaningful control over these data points to grave missteps in the digital Privacy.
Ultimately, it seems that Facebook should be the one to take responsibility for this massive oversight, but as Mark Zuckerberg's performance in the Capitol emphasized, it's not really clear what responsibility Ooks takes on others as a sign of repentance and promises, to do it better.