Anonymising data is ‘not enough to protect privacy’ as a person’s identity can easily be pieced together from bits of information, study warns
- Under GDPR rules, organisations can only sell personal data by ‘anonymising’ it
- This means stripping it of identifiable details, such as name and email address
- However, machine-learning could be used to reverse this by third-party buyers
Data privacy laws requiring the anoymisation of a person’s data are failing to stop people being identified as the wealth of information snippets available can be assembled like a jigsaw to find someone’s true identity.
The little data nuggets allows a wider picture to be pieced together from information such as postcode, gender and date of birth and can often reveal a person’s name.
A study has warned that despite heightened privacy laws in the wake of GDPR, rolled out following the Cambridge Analytica scandal last year, people are still exposed.
Companies now often sell anonymised data to third parties for a variety of uses, including for analytics and reviewing audience participation.
Scroll down for video
Risk: Although companies such as Facebook are forced to strip personal information from any data they share, researchers from Imperial College London showed machine-learning could be used to reverse this process by third-party buyers – even with incomplete datasets (stock)
That is done by stripping the data of identifying characteristics like names and email addresses, so that individuals cannot, in theory, be identified.
After this process, the data’s no longer subject to data protection regulations, so it can be freely used and sold.
But researchers from Imperial College London and the University of Louvain in Belgium showed machine-learning could be used to reverse this process.
They created an online computer tool that could correctly ‘re-identify’ 99.98 per cent of Americans in any available ‘anonymised’ dataset by using just 15 characteristics, including age, gender, and marital status.
Study first author Dr Luc Rocher, of UC Louvain, said: ‘While there might be a lot of people who are in their thirties, male, and living in New York City, far fewer of them were also born on January 5, are driving a red sports car, and live with two kids – both girls – and one dog.’
The tool first asked a user to put in the first part of their postcode, gender, and date of birth and estimated the probability they could be identified from an ‘anonymous’ dataset.
The estimate dramatically increased as the user gave more personal details such as marital status, number of vehicles, house ownership status and employment status.
Not enough? The EU General Data Protection Regulation (GDPR) was rolled out in the wake of the Facebook and Cambridge Analytica scandal, early last year, but may be insufficient (stock)
WHAT IS THE EU’S GENERAL DATA PROTECTION REGULATION?
The European Union’s General Data Protection Regulation (GDPR) is a new data protection law that entered into force on May 25, 2018.
It aims to strengthen and unify data protection for all individuals within the European Union (EU).
This means cracking down on how companies like Google and Facebook use and sell the data they collect on their users.
The law will mark the biggest overhaul of personal data privacy rules since the birth of the internet.
Under GDPR, companies are required to report data breaches within 72 hours, as well as to allow customers to export their data and delete it.
The European Union’s General Data Protection Regulation (GDPR) is a new data protection law that entered into force on May 25
Part of the expanded rights of data subjects outlined by the GDPR is the right for data subjects to obtain from the data controller confirmation as to whether or not personal data concerning them is being processed, where and for what purpose.
Further, the controller must provide a copy of the personal data, free of charge, in an electronic format.
This change is a dramatic shift to data transparency and empowerment of data subjects.
Under the right to be forgotten, also known as Data Erasure, are entitled to have the data controller erase their personal data, cease further dissemination of the data, and potentially have third parties halt processing of the data.
The conditions for erasure include the data no longer being relevant to original purposes for processing, or a data subject withdrawing their consent.
This right requires controllers to compare the subjects’ rights to ‘the public interest in the availability of the data’ when considering such requests.
Study senior author Dr Yves-Alexandre de Montjoye, of Imperial’s Department of Computing, said: ‘This is pretty standard information for companies to ask for.
‘Although they are bound by GDPR guidelines, they’re free to sell the data to anyone once it’s anonymised. Our research shows just how easily – and how accurately – individuals can be traced once this happens.’
‘Companies and governments have downplayed the risk of re-identification by arguing that the datasets they sell are always incomplete.
‘Our findings contradict this and demonstrate that an attacker could easily and accurately estimate the likelihood that the record they found belongs to the person they are looking for.’
The findings were published in the journal Nature Communications.
WHAT ARE SOME OF GOOGLE’S PAST CONTROVERSIES?
March 2019: Google refused to scrap a Saudi government app which lets men track and control women.
The tech giant says that software allowing men to keep tabs on women meets all of its terms and conditions.
October 2018: A software bug in Google+ meant that the personal information of ‘hundreds of thousands’ of users was exposed. The issue reportedly affected users on the site between 2015 and March 2018.
The bug allowed app developers to access information like names, email addresses, occupation, gender and more.
Google announced it would be shutting down the Google+ social network permanently, partly as a result of the bug.
It also announced other security features that meant apps would be required to inform users what data they will have access to. Users have to provide ‘explicit permission’ in order for them to gain access to it.
August 2018: A new investigation led by the Associated Press found that some Google apps automatically store time-stamped location data without asking – even when Location History has been paused.
The investigation found that the following functions were enabled by default:
- The Maps app storing a snapshot of where the user is when it is open
- Automatic weather updates on Android phones pinpointing to where the user is each time the forecast is refreshed
- Simple searchers, such as ‘chocolate chip cookies,’ or ‘kids science kits,’ tagging the user’s precise latitude and longitude – accurate to the square foot – and saving it to the Google account
This information was all logged as part of the ‘Web and App Activity feature, which does not specifically reference location information in its description.
July 2018: The EU fined Google $5 Billion in for shutting-out competitors by forcing major phone manufacturers including South Korea’s Samsung and China’s Huawei to pre-install its search engine and Google Chrome browser by default.
July 2018: The Wall Street Journal revealed that data privacy practices of Gmail means that it was common for third-party developers to read the contents of users’ Gmail messages.