Researchers: Anonymized data does little to protect user privacy

Researchers: Anonymized data does little to protect user privacy 1

Providing third-parties with data is a necessary cost of living in the 21st century. Whether it’s securing auto insurance, undergoing a routine examination at the dentist, or chatting up friends and relatives on Facebook, each of us will hand over about 1.7MB of data per second next year, according to one recent report.

While our anxiety around how this data will be used has grown considerably in recent years, culminating with the launch of a federal probe by the DOJ in recent weeks, it’s done little to stop the flow of information from individuals to companies, or from one company to another. The data trade, in fact, has overtaken oil as the world’s fastest-growing commodity market according to some experts.

And while we grow increasingly anxious about it, there’s little we can do to stop its flow. We’re assuaged at the thought of our data being anonymized, crucial data points stored as individual blips on a massive database — one that’s so large, with so many of these markers, that it’s nearly impossible to trace back to a single human.

Or, that’s what we were told, anyway. But this has never been true. In fact, we’ve known since the mid-1990s, when Dr. Latanya Sweeney, Professor of Government and Technology in Residence at Harvard University, blew that notion to pieces by identifying the medical records of William Weld (then the Governor of Massachusetts) from just three data points in an anonymous database. Dr. Sweeney, who also heads the Data Privacy Lab at the Institute of Quantitative Social Sciences at Harvard, needed only Weld’s zipcode, his date of birth, and gender to correctly identify him among countless others.

Pressed by NGOs and legislators to truly anonymize data before sharing it with others, companies started to rely on a new method called sampling. In a sample database, no individual, or company, would have access to only a small piece of an anonymous database, and not the entire thing.

In theory, it would lower the risk of re-identification of anonymous individuals by splitting the data into several, smaller samples. This makes it unlikely that any one person would be re-identified, because the number of anonymous data points on each person would be split across several databases — and no company or individual would be able to access all of them.

According to the Office of the Australian Information commissioner, sampling “[creates] uncertainty that any particular person is even included in the dataset.” Or, to put it simply, sampling will prevent re-identification of anonymous individuals. But this too is false.

According to a trio of European researchers, individuals in a sample database can be re-identified 83 percent of the time using just three data points: their gender, date of birth, and zip code. They created a handy tool (that doesn’t store collected data) that you can use to find out how likely you are to be re-identified by these three data points. For me that’s 45 percent of the time, much better than average, but still shockingly high.

In an article published in Nature Communications, the team developed a statistical model that could correctly identify 99.98 percent of Americans using 15 characteristics from an anonymized dataset, including age, gender, and marital status.

The 15 needed characteristics may seem unrealistic for a single company to have obtained, or for an individual to have provided. It’s not. Facebook, Google, and Amazon alone have hundreds, perhaps thousands of data points to pull from, data that you’ve given up based on your search history, the ads you click, and the purchases you make. At this point, the companies don’t even need you to give them this data, as they can make an educated guess that’s reasonably accurate based on your behavior while using certain websites or applications.

And what they aren’t tracking, they’re buying. Data brokers are big business, and exist solely to provide competitive insights into everything from your household income to who you voted for in the last election.

According to the researchers:

Contrary to popular belief, sampling a dataset does not provide plausible deniability and does not effectively [protect] people’s privacy.

We believe that, in general, it is time to move away from de-identification and tighten the rules for constitute truly anonymized data. Making sure data can be used statistically, e.g., for medical research is extremely important but cannot happen at the expense of people’s privacy. Datasets such as the NIGMS and NIH genetic data, the Washington State Health Data, the NYC Taxicab dataset, the Transport For London bike sharing dataset, and the Australian de-identified Medicare Benefits Schedule (MBS) and Pharmaceutical Benefits Schedule (PBS) datasets have been show to be easily re-identifiable.

Anonymized data is better than the alternative, but it’s clear that we have some work to do in increasing our understanding of what’s collected and how it may be used against us.

Researchers: Anonymized data does little to protect user privacy 2
About the author

E-Crypto News was developed to assist all cryptocurrency investors in developing profitable cryptocurrency portfolios through the provision of timely and much-needed information. Investments in cryptocurrency require a level of detail, sensitivity, and accuracy that isn’t required in any other market and as such, we’ve developed our databases to help fill in information gaps.

Related Posts

E-Crypto News Executive Interviews



bitcoin
Bitcoin (BTC) $ 44,864.00
ethereum
Ethereum (ETH) $ 3,159.01
cardano
Cardano (ADA) $ 2.27
tether
Tether (USDT) $ 1.00
binance-coin
Binance Coin (BNB) $ 383.25
xrp
XRP (XRP) $ 1.00
solana
Solana (SOL) $ 149.61
polkadot
Polkadot (DOT) $ 32.73
usd-coin
USD Coin (USDC) $ 1.00
dogecoin
Dogecoin (DOGE) $ 0.223986
USD
EUR
GBP
bitcoinBitcoin (BTC)
$ 44,864.00
ethereumEthereum (ETH)
$ 3,159.01
tetherTether (USDT)
$ 1.00
bitcoin-cashBitcoin Cash (BCH)
$ 552.27
litecoinLitecoin (LTC)
$ 164.16
bitcoinBitcoin (BTC)
38.255,40
ethereumEthereum (ETH)
2.693,68
tetherTether (USDT)
0,852697
bitcoin-cashBitcoin Cash (BCH)
470,92
litecoinLitecoin (LTC)
139,98
bitcoinBitcoin (BTC)
32,890.92
ethereumEthereum (ETH)
2,315.95
tetherTether (USDT)
0.733125
bitcoin-cashBitcoin Cash (BCH)
404.88
litecoinLitecoin (LTC)
120.35

Automated trading with HaasBot Crypto Trading Bots

Crypto Scams

Crypto Scams
Crypto Scams Still Persistent In 2021, SEC Warns About Red Flags To Watch
September 9, 2021
Poly Network
Here’s How Hackers Stole Over $600 million in the Poly Network Attack
August 12, 2021
The World’s Most Infamous Crypto Hacks and Scams
July 31, 2021
Cryptocurrency Exchanges
Cryptocurrency Exchanges and the Plague of Scams and Bans
June 29, 2021
What Role Do Cryptocurrencies Play In The Era Of Ransomware Attacks?
June 9, 2021

Blockchain/Cryptocurrency Questions and Answers

Beginner’s Guide to Investing in Cryptocurrency
August 9, 2021
Short-Sell Cryptocurrency
How to Short-Sell Cryptocurrency: A Brief Overview
July 17, 2021
Klaytn
What Is Klaytn (KLAY) And How Does It Work?
July 16, 2021
Cryptocurrencies
Our Crypto Roundup Interview Asks- Do Cryptocurrencies Have a Future?
July 15, 2021
Solana
What Is Solana (SOL) And How Does It Work?
June 26, 2021


CryptoCurrencyUSDChange 1hChange 24hChange 7d
Bitcoin44,889 0.06 % 3.40 % 6.84 %
Ethereum3,155.0 0.14 % 4.33 % 12.26 %
Cardano2.270 0.11 % 2.43 % 9.00 %
Tether0.9986 0.03 % 0.08 % 0.23 %
Binance Coin383.52 0.09 % 1.92 % 10.80 %
XRP1.000 0.06 % 1.53 % 10.55 %
Solana149.91 0.63 % 2.42 % 5.48 %
Polkadot30.87 2.19 % 17.29 % 10.73 %
USD Coin1.000 0.01 % 0.22 % 0.23 %
Dogecoin0.2238 0.12 % 0.45 % 9.50 %

bitcoin
Bitcoin (BTC) $ 44,864.00
ethereum
Ethereum (ETH) $ 3,159.01
cardano
Cardano (ADA) $ 2.27
tether
Tether (USDT) $ 1.00
binance-coin
Binance Coin (BNB) $ 383.25
xrp
XRP (XRP) $ 1.00
solana
Solana (SOL) $ 149.61
polkadot
Polkadot (DOT) $ 32.73
usd-coin
USD Coin (USDC) $ 1.00
dogecoin
Dogecoin (DOGE) $ 0.223986