Our lack of interest in data ethics will come back to haunt us
When was the last time you saw a creepy ad on Facebook, which seemed to know about a product you were discussing with a coworker? Or when was the last time you noticed that your Google search had been modified to suit variables like your current location and personal interests?
These micro-events happen on a daily basis for most of us, and are reminders of how valuable and how ubiquitous our data really is. Data is the currency of the new world, and with 2.5 quintillion bytes of data created each day, that status isn’t going away anytime soon.
The problem is, data scientists and analysts are constantly talking about the potential for how to use this data, but too few are talking about the ethics. If we’re going to keep pushing for better systems built on big data, we need to democratize and popularize the ethical conversation surrounding them.
The main concerns
Most people understand the privacy concerns that can arise with collecting and harnessing big data, but the ethical concerns run far deeper than that.
These are just a smattering of the ethical problems in big data:
- Ownership: Who really “owns” your personal data, like what your political preferences are, or which types of products you’ve bought in the past? Is it you? Or is it public information? What about people under the age of 18? What about information you’ve tried to privatize?
- Bias: Biases in algorithms can have potentially destructive effects. Everything from facial recognition to chatbots can be skewed to favor one demographic over another, or one set of values over another, based on the data used to power it.
- Transparency: Are companies required to disclose how they collect and use data? Or are they free to hide some of their efforts? More importantly, who gets to decide the answer here?
- Consent: What does it take to “consent” to having your data harvested? Is your passive use of a platform enough? What about agreeing to multi-page, complexly worded Terms and Conditions documents?
If you haven’t heard about these or haven’t thought much about them, I can’t really blame you. We aren’t bringing these questions to the forefront of the public discussion on big data, nor are big data companies going out of their way to discuss them before issuing new solutions.
“Oops, our bad”
One of the biggest problems we keep running into is what I call the “oops, our bad” effect. The idea here is that big companies and data scientists use and abuse data however they want, outside the public eye and without having an ethical discussion about their practices. If and when the public finds out that some egregious activity took place, there’s usually a short-lived public outcry, and the company issues an apology for the actions — without really changing their practices or making up for the damage.
Facebook’s recent Cambridge Analytica data breach scandal is a perfect example here; for many years, users’ personal information was made available to third-party apps indiscriminately, and Facebook didn’t take much effort to protect its users or publicly disclose these vulnerabilities. Instead, it merely reacted when the public started getting angry, saying the equivalent of “oops, our bad.”
Google has undergone something similar, and on many occasions. For example, it was once revealed that Google was tracking Android users’ locations even when “location services” were disabled, presumably a manual action to prevent this kind of thing from occurring. Again, Google only acknowledged this after being called out.
I’m also especially concerned about data ethics because of how important it’s going to be in the future. The global internet population’s growth rate is accelerating, with a population of 2.5 billion users in 2012 and 3.7 billion users in 2017. On top of that, the average user’s creation of data is increasing every year, with more content, images, videos, and interactions being available for companies to mine. And of course, companies’ data mining and data-exploiting capabilities are improving every year as well.
This problem isn’t going away; it’s just getting bigger and more unwieldy with each passing year. The longer we wait to have a serious conversation, the harder it will be to implement an acceptable solution. And the more we grow accustomed to unethical data practices, the less we’ll be able to determine what should count as “normal” in our society.
The EU’s valiant efforts
I’ll admit we have made some progress, and the conversation around data ethics isn’t silenced in all areas of the world. In fact, the EU has made a concentrated effort in past years to try and regulate how companies can collect and use data. The “right to be forgotten” concept has allowed users to erase irrelevant and/or damaging mentions of themselves from Google search results (and similar web locations), and politicians are continuing to create (and enforce) stricter data protection rules for consumers.
The EU doesn’t have authority in all corners of the world, however, and in many ways, it fails to address the central dilemmas here. The way I see it, it’s like building guardrails on a busy highway, rather than educating drivers on how to drive more safely; we’re putting safeguards in place to keep things from getting out of hand, but it isn’t addressing the root of the problem; the incentive for companies to mine and use personal data because it can be monetized.
Outreach and attention
We can’t solve these ethical dilemmas by issuing judgments or making a few laws. After all, ethical discussions rarely result in a simple understanding of what’s “right” and what’s “wrong.” Instead, we should be concentrating our efforts on raising awareness of these ethical dilemmas, and facilitating more open, progressive conversations.
We need to democratize the conversation by encouraging consumers to demand greater ownership, control, and/or transparency over their own data. We need to hold companies accountable for their practices before they get out of hand. And we need the data scientists, entrepreneurs, and marketers of the world to think seriously about the consequences of their data-related efforts — and avoid sacrificing ethical considerations in the name of profits.