Big Data Privacy Issues That Need To Be Addressed in Data Science

February 26, 2020

We live in an era of “big data”. The technology that lies at our fingertips allows us to approach problem solving in new and innovative ways. However, as great as it may be in widening the scope of what we can analyze, achieve, and do, it’s a double-edged sword. Human rights and privacy are potentially at stake.

What security risks does big data raise?

This article will give an overview of key topics in data science and how they affect privacy protection and potential solutions.

Artificial Intelligence

Computers can process data much faster than humans and Artificial Intelligence (AI) is currently used for automation that allows us to save on human resources.

AI expands on the concept of big data. Practical examples include Google’s spam filters, optimization of the commute times in ride-hailing services like Uber and Lyft, and Facebook’s facial recognition. But there are some critical ethical questions the data scientists must answer. When can they use the harvested data without violating anyone’s privacy? For instance, is it okay that Facebook tags you in your friends’ photos or is it creepy and invasive?

De-identifying and generalizing the collected data could be a solution.

Algorithms and profiling

Profiling has always been a source of ethical controversy. For example, the process of passenger screening at airports is based on an airport agents’ opinions (which may vary) rather than a completely objective thought process. Airport security personnel may make assumptions based on a passenger’s race or clothing.

Profiling also happens outside of airports. Some companies use profiling to create buyer personas and determine the most valuable customers, and other employers use profiling techniques when deciding employment and “culture fit”. In some cases, predictive algorithms analyze resume data,but some companies take data collection and processing a step further and scrape public profiles on social media for information. This information is public, but who really owns it?

Transparency and clear usage policy could help combat the fear of profiling misuse. These policies should clearly outline what the institution, company, or organization can and cannot do with raw data as well as processed data that contains unique customer or employee insights.

Controlling IoT data

As, it raises significant big data privacy concerns are cropping us as more of our daily lives become connected to the internet. The biggest question is, who controls all of this data?

Let’s take a look at current products and business practices at Google. Users of Google Home can issue voice commands, turn on the lights in their home, and program the TV to turn on and off automatically. Although users can set privacy settings, Google admits that they share some of the information gathered from you home devices with third-party services. It may enhance the product quality, but at what cost?

A potential solution could be to standardize data encryption across IoT devices before they’re released to the public. According to an article on WIRED, IoT devices are built quickly and with poor security features so big data privacy issues are often overlooked. These devices collect sensitive data about the number of devices you use, where you live, where you work, and what time you leave your home. When IoT products lack sufficient security, users are vulnerable to third-party data gathering or hacking.

Data privacy in the online world

Users can go the extra mile to shield their privacy by firing up a VPN, but the fact remains; their data is being harvested on a larger scale.

In search of a solution, The EU created the General Data Protection Regulation (GDPR). The GDPR provides a set of legal guidelines on what a company can and cannot do with personal data. For instance, every company which operates in the European Union and the European Economic Area must offer a way to opt-out of data storage at any time and the process must be outlined in plain language. The companies must also ask for permission to store personal data. For example, web analytics relies on cookie-based tracking and under the GDPR, the website visitor must be alerted.

The bottom line

Addressing the potential big data privacy issues and being transparent about data collection and analysis is not an option anymore; it’s a necessity.