TDWI | Training & Research | Business Intelligence, Analytics, Big Data, Data Warehousing

TDWI Articles

00 Days

00 Hrs

00 Min

00 Sec

Three Attack Vectors That Target Your Data

As you build your data science program, don't ignore that there are people working to disrupt your plans. Here are three attack vectors they can use and how to protect your data.

By Troy Hiltbrand
February 7, 2020

As companies implement data science programs, their target is often to use technology and advanced analytics to create a competitive advantage. At the same time, there are others in the market who want to prevent their success. This list of adversaries can range from industry competitors working in the same market segment to bad actors and enemy states whose goals are to disrupt the economy and society in general. No matter their intentions, these opponents desire to stop a company's progress.

For Further Reading:

9 Ways to Make your Company Network Secure

The Rise of the Data Security Scientist

Why Encryption Holds the Secret to Data Security

There are three major attack vectors used to disrupt a company's forward progress with its data science program -- both external and internal data poisoning as well as intellectual property theft.

External Data Poisoning

External data poisoning is the safest to execute because it can be executed from outside the boundaries of an organization.

In 2016, Microsoft deployed a state-of-the-art AI Twitter chatbot named Tay as an experiment in conversational understanding. The idea was to have the chatbot learn through interactions with the public and evolve over time to become smarter and more capable through playful communication. This experiment went horribly wrong as the Twitter community fed Tay with garbage and the result was a chatbot spewing misogynistic and racist banter. Microsoft quickly deactivated Tay and cleaned up the account.

This is a prime example of the concept of garbage in, garbage out. The model learned from a set of input data that was culturally corrupt and generated results that reflected this. A learning model is only as good as the data it is fed. This opens up an attack vector for your learning models. If a a bad actor has an open interface to push data into your model, then it is susceptible to external data poisoning.

Models that use real-time data to evolve are most susceptible to this method of attack, and the inputs must be monitored and cleansed to ensure they do not have a negative impact on the model. Models that use a predefined training set of data at the time of construction are also susceptible, and care should be taken to cleanse the training data before starting modeling exercises.

Models most susceptible to this form of attack are those that are very sensitive to outliers in the data. These at-risk models include linear classifiers -- such as logistic regression and naive Bayes classifiers -- support vector machines, decision trees, boosted trees, random forest, neural networks, and nearest neighbor. Due to the nature of how these models are trained, the introduction of outliers can significantly skew their results.

Solution:

-- Implement a data quality program to ensure data inputs are valid, specifically focusing on outliers

-- Continuously scan and monitor your input training data

Internal Data Poisoning

The second attack vector is internal data poisoning. This entails having a bad actor breach your perimeter security, directly access your corporate data, and alter the data to the point that it has a detrimental impact on your data science models.

When people start speaking of internal data and system manipulation, your mind might go to the 1995 classic Angelina Jolie movie Hackers where the movie takes the audience flying through its representation of an internal network as the hackers strive to interrupt a company's computer operations. In reality, external network penetration is much less sexy than portrayed in the movie, but it can have significant negative consequences to business operations.

Network security teams across the world can attest to the fact that their networks are continually being scanned for gaps in security. When gaps are identified, bad actors exploit them to access resources that are intended to be protected -- including your historical data sets. If hackers can poison this data by alterating it at the source, the derived models will produce weaker results or even become completely misleading and steer your company in the wrong direction.

For Further Reading:

9 Ways to Make your Company Network Secure

The Rise of the Data Security Scientist

Why Encryption Holds the Secret to Data Security

To protect against this, network teams need to ensure that up-to-date systems are installed to secure the perimeter and they are patched to eliminate recently identified security holes and zero-day exploits.

Direct attacks to permeate through the perimeter of the network are not the only option bad actors have at their disposal. We see an increase in phishing and even targeted spear-phishing, where misleading email messages result in an employee installing malicious software on their computer. If the employee has the appropriate level of data access, the software can silently corrupt that data without the user even knowing it.

Malicious software can also prevent access to the data altogether. We have seen a rise in the use of ransomware, which encrypts data and demands payment in exchange for the key to decrypt the data. If this happens on your corporate network, you will be locked out of your data, the fuel for your data science program. To protect yourself , it is important your desktop and mobile devices have current and effective antivirus software. It is also important to train your employees to recognize and respond appropriately to phishing attempts.

Solution:

-- Implement modern perimeter security and desktop management solutions and keep their software patched and up to date

-- Train your employees to recognize and respond to phishing attacks

Model Theft

The third and final attack vector is the theft of your models. Unlike the theft of physical assets, the theft of models does not leave your company without a model to work with. It does, however, create a copy that can be studied and replicated by a competitor, eliminating the competitive advantage your company is striving for with its data science program. To protect against this, companies need to apply robust access controls to their models. This includes central storage and management of these models with user- and role-based protection to prevent unauthorized individuals from accessing them.

As is the case with any intellectual property, one of the most effective methods of stealing models is for a competitor to hire away the staff who built them. If they can hire your knowledge base away, they may be able to replicate that intellectual property and rob you of its future evolution and benefits.

The best way to protect your enterprise is through effective human resource processes. This includes having effective intellectual property agreements with your employees that prevent them from disclosing company secrets if they are lured away. Another method to prevent this type of intellectual migration is to ensure your employees have a high level of job satisfaction. This can include strategically planning the compensation, benefits, and culture you offer. Strong company culture and employee loyalty can be one of the best protections against the attack vector of intellectual property theft.

Solution:

-- Implement central model storage and role-based access controls to protect the models

-- Build in contractual intellectual property controls to employment contracts

-- Implement human resource programs to promote active and healthy employee engagement to minimize employee attrition to other companies

A Final Word

Although your data science program is heavily focused on identifying ways to use data to generate a competitive advantage, you need to be aware that others are looking for ways to disrupt your success. Whether your enemies are competitors or external bad actors, you can take steps to protect yourself and ensure your work has a lasting impact. If you focus on these attack vectors and work jointly with other departments, you can reduce the probability of an attack being successful and minimize the impact when an attack happens.

About the Author

Troy Hiltbrand is the senior vice president of digital product management and analytics at Partner.co where he is responsible for its enterprise analytics and digital product strategy. You can reach the author via email.

TDWI Membership

Accelerate Your Projects,
and Your Career

TDWI Members have access to exclusive research reports, publications, communities and training.

Individual, Student, and Team memberships available.

TDWI | Training & Research | Business Intelligence, Analytics, Big Data, Data Warehousing

Research & Resources

Webinars

Virtual Summits

TDWI Articles

Three Attack Vectors That Target Your Data

Related Articles

Trending Articles

From Reactive to Proactive: Automating Data Quality in Petabyte-Scale Analytics Pipelines

From Pilot to Production: Why LLM Features Stall, and a Readiness Checklist for Data Leaders

The Inferencing Cost Problem No One Is Talking About: Unstructured Data Quality

The Hidden Cost of Poor Training Data in Generative AI

TDWI Membership

Accelerate Your Projects,
and Your Career

TDWI

Engage

Research

Research & Resources

Webinars

Virtual Summits

TDWI Articles

Three Attack Vectors That Target Your Data

Related Articles

Trending Articles

From Reactive to Proactive: Automating Data Quality in Petabyte-Scale Analytics Pipelines

From Pilot to Production: Why LLM Features Stall, and a Readiness Checklist for Data Leaders

The Inferencing Cost Problem No One Is Talking About: Unstructured Data Quality

The Hidden Cost of Poor Training Data in Generative AI

TDWI Membership

Accelerate Your Projects, and Your Career

TDWI

Engage

Research

Accelerate Your Projects,
and Your Career