Understanding the Differences Between Data Science and BI
Knowing the differences between the two technologies is more than just a matter of semantics.
- By Stan Pugsley
- December 5, 2017
Defining the terms data science and business intelligence -- and the relationship between them -- has long been the subject of heated debate. Although these terms are related, failing to grasp the separate and distinct concepts behind them can have significant consequences.
For example, hundreds of thousands of data science (DS) and business intelligence (BI) jobs will open up in the next few years. The pool of candidates can seem impossibly small or surprisingly large depending on how relevant and useful you judge the required skills of each position to be. Can a BI expert successfully transition to a DS role? Is DS important to BI positions?
In 2018, business executives are going to evaluate billions of dollars of new projects. What gets the green light and what gets shelved will be affected by how executives and their teams understand and define the two terms. Project champions are quick to attach industry buzzwords to their projects to ride the latest trend, but are such projects just a rebranding of older ideas?
Figure 1 offers one perspective on the BI and DS landscape:
Figure 1: The BI and data science landscape.
The Foundations Haven't Changed
Data management and data visualization are still at the core of any effort to understand and plan a business. These involve technologies and processes to capture, clean, standardize, integrate, visualize, and secure data in a high-performing way. Excel is not enough. A pretty dashboard is not enough. You must commit long term to preserve data as an asset, and you need discipline to build and maintain data lake and data warehouse environments.
The crucial point is that any DS or BI initiative that does not have a solid data foundation will be unsustainable. Any processes that are built on manual, inconsistent processes will be slow, untrustworthy, and resource intensive. Eventually they need to mature with professional IT assistance, or they will fall apart under their own weight.
Since the 1980s, nearly every company has tried to use computers and databases to manage and understand their historical data. However, here is the thing -- after almost 40 years, nobody has truly mastered it. Every year, companies add or replace software systems, and the IT department can rarely keep up, so enterprises end up constantly prioritizing projects to see which data gets attention and which data gets ignored (sorry, marketing department).
You will recognize business intelligence by its charts, dashboards, database diagrams, and data integration projects. It is expensive and frustrating -- but indispensable.
BI has a permanent advantage over DS because it has concrete data points; few, simple assumptions; self-explanatory metrics; and automated processes. Furthermore, BI will never go away. It will always be a work in progress because you will never stop changing your business or upgrading and replacing the source systems.
Looking in the rearview mirror of data is important and helpful, but it's limited and will never get you where you want to go. At some point you need to look ahead. BI needs to be accompanied by data science.
DS is a complicated, sophisticated form of planning and optimization. Examples include:
- Predicting in real time which product a customer is most likely to buy
- Forming a weighted network between business micro events and micro responses so that decisions can be made without human intervention, then updating that network with every outcome so that it learns as it acts
- Forecasting at the SKU level, by day, with every sale
- Identifying and predicting rare events, such as credit card fraud, and sending automatic notifications to customers and/or staff
- Creating clusters of customers based on dozens of attributes and behavior, then targeting them with custom messaging
Where traditional planning is done in discrete, human-directed sessions, DS techniques should result in planning and optimization steps that are embedded into software and run as part of automated processes. The model is trained using historical data, setting aside a subset of data to validate the accuracy of prediction. If the results are promising, then the model is deployed and monitored, often using BI reports.
Finding business sponsorship for DS projects is a challenge because the techniques are difficult to explain and visualize. The projects are difficult to manage and often involve unstructured or semistructured data, complex assumptions, statistical models, exploratory projects that come to dead ends, and limited or confusing visualizations.
As a result, pilot projects can be slow to start and frustrating to sustain. The results you achieve may be sporadic because of the unpredictability of your work. Predicting the future is never going to be simple.
Although BI projects are often completed by a single person, DS projects require extensive cooperation between employees who don't usually speak the same language, including data engineers, statisticians, business experts, and software developers. The competencies of each position require many years to master. Data scientists often have deep expertise in statistics but only elementary software development skills and limited business expertise. DS teams need to partner with IT and business departments to create truly integrated solutions.
Making it All Work
The bottom line is that the difference between the terms is a matter of whether you need to look back (BI) or look forward (data science). BI collects data to understand events in the past. DS generates data to model events that have not yet occurred.
Knowing the difference between the practices is vital to approving or rejecting a proposed project, hiring staff with skills needed for a BI or a DS project, and architecting a data management platform that can support both. We should avoid looking at them as competing initiatives or as fads that will pass. Both data science and business intelligence are here for the long term and will be major differentiators for business that harness their potential.
Stan Pugsley is an independent data warehouse and analytics consultant based in Salt Lake City, UT. He is also an Assistant Professor of Information Systems at the University of Utah Eccles School of Business. You can reach the author via email.