Three Paths to Value in Data Science
Organizations are using many strategies to gain bottom-line value from big data and data science. Three major trends stood out in TDWI's recent research.
- By Fern Halper
- February 2, 2017
We recently published TDWI Best Practices Report: Data Science and Big Data: Enterprise Paths to Success. One of the premises of the research is that there are numerous paths to success; organizations are utilizing different strategies for platforms, tools, and organizational models to get big data projects off the ground and keep them advancing.
On the technology front, these include a mix of on-premises and cloud technologies, open source and commercial software options, data warehouses along with Hadoop and Spark, and the use of more advanced analytics. It is truly an evolving ecosystem. On the organizational front, companies are using multiple strategies to obtain the skill sets they need, looking to business analysts to supplement data science skills, and experimenting with different organizational and leadership models.
While writing and researching the report, there were several trends that stood out for me.
Path to Value #1: Open Source
The open source model is a collaborative development model where code is freely available. Many believe that it fosters a community of innovation at a low cost. Open source has become quite popular, especially for big data and data science.
In the survey for our report, two of the top three technologies that respondents thought would be important in 2017 for big data and data science were Hadoop and R -- both open source technologies. Spark came in at number four on the list.
What was even more striking was that close to 50 percent of respondents felt that open source technologies could be deployed in production. Less than 20 percent felt that open source was good for experimentation but not production. That said, 60 percent of respondents preferred open source with added innovations that make it reliable and scalable (i.e., commercialized open source).
Of course, that doesn't mean that commercial software is going away -- a majority of respondents said that one way they would deliver on their big data analytics projects was through commercialized software. However, it does speak to the popularity of open source and the fact that it is being used by many organizations today as a path to value.
Path to Value #2: Centers of Excellence
About one-third of respondents to the survey had a center of excellence (CoE) -- sometimes called a competency center -- to provide leadership in big data analytics and data science. Another 25 percent were planning to put one in place in the coming year.
Some CoEs are companywide and may have teams within the centers that serve different business areas. Other organizations distribute the analytics expertise in the lines of business throughout the organization -- sometimes reporting back to the CoE, but more often reporting into the line of business. The best organizational structure is still being debated.
However, what was notable in the research was the correlation between deploying a center of excellence and measuring top- and bottom-line impact from data science efforts. Although correlation does not mean causation, there is some evidence to suggest that having an organizational function that provides expertise in data science and analytics helps to drive measurable value. Of course, the culture of each organization determines how employees accept a CoE and how the CoE builds trust.
Path to Value #3: Analyzing Disparate Data Types
I've been a proponent of using disparate "new" kinds of data for analysis for quite some time. These include internal and external text data as well as geospatial data and streaming data.
Typically the vast majority of respondents are using structured data from their data stores for big data analysis, and this was the case in this report. Of course analyzing structured data is powerful, but there is also significant value in analyzing unstructured data -- for example, text data using NLP and text mining software.
In fact, in this study we saw that enterprises that measured top- or bottom-line impact were more likely to be collecting/analyzing disparate data types than those that did not. I have seen this in other research as well. It speaks to the fact that organizations that become more mature tend to incorporate more advanced analytics across different data types, running on both old and new technologies. It is something that organizations starting out on their big data and data science journeys need to consider and plan for as a path to value.
Learn More at the Leadership Summit
You can learn more about data science at TDWI's Accelerate conference April 3-5, 2017 in Boston. Details are available at https://tdwi.org/events/accelerate/boston.aspx. Among the sessions are classes on core data science skills.
About the Author
Fern Halper, Ph.D., is well known in the analytics community, having published hundreds of articles, research reports, speeches, webinars, and more on data mining and information technology over the past 20 years. Halper is also co-author of several “Dummies” books on cloud computing, hybrid cloud, and big data. She is VP and senior research director, advanced analytics at TDWI Research, focusing on predictive analytics, social media analysis, text analytics, cloud computing, and “big data” analytics approaches. She has been a partner at industry analyst firm Hurwitz & Associates and a lead analyst for Bell Labs. Her Ph.D. is from Texas A&M University. You can reach her at [email protected], on Twitter @fhalper, and on LinkedIn at linkedin.com/in/fbhalper.