Forecasting a Key Role for the Data Warehouse in Advanced Analytics
The data warehouse still has a big part to play in the way companies develop, deliver, and consume analytics. Data scientist James Bird and his team built a forecasting application based on a mix of open source software and an existing data warehouse.
- By Steve Swoyer
- October 7, 2016
Data scientist James Bird knows that the data warehouse has a big part to play in the way his company develops, delivers, and consumes analytics. Bird leads the advanced analytics group with NXP Semiconductors N.V., one of the largest suppliers of non-memory semiconductors in the world.
Recently, he and his team built a new forecasting application that accelerates the rate at which NXP is able to make product-demand forecasts available to its lines of business. NXP's existing Teradata data warehouse provides most of the source data that powers the application. Bird spoke about this experience at Teradata's 2016 Partners Conference, held in September in Atlanta.
Moving to Advanced Predictive Modeling
Like many companies, NXP is shifting its decision support infrastructure from conventional business intelligence (BI) reporting to advanced analytics. The demand-forecasting app Bird helped build is part of this shift.
The idea is to transform the company's information culture from reactive to predictive: "We want to help move the needle on analytics in NXP, moving ... from BI reporting into more advanced analytics with predictive and prescriptive modeling," he says.
The app Bird and his team developed is a case in point. It allows the lines of business to have demand forecasts much more quickly. Using the old process, NXP's analysts had to wait weeks for even the slightest insight into monthly demand.
"This was a five-person team from each business unit in the company. It took them two weeks just to build the models and build the graphs. The data would be manually pulled from the warehouse; they would manually ETL the data; it was all Excel-based, not very efficient at all," he told attendees, noting that analysts used colored tabs in Excel to indicate changes or priorities. "It was just very labor intensive ... and not very scientific."
How NXP's New Demand-Forecasting Application Works
First, it automates the extraction, loading, and transformation of data from NXP's Teradata data warehouse. Under the covers, NXP is running R -- the ubiquitous programming language for statistical analysis -- in the context of the Teradata database engine.
Bird and his team also used "Shiny," an event-driven Web application framework for R, to build a Web browser-based user interface for the app. Shiny facilitates bidirectional communication between client Web browsers and R. "In the end, we got a very slick-looking interactive app that ... doesn't take a lot of extra effort beyond your original R code."
The app provides at-a-glance projections of near-term and long-term demand.
Benefits of the New Application
"The forecasting application provides a high-level comparison of the most recently locked-down model and the latest daily forecasts over an 18-month look-ahead window," Bird said. "This gives the demand forecast team the ability to quickly see where deviations exist [such that] they can quickly drill into those deviations to understand ... if they need to make any adjustments."
Business units can specify custom seasonality settings; individual analysts can likewise customize forecasts to suit their own needs -- e.g., excluding certain product lines or product families within a product line. "There's seasonality settings defined per business group along with other user-based choices, and after those [filters] are applied, the application builds the models ... and displays graphs interactively," he explained, noting that analysts can also drill down into the details of the forecast model.
The difference is nothing short of transformative, Bird says. "The [forecast] dashboard improved cycle time. It used to take five people two weeks just to create what's available to [analysts] instantaneously now." NXP was able to "move forecast meetings with business groups ahead by one week in the planning cycle ... which really improved the forecast ability."
Collaboration and Open Source Made Improvement Possible
Most important, Bird stressed, is that what he and his team accomplished isn't a product of data science wizardry. They used freely available open source software in tandem with their existing Teradata data warehouse.
"Data science and machine learning algorithms are readily available today. Python and R are the two most common languages, and ... there are over 9,000 publicly available packages today in the R environment. Whatever solution, whatever problem you're trying to solve, someone has probably already solved it and probably already released a package."
Bird had some final advice for attendees. Statistical analysis and machine learning are powerful technologies -- but communication and collaboration are even more powerful.
"You really need to understand the business question you're trying to answer and you really need to work with your business partners to make sure you're answering the right question. Sometimes that isn't obvious. It may take a while to figure out [what that question is] and deliver what they want."
Stephen Swoyer is a technology writer with 20 years of experience. His writing has focused on business intelligence, data warehousing, and analytics for almost 15 years. Swoyer has an abiding interest in tech, but he’s particularly intrigued by the thorny people and process problems technology vendors never, ever want to talk about. You can contact him at [email protected].