Q&A: Streaming Data and Analytics Deliver Real-Time Insights

Real-time streaming data and visual data discovery are changing the world of BI.

Two big changes are taking place in the world of BI and analytics, says Datawatch VP of product marketing Dan Potter. First is the move to visual data discovery rather than more traditional BI approaches; second is the advent of real-time streaming analytics.

Both issues are about getting usable data to business users faster, Potter explains. In fact, according to analyst firm Gartner, data discovery will soon be the dominant approach to BI; in the next three years, over 50 percent of analytics implementations will make use of streaming data. In this interview, Potter discusses how visual data discovery and streaming analytics represent the future of BI.

Potter leads Datawatch's product marketing and go-to-market strategy. He has held senior positions at IBM, Oracle, Progress Software, and Attunity, and has worked on solutions across a variety of emerging markets including cloud computing, visual data discovery, real-time data streaming, federated data, and e-commerce.

BI This Week: Can you recap what's happening with real-time analytics?

Dan Potter: There are two big changes happening in the world of BI and analytics. The first one doesn't directly have to do with real-time and streaming; it's the move to a visual data discovery paradigm versus a traditional BI approach.

Here's what's driving the move to put the tools in the hands of the business user and to enable them to create and share their own visualization and analytics: it's all about time. Business users have been clamoring for better information faster. In traditional BI, it was a bottom-up approach. Data was gathered, aggregated, cleansed, and stored in a warehouse. IT would build models, then dashboards and reports and scorecards that were disseminated to business users.

That model hasn't worked for the majority of people, hence the rise of visual data discovery vendors, which has really taken the BI world by storm.

Again, the key driver here is speed -- how do we get the right information to users faster?

The other sea change is also about speed. It's the use of event data streams to move data faster between applications, between devices, between sensors -- basically, to move information from customer feedback directly into analytics. A whole new class of technologies is emerging to support that concept of real-time streaming data in the organization.

What started as traditional messaging middleware has evolved to layers of complex event processing and event stream engines, and to a new class of streaming infrastructure from vendors like IBM with its InfoSphere Streams, with Amazon Kinesis, [and] with Informatica Vibe. There's a whole new class of infrastructure to support that rapid data movement throughout the organization.

Where do these two major shifts that you've talked about come together in the enterprise?

Where they come together is when you can apply that visual data discovery technology directly to data in motion. Again, it's about speed -- empowering the business user to get the right information faster, to be able to visually uncover new insights, or new opportunities or threats, and to be able to take action faster.

How does in-memory computing fit into all this?

Again, it's about speed. In-memory computing is part of the move on the back end to better support faster data movement and faster insights. If you think about the evolution of databases and data warehousing, it started with a data warehouse that might have been updated on a daily or weekly basis. Then we started to move to trickle feeds and more operational data warehousing -- pushing, again, to make the data available faster, and to allow analytics to happen faster.

That, in turn, evolved into the distributed or hybrid database -- moving more and more of that data into memory so aggregations and calculations can be done on the fly. All of the major database and infrastructure vendors have some sort of in-memory strategy -- either moving data partially into memory or moving all of it into memory. Again, it's all about speed. Having data in memory definitely helps the speed at which a query can be processed, but it's that real-time streaming infrastructure that is updating those databases faster.

Streaming technology and event-driven technology tends to get used to push information faster into a database, whether that data is in-memory or not. The idea here is using both technologies. If I can visualize data as it's moving, as well as go back and analyze data as it gets moved into the database, that's really the best of both worlds. I have both continual monitoring and I can do analysis later.

What other things are being done on the data side to speed analytics?

One interesting example is SAP HANA. It brings together online transaction processing with online analytic processing. In the past, you'd have to take a transactional system used for transaction processing and either move data into an OLAP warehouse or use relational OLAP and provide materialized views or do some other aggregation to make the data available for analytics. There's a time delay in either of those.

What HANA is doing is acting as a hybrid. It enables and optimizes the use of a database for both writing transactions as well as querying that data. That really helps to remove latency and make data available for analytics. HANA takes it one step further by moving some of the data directly into memory to speed things up as well. It's a great example of that evolution in practice.

How can companies prepare for streaming analytics? What's the first step?

For streaming analytics, you need a streaming infrastructure. As I said, every major platform vendor from IBM to Oracle to SAP has a streaming infrastructure as part of its core platform. There's the data movement piece, the event generation piece, and complex event processing (CEP), which provides some aggregation and filtering and the ability to push that query down to where the data is being moved.

Streaming analytics is a part of all standard platforms today, including the Apache platform with Storm and Spark. Also, there are some interesting emerging technologies in this space, including Amazon Kinesis. Even what we've always thought of as a more traditional ETL vendor such as Informatica has a very solid strategy around data streaming with Ultra Messaging and Vibe.

Can you name some industries that are using streaming analytics already?

We see a lot of telecommunications. We have customers who are using streaming analytics to monitor network traffic. They need to continuously analyze network bandwidth and quality of service. We see it in telecom contact centers that want to monitor and manage the customer experience, including all the different aspects of a customer calling a contact center -- how long they are on hold, the abandon rate, how long it takes someone to reach an agent, how long the agent spends processing the call, staff and infrastructure utilization, interactive voice response systems, and so forth.

We also see streaming analytics in the energy sector. It's all part of "the Internet of things" and providing sensors and instrumentation on a wide variety of devices. There are many things to monitor in the oil and gas industry, including pipelines, wells, and drilling platforms.

Here's an interesting IBM InfoSphere Streams example. One of our customers is using streaming technology to better understand in real time the effect of icebergs. They need to know what to do in terms of oil platform movement and preparation in the event that an iceberg might come across their path. They monitor weather, the ocean currents, where the platforms are, and so forth so that they can move the platform if needed.

What's the importance of visual data discovery to this discussion?

Traditionally, people think about a dashboard, which has pretty charts and allows a user to drill up and drill down, but that's about it. A dashboard is good at answering those questions you think to ask, such as how are my sales by territory? What makes visual discovery different is that it can provide those common dashboard capabilities, but what it really does is empower the user to interactively explore with the data. Not only can the user drill up or drill down, but they can slice and dice the data in different ways. They can bring together different data sets in the analysis so you're not limited to what has been pre-modeled.

For example, if you're an operations analyst and you have weather and rig information coming from a corporate source, you may also have some other information that's critical for your analysis. It may be delivered in a PDF or Excel document. It's not corporate-modeled and managed, but it's absolutely critical to your analysis. You can bring that data in and merge it directly with those other sets of data, and start to visually explore things that you couldn't see with a traditional dashboard.

One of the big differences with visual data discovery is that it empowers the business user. They still have access to governed data sets, as with a traditional warehouse, but they can also bring in other sets of data and combine and merge them. Users can build any sort of visual analysis that they need to do their job. They're not confined to someone else building what someone else thinks will help them run the business.

What does Datawatch bring to the picture?

We're unique in the analytics world in that we're the only vendor who brings visual discovery together with the ability to directly consume data in motion. We're changing that traditional BI paradigm, in which I ask a question of data at rest, do my analytics, and get a response back. Instead, we allow users to provide a continuous query. You can continually monitor for changes in data as it happens, well before it comes to rest in a data warehouse or repository.

That means the business user gets notified immediately and directly when something important happens. You can then go back and do the traditional queries on data at rest. That can all be done across a wide variety of sources -- traditional and non-traditional, including Hadoop and NoSQL, even a print spool or a PDF document or HTML copy. We can pull all of that data together for analysis.

It really is a sea change in terms of BI and analytics. It's all about time, and putting tools in the hands of the business user that enable them to create and share their own visualizations and analytics throughout the organization.

TDWI Membership

Get immediate access to training discounts, video library, BI Teams, Skills, Budget Report, and more

Individual, Student, & Team memberships available.