RESEARCH & RESOURCES

Q&A: Using Big Data Analytics for Better Customer Intelligence

Building a better understanding of how analytics can be applied to big data can lead to a more customer-centric organization.

In this discussion of big data and analytics, Hannah Smalltree with Treasure Data talks about strategies for working with big data to uncover customer intelligence and suggests how to get started on an analytics project using big data. Smalltree is a director with Treasure Data, focusing on strategy and customer programs for the company, which offers an end-to-end, fully managed service in the cloud for big data. She spent over a decade as technology journalist, interviewing hundreds of companies about their data and analytics projects. Smalltree recently spoke at a TDWI Webinar, Big Data Analytics for Better Customer Intelligence: Steps to Success.

BI This Week: When we talk about developing a big data strategy, what should the top priorities be for companies, given how quickly things can move? What should companies be doing to work more effectively with big data?

Hannah Smalltree: In terms of big data strategies, many companies start with a top-down approach, looking at what they could be doing [by collecting and storing] different data types. However, with big data in particular, I think it's often more effective to start with a bottom-up approach, using a tactical strategy. That starts with looking at what data your organization already has, what data your organization has access to that you may or may not be storing, and what data your organization has access to that you're not storing at all. That final category, for example, might include things like log data. People are just now realizing that there's value in those data types.

Rather than starting with a more traditional top-down approach ("If we had this data, we could do this and this"), think about experimenting and looking at some tactical projects ("We already have this data, so let's look at what we can do with it"). You can start to build out and consume more data types. You can start with a small project -- maybe just one data set. Start there and see what you can learn.

So you're suggesting a tactical approach that proves the value, then building from there?

Part of the reason that I suggest that is data collection takes time. If you're going to start collecting a new type of data that you haven't been collecting before, you're going to need some time to build history. Practically speaking, it makes sense to start with some data that you already have or data that you can easily get access to. You can start playing with that and finding some interesting combinations.

Often, the most interesting experiments combine big data with historical data such as customer data and transaction data. Historical data gives you insights into the behavior of your high-value, long-time customers. How are they using your product? How are they using your Web site? If you're tracking those behaviors, you can combine that with other data sets and find some interesting insights.

What about data quality, which is also a big challenge when we talk about big data. What can companies do to make sure big data in particular is reliable data?

Data quality specialists may cringe when I say this, but I think there have to be levels of data quality that allow some agility. I covered master data management for years as a reporter, and I know that with customer records, yes, you do need to spend time and energy on data governance and data stewardship to get that golden record. It's worth it to really get to know and understand your customers.

However, with big data, it's so big and adds up so quickly -- and changes so quickly -- that you often have to find that agile balance point. Your data governance strategy -- and hopefully you have one -- has to be expanded to accommodate these new data types. You need to set different levels of governance and quality standards, they might change as data sets become integrated.

It's a difficult problem with big data and with certain types of big data in particular. It's important to be able to relax your standards, if you will, around certain data types because the value you're getting is greater than the effort you'd have to spend cleansing and maintaining it.

Obviously, for any data that is subject to regulatory compliance, you need very rigorous standards for data management. However, if you're talking about log records coming from a device that generates millions of records a day, maybe it's OK that some of those records are bad. Maybe 80 percent good data is OK across that data set. Remember that a lot of the value in big data is moving fast with it. It's agility. You need to be able to take that data and glean insights from it and act on it, so you don't want weeks and weeks to go by as you cleanse it and so forth.

You mentioned blending big data with data already stored within the company to gain useful insights. How and where does that tend to happen?

The blending generally takes place behind the firewall, although in some companies, big data is being aggregated outside the firewall on some kind of cloud service. The valuable data is then moved inside the firewall, where it's being combined with other data types and explored. Often raw data needs to go through some kind of aggregation stage just to get it in some kind of shape to be viably combined with your enterprise data. ...

You also have plenty of companies doing things outside the firewall, sometimes operating entirely in the cloud. However, there's still some trepidation in enterprises right now about where customer data is stored and where it's sitting. Practically speaking, larger companies still tend to be more comfortable with their customer data behind the firewall.

Are concerns about data privacy holding companies back from doing more with customer data?

Definitely. For one thing, it's holding them back from using the cloud for efficient processing. It's also holding their budget back, you could say, because a lot of people are spending more than they perhaps need to, in my opinion. There is so much fear and angst around data privacy, although it's an emotional reaction in many ways when it comes to fear of the cloud.

As a cloud provider, how do you address fears around the cloud?

Our job as a cloud provider is to keep the data safe at all times so that only people who are allowed to access the data can access it. That also means keeping data safe during transit with encryption. That said, although we are managing the data for our customers, it is their data. They need to do wise things with it, such as being upfront with customers about how data will be stored, treated, and used.

Let's bring analytics into the conversation. Specific to big data, is it challenging to find the ROI in analytics?

It's very challenging. Finding the return on investment often requires experimentation. As a journalist, I always found writing about the ROI of business intelligence a challenge. That hasn't changed at all with big data. In fact, the ROI is probably even more elusive. That's why you almost have to do some level of experimentation. It's generally hard to prove ROI up front, although some people certainly do. You really need to get practical and play with the data, to innovate and see what new things you might be able to do.

When we look at machine data, for example, and the rise of companies such as Splunk, we didn't used to see that much value in machine data or network data. Now, we realize that analyzing that data can make for a much more efficient operation. Companies are drawing correlations that they never expected, and a lot of it has to start with some level of experimentation. Then you can start finding the ROI, but it's very much a trial-and-error process.

What can organizations do to improve speed-to-insight for customer intelligence?

That's a good question. One answer is to look for well-bounded problems. I mentioned earlier this notion of behavioral segmentation as an interesting place for a lot of people to start with big data. Especially with retail, looking across channels is an interesting place to begin. You can combine channels, for example, pulling customer data from the Web, data from stores, and so forth. Segmentation and comparison -- what people are doing in stores, offline versus online -- can be really interesting for retail companies.

What should organizations be focusing on to expand their use of big data and analytics?

I'd have to say understanding your data. As we discussed, it comes back around to knowing what data you have and what data you have that you're not doing anything with. Gartner calls that dark data. It could be your data or it could be data from customers or partners. You really need to do some kind of assessment to see what you have. From there, you can back into what you want or what kinds of questions you might be able to ask if you had that data.

Just from a practical standpoint, keeping an inventory -- a list of questions you'd like to ask -- is often a great way to spark ideas about what you should be collecting down the road.

Also, look at where the challenges are in your business. Put those two things on a whiteboard and then be creative about how you can combine data in interesting ways. Hopefully, you're using tools that aren't holding you back. You don't want the tools that you're using to define what you're able to do. If you're feeling bounded by technology, there are a lot of options out there, and the cloud is one of them. It's one of those things that can get you up and running quickly.

TDWI Membership

Get immediate access to training discounts, video library, BI Teams, Skills, Budget Report, and more

Individual, Student, & Team memberships available.