Blog by Philip Russom
Research Director for Data Management, TDWI
I just got off the phone with Ellie Fields, the director of product marketing at Tableau Software. Ellie has a lot to say to about intersections among big data, analytics, and data visualization. So allow me to recount the high spots of the conversation.
Philip Russom: Tableau is often pigeon-holed as a data visualization vendor. But the Tableau users I’ve met are using the tool for analytics. How does Tableau position itself?
Ellie Fields: Our customers use Tableau in different ways. For example, many use us as their primary, enterprise BI platform. Others use us for specific BI applications within a department. Still other customers use Tableau for fast analytics, as a complement to a legacy BI platform. Given the breadth of use, we see ourselves as a multi-purpose BI platform.
Philip Russom: I’ve seen demonstrations of the Tableau tool, so I know that ease-of-use is high. But is it high enough to enable self-service BI?
Ellie Fields: The Tableau tool was designed with self-service in mind for a broad range of BI users. For example, with a few mouse clicks, a user can access a database, identify data structures of interest, and bring data into server memory for reporting or analysis. The user needs to know the basics of enterprise data, but doesn’t need to wait for assistance from IT. With a few more clicks, you can publish your work for colleagues to use. Going back to your question about positioning, we describe this quick and easy method as “rapid fire business intelligence.”
Philip Russom: What’s the relationship between data visualization and big data?
Ellie Fields: As you know, Tableau is strongly visual. In fact, the visual images representing data are an extension of the user interface, in that you grab your mouse and – with simple drag-and-drop methods – you interact directly with the visualization and other visual controls to form queries, reports, and analyses. Analysis is iterative, and iterations need to flow fast. The drag-and-drop environment enables an analyst to work quickly, without losing the train of thought, and even to collaborate with others on live data. So, we’re fast with results – even against big data.
When working with big data, all of our visualizations scale up and down, in that they can represent ten data points from a spreadsheet or ten million rows of big data. And when working with big data, visualization is even more important. It’s how humans explore and consume information to arrive at a conclusion. Analytics without good visualization is hamstrung from the beginning.
Philip Russom: What types of analytic applications have you seen in your customer base recently?
Ellie Fields: Many of our customers practice what we call “exploratory analytics.” This is especially important with big data, where the point is to explore and discover things you didn’t already know. For example, we have a lot of Web companies as customers, and they depend on advertizing for revenue. As they explore big data, they’re answering analytic questions like: “How do small ads compare to big ones? Or which colors in an ad sell the most?” Yahoo! is a customer, and they analyze online ads by many dimensions, including size, color, location, frequency, Web site locations, revenue, and so on.
High tech manufacturing stands out as a growing area, especially analytics for monitoring product and supply quality. Healthcare, finance, and education companies have also adopted Tableau. One healthcare client analyzes its supply chain to be sure all locations are equipped adequately. Another hospital uses analytics to optimize nurse staffing. And a university client analyzes trends in SAT scores to enlighten decisions about recruitment, scholarships, and educational curricula.
So, what do you think, folks? Let me know. Thanks!
More
Posted by Philip Russom, Ph.D. on May 19, 20110 comments
When you’re 100 years old, as IBM is this year, it would be easy to think that you’ve seen it all. What could possibly be new to Big Blue about “big data”? In the view of Robert LeBlanc, SVP of Middleware Software for the IBM Software Group, quite a bit.
The new problem set, defined by business opportunities opening up due to the availability of new sources of information, cannot be solved with traditional data systems alone. Kicking off the IBM Big Data Symposium for industry analysts at the Yorktown Research Center on May 11, LeBlanc itemized a number of challenges, including multi-channel customer sentiment and experience analysis, detection of life-threatening conditions at hospitals in time to intervene, Medicare fraud interdiction before payment, and weather pattern predictions to optimize wind turbine locations. (Note: The next TDWI Solution Summit, September 25-27 in San Diego, will feature case studies focused on the theme of “Deep Analytics for Big Data.”)
More
Posted by David Stodder on May 17, 20110 comments
Blog by Philip Russom, Research Director for Data Management, TDWI
I recently had a great phone conversation with Mike Olson, the CEO of Cloudera. Mike has a gift for explaining new and complex technologies and their emerging best practices. Let me share a few of Mike’s insights.
Philip Russom: My understanding is that Cloudera makes a business by distributing open source software, namely MapReduce-based Apache Hadoop. Is that right?
Mike Olson: Well, that’s part of it. Cloudera does a lot more than simply distribute open source Hadoop. We make Hadoop viable for serious enterprise users by also providing technical support, upgrades, administrative tools for Hadoop clusters, professional services, training, and Hadoop certification. Furthermore, our distribution package of Hadoop includes more than Hadoop. So Cloudera collects and develops additional components to strengthen and extend Hadoop.
Philip Russom: So, what is Hadoop?
Mike Olson: Essentially there are two pieces in Hadoop. First, there’s the Hadoop Distributed File System (or HDFS), which can manage big data on clusters of many nodes. Our customers typically start with twenty nodes or so, then quickly grow to fifty or more. Some of our customers have thousands of nodes, managing petabytes of data. A many-node cluster enables big data management, plus other nice benefits like scalability, performance, and high availability. But the ramification is that data is heavily distributed.
That’s where the second piece comes in, namely MapReduce. Thanks to this capability of Hadoop, you can define a data operation--like a query or analysis--and the platform ‘maps’ the operation across all relevant nodes, for distributed processing and data collection. The platform then consolidates and reduces the responses that come back. Due to the distributed processing of MapReduce, analytics against very big data is possible—and with good performance.
Philip Russom: What kind of analytics?
Mike Olson: Hadoop excels in discovering patterns in big data, patterns that you didn’t know were there, in data that you probably don’t know very well. That makes Hadoop the opposite of your average data warehouse query against well-understood relational data. Since Hadoop and a traditional data warehouse are complementary, putting them together gives you a very broad range of business intelligence capabilities.
Philip Russom: What data types and data models are your customers managing?
Mike Olson: In Hadoop, you can mix and match data types to your heart’s content. Hadoop will store anything without requiring a data type declaration. Also, Hadoop is amazingly tolerant of messy data. For example, our customers manage any kind of file you can think of in the HDFS, and these can have just about any kind of data model. This also includes human language text and complex data types. So, big data’s not just big. It’s also highly diverse and complicated. And Hadoop excels in handling data of such extreme size, diversity, and complexity for the purposes of analytics.
So, what do you think, folks? Let me know. Thanks!
Posted by Philip Russom, Ph.D. on May 12, 20110 comments
I’ve recently been interviewing users and business sponsors, asking them about their new practices with advanced analytics, plus the special role of big data. When I ask people to talk about critical factors that make or break or success, they usually come around to a common issue that needs sorting out. It’s the fact that most analytic applications are departmentally focused (often departmentally owned and funded) and they satisfy department requirements, not enterprise ones.
More
Posted by Philip Russom, Ph.D. on May 10, 20110 comments
I recently started work on a new TDWI Best Practices Report with the working title: Deep Analytics with Big Data. The report is a tad schizophrenic, in that it’s really about two things – big data and analytics – plus how the two have teamed up to create one of the most profound trends in business intelligence (BI) today. Let me share some of the thinking behind the schizophrenia. Please reply to this blog to tell me whether this makes sense or not.
Advanced Analytics
More
Posted by Philip Russom, Ph.D. on April 25, 20110 comments
A few days ago, I presented a TDWI Webinar based on my newly published TDWI Best Practices report about “Next Generation Data Integration” (NGDI). Almost three hundred people attended the broadcast, and (with such a large turnout) I got a ton of great questions from the audience about data integration (DI).
I’d like to share some of those questions with you (and my responses to Webinar attendees who asked them), as a way of expanding and clarifying the research findings of the report. If you care about DI, this should be interesting for you.
More
Posted by Philip Russom, Ph.D. on April 19, 20110 comments