TDWI | Training & Research | Business Intelligence, Analytics, Big Data, Data Warehousing

Think
- Research & Resources
  - TDWI Playbook | Next Generation Data Science: The AI-Driven Data Science Life Cycle
  - TDWI Data Points | The Data Foundation for AI
  - TDWI Best Practices Report | Data Strategies and Foundations for Modern Data Management
  - TDWI Insight Accelerator | Adopting a Platform Approach for Gaining Insights from Unstructured Data
- Webinars
  - Data Integration for AI: Overcoming Modern Pipeline Challenges July 23, 2025
  - From Silos to Insights: Centralizing Data to Drive AI July 24, 2025
  - Expert Panel: Leveraging AI-Powered Solutions for Data Management July 28, 2025
  - A Generative AI Framework for Credit and Financial Markets July 29, 2025
- Virtual Summits
  - Virtual Events Keys to Making Your Data AI Ready September 10, 2025
  - Virtual Events Data Quality for BI, Analytics and AI October 22, 2025
  - Virtual Events Modern Data Strategy November 12, 2025
  - Virtual Events What’s Ahead in 2026 for Data & Analytics December 10, 2025
- By Topic
  - By Topic
    
    Explore the Latest AI, Analytics, and Data Research and Training by Topic
  - BI, Analytics, and Data Literacy
  - AI, Data Science, and Machine Learning
  - Data Management and Governance
  - Platforms and Architecture
  - Strategy and Methods
- Speaking of Data Podcast
  
  Current Research Surveys
Train
- In-Person Events
  - Conference TDWI Transform 2025 San Diego August 18, 2025
  - Executive Summit TDWI Modern Data Leader's Summit San Diego: AI in the Enterprise August 18, 2025
  - Conference TDWI Transform 2025 Orlando November 16, 2025
  - Executive Summit TDWI Data & AI Leaders Summit Orlando: Governing Data, Analytics, and AI November 17, 2025
- Virtual Live Seminars
  - TDWI Data Governance Principles and Practices: Managing Data as an Asset June 25, 2025
  - Building Your Company’s Data Governance Roadmap June 25, 2025
  - Data Governance: Driving Engagement and Organizational Change June 26, 2025
  - A Framework for Modern Data Governance June 25, 2025
- Online Learning
- By Topic
  - By Topic
    
    Explore the Latest AI, Analytics, and Data Research and Training by Topic
  - BI, Analytics, and Data Literacy
  - AI, Data Science, and Machine Learning
  - Data Management and Governance
  - Platforms and Architecture
  - Strategy and Methods
- Train Your TeamCustom solutions for training your team
  
  Get CertifiedEarn a professional credential in BI and Analytics, Data Governance, or AI
  
  TDWI MembershipExclusive access to the research, tools, training, and connections
Engage
- Connect
  - Connect and Contribute to Our Vibrant Community of Data Leaders
    
    Subscribe to TDWI Stay up to date on the latest news and events. Sign Up
    
    Become a TDWI Member Gain exclusive access to the research, tools, training, and connections to move your careers, teams, and projects forward. Learn More
    
    Become a Part of the TDWI Research Panel Make a difference in the data and analytics industry and earn incentives by sharing your insights with TDWI. Explore Now
    
    Speak at TDWI Events Share your expertise and build your personal brand as a speaker at a TDWI In-Person or Virtual Event. Submit a Proposal
    
    Become a TDWI Research Fellow Apply to be a member of TDWI’s industry leading research team. Apply Today
    
    Become a Member of the Data & AI Leaders Forum Engage in collaborative discussions, stay ahead of the curve, and stay in the know. Apply Now
    
    Showcase Your Data & AI Solutions Reach and engage with TDWI community through multi-channel marketing programs. Learn More

RESEARCH & RESOURCES

Traditional Tools Still Big Part of Big Data

Imagine a big data project in which old-tech tools such as relational databases and enterprise applications are prominent players. It's more popular than you might think, according to a new survey.

By Stephen Swoyer
September 11, 2012

A new survey from open source software (OSS) business intelligence (BI) specialist JasperSoft Inc. sheds light on how and why adopters are using big data, and in so doing raises a bevy of important questions.

JasperSoft's Web survey of 631 respondents indicates that almost two-thirds (62 percent) have already "deployed" big data projects, while -- among those that haven't -- "a lack of understanding about Big Data" is cited as the chief impediment.

Respondents occupy any of five different roles, including application developers (63 percent), report developers (13 percent), BI administrators (6 percent), and business users (6 percent). (The ubiquitous "Other" category accounted for 12 percent of responses.)

According to director of product marketing Mike Boyarski, JasperSoft was surprised by the number of respondents who say they're already working with big data. "That for us was interesting because we've been talking about it for a while, but ... we didn't expect to see such a large [number of] production projects," he comments.

Among respondents who plan to deploy, a sizeable percentage have already secured budgets.

This, too, is surprising -- and significant, says Boyarski.

"There's this suspicion around the ROI with these types of data projects ... [but] it's clear that the business sponsors are recognizing an opportunity and they're okay-ing it, whether it's time and effort or dollars and budget to go and implement [it]."

The survey unearthed a few ostensible surprises.

Take, for example, the high use of ETL, which almost three-fifths (59 percent) of respondents said was "very important" to their big data projects. JasperSoft, says Boyarski, was surprised that ETL -- or other traditional data integration (DI) tools -- figures so largely in big data project efforts. "We were surprised at how frequent [is] the use of ETL ... or the desired use of ETL within the context of big data [projects]," he said, speculating that respondents could be using ETL as a "sort of intermediary tool looking to put some structure into the data."

Industry veteran Marc Demarest, a principal with management consultancy Noumenal Inc., says he isn't surprised by this. Demarest says that in most cases, Hadoop is being used to pool large amounts of file-oriented data, while MapReduce and conventional DI tools -- such as ETL and ELT -- "are being used to extract data from Hadoop -- or to get data into Hive or something similar, which then becomes a 'source system' for ETL, ELT," or other traditional DI tools.

Another seemingly surprising finding was the prominence of conventional relational databases in many (so-called) "big data" projects. Given the use of ETL, however, this shouldn't come as a surprise. In fact, nearly the same proportion three-fifths (60 percent) of respondents are using vanilla relational databases as are using ETL in their big data projects efforts.

Respondents were able to select multiple repositories, and what is surprising is the comparatively low representation of Hadoop and NoSQL repositories in the survey data. Fewer than one in five (18 percent) respondents say they're using Hadoop -- the archetypal platform for big data -- and slightly more (19 percent) say they're using MongoDB, a NoSQL data store that's also touted for use with big data projects. Other well-known NoSQL solutions included Apache Cassandra (used by just 7 percent of respondents), CouchDB (3 percent), and DynamoDB (4 percent). Elsewhere, analytic database platforms such as those marketed by Teradata Inc., IBM Netezza, and ParAccel Inc. (among others) were used in 11 percent of big data projects.

This seems counter-intuitive. After all, platforms such as Hadoop, MongoDB, Cassandra, and others are marketed as solving the shortcomings of conventional relational platforms in a big data context.

What conclusions can we draw from the data? It's hard to say. As Boyarski concedes, the structuring of the survey question (viz., "What Big Data stores are you using for your project?) doesn't tell us much.

Nor does the lack of any (detailed) follow-up question, such as -- specifically -- which relational platforms were in use. Boyarski suggests that perhaps "a lot of these [big data analytic] projects are ... combining data from various places and trying to supplement what they have, so that's probably where the relational data is coming into play."

Demarest, on the other hand, says he doesn't find this surprising.

After all, several of his clients are primarily using relational databases with their big data projects, he indicates. "'Big data' technologies are just [being used as a] pre-ETL pooling technology for [relational databases]," Demarest explains. In this scheme, he continues, data that's "persisted in Hadoop ends up in my 'normal' [data warehousing or business intelligence] infrastructure, in relational form, for normal consumption through normal mechanisms by normal users."

Similarly surprising -- again, at first glance -- is the representation of traditional enterprise applications in the big data mix. Almost four-fifths (79 percent) of respondents say they're piping enterprise application data -- from e-commerce, financial, ERP, CRM, SCM, PLM, and other applications -- into their big data projects. This came as a surprise to JasperSoft.

"It's interesting that the number-one source was application data, number two was machine-generated, and ... number three [was] human-generated," Boyarski comments.

Veteran data warehouse architect Mark Madsen, a principal with information management consultancy Third Nature Inc., says this doesn't quite jibe with his experience.

Although the idea of analyzing big data information in context with traditional enterprise data is commonly touted as the end-game of the (nascent) big data paradigm shift, most such efforts are believed to be in the early-adopter stage. Given the nebulousness of some survey questions -- and the lack of any questions asking how enterprise application data is being used in big data projects -- it's difficult to draw any conclusions, Madsen says.

"I find it odd that enterprise sources [such as OLTP applications] are a major source, because that's not what I've seen, but then it depends on the market," he comments. "Banks, retailers, insurance companies, etc. are doing analytics that were expensive [and/or] resource-constrained in the old environment, so transferring the data makes sense."

Once again, Noumenal's Demarest says he isn't surprised.

Outside of cutting-edge or unconventional sectors -- such as social media -- many adopters are approaching "new" (i.e., big) data the same way they approached ... "old" data, and with good reason, he argues: they've invested millions in developing "old" data skills -- and "old" data infrastructures -- for starters. From the perspective of many big data adopters, he argues, "there's nothing about the 'new' data that invalidates how I deal with the 'old' data." True, Demarest concedes, "there are some cases in which the 'new' data overwhelms the 'old' infrastructure," but -- because these are exception scenarios -- "why should I use new stuff?"

For this reason, and as JasperSoft's survey suggests, the venerable enterprise data warehouse (EDW) is going to be powerfully difficult to dislodge from its place of primacy.

"In all cases, the centerpiece of the all-encompassing architecture is still the relational 'EDW' and its dependent marts," he concludes, adding that "[w]hether this is sidelining the EDW or cementing it in place remains to be seen longer-term."

TDWI Membership

Get immediate access to training discounts, video library, research, and more.

Find the right level of Membership for you.

Learn More

↑

Research & Resources

Webinars

Virtual Summits

By Topic

In-Person Events

Virtual Live Seminars

Online Learning

By Topic

Connect and Contribute to Our Vibrant Community of Data Leaders

RESEARCH & RESOURCES

Traditional Tools Still Big Part of Big Data

TDWI Membership

Get immediate access to training discounts, video library, research, and more.

TDWI

Engage

Research