TDWI | Training & Research | Business Intelligence, Analytics, Big Data, Data Warehousing

Think
- Research & Resources
  - TDWI Playbook | Next Generation Data Science: The AI-Driven Data Science Life Cycle
  - TDWI Data Points | The Data Foundation for AI
  - TDWI Best Practices Report | Data Strategies and Foundations for Modern Data Management
  - TDWI Insight Accelerator | Adopting a Platform Approach for Gaining Insights from Unstructured Data
- Webinars
  - Expert Panel: What's Next in Data Integration: Powering the AI-Driven Enterprise August 25, 2025
  - Expert Panel: Improving Data Quality, Accuracy, and Consistency August 27, 2025
  - The State of Self-Service Analytics: Results from TDWI’s Latest Research September 8, 2025
  - Expert Panel: Building an AI-Driven Data Strategy September 15, 2025
- Virtual Summits
  - Virtual Events Keys to Making Your Data AI Ready September 10, 2025
  - Virtual Events Data Quality for BI, Analytics and AI October 22, 2025
  - Virtual Events Modern Data Strategy November 12, 2025
  - Virtual Events What’s Ahead in 2026 for Data & Analytics December 10, 2025
- By Topic
  - By Topic
    
    Explore the Latest AI, Analytics, and Data Research and Training by Topic
  - BI, Analytics, and Data Literacy
  - AI, Data Science, and Machine Learning
  - Data Management and Governance
  - Platforms and Architecture
  - Strategy and Methods
- Speaking of Data Podcast
  
  Current Research Surveys
Train
- In-Person Events
  - Conference TDWI Transform 2025 San Diego August 18, 2025
  - Executive Summit TDWI Modern Data Leader's Summit San Diego: AI in the Enterprise August 18, 2025
  - Conference TDWI Transform 2025 Orlando November 16, 2025
  - Executive Summit TDWI Data & AI Leaders Summit Orlando: Governing Data, Analytics, and AI November 17, 2025
- Virtual Live Seminars
  - Data Governance Week July 30, 2025
  - Platforms & Architecture Week July 30, 2025
  - AI Bootcamp Week July 30, 2025
- Online Learning
- By Topic
  - By Topic
    
    Explore the Latest AI, Analytics, and Data Research and Training by Topic
  - BI, Analytics, and Data Literacy
  - AI, Data Science, and Machine Learning
  - Data Management and Governance
  - Platforms and Architecture
  - Strategy and Methods
- Train Your TeamCustom solutions for training your team
  
  Get CertifiedEarn a professional credential in BI and Analytics, Data Governance, or AI
  
  TDWI MembershipExclusive access to the research, tools, training, and connections
Engage
- Connect
  - Connect and Contribute to Our Vibrant Community of Data Leaders
    
    Subscribe to TDWI Stay up to date on the latest news and events. Sign Up
    
    Become a TDWI Member Gain exclusive access to the research, tools, training, and connections to move your careers, teams, and projects forward. Learn More
    
    Become a Part of the TDWI Research Panel Make a difference in the data and analytics industry and earn incentives by sharing your insights with TDWI. Explore Now
    
    Speak at TDWI Events Share your expertise and build your personal brand as a speaker at a TDWI In-Person or Virtual Event. Submit a Proposal
    
    Become a TDWI Research Fellow Apply to be a member of TDWI’s industry leading research team. Apply Today
    
    Become a Member of the Data & AI Leaders Forum Engage in collaborative discussions, stay ahead of the curve, and stay in the know. Apply Now
    
    Showcase Your Data & AI Solutions Reach and engage with TDWI community through multi-channel marketing programs. Learn More

TDWI Articles

Big Data Management Comes of Age

Several start-ups market products designed to address the lack of data management features for big data platforms. For example, Diyotta provides a scalable architecture for big data integration including old-world enterprise amenities such as metadata management.

By Steve Swoyer
March 18, 2016

The challenge with big data integration is getting data -- the right data -- into and out of big data platforms. This has to do with the immaturity of critical data management services for big data systems, such as robust metadata management services and resilient automation features.

This is true of the big data ingest cycle, which lacks anything comparable to the automated, repeatable ETL and/or ELT processes that undergird data ingest in the data warehousing world.

It's even more true of the big data extraction process, said Krish Krishnan, president and founder of information management consultancy Six Sense Advisors. There's a big difference between querying against or manually extracting data from Hadoop, MongoDB, Cassandra, Spark, and other big data environments and doing so in the context of a consistent, resilient, repeatable process.

"The problem ... that I am seeing [with respect to big data] is that ... people don't know either how to push data in effectively or pull data out effectively," Krishnan said during a recent analyst call.

Krishnan noted that one of his clients ingested approximately 600 TB of data into its Hadoop system. This client discovered it's easier to load data into Hadoop than to get it back out again.

"The problem is that they don't know how to make it a repeat[able] process and they don't know how to pull data out. In the world of data integration, there's a blockage that I'm seeing in the world of Hadoop. They have no clue on how to do repeat[able] ETL/ELT work, they have no clue how to do CDC [change data capture]," Krishnan said.

"More important, once the data is in Hadoop, people are struggling about how to operationalize on top of it because they don't know how to get the data back out successfully. There are a lot of tools ... but the answers that people want versus what they're getting, there's a huge gap. I can talk about Spark all day long. I can talk about SQL-on-Hadoop all day long. The question is how do you get this across [to prospective adopters]."

Upstart Solutions

This isn't a new problem. It's arguably the primary reason Teradata Corp. acquired the former Revelytix -- the metadata management and big data prep assets of which it now markets as Teradata Loom -- almost two years ago. It's the reason data integration (DI) powerhouse Informatica Corp. announced a new Informatica Big Data Management offering late last year, complete with metadata management, security and governance, and data lineage-tracking services for big data platforms.

It's the backstory behind the emergence of start-ups such as Alation, which markets metadata management technology for Hadoop and other big data platforms, and Tamr, which markets technology for profiling, cataloging, and cleansing data that's siloed in traditional and big data sources. It's likewise the logic behind any of several open source projects -- including Apache Atlas. (Alation recently announced a strategic partnership with Teradata whereby "Alation will be the primary solution for Teradata customers seeking a product to help increase the productivity of their data consumers as well as the effectiveness of their data stewards," according to a blog posting by Alation CEO Satyen Sangani. Interestingly, Alation markets software that overlaps (in whole or in part) with Teradata's Loom technology.)

Finally, it's one of Diyotta's reasons for being, which claims to offer a scalable, resilient architecture that's fit for any data integration scenario. "Our core architectural principle is to use the platform for what it is best for," said CEO Sanjay Vyas, referring to his company's flagship offering, the Diyotta Modern Data Integration Suite.

"It could be just [used for] sending the data. Let's say mobile [data sources are] sending JSON data: we collect it, we process it, and we send it to Hadoop -- or it could be cloud. You could have a data lake in the cloud, you could have [an] MPP [database] sitting on prem[ises]. You could easily use our agent-based architecture [in this scenario]," Vyas said.

Data Integration Challenges

Diyotta's Modern Data Integration Suite addresses three thoroughly modern DI challenges, Vyas pointed out. The first is that of cloud to on-premises data integration -- and vice versa. Apps in the cloud need data from on-premises systems; on-premises apps -- particularly business intelligence (BI) and analytic apps -- need data from cloud apps.

The second DI challenge is what Vyas calls the "digital revolution." This has to do with integrating and managing polystructured data, as with data from social media sources, for example.

The third challenge has to do with the collapsing distinction between on- and off-premises modes of data integration. Now more than ever, organizations are keen to integrate data from multiple on-premises locations -- some Diyotta customers have presences in every U.S. state, in addition to different regions around the world -- as well as from a mix of cloud and social media sources. The challenge is to integrate -- as seamlessly as possible -- across all contexts. This challenge is compounded by virtue of the fact that many of the vendors in this space -- Vyas mentions a prominent DI vendor by name -- require customers to license several different versions of their products in order to cover traditional, cloud, and big data integration scenarios.

So much for the explicit DI challenges. Diyotta's Modern Data Suite also brings critical data management services to big data integration, Vyas maintained. "Some of the big-data value creation ... is not only about the new world of data, it's also about how you bring the old enterprise standards ... [such as] automation, orchestration, enterprise data governance, data glossary, all of those things [to big data]," he explained. "It is also about speed and agility. How fast you can get those things done, how fast can you deliver these new projects, along with the automation aspects."

The last and most important data integration challenge is that of future-proofing, Vyas argued. Big data is a protean beast. It's likely to remain a protean beast for some time to come. Three years ago, the Hadoop platform was ascendant and MongoDB had just eclipsed a $1 billion venture-capital valuation. In just the last 12 months, Apache Spark became The New Thing, Apache Cassandra came on strong, and MongoDB's valuation surged to $1.6 billion. A truly resilient big data-ready DI platform must be tolerant of platform migration or -- even -- architectural transformation, Vyas argued.

"We don't need developers to know Pig [Latin] or Spark SQL or anything related to the new [disruptive platforms]. Even though it matters, what we believe is that we bring the best practices to the table so you leverage that rather than creating some siloed framework which would not scale to the next level in the near future. Today we are talking Spark, but tomorrow it could be something else," he concluded. "[T]oday when you want to, you can port from MapReduce to Spark. For us, it's just a matter of porting your existing code into Spark, or from Spark to a future engine."

About the Author

Stephen Swoyer is a technology writer with 20 years of experience. His writing has focused on business intelligence, data warehousing, and analytics for almost 15 years. Swoyer has an abiding interest in tech, but he’s particularly intrigued by the thorny people and process problems technology vendors never, ever want to talk about. You can contact him at [email protected].

TDWI Membership

Accelerate Your Projects,
and Your Career

TDWI Members have access to exclusive research reports, publications, communities and training.

Individual, Student, and Team memberships available.

↑

TDWI | Training & Research | Business Intelligence, Analytics, Big Data, Data Warehousing

Research & Resources

Webinars

Virtual Summits

By Topic

In-Person Events

Virtual Live Seminars

Online Learning

By Topic

Connect and Contribute to Our Vibrant Community of Data Leaders

TDWI Articles

Big Data Management Comes of Age

Related Articles

Trending Articles

Breaking Barriers in Conversational BI/AI with a Semantic Layer

AI in 2025: Key Considerations for Technology Leaders

The Tech Blanket: Building a Seamless Tech Ecosystem

What’s Ahead in Generative AI in 2025? (Part Two)

TDWI Membership

Accelerate Your Projects,
and Your Career

TDWI

Engage

Research

Research & Resources

Webinars

Virtual Summits

By Topic

In-Person Events

Virtual Live Seminars

Online Learning

By Topic

Connect and Contribute to Our Vibrant Community of Data Leaders

TDWI Articles

Big Data Management Comes of Age

Related Articles

Trending Articles

Breaking Barriers in Conversational BI/AI with a Semantic Layer

AI in 2025: Key Considerations for Technology Leaders

The Tech Blanket: Building a Seamless Tech Ecosystem

What’s Ahead in Generative AI in 2025? (Part Two)

TDWI Membership

Accelerate Your Projects, and Your Career

TDWI

Engage

Research

Accelerate Your Projects,
and Your Career