TDWI | Training & Research | Business Intelligence, Analytics, Big Data, Data Warehousing

Think
- Research & Resources
  - TDWI Playbook | Next Generation Data Science: The AI-Driven Data Science Life Cycle
  - TDWI Data Points | The Data Foundation for AI
  - TDWI Best Practices Report | Data Strategies and Foundations for Modern Data Management
  - TDWI Insight Accelerator | Adopting a Platform Approach for Gaining Insights from Unstructured Data
- Webinars
  - Modernize and Govern: Unifying Your Data Strategy July 10, 2025
  - Expert Panel: Best Practices for Modernizing Your Data Environment July 14, 2025
  - Powering Data Science with AI-Driven Tools and Practices July 15, 2025
  - Data Integration for AI: Overcoming Modern Pipeline Challenges July 23, 2025
- Virtual Summits
  - Virtual Events Keys to Making Your Data AI Ready September 10, 2025
  - Virtual Events Data Quality for BI, Analytics and AI October 22, 2025
  - Virtual Events Modern Data Strategy November 12, 2025
  - Virtual Events What’s Ahead in 2026 for Data & Analytics December 10, 2025
- By Topic
  - By Topic
    
    Explore the Latest AI, Analytics, and Data Research and Training by Topic
  - BI, Analytics, and Data Literacy
  - AI, Data Science, and Machine Learning
  - Data Management and Governance
  - Platforms and Architecture
  - Strategy and Methods
- Speaking of Data Podcast
  
  Current Research Surveys
Train
- In-Person Events
  - Conference TDWI Transform 2025 San Diego August 18, 2025
  - Executive Summit TDWI Modern Data Leader's Summit San Diego: AI in the Enterprise August 18, 2025
  - Executive Summit AI Accelerate 2025, Brought to You by AI Boadroom & TDWI August 18, 2025
  - Conference TDWI Transform 2025 Orlando November 16, 2025
- Virtual Live Seminars
  - TDWI Data Governance Principles and Practices: Managing Data as an Asset June 25, 2025
  - Building Your Company’s Data Governance Roadmap June 25, 2025
  - Data Governance: Driving Engagement and Organizational Change June 26, 2025
  - A Framework for Modern Data Governance June 25, 2025
- Online Learning
- By Topic
  - By Topic
    
    Explore the Latest AI, Analytics, and Data Research and Training by Topic
  - BI, Analytics, and Data Literacy
  - AI, Data Science, and Machine Learning
  - Data Management and Governance
  - Platforms and Architecture
  - Strategy and Methods
- Train Your TeamCustom solutions for training your team
  
  Get CertifiedEarn a professional credential in BI and Analytics, Data Governance, or AI
  
  TDWI MembershipExclusive access to the research, tools, training, and connections
Engage
- Connect
  - Connect and Contribute to Our Vibrant Community of Data Leaders
    
    Subscribe to TDWI Stay up to date on the latest news and events. Sign Up
    
    Become a TDWI Member Gain exclusive access to the research, tools, training, and connections to move your careers, teams, and projects forward. Learn More
    
    Become a Part of the TDWI Research Panel Make a difference in the data and analytics industry and earn incentives by sharing your insights with TDWI. Explore Now
    
    Speak at TDWI Events Share your expertise and build your personal brand as a speaker at a TDWI In-Person or Virtual Event. Submit a Proposal
    
    Become a TDWI Research Fellow Apply to be a member of TDWI’s industry leading research team. Apply Today
    
    Become a Member of the Data & AI Leaders Forum Engage in collaborative discussions, stay ahead of the curve, and stay in the know. Apply Now
    
    Showcase Your Data & AI Solutions Reach and engage with TDWI community through multi-channel marketing programs. Learn More

TDWI Articles

Wanted: A Data Architecture for On-Demand Data Access

The way we integrate and provision data is incompatible with the requirements of new use cases such as data science.

May 5, 2017

There's a consensus that the way we integrate and provision data is incompatible with the requirements of new, "exploratory" use cases, such as data science.

There is no consensus as to what to do about this, although some people have a few ideas.

"If we think about what [the new] data architecture should look like, we can isolate a few key attributes: it must facilitate abstraction, promote reuse, be center-less, and be parallel-aware," says Mark Madsen, a research analyst with information management consultancy Third Nature.

The Problem of On-Demand Data Access

The most complex piece of the data architecture puzzle is the piece that enables ad hoc or on-demand data access. After all, the needs of data scientists and similar power users are largely unpredictable; in many cases, they're one-off. Data scientists want to access data whenever they need it from the platform or environment they're most comfortable working in.

This is why next-generation data architectures will likely borrow key features and concepts from the self-service data preparation and data virtualization (DV) paradigms. However, neither paradigm addresses the complete range of key attributes identified by Madsen.

Self-service data prep tools, for example, are end-user-oriented offerings, designed primarily for business analysts and data scientists. By contrast, a next-gen data architecture must be flexible enough to serve both exploratory (data scientists, business analysts) users with unpredictable needs and traditional (information consumers) users whose data access needs are more schedulable. Like it or not, something like a data fabric middleware is required to support both of these constituencies.

At this point, self-service data prep tools also do little to substantively address the core needs -- reuse and repeatability -- that are prerequisites for manageable, governed access at scale. These are likewise problems the software category we call "middleware" evolved to address.

DV, on the other hand, is a more plausible contender for a next-gen data architecture. It enables a virtual abstraction layer designed to mask the complexities -- e.g., physical location, instance type (physical, virtual, cloud) -- of data sources. DV is notionally middleware, albeit with a twist; think of it as a kind of middleware that is also its own architecture.

However, DV architecture isn't in any sense decentered: its virtual abstraction layer is enabled by a DV engine at its center, orchestrating queries, events, and messages.

Another problem is that DV isn't a commoditized technology: best-in-class products are expensive to license, install, and maintain, and there's a dearth of robust open source DV-like technologies. A DV-only architecture risks some degree of vendor lock-in.

Adding Complexity: Platform and Data Source Distribution, Massive Data Volumes

On-demand access isn't strictly a problem of connecting -- via reliable, predictable internal network transport -- to physical instances of data sources running elsewhere in the on-premises enterprise. On-demand access is complicated by the inevitability of data source and data platform distribution, to say nothing of the phenomenon of ever-increasing data volumes. The principal challenges include:

User-initiated access: The needs of data scientists and other exploratory users are unpredictable, so access must be on demand.

Disparate sources: Data sources are distributed across the enterprise, cloud services, the Internet, etc., so data access requires negotiating stateful (such as ODBC or JDBC), stateless (such as RESTful cloud APIs), and other types of connections to data sources.

Minimized movement: Because it can be practically impossible to move data at petabyte-scale volumes, particularly over the Internet, the data access solution must minimize data movement.

Smart parallelism: Data access must be smart about when and where it processes data. If a user needs to access data on an upstream system, the solution should be able to exploit upstream parallelism, either on the data source itself or on a system that is local to it. This is true, for example, of data stored in Amazon's S3 cloud storage service. Instead of extracting and moving data in bulk from S3 storage, a smart on-demand access solution would exploit local parallelism (e.g., Amazon's Elastic MapReduce service) to reduce the size of the data set before moving it.

Madsen's solution for on-demand data access borrows from DV's concept of abstraction between physical sources and targets and also makes use of self-service features and concepts. It exploits the equivalent of query federation -- the underlying technology that enables DV -- to knit together distributed data sources, be they local or far-flung.

This means a data scientist who wants to use a Spark cluster to perform her analysis should be able to initiate access to the data she needs from Spark, no matter where this data is located. She shouldn't have to open up a separate program or go to another system. The solution would bring the data she needs to her, along the lines described above.

In Madsen's view, on-demand data access is implemented as a data-fabric middleware, permitting self-serving users (or IT itself) to monitor data flows and identify jobs that should be instantiated as scheduled processes. (This also promotes reuse, along with repeatability.) On-demand access is center-less too, in that the middleware knits together all systems, such that a user can initiate data access from any system to any system and vice versa. Finally, access is parallel-aware, in that the middleware exploits upstream or downstream parallelism where available.

The Very Model for an On-Demand Access Solution for Exploratory Use Cases

One good model for the on-demand access solution Madsen envisions is Teradata's QueryGrid technology. QueryGrid enables the equivalent of a center-less data fabric that serves both the more predictable needs of traditional information consumers and those of exploratory users, who require on-demand, self-initiated access to data.

"The query federation approach Teradata takes with QueryGrid is suitable for ad hoc use, where data requirements cannot be anticipated in advance. With QueryGrid, the user can [initiate] access from [the environment] they're most comfortable in. They write a series of queries to explore the data so they can identify the data they actually need. If necessary, they can [use queries to] move the [relevant] data or create views [i.e., presentations of the data]," Madsen says, noting that users can instantiate these as repeatable and reusable data flows.

QueryGrid addresses several other critical issues, too. He points to Teradata's focus on high concurrency and QueryGrid's support for platform-specific optimizations as examples. The upshot, he argues, is that even if a consensus modern data architecture doesn't yet exist, QueryGrid is emerging as a credible, albeit Teradata-specific, alternative. "QueryGrid hides the technical complexity of accessing data on multiple platforms. Access [is via] a single SQL dialect that is available on any connected platform," Madsen says.

"This [abstraction of complexity] extends to data movement, too, so QueryGrid automatically performs type conversion if data is moved between systems, or linked [from one system to another]. It's parallel- and location-aware, too, so [it] tries to move data as efficiently as possible. All of this makes for an easier-to-use environment [for exploratory users]. That's the goal."

TDWI Membership

Accelerate Your Projects,
and Your Career

TDWI Members have access to exclusive research reports, publications, communities and training.

Individual, Student, and Team memberships available.

↑

TDWI | Training & Research | Business Intelligence, Analytics, Big Data, Data Warehousing

Research & Resources

Webinars

Virtual Summits

By Topic

In-Person Events

Virtual Live Seminars

Online Learning

By Topic

Connect and Contribute to Our Vibrant Community of Data Leaders

TDWI Articles

Wanted: A Data Architecture for On-Demand Data Access

Related Articles

Trending Articles

Breaking Barriers in Conversational BI/AI with a Semantic Layer

AI in 2025: Key Considerations for Technology Leaders

The Tech Blanket: Building a Seamless Tech Ecosystem

What’s Ahead in Generative AI in 2025? (Part Two)

TDWI Membership

Accelerate Your Projects,
and Your Career

TDWI

Engage

Research

Research & Resources

Webinars

Virtual Summits

By Topic

In-Person Events

Virtual Live Seminars

Online Learning

By Topic

Connect and Contribute to Our Vibrant Community of Data Leaders

TDWI Articles

Wanted: A Data Architecture for On-Demand Data Access

Related Articles

Trending Articles

Breaking Barriers in Conversational BI/AI with a Semantic Layer

AI in 2025: Key Considerations for Technology Leaders

The Tech Blanket: Building a Seamless Tech Ecosystem

What’s Ahead in Generative AI in 2025? (Part Two)

TDWI Membership

Accelerate Your Projects, and Your Career

TDWI

Engage

Research

Accelerate Your Projects,
and Your Career