TDWI | Training & Research | Business Intelligence, Analytics, Big Data, Data Warehousing

Think
- Research & Resources
  - TDWI Playbook | Next Generation Data Science: The AI-Driven Data Science Life Cycle
  - TDWI Data Points | The Data Foundation for AI
  - TDWI Best Practices Report | Data Strategies and Foundations for Modern Data Management
  - TDWI Insight Accelerator | Adopting a Platform Approach for Gaining Insights from Unstructured Data
- Webinars
  - Expert Panel: What's Next in Data Integration: Powering the AI-Driven Enterprise August 25, 2025
  - Expert Panel: Improving Data Quality, Accuracy, and Consistency August 27, 2025
  - The State of Self-Service Analytics: Results from TDWI’s Latest Research September 8, 2025
  - Expert Panel: Building an AI-Driven Data Strategy September 15, 2025
- Virtual Summits
  - Virtual Events Keys to Making Your Data AI Ready September 10, 2025
  - Virtual Events Data Quality for BI, Analytics and AI October 22, 2025
  - Virtual Events Modern Data Strategy November 12, 2025
  - Virtual Events What’s Ahead in 2026 for Data & Analytics December 10, 2025
- By Topic
  - By Topic
    
    Explore the Latest AI, Analytics, and Data Research and Training by Topic
  - BI, Analytics, and Data Literacy
  - AI, Data Science, and Machine Learning
  - Data Management and Governance
  - Platforms and Architecture
  - Strategy and Methods
- Speaking of Data Podcast
  
  Current Research Surveys
Train
- In-Person Events
  - Conference TDWI Transform 2025 San Diego August 18, 2025
  - Executive Summit TDWI Modern Data Leader's Summit San Diego: AI in the Enterprise August 18, 2025
  - Conference TDWI Transform 2025 Orlando November 16, 2025
  - Executive Summit TDWI Data & AI Leaders Summit Orlando: Governing Data, Analytics, and AI November 17, 2025
- Virtual Live Seminars
  - Data Governance Week July 30, 2025
  - Platforms & Architecture Week July 30, 2025
  - AI Bootcamp Week July 30, 2025
- Online Learning
- By Topic
  - By Topic
    
    Explore the Latest AI, Analytics, and Data Research and Training by Topic
  - BI, Analytics, and Data Literacy
  - AI, Data Science, and Machine Learning
  - Data Management and Governance
  - Platforms and Architecture
  - Strategy and Methods
- Train Your TeamCustom solutions for training your team
  
  Get CertifiedEarn a professional credential in BI and Analytics, Data Governance, or AI
  
  TDWI MembershipExclusive access to the research, tools, training, and connections
Engage
- Connect
  - Connect and Contribute to Our Vibrant Community of Data Leaders
    
    Subscribe to TDWI Stay up to date on the latest news and events. Sign Up
    
    Become a TDWI Member Gain exclusive access to the research, tools, training, and connections to move your careers, teams, and projects forward. Learn More
    
    Become a Part of the TDWI Research Panel Make a difference in the data and analytics industry and earn incentives by sharing your insights with TDWI. Explore Now
    
    Speak at TDWI Events Share your expertise and build your personal brand as a speaker at a TDWI In-Person or Virtual Event. Submit a Proposal
    
    Become a TDWI Research Fellow Apply to be a member of TDWI’s industry leading research team. Apply Today
    
    Become a Member of the Data & AI Leaders Forum Engage in collaborative discussions, stay ahead of the curve, and stay in the know. Apply Now
    
    Showcase Your Data & AI Solutions Reach and engage with TDWI community through multi-channel marketing programs. Learn More

TDWI Articles

Four Data Preparation Trends to Watch in 2019

From privacy to pricing, scalability to self-driving technology, 2019 will be a crucial year for data prep advancements.

By Piet Loubser
December 17, 2018

As 2018 draws to a close, it is a perfect time to look back at the past year and reflect on the trends that emerged this year and take note of what's next in the coming one. Although not a new technology, the data preparation (DP) industry is evolving rapidly and is now considered a vital organizational competency in a world where the winners and losers are dependent on the speed and quality of their data and analytical processes. Before we look at what's to come in 2019, it makes sense to look at what was accomplished this year.

For Further Reading:

Accessible Data Preparation: 6 Data Quality Tips

Q&A: An Introduction to Self-Service Data Prep

3 Best Practices for Implementing Self-Service Data Preparation

2018: Recognition of Data Prep as a Key Part of a Modern Data Architecture

One of the key milestones for the data prep market in 2018 was its recognition as a key component to transforming data into information on demand to support analytics and modern approaches. Numerous analysts included data prep in their overall modern data architecture frameworks to support activity.

Intelligent data ingestion and processing. Data prep tools got smarter. By using artificial intelligence (AI) and machine learning (ML), DP can dynamically read, interpret, and flatten complex data structures to ease traditional data preparation workflows. This is critical because the number of data sources and formats needed to harness analytics and data science initiatives are expanding. Also notable was the explosion in exploratory versus operational analysis. Although predefined operational reports and dashboards remain, there is a massive appetite for exploratory styles of analysis where the questions are not predefined.

One-step data profiling. Data prep tools were increasingly used to be the "first eyes on data" in data lakes and other data repositories that are poorly described or documented. This enables business consumers to easily find their data and deeply understand what is in the data and what it means, which is the first step to better data outcomes.

Collaborative data prep. Most of the earlier data prep efforts were based on individual people working with data in Excel, Access, or some other desktop tool, but 2018 marked the point where data prep became a team sport. Today, data sets and recipes are shared with others, which enable peers to collaboratively develop and review data prep projects using Google Sheets-style joint editing.

Cloud data prep takes center stage. As data prep becomes mission critical, the need to move it to the most powerful and trusted infrastructure became a key enabler. Cloud has emerged for most organizations as the go-to platform for data projects. The data center of gravity is no longer on premises but in the cloud. More data lakes are moving to the cloud, and Snowflake and other cloud native technologies are removing the traditional on-premises EDW.

2019 and Beyond: What Lies Ahead for Data Prep

The next few years will be pivotal for data technologies to mature and massively move into the cloud in every way. Here are a few trends to keep on your radar.

The move towards consumption-based pricing models for data architectures and technologies. Pricing models come and go with industry cycles, but we are on the precipice of one of the most interesting trends in recent years -- the move to consumption-based pricing. Thanks to the cloud and companies such as Snowflake and Databricks, paying for only the computational resources you use (e.g., AWS) is becoming a reality. This is a great fit if you only run a few inquiries a week, but if your organization needs to run hundreds of data tasks a day, the economic shift (and the business itself) is left to offset the cost of the pay-as-you-go model. Bottom line: some normalization is coming, and the future is moving to one that can provide the right balance between fixed and variable pricing.

Containerization and adoption of Kubernetes to deliver massively scale-out, elasticity, manageability. Today's cloud systems come in a variety of distribution types, including infrastructure-as-a-service (IaaS), platform-as-a-service (PaaS), and software-as-a-service (SaaS). However, each approach requires complex cluster management, and 2019 will usher in the age of serverless and low-touch administration and management. In fact, cloud vendors such as Google and AWS are already setting the example for how users can consume technology without having to own the complex cluster management previously required. Container technology such as Kubernetes enable vendors to provide managed services to the end user where all of the configuration is packaged and managed in a simple way. Bottom line: all parties win because vendors no longer need to continuously support the customer, and the customer can use the value of the service without having to keep it up and running.

Self-Service 2.0 -- The shift to self-driving technology. Just as self-service applications are all the rage, the industry is preparing for its next incarnation: self-driving technology where machines take the lead and humans validate and teach the machines the exceptions. Today, users benefit from machine learning but still do most of the heavy lifting when it comes to manipulating the data. However, now that organizations are deriving value from ML, it's time for a role reversal. Instead of humans doing 80 percent of the work and machines handling 20 percent, companies such as DataRobot and other self-driving paradigms are setting a new standard where machines assume 80 percent of the heavy lifting while humans have more time for thoughtful analysis. Bottom line: self-driving technology stands to improve productivity and expand and broaden the types of users who can interact with data by themselves.

Customer privacy in the new age of data democracy. Just as in politics, democracy does not exist without rules and constitutions. Similarly, data democracy does not exist without guidelines such as governance, security, data lineage, and collaboration. Data is at the center of almost every strategy, including customer experiences, product development, and optimization of business processes. To support these initiatives, organizations are democratizing access to information, making it available to more business users. This can yield impressive operational results, but the risk is that more copies of data are being made, more people have access to information, and with the advent of GDPR and other regulations relating to consumer privacy, the risks for organizations are increasing.

As various users within the enterprise ask for data relating to customers, purchase patterns, or renewal patterns, organizations must establish standards that match their world of self-service and democratization with governance, security, and the enterprisewide ability to track data lineage and usage. Bottom line: chief data officers, chief analytics officers, and IT leaders must bring balance to the enterprise and drive the adoption of standardized data technologies and approaches.

About the Author

Piet Loubser is senior vice president, global head of marketing at Paxata, a pioneer and leader in enterprise-grade self-service data preparation for analytics.

TDWI Membership

Accelerate Your Projects,
and Your Career

TDWI Members have access to exclusive research reports, publications, communities and training.

Individual, Student, and Team memberships available.

↑

TDWI | Training & Research | Business Intelligence, Analytics, Big Data, Data Warehousing

Research & Resources

Webinars

Virtual Summits

By Topic

In-Person Events

Virtual Live Seminars

Online Learning

By Topic

Connect and Contribute to Our Vibrant Community of Data Leaders

TDWI Articles

Four Data Preparation Trends to Watch in 2019

Related Articles

Trending Articles

Breaking Barriers in Conversational BI/AI with a Semantic Layer

AI in 2025: Key Considerations for Technology Leaders

The Tech Blanket: Building a Seamless Tech Ecosystem

What’s Ahead in Generative AI in 2025? (Part Two)

TDWI Membership

Accelerate Your Projects,
and Your Career

TDWI

Engage

Research

Research & Resources

Webinars

Virtual Summits

By Topic

In-Person Events

Virtual Live Seminars

Online Learning

By Topic

Connect and Contribute to Our Vibrant Community of Data Leaders

TDWI Articles

Four Data Preparation Trends to Watch in 2019

Related Articles

Trending Articles

Breaking Barriers in Conversational BI/AI with a Semantic Layer

AI in 2025: Key Considerations for Technology Leaders

The Tech Blanket: Building a Seamless Tech Ecosystem

What’s Ahead in Generative AI in 2025? (Part Two)

TDWI Membership

Accelerate Your Projects, and Your Career

TDWI

Engage

Research

Accelerate Your Projects,
and Your Career