Data Engineering with Apache Hadoop: Highlights from the Cloudera Engineering Blog

June 21, 2016

Data engineering is the process of building analytics data infrastructure or internal data products that support the collection, cleansing, storage, and processing (in batch or real time) of data for answering business questions (usually by a data scientist, a statistician, or someone in similar role, but in some cases these functions overlap). Examples can include:

The construction of data pipelines that aggregate data from multiple sources
The productionization, at scale, of machine-learning models designed by data scientists
The creation of pre-built tools that assist data scientists in the query process (e.g., UDFs or entire applications)

Data engineers rely on Apache Hadoop ecosystem components, such as Apache Spark, Apache Kafka, and Apache Flume, as a foundation for this infrastructure. Regardless of use case or components involved, this infrastructure should be compliance-ready with respect to security, data lineage, and metadata management.

This white paper contains selected posts from the Cloudera Engineering Blog about some key concepts pertaining to building and maintaining analytics data infrastructure on a Hadoop-powered enterprise data hub.

First Name

Last Name

Company

Industry

Country

I agree to receive email communications from 1105 Media, Inc. containing news, updates and promotions regarding offers from select vendors. I understand that I can withdraw consent at any time.

Your e-mail address is used to communicate with you about your registration, related products and services, and offers from select vendors. Refer to our Privacy Policy for additional information.

TDWI Membership

Get immediate access to training discounts, video library, research, and more.

Find the right level of Membership for you.

Learn More

TDWI | Training & Research | Business Intelligence, Analytics, Big Data, Data Warehousing

Research & Resources

Webinars

By Topic

In-Person Events

Virtual Live Seminars

Virtual Summits

By Topic

RESEARCH & RESOURCES

Data Engineering with Apache Hadoop: Highlights from the Cloudera Engineering Blog

June 21, 2016

TDWI Membership

Get immediate access to training discounts, video library, research, and more.

TDWI

Engage

Research