Data engineering is the process of building analytics data infrastructure or internal data products that support the collection, cleansing, storage, and processing (in batch or real time) of data for answering business questions (usually by a data scientist, a statistician, or someone in similar role, but in some
cases these functions overlap). Examples can include:
Data engineers rely on Apache Hadoop ecosystem components, such as Apache Spark, Apache Kafka, and Apache Flume, as a foundation for this infrastructure. Regardless of use case or components involved, this infrastructure should be compliance-ready with respect to security, data lineage, and metadata management.
This white paper contains selected posts from the Cloudera Engineering Blog about some key concepts pertaining to building and maintaining analytics data infrastructure on a Hadoop-powered enterprise data hub.
Sponsored by Cloudera
Individual, Student, and Team memberships available.