Data-driven is the modern mantra of business management, but enabling a data-driven organization is complex and challenging. Abundant data sources and multiple use cases result in many data pipelines—maybe as many as one for each use case. Capabilities to find the right data, manage data flow and workflow, and deliver the right data in the right forms for analysis are essential for all organizations that seek to become data-driven.
Multiple and complex data pipelines can quickly become chaotic under pressure from agile development, democratization, self-service, and organizational pockets of analytics. The resulting difficulty in governance and uncertainty of data usage are only the beginning of the troubles. Therefore, data pipeline management must ensure that data analysis results are traceable, reproducible, and of production strength, whether enterprise-level or self-service. Robust pipeline management works across a variety of platforms from relational to Hadoop, and recognizes today’s bidirectional data flows where any data store may function in both source and target roles.
You Will Learn
- The challenges and complexities of modern data pipelines
- Why data flow and workflow are critical parts of—and how they fit into—your analytics architecture
- How to define and design data pipelines
- The roles and functions of metadata in pipeline management
- The important relationships between pipeline management and data governance
- The state of tools and technologies to support pipeline management
- Analytics architects, BI architects, data warehouse architects, data architects, and anyone in an architect role that intersects with data; data engineers who define, design, and develop data warehouses, data lakes, operational data stores, data sandboxes, master data hubs, or other enterprise data stores; data integration and preparation professionals who define, design, and develop the processes that move data through pathways from sources to consumers