Prerequisite: None
This presentation is an overview of architectural steps taken from initial cloud data lake creation to having efficient support for big data analytics.
Initial high-level descriptions of a data lake just mention putting data in but rarely cover the steps needed to efficiently enable getting data out to enable big data analytics. In order to effectively use the data lake for analytics, additional capabilities and management need to be put in place.
This presentation will outline one pharmaceutical company’s experience with raising a cloud data lake’s maturity from raw data store to analytics enablement:
- Initial implementation: Ingesting data into a data lake
- Additional tools: Allowing fine-grained security, integration, filtering, and masking
- Managing the data lake: Creating an inventory and classifying privacy, security, and responsibilities
- Enabling efficient analytics: Search, request, approve, provision