The abundance, unprecedented growth, and variety of data in large enterprises has led to the proliferation of data lakes as a data management solution. A data lake is a storage repository that holds a vast amount of raw data in its native format and that facilitates data analysis through machine learning and data science applications. However, unless the meaning of the data is captured in some coherent model that can be used to normalize and cleanse the data prior to analysis, the analysts will continue to spend the majority of their time preparing the data before any meaningful value can be extracted from their algorithms.
Semantic technologies, i.e. Semantic Web languages and ontologies, as well as networked data models are the right tools that are needed to build the interconnected logical models of the data and ensure that it can be curated, governed, and connected to the business to meet the needs of the data scientists and enterprise stakeholders alike. In this talk I will give a general overview of what we mean by semantic technologies and networked data, discuss the roles they play in capturing the meaning of the data and connecting it with business glossaries, and discuss the emerging trend of building Semantic Data Lakes, i.e. data lakes that expose the data through semantic models and tie the models directly to the business concepts of interest to the enterprise stakeholders.