TDWI Research Report Examines Emerging Best Practices for Data Lakes
Report reveals organizations’ experiences with and readiness for data lakes, quantifies related trends, and discusses emerging best practices, enabling technologies, and real-world use cases.
SEATTLE, WA, March 29, 2017—TDWI Research has released its newest Best Practices Report, Data Lakes: Purposes, Practices, Patterns, and Platforms. One of the more popular subjects in data modernization today is the addition of data lakes to many different ecosystems. This original, survey-based report defines data lake types and discusses emerging best practices, enabling technologies, and real-world use cases.
Most users (82%) are dealing with data that is rapidly evolving in terms of its types, structures, sources, and volumes—including both big data and exploding volumes of traditional enterprise data. At the same time, many analytics applications demand that old and new data be consolidated at scale, which could explain why the vast majority of survey respondents (85%) consider the data lake an opportunity.
Philip Russom, senior director of TDWI Research for data management, explains that most users are finding it challenging to manage their data with only relational data warehouses. They are considering data lakes “because lakes provision the kind of raw data that users need for data exploration and discovery-oriented forms of advanced analytics.”
Russom adds that a data lake can also be “a consolidation point for both new and traditional data, thereby enabling analytics correlations across all data. ... The chief beneficiaries of data lakes as identified by this report’s survey are analytics, new self-service data practices, value from big data, and warehouse modernization.”
Only a quarter of surveyed organizations have at least one data lake in production, but another quarter plan to enter production within a year. The report finds that Hadoop is the preferred platform (53%), with another 24% of data lakes deployed on both Hadoop and a relational database management system.
This comprehensive report reveals:
- The most commonly anticipated benefits from a data lake are advanced analytics (49%) and data discovery (49%), but survey respondents also expect many other benefits, including value from big data and data warehouse modernization.
- The most likely barrier to adoption of a Hadoop-based data lake is the lack of governance (41%).
- Straightforward questions about relational requirements can help a team select the best platform for a new data lake.
- Data lakes deployed today tend to fall into recurring categories based on the larger application or data ecosystem the lake integrates with, the data domain managed by the lake, the department that commissioned the lake, or the industry the lake serves.
- Cross-training existing personnel and engaging consultants have been the most effective approaches in hiring and training for data lake skills.
Russom offers suggestions and best practices that can guide user organizations through the successful implementation of a data lake, including choosing tools to supplement Hadoop and planning internal zones within the data lake.
This research was sponsored by Diyotta, HPE Security, IBM, SAS, and Talend.
About the Author
Philip Russom, Ph.D., is senior director of TDWI Research for data management and is a well-known figure in data warehousing, integration, and quality, having published over 550 research reports, magazine articles, opinion columns, and speeches over a 20-year period. Before joining TDWI in 2005, Russom was an industry analyst covering data management at Forrester Research and Giga Information Group. He also ran his own business as an independent industry analyst and consultant, was a contributing editor with leading IT magazines, and was a product manager at database vendors. His Ph.D. is from Yale. You can reach him at firstname.lastname@example.org, @prussom on Twitter, and on LinkedIn at linkedin.com/in/philiprussom.
For 20 years, TDWI has provided individuals and teams with a comprehensive portfolio of business and technical education and research about all things data. The in-depth, best-practices-based knowledge TDWI offers can be quickly applied to develop world-class talent across your organization’s business and IT functions to enhance analytical, data-driven decision making and performance. TDWI advances the art and science of realizing business value from data by providing an objective forum where industry experts, solution providers, and practitioners can explore and enhance data competencies, practices, and technologies. TDWI offers six major conferences as well as topical seminars, onsite education, membership, certification, live webinars, resource-filled publications, industry news, and in-depth research. See tdwi.org or follow us on Twitter @TDWI.
About 1105 Media
1105 Media, Inc., is a leading provider of integrated information and media in targeted business-to-business markets, including specialized sectors of the information technology community; industrial health, safety, and compliance; security; environmental protection; and home healthcare. 1105's offerings span print and online magazines, journals, and newsletters; seminars, conferences, and trade shows; training courseware; and web-based services. 1105 Media is based in Chatsworth, California, with offices throughout the U.S.
Vice President of Marketing, TDWI.org