Get More Out of Hadoop by Building Your User Community
If you are launching Hadoop in your enterprise, you'll want to get the most out of its rich data. Users across the enterprise have important roles to play in your eventual Hadoop user community.
If your company is standing up Hadoop, you'll want to get the most out of its rich data. This process doesn't end with the first user group or first user. As with data warehouses, the users will eventually form into multiple categories. Although the Hadoop user base is often data science heavy at first, Hadoop builders should nourish users across the enterprise for the valuable data they are making available.
Four categories will make up your Hadoop user community, and each one interacts with Hadoop in a specific way.
Data Scientists
Scientists with the statistical and applied mathematical expertise to analyze data for insights -- the ability to extract signal from noise -- are more critical than ever in Hadoop environments. Most Hadoop projects, of course, involve processing big data to find a relatively minute amount of signal.
Hadoop's ability to rapidly crunch through enormous amounts of data makes it economically feasible to extract these insights. In addition, data scientists may discover patterns that aren't evident in smaller data sets.
These members of your team investigate the value of various big data sources, which in Hadoop environments means mastering a wider range of tools and analytic techniques.
Creating queries and guiding machine-learning algorithms, they discover data patterns and relationships that could potentially be useful for BI or for building predictive or descriptive analytic models. They determine which data looks interesting enough to justify further analysis and build logical views (e.g., Hive tables) on top of the data to facilitate queries by themselves and other users.
Data Analysts
As in the traditional data warehouse environment, data analysts run queries to respond to inquiries or produce reports. Thanks to today's SQL-on-Hadoop solutions, most of their skills are transferable, and with the processing performance of Hadoop, they'll enjoy a nice bump up in productivity.
What's different for Hadoop is that the role of data analyst tilts more toward internal consulting.
You need these data experts to be able to point business analysts and other business users to the right sources and guide less technical staff in accessing, integrating, and analyzing data for specific business aims. Data analysts can also help by creating data visualizations that can be accessed from libraries and reused in different contexts -- BI tools, mobile apps, Web pages, etc.
Business Analysts
In Hadoop environments, business analysts still fulfill the critical functions of looking for ways to improve business processes, posing questions that can lead to discovering strategies for competitive advantage and helping to specify requirements for new products and services.
They're empowered in all of these responsibilities by an increasing range of role-based, self-service graphical tools that greatly expand their abilities to access, integrate, and analyze big data. In fact, data integration solutions on Hadoop include hundreds of prebuilt connectors to big data sources.
In addition, these tools enable the role of the business analyst to encompass data preparation, including contributing to data profiling, cleansing, and validation processes. They may also be able to help specify and maintain data quality rules under the oversight of the data steward.
Citizen Developers
Hadoop is ushering in a new era of ubiquitous analytics, where nearly every job in the enterprise involves working with data in some respect -- ideally via self-service tools or using familiar apps.
Forward-looking organizations understand the competitive potential of infusing everyday tasks with evidence, insights, and predictions. They're doing everything they can to put big data analytics in the hands of enterprise citizens.
The Bottom Line
As you set up or build out your own Hadoop environment, keep end users -- and the tremendous leverage they can exert as big data consumers -- clearly in your sights.
About the Author
McKnight Consulting Group is led by William McKnight. He serves as strategist, lead enterprise information architect, and program manager for sites worldwide utilizing the disciplines of data warehousing, master data management, business intelligence, and big data. Many of his clients have gone public with their success stories. McKnight has published hundreds of articles and white papers and given hundreds of international keynotes and public seminars. His teams’ implementations from both IT and consultant positions have won awards for best practices. William is a former IT VP of a Fortune 50 company and a former engineer of DB2 at IBM, and holds an MBA. He is author of the book Information Management: Strategies for Gaining a Competitive Advantage with Data.