Seven Recommendations for Becoming Big Data Ready
New big data sources and data types – and the need to get business value from new data – are forcing organizations to evolve their data management practices.
By Philip Russom, TDWI Research Director for Data Management
I recently participated as a core speaker in the Informatica Big Data Ready Virtual Summit, sharing a session with Amit Walia, the Chief Product Officer at Informatica Corporation. Amit and I had an interactive conversation where we discussed one of the most pressing questions in data management today, namely: How should an organization get ready to capture and leverage big data? This is an important question, because many organizations in many industries are facing big data, with its new data sources, data types, large volumes, and fast generation rates. Organizations need to modernize their data integration (DI) infrastructure, so they can capture and leverage the new data for new business insights and analytics.
Amit Walia and I boiled down this complex issue to seven recommendations, which I will now summarize:
Achieve agility and autonomy, as required of big data and analytics. The creation of data management solutions must keep up with the pace of business by adopting agile and lean development methods. New tool functions that assist with agility and autonomy include those for data exploration and profiling, self-service data access, and rapid dataset prototyping (or “data prep”).
Govern big data, as you would any enterprise data asset. Big data has a bit of a “hall pass” today, because it’s new and exotic. But eventually, it will be assimilated as yet another category of enterprise data. Prepare for that day, by assuming that new data demands governance, stewardship, privacy, security, quality, and standards.
Include Hadoop in your data integration infrastructure. Hadoop can replace some of the database management systems and file systems you’re using today, while scaling at a reasonable cost and handling new data types. Modern users’ DI architectures already include Hadoop for landing, staging, push-down processing, archiving, hubs, and lakes.
Integrate fit-for-purpose data to enable data exploration and profiling. The trend is to integrate big data in its raw, original state, into a big data platform, such as Hadoop or a large relational MPP implementation. That way, users can explore and profile new big data to determine its business value. Later, users can repurpose discovered data many ways, sometimes at runtime, as new requirements arise for analytics or operations.
Embrace real-time data ingestion, as required by some forms of big data and analytics. A modern DI infrastructure supports many speeds and frequencies of data ingestion, because diverse data sources and business processes have diverse requirements relative to time. A new challenge for DI is to capture and process, streaming data in real time, to enable near time analytics and business operations.
Prepare to integrate big data by upgrading skills and team structures. TDWI surveys say that a lack of skill is the biggest barrier to success with new big data. Data management professionals need training for Hadoop, NoSQL, natural language processing, and new data types (e.g., JSON, social media, streams). These competencies should be added to those of existing DI competency centers.
Modernize data management solution development by combining agile, stewardship, and collaborative methods. Both agile and stewardship methods recommend the use of a pair of specialists, working together closely: a data specialist and a business representative (or steward). This “dynamic duo” accelerates requirements gathering, ensures data-to-business alignment, and delivers solutions faster than ever.
If you’d like to hear more of my discussion with Informatica’s Amit Walia (and hear other expert speakers in the Informatica Big Data Ready Virtual Summit, too), please replay the Informatica Webinar by clicking here.
Posted on January 6, 2016