Data Exploration and Data Profiling Can Make Data Integration More Agile
Agility comes from self-service data access, rapid dataset prototyping, and data stewardship.
- By Philip Russom, Ph.D.
- September 23, 2014
Data integration development is caught between the proverbial rock and a hard place. Two important factors are responsible.
The accelerating pace of business today. Organizations cannot wait three months for a dataset the way they used to. In a growing number of situations, business people need a dataset (and the reports, analyses, etc. based on it) in weeks or days so they can quickly react to a competitive challenge, address a shortfall in revenue, embrace a new customer segment, or bring a new partner on board.
The time it takes to do development right. Despite the need for speed, it takes time to understand business requirements, find appropriate data sources, model a target database, develop transformational logic, test, deploy, and so on.
Like everyone else in IT, data integration developers are under pressure to do it right and do it fast. To that end, data integration specialists are applying agile methods to their development regime to achieve shorter project runs, which in turn means shorter time to use for the business.
The question is: how can a data integration specialist achieve greater agility?
The answer is that some data integration specialists are now depending on new tool functions for greater agility. A number of data management and analytics tools today support self-service to data in the form of data exploration and data profiling functions.
For example, data exploration is commonly built into tools for data visualization and advanced analytics, whereas data profiling is built into most up-to-date tools for data quality and data integration. Self-service data exploration and data profiling can kick-start agility in at least three areas of data integration development:
- Self-service data access. Self-service exploration and profiling are useful to both technical and business people, enabling both to quickly find appropriate data sources and assess the condition of the discovered data.
- Rapid dataset prototyping. In most tools, the functions just described go beyond exploration and profiling -- they enable users to extract data and apply simple transformations and modeling to the extracted data. These capabilities don't replace the advanced features of mature ETL and data modeling tools, but in many cases they suffice for the rapid prototyping of datasets. Having a prototype early in a project is important to any agile methodology; data exploration and profiling functions now make rapid dataset prototyping faster and easier.
- Data (integration) stewardship. Many tools assume that multiple users will collaborate over data so users of data exploration and profiling tool functions can work simultaneously or independently to identify and assess data for a new project. For example, imagine a data integration specialist and a data steward collaborating to create early dataset prototypes.
This direct approach -- based on real data instead of tedious discussions of business needs -- is the new requirements gathering. It shaves weeks off the planning process for a new data integration project. With the technology lead and the steward working side by side, the alignment of data integration work to business requirements is more accurate. Proper expectations are set because they're working with real data instead of wishing for data that doesn't exist.
For more information, replay the archived TDWI Webinar The Three Pillars of Agile Data Integration, available here. [Editor's note: A short registration is required for users downloading from tdwi.org for the first time.]