Information- or Process-centric? The Answer Lies Somewhere In Between
Why informationistas must consider process.
- By Barry Devlin
- June 29, 2018
Are you an information expert or a process pundit? Do you find joy in the elegant structure of a data model or are you energized by pitch-perfect programming?
Such partitioning of skill and focus is widespread in IT. Deep and detailed expertise is required on both sides. As businesses ingest ever more external data, understanding its structure and extracting its real meaning demand advanced data expertise. Programmers of machine learning algorithms are redefining -- in depth and breadth -- the processes that will run the world.
Taken to the extreme, however, either approach can be problematic. On the information side, we end up with the intransigent data modeler whose enterprise data model is never ready for production. On the process side, the pernickety programmer is forever reworking some special subroutine for fastest performance or most extreme accuracy.
I don't consider myself an extremist but, as you can probably guess, I run with the data/information crowd. Thirty years of data warehousing does that to you.
To me, information is the basis of business computing, at the heart of digital transformation. In my Business unIntelligence conceptual architecture, information is the foundational thinking space, the starting point for all consideration of analytics and informational needs.
However, recent discussions with a company with forty years of process focus gave me pause for thought. Innovative Routines International (IRI), Inc. delivered the world's first commercial non-mainframe sort software in 1978 and has continuously extended it to handle a variety of conversion, transformation, cleansing, and integration function. Their metadata and engine-based approach shows how process orientation can deliver results quickly and efficiently: in parallel, in a single pass of the data, and without staging to storage.
Why is this important? The mindset of a product designer or system developer -- information- or process-focused -- can lead to very different implementation choices.
Why Informationistas Must Consider Process
Most data warehouse implementation methodologies begin with a logical architecture that already contains a variety of data stores arranged in layers -- an operational data store (ODS), a staging area, an enterprise data warehouse (EDW), and multiple data marts -- each with its own unique definition and objectives. Each layer has a good reason for its existence, often based on many years of design experience and painful memories.
A good example is the rationale for an ODS, "used for operational reporting, controls, and decision making" according to Wikipedia. The article goes on to position the ODS as a feed for the EDW and a place for some "loose integration" of data. The information-centric view will see operational reporting, controls, and decision making as valid requirements and some value in the loose integration of data prior to its entry into the EDW.
Similar -- and somewhat overlapping -- arguments can justify a staging area in the information-centric view. The result, in many cases, is four layers of data stores, each feeding data to the one above it. The impact on data timeliness for the businessperson is clear. Agility in development and, especially, in maintenance is a challenge.
In contrast, a process-centric view of the ODS requirements and objectives tries first to satisfy them in code rather than storage structures. Data can be loosely integrated and passed directly to the app used by the businessperson for operational reporting. The number of near-copies of the same data is reduced, data management is eased, maintenance agility is improved, and -- most important -- the business gets its data earlier.
A similar analysis can be applied to each layer in a traditional data warehouse architecture. Whereas the informationista may implement four layers, the processite would have none at all, having no storage structures between the data sources and the businesspeople using the data.
Of course, neither extreme is ideal -- in medio stat virtus. Trade-offs in data management, performance, and cost (among other factors) will lead to a design with fewer layers and, indeed, with data of different qualities taking alternative paths through the layering and bypassing one or more layers. An integrated set of customer reference data, for example, may be deemed worth storing in the EDW before passing to the business user, whereas a list of open orders may be passed directly through to the same user, with the two sets of data being joined "on the fly."
Data lake population design and operational analytics model management is another area where process-centric thinking helps to simplify the data integration and management environment. I discuss these and other scenarios further elsewhere.
The Middle Line
If you are an informationista, the real value of process-centric thinking is to allow (or force) you to consider other options for data warehouse design than the traditional fully layered approach. If you are a processite -- and many data lake builders are -- consider how the information-centric view of your friendly BI Center of Competence can help put some structure on the data lake.
Dr. Barry Devlin defined the first data warehouse architecture in 1985 and is among the world’s foremost authorities on BI, big data, and beyond. His 2013 book, Business unIntelligence, offers a new architecture for modern information use and management.