An Open Letter to Santa: Thinking about BI “Subjects”
Today’s integrated tools can analyze data, execute business rules, and move data. What’s needed are tools that don’t work on the record level but rather examine “subjects.”
To: Santa Claus
My name is Gian Di Loreto and I live in Illinois. I have been a very good boy this year (well, mostly good). I would like to ask you for a very special present this year. This present does not exist and so you’ll have to have your elves create it from scratch (although there is some help out there -- I’ll tell you more in a minute), I apologize for the late notice, but I’m sure you can make it happen.
What I want for Christmas this year is a software application that will allow me to analyze (view, study, and profile), create, and execute business rules, move (extract, transform and load), and cleanse my client’s data. Please don’t send me of the dozens of currently available tools that claim they can handle this task because while they are fine products in their own right, they don’t work the way I need them to.
Although it’s true that there are many integrated tools available, I need my tool to perform these tasks from a fundamentally different perspective. All the tools on the market today look at data as records in a table. They study one record, decide if it’s “clean” or not, create an output record or perhaps increment a counter on their dashboard, and then move on the to the next record. There is no product that accounts for the fact that these records in the tables in almost every case describe another entity, which I like to call the “subject” -- the thing that the data describes.
Santa, I would like my tool to be built around the subject rather than around the data itself. Once the subject concept is integrated into the tool, the first step in any data-centric exercise would be to point the tool at one or more data sources and load the subjects into the tool. Therefore, the tool you build for me will need a built-in staging area.
All data can be tied back to the subject which it describes and loaded into the staging area as well. The staging area becomes a repository for the data itself but also for newly created metadata, starting with the subject table.
Once this infrastructure is in place, the data can be analyzed at the subject level. Business rules can be created around the subject, which is a concept everyone understands, not just the IT department. Dashboards and scorecards can be created and populated with subject-level data which, in turn, provides a holistic picture of the situation, beyond a picture of the data itself -- that is, whatever it is that the data describes, which in the end is what we all care about.
Finally, Santa, the quality of the data can be assessed and the data can ultimately be cleansed at the subject level rather than record by record. Let’s face it, a record-by-record data cleansing exercise is just one small step ahead of a manual data-cleansing project.
Programmatically, I don’t think this is a huge task, but it requires a sea change regarding the way this problem is viewed both by the manufacturers and the consumers of data management tools.
I think you could start by taking one of the fine existing record-level tools as an example and rebuild it while thinking about sticking to the concept of the subject at each turn.
The end result would be a truly wonderful thing. A tool we could use to cleanse data using language everyone speaks. It would be able report results that could be digested by everyone from IT up to the non-technical business leaders.
Santa, if this is not possible this year, I will try harder to be an extra-good boy next year.
Gian Di Loreto
- - -
Gian is the owner of Loreto Services and Technologies, where he is responsible for everything from sales to implementation to planning the holiday party. You can contact the author at firstname.lastname@example.org.