RESEARCH & RESOURCES

SQL-Based Agile Analytics

TDWI sees SQL-based agile analytics increasingly used as a primary analytic method, not just a reactive one, so it’s possible that you’ll be doing it soon if you aren't already.

As you've no doubt noticed for the last few years, many enterprises are moving strongly into various forms of advanced analytics. They do it for the usual reasons: to understand constantly changing business environments, to identify bottom-line costs, and to discover new top-line opportunities for sales and product development. These are key to surviving and thriving in any economy, yet intensified by the ongoing recession.

In response, many firms are turning to three different technologies and practices:

Advanced Analytics. There are different analytic methods users can choose as they move beyond basic OLAP-based methods and into advanced analytics. Some users choose predictive analytic methods based on data mining, statistics, clustering, and so on. The majority of users, however, seem to be choosing SQL-based methods.

Extreme SQL. Users are diving deeper into SQL because it’s a technology they know and trust, and they can leverage the SQL-based tools and skills they already have. The catch is, to make SQL analytic, you have to push it to extremes. Extreme SQL, as I call it, typically involves complex SQL routines with multi-pass logic, distributed queries, and lots of clauses, developed using an iterative method that starts with ad hoc queries.

Agile BI. This is a user-oriented best practice that seeks to speed up DW/BI development and delivery by jettisoning team bureaucracy and liberating single DW/BI team members to work directly and iteratively with business users who can articulate the business requirements clearly.

(Commentary continues below)


Related Resources

White Papers:
The Top 10 Reasons for Choosing Open Source Data Integration
Is Your Approach to Multi-domain Master Data Management Upside-Down?

Webinars:
Unique Requirements of Product Data Quality
Strategies for Designing and Architecting Performance Dashboards


(Commentary continues)

My point is that users -- for a few years now -- have been putting these three practices together to spawn a new practice that I call SQL-based agile analytics. I’ve also heard it called load-and-go analytics. Regardless of what you call it, here’s what it does in terms of data preparation:

  • Users quickly load a few terabytes of data, often in response to a recent business event that needs study.
  • This large volume of data is mostly raw operational data, still in its original schema. It’s probably drawn from multiple sources, each with a different schema.
  • Note that this reactive method doesn’t allow time and resources up front for data transformation, cleansing, or remodeling. Instead, these are expressed via complex SQL, and analysts may do these later, once they gain an initial understanding of the data and the answers it can yield.

Being SQL-based, this form of agile analytics depends on lots of querying:

  • Analysts execute many ad hoc queries until the data reveals the answers they need
  • The queries get more complex with each iteration by a business analyst
  • Analysts apply SQL’s more advanced capabilities to compensate for the raw state of data, resulting in lots of WHERE clauses, table joins (often multi-way), etc.

The point of SQL-based agile analytics is to get answers to analytic questions as soon as possible after a critical business event has occurred. For example, new forms of customer churn keep popping up, and it behooves a firm to understand each one ASAP to retain customers before they’re lost. Likewise, drops in grid performance (in telco) or production yield (in manufacturing) are common topics for advanced analytics.

SQL-based agile analytics is all about agility. When a new business problem or opportunity comes up, the BI team must quickly shift focus to it. In response, the business team must react as soon as possible, based on insight from the BI team. For example, TDWI has interviewed BI team members who learned of a critical issue one morning and had a terabyte-scale analytic dataset with preliminary query results within hours. In turn, this enabled the business to respond that same afternoon with new incentives to retain customers.

Despite its current focus on tactical problems, TDWI sees SQL-based agile analytics increasingly used as a primary analytic method, not just a reactive one, so it’s possible that you’ll be doing it soon if you aren't already. This column just reveals the tip of the iceberg. To learn more, read TDWI’s recent Checklist Report on Data Requirements for Advanced Analytics, available at http://tdwi.org/research.


Related Resources

White Papers:

   
The Top 10 Reasons for Choosing
Open Source Data Integration
    Is Your Approach to Multi-domain
Master Data Management Upside-Down?

Webinars:


Unique Requirements of Product Data Quality
September 8, 2010
Speaker: Philip Russom


Strategies for Designing and Architecting Performance Dashboards
September 14, 2010
Speaker: Wayne Eckerson

TDWI Membership

Get immediate access to training discounts, video library, research, and more.

Find the right level of Membership for you.