TDWI | Training & Research | Business Intelligence, Analytics, Big Data, Data Warehousing

Think
- Research & Resources
  - TDWI Playbook | Next Generation Data Science: The AI-Driven Data Science Life Cycle
  - TDWI Data Points | The Data Foundation for AI
  - TDWI Best Practices Report | Data Strategies and Foundations for Modern Data Management
  - TDWI Insight Accelerator | Adopting a Platform Approach for Gaining Insights from Unstructured Data
- Webinars
  - Data Integration for AI: Overcoming Modern Pipeline Challenges July 23, 2025
  - From Silos to Insights: Centralizing Data to Drive AI July 24, 2025
  - Expert Panel: Leveraging AI-Powered Solutions for Data Management July 28, 2025
  - A Generative AI Framework for Credit and Financial Markets July 29, 2025
- Virtual Summits
  - Virtual Events Keys to Making Your Data AI Ready September 10, 2025
  - Virtual Events Data Quality for BI, Analytics and AI October 22, 2025
  - Virtual Events Modern Data Strategy November 12, 2025
  - Virtual Events What’s Ahead in 2026 for Data & Analytics December 10, 2025
- By Topic
  - By Topic
    
    Explore the Latest AI, Analytics, and Data Research and Training by Topic
  - BI, Analytics, and Data Literacy
  - AI, Data Science, and Machine Learning
  - Data Management and Governance
  - Platforms and Architecture
  - Strategy and Methods
- Speaking of Data Podcast
  
  Current Research Surveys
Train
- In-Person Events
  - Conference TDWI Transform 2025 San Diego August 18, 2025
  - Executive Summit TDWI Modern Data Leader's Summit San Diego: AI in the Enterprise August 18, 2025
  - Conference TDWI Transform 2025 Orlando November 16, 2025
  - Executive Summit TDWI Data & AI Leaders Summit Orlando: Governing Data, Analytics, and AI November 17, 2025
- Virtual Live Seminars
  - TDWI Data Governance Principles and Practices: Managing Data as an Asset June 25, 2025
  - Building Your Company’s Data Governance Roadmap June 25, 2025
  - Data Governance: Driving Engagement and Organizational Change June 26, 2025
  - A Framework for Modern Data Governance June 25, 2025
- Online Learning
- By Topic
  - By Topic
    
    Explore the Latest AI, Analytics, and Data Research and Training by Topic
  - BI, Analytics, and Data Literacy
  - AI, Data Science, and Machine Learning
  - Data Management and Governance
  - Platforms and Architecture
  - Strategy and Methods
- Train Your TeamCustom solutions for training your team
  
  Get CertifiedEarn a professional credential in BI and Analytics, Data Governance, or AI
  
  TDWI MembershipExclusive access to the research, tools, training, and connections
Engage
- Connect
  - Connect and Contribute to Our Vibrant Community of Data Leaders
    
    Subscribe to TDWI Stay up to date on the latest news and events. Sign Up
    
    Become a TDWI Member Gain exclusive access to the research, tools, training, and connections to move your careers, teams, and projects forward. Learn More
    
    Become a Part of the TDWI Research Panel Make a difference in the data and analytics industry and earn incentives by sharing your insights with TDWI. Explore Now
    
    Speak at TDWI Events Share your expertise and build your personal brand as a speaker at a TDWI In-Person or Virtual Event. Submit a Proposal
    
    Become a TDWI Research Fellow Apply to be a member of TDWI’s industry leading research team. Apply Today
    
    Become a Member of the Data & AI Leaders Forum Engage in collaborative discussions, stay ahead of the curve, and stay in the know. Apply Now
    
    Showcase Your Data & AI Solutions Reach and engage with TDWI community through multi-channel marketing programs. Learn More

TDWI Blog

Agile BI Blog Posts

See the most recent Agile BI related items below.

TDWI Blog: Data 360

Agile BI and DW: Dynamic, Continuous, and Never Done

Delivering value sooner and being adaptable to business change are two of the most important objectives today in business intelligence (BI) and data warehouse development. They are also two of the most difficult objectives to achieve. “Agility,” the theme of the upcoming TDWI World Conference and BI Executive Summit, to be held together the week of August 7 in San Diego, is about implementing methodologies and tools to that will shorten the distance to business value and make it easier to keep adding value throughout development and maintenance cycles.

We’re very excited about the programs for these two educational events. Earlier this week, I had the pleasure of moderating a Webinar aimed at giving attendees a preview of how the agility theme will play out during the week’s keynotes and sessions. The Webinar featured Paul Kautza, TDWI Director of Education, and two Agile experts who will be speaking and leading seminars at the conference: Ken Collier and Ralph Hughes.

Agile methodology has become a mainstream trend in software development circles, but it is much less mature in BI and DW. A Webinar attendee asked whether any Agile-trained expert could do Agile BI. “No,” answered Ken Collier. “Agile BI/DW training requires both Agile expertise as well as BI/DW expertise due to the nuances of commercial off-the-shelf (COTS) system integration, disparate skill sets and technologies, and large data volumes.” Ralph Hughes agreed, adding that “generic Agile folks can do crazy things and run their teams right into the ground.” Ralph then offered several innovations that he sees as necessary, including planning work against the warehouse’s reference architecture and pipelining work functions so everyone has a full sprint to work their specialty. He also advocated small, mandated test data sets for functional demos and full-volume data sets for loading and re-demo-ing after the iteration.

If you are just getting interested in Agile or are in the thick of implementing Agile for BI and DW projects, I would recommend listening to the Webinar, during which Ken and Ralph offered many wise bits of advice that they will explain in greater depth at the conference. The BI Executive Summit will feature management-oriented sessions on Agile, including a session by Ralph, but will also take a broader view of how innovations in BI and DW are enabling these systems to better support business requirements for greater agility, flexibility, and adaptability. These innovations include mobile, self-service, and cloud-based BI.

As working with information becomes integral to more lines of business and operations, patience with long development and deployment cycles will get increasingly thin. The time is ripe for organizations to explore what Agile methodologies as well as recent technology innovations can do to deliver business value sooner and continuously, in a virtuous cycle that does not end. In Ken Collier’s words, “The most effective Agile teams view the life of a BI/DW system as a dynamic system that is never done.”

Posted by David Stodder on July 14, 20110 comments

The Spanner: The Next Generation BI Developer

To succeed with business intelligence (BI), sometimes you have to buck tradition, especially if you work at a fast-paced company in a volatile industry.

And that’s what Eric Colson did when he took the helm of Neflix’ BI team last year. He quickly discovered that his team of BI specialists moved too slowly to successfully meet business needs. “Coordination costs [among our BI specialists] were killing us,” says Colson.

Subsequently, Colson introduced the notion of a “spanner”—a BI developer who builds an entire BI solution singlehandedly. The person “spans” all BI domains, from gathering requirements to sourcing, profiling, and modeling data to ETL and report development to metadata management and Q&A testing.

Colson claims that one spanner works much faster and more effectively than a team of specialists. They work faster because they don’t have to wait for other people or teams to complete tasks or spend time in meetings coordinating development. They work more effectively because they are not biased to any one layer of the BI stack and thus embed rules where most appropriate. “A traditional BI team often makes changes in the wrong layer because no one sees the big picture,” Colson says.

Also, since spanners aren’t bound by a written contract (i.e., requirements document) created by someone else, they are free to make course corrections as they go along and “discover” the optimal solution as it unfolds. This degree of autonomy also means that spanners have higher job satisfaction and are more dedicated and accountable. One final benefit: there’s no fingerpointing, if something fails.

Not For Everyone

Of course, there are downsides to spanning. First, not every developer is capable of spanning. Some don’t have the skills, and others don’t have the interest. “We have lost some people,” admits Colson. Finding the right people isn’t easy, and you must pay a premium in salary to attract and retain them. Plus, software license costs increase because each spanner needs a full license to each BI tool in your stack.

Second, not every company is well suited spanners. Many companies won’t allocate enough money to attract and retain spanners. And mature companies in regulated or risk-averse industries may work better with a traditional BI organization and development approach.

Simplicity

Nonethless, experience shows that the simplest solution is often the best one. In that regard, spanners could be the wave of the future.

Colson says that using spanners eliminates much of the complexity of running BI programs and development projects. The only thing you need is a unifying data model and BI platform and a set of common principles, such as “avoid putting logic in code” or “account ID is a fundamental unifier.” The rest falls into the hands of the spanners who rely on their skills, experience, and judgment to create robust local applications within an enterprise architecture. Thus, with spanners, you no longer need business requirement analysts or requirements documents, a BI methodology, project managers , and a QA team, says Colson.

This is certainly pretty radical stuff, but Colson has proven that thinking and acting outside the box works, at least at Neflix. Perhaps it’s time you consider following suit!

Posted on October 21, 20100 comments

Do We Really Need Semantic Layers?

It used to be that a semantic layer was the sine qua non of a sophisticated BI deployment and program. Today, I’m not so sure.

A semantic layer is a set of predefined business objects that represent corporate data in a form that is accessible to business users. These business objects, such as metrics, dimensions, and attributes, shield users from the data complexity of schema, tables, and columns in one or more back-end databases. But a semantic layer takes time to build and slows down deployment of an initial BI solution. Business Objects (now part of SAP) took its name from this notion of a semantic layer, which was the company’s chief differentiator at its inception in the early 1990s.

A semantic layer is critical for supporting ad hoc queries by non-IT professionals. As such, it’s a vital part of supporting self-service BI, which is all the rage today. So what’s my beef? Well, 80% of most BI users don’t need to create ad hoc queries. The self-service requirements of “casual users” are easily fulfilled using parameterized reports or interactive dashboards which do not require semantic layers to build or deploy.

Accordingly, most pureplay dashboard vendors don’t incorporate a semantic layer. Corda, iDashboards, Dundas, and others are fairly quick to install and deploy precisely because they have a lightweight architecture (i.e., no semantic layer). Granted, most are best used for departmental rather than enterprise deployments, but nonetheless, these low-cost, agile solutions often support sophisticated BI solutions.

Besides casual users, there are “power users” who constitute about 20% of total users. Most power users are business analysts who typically query a range of databases, including external sources. From my experience, most bonafide analysts feel constrained by a semantic layer, preferring to use SQL to examine and extract source data directly.

So is there a role for a semantic layer today? Yes, but not in the traditional sense of providing “BI to the masses” via ad hoc query and reporting tools. Since the “masses” don’t need such tools, the question becomes who does?

Super Users. The most important reason to build a semantic layer is to support a network of “super users.” Super users are technically savvy business people in each department who gravitate to BI tools and wind up building ad hoc reports on behalf of colleagues. Since super users aren’t IT professionals with formal SQL training, they need more assistance and guiderails than a typical application developer. A semantic layer ensures super users conform to standard data definitions and create accurate reports that align with enterprise standards.

Federation. Another reason a semantic layer might be warranted is when you have a federated BI architecture where power users regularly query the same sets of data from multiple sources to support a specific application. For example, a product analyst may query historical data from a warehouse, current data from sales and inventory applications, and market data from a syndicated data feed. If this usage is consistent, then the value of building a semantic layer outweighs its costs.

Distributed Development. Mature BI teams often get to the point where they become the bottleneck for development. To alleviate the backlog of projects, they distribute development tasks back out to IT professionals in each department who are capable of building data marts and complex reports and dashboards. To make distributed development work, the corporate BI team needs to establish standards for data and metric definitions, operational procedures, software development, project management, and technology. A semantic layer ensures that all developers use the same definitions for enterprise metrics, dimensions, and other business objects.

Semi-legitimate Power Users. You have inexperienced power users who don’t know how to form proper SQL and aren’t very familiar with the source systems they want to access. This type of power user is probably more akin to a super user than a business analyst and would be a good candidate for a semantic layer. However, before outfitting these users with ad hoc query tools, first determine whether a parameterized report, an interactive dashboard, or a visual analysis tool (e.g., Tableau) can meet their needs.

So there you have it. Semantic layers facilitate ad hoc query and reporting. But the only people who need ad hoc query and reporting tools these days are super users and distributed IT developers. However, if you are trying to deliver BI to the masses of casual users, then a semantic layer might not be worth the effort. Do you agree?

Posted by Wayne Eckerson on July 28, 20100 comments

The Key to Analytics: Ask the Right Questions

People think analytics is about getting the right answers. In truth, it’s about asking the right questions.

Analysts can find the answer to just about any question. So, the difference between a good analyst and a mediocre one is the questions they choose to ask. The best questions test long-held assumptions about what makes the business tick. The answers to these questions drive concrete changes to processes, resulting in lower costs, higher revenue, or better customer service.

Often, the obvious metrics don’t correlate with sought-after results, so it’s a waste of time focusing on them, says Ken Rudin, general manager of analytics at Zynga and a keynote speaker at TDWI’s upcoming BI Executive Summit in San Diego on August 16-18.

Challenge Assumptions

For instance, many companies evaluate the effectiveness of their Web sites by calculating the number of page hits. Although a standard Web metric, total page hits often doesn’t correlate with higher profits, revenues, registrations, or other business objectives. So, it’s important to dig deeper, to challenge assumptions rather than take them at face value. For example, a better Web metric might be the number of hits that come from referral sites (versus search engines) or time spent on the Web site or time spent on specific pages.

TDWI Example. Here’s another example closer to home. TDWI always mails conference brochures 12 weeks before an event. Why? No one really knows; that’s how it’s always been done. Ideally, we should conduct periodic experiments. Before one event, we should send a small set of brochures 11 weeks beforehand and another small set 13 weeks prior. And while we’re at it, we should test the impact of direct mail versus electronic delivery on response rates.

These types of analyses don’t take sophisticated mathematical software and expensive analysts; just time, effort, and a willingness to challenge long-held assumptions. And the results are always worth the effort; they can either validate or radically alter the way we think our business operates. Either way, the information impels us to fine-tune or restructure core business processes that can lead to better bottom-line results.

Analysts are typically bright people with strong statistical skills who are good at crunching numbers. Yet, the real intelligence required for analytics is a strong dose of common sense combined with a fearlessness to challenge long-held assumptions. “The key to analytics is not getting the right answers,” says Rudin. “It’s asking the right questions.”

Posted by Wayne Eckerson on June 10, 20100 comments

Three Tiers of Analytic Sandboxes: New Techniques to Empower Business Analysts

Analytic sandboxes are proving to be a key tactic in liberating business analysts to explore data while preventing the proliferation of spreadmarts and renegade data marts. Many BI teams already provide sandboxes of some sort, but few recognize that there are three tiers of sandboxes that can be deployed individually or in concert to meet the unique needs of every organization

Analytic sandboxes adhere to the maxim, “If you can’t beat them, join them.” They provide a “safe haven” for business analysts to explore enterprise data, combine it with local and external data, and then massage and package the resulting data sets without jeopardizing an organization’s proverbial “single version of truth” or adversely affecting performance for general DW users.

By definition, analytic sandboxes are designed for exploratory analysis, not production reporting or generalized distribution. Ideally, sandboxes come with an expiration date (e.g. 90 days), reinforcing the notion that they are designed for ad hoc analyses, not application development. If analysts want to convert what they’ve created into a scheduled report or application, they need to turn it over to the BI team to “productionize” it.

Unfortunately, analytic sandboxes can’t enforce information policies. Analysts can still export data sets to their desktop machines, email results to colleagues, and create unauthorized production applications. Ultimately, organizations that establish sandboxes must establish policies and procedures for managing information in a consistent manner and provide sufficient education about proper ways to produce and distribution information. Nonetheless, many BI teams are employing analytic sandboxes with reasonable success.

Tiers of Sandboxes

1. DW-Centric Sandboxes. The traditional analytic sandbox carves out a partition within the data warehouse database, upwards of 100GB in size, in which business analysts can create their own data sets by combining DW data with data they upload from their desktops or import from external sources. These DW-centric sandboxes preserve a single instance of enterprise data (i.e., they don’t replicate DW data), make it easier for database and DW administrators to observe what analysts are doing, and help analysts become more comfortable working in a corporate data environment. It’s also easier for the BI team to convert analyses into production applications since the analytic output is already housed in the DW.

However, a DW-centric sandbox can be difficult to manage from a systems perspective. Database administrators must create and maintain partitions and access rights and tune workload management utilities to ensure adequate performance for both general DW users and business analysts. An organization that has dozens or hundreds of analysts, each of whom wants to create large data sets and run complex queries, may bog down performance even with workload management rules in place. Inevitably, the BI team may need to upgrade the DW platform at considerable expense to support the additional workload.

2. Replicated Sandboxes. One way to avoid performance problems and systems management complexities is to replicate the DW to a separate platform designed exclusively for analysts. Many companies have begun to physically separate the production DW from ad hoc analytical activity by purchasing specialized DW appliances.

This approach offloads complex, ad hoc queries issued by a handful of people to a separate machine, leaving the production DW to support standardized report delivery, among other things. DW performance improves significantly without a costly upgrade, and analysts get free reign of a box designed exclusively for their use.

Of course, the downside to this is cost and duplication of data. Organizations must purchase, install, and maintain a separate database platform--which may or may not run the same database and server hardware as the DW. Executives may question why they need a separate machine to handle tasks they thought the DW was going to handle.

In addition, the BI team must establish and maintain a utility to replicate the data to the sandbox, which may take considerable expertise to create and maintain. The replication can be done at the source systems, the ETL layer, the DW layer (via mirrored backup), or the DW storage system. Also, with multiple copies of data, it’s easy for the two systems to get out of sync and for analysts to work with outdated information.

3. Managed Excel Sandboxes. The third tier of analytic sandbox runs on the desktop. New Excel-based analytical tools, such as Microsoft’s PowerPivot and Lyzasoft’s Lyza Workstation, contain in-memory columnar databases that run on desktop machines, giving analysts unheralded power to access, massage, and analyze large volumes of data in a manner that conforms to the way they’ve traditionally done such work (i.e., using Excel versus SQL.)

Although these spreadsheets-on-steroids seem like a BI manager’s worst nightmare, there is a silver lining: analysts who want to share their results have to publish through a server managed by corporate IT. This is why I call this type of sandbox a “managed Excel” environment.

For example, with Microsoft PowerPivot, analysts publish their results to Microsoft SharePoint, which makes the results available to other users via Excel Services, which is a browser-based version of Excel. Excel Services prevents users from changing or downloading the report, preventing unauthorized distribution. In the same way, Lyzasoft lets analysts publish data to the Lyza Commons, where others can view and comment on the output via browser-based collaborative tools.

Of course, where there is a will there is a way and business analysts can and will find ways to circumvent the publishing and distribution features built into PowerPivot and Lyza workbooks and other managed Excel environments. But the collaborative features of their server-based environments are so powerful and compelling that I suspect most business analysts will take the path of least resistance and share information in this controlled manner.

Combining Sandboxes. A managed Excel sandbox might work well in conjunction with the other two sandboxes, especially if the corporate sandboxes have performance or size constraints. For example, analysts could download a subset of data from a centralized sandbox to their managed Excel application, combine it with local data on their desktops, and conduct their analyses using Excel. If they liked what they discovered, they could then run the analysis against the entire DW within the confines of a centralized sandbox.

Our industry is in the early stages of learning how to make most effective use of analytic sandboxes to liberate power users without undermining information consistency. With three (and perhaps more) types of analytic sandboxes, BI teams can tailor the sandbox experience to meet the unique needs of their organization.

Posted by Wayne Eckerson on March 22, 20100 comments

Zen BI: The Wisdom of Letting Go

One of my takeaways from last week’s BI Executive Summit in Las Vegas is that veteran BI directors are worried about the pace of change at the departmental level. More specifically, they are worried about how to support the business’ desire for new tools and data stores without undermining the data warehousing architecture and single version of truth they have worked so hard to deliver.

At the same time, many have recognized that the corporate BI team they manage has become a bottleneck. They know that if they don’t deliver solutions faster and reduce their project backlog, departments will circumvent them and develop renegade BI solutions that undermine the architectural integrity of the data warehousing environment.

The Wisdom of Letting Go

In terms of TDWI’s BI Maturity Model, these DW veterans have achieved adulthood (i.e. centralized development and EDW) and are on the cusp of landing in the Sage stage. However, to achieve true BI wisdom (i.e. Sage stage), they must do something that is both counterintuitive and terrifying: they must let go. They must empower departments and business units to build their own DW and BI solutions.

Entrusting departments to do the right thing is a terrifying prospect for most BI veterans. They fear that the departments will create islands of analytical information and undermine data consistency with which they have worked so hard to achieve. The thought of empowering departments makes them grip the proverbial BI steering wheel tighter. But asserting control at this stage usually backfires. The only option is to adopt a Zenlike attitude and let go.

Trust in Standards

I'm reminded of the advice that Yoda in the movie “Star Wars” provides his Jedi warriors-in-training: “Let go and trust the force.” But, in this case, DW veterans need to trust their standards. That is, the BI standards that they’ve developed in the BI Competency Center, including definitions for business objects (i.e. business entities and metrics), processes for managing BI projects, techniques for developing BI software, and processes and procedures for managing ETL jobs and handling errors, among other things.

Some DW veterans who have gone down this path add the caveat: “trust but verify.” Although educating and training departmental IT personnel about proper BI development is critical, it’s also important to create validation routines where possible to ensure business units conform to standards.

Engage Departmental Analysts

The cagiest veterans also recognize that the key to making distributed BI development work is to recruit key analysts in each department to serve on a BI Working Committee. The Working Committee defines DW and BI standards, technologies, and architectures and essentially drives the BI effort, reporting their recommendations to the BI Steering Committee comprised of business sponsors for approval. Engaging analysts who are most apt to create renegade BI systems ensures the DW serves their needs and helps ensure buy-in and support.

By adopting a Zen like approach to BI, veteran DW managers can eliminate project backlogs, ensure a high level of customer satisfaction, and achieve BI nirvana.

Posted by Wayne Eckerson on February 28, 20100 comments

Mashboards: New Tools for Self-Service BI

There is an emerging type of dashboard product that enables power users to craft ad hoc dashboards for themselves and peers by piecing together elements from existing reports and external Web pages. I’m calling these “Mashboards” because they “mash” together existing charts and tables within a dashboard framework. Other potential terms are “Report Portal,” “Metrics Portal,” and “Dashmart.”

I see Mashboards as the dashboard equivalent of the ad hoc report, which has spearheaded the self-service BI movement in recent years. Vendors began delivering ad hoc reporting tools to ease the report backlog that afflicts most BI deployments and dampens sales of BI tools. Ad hoc reports rely on a semantic layer that enables power users to drag and drop predefined business objects onto a WYSIWYG reporting canvas to create a simple report.

Likewise, Mashboards enable powers to select from predefined report “parts” (e.g., charts, tables, selectors) and drag and drop them onto a WYSIWYG dashboard canvas. Before you can create a Mashboard, IT developers need to create reports using the vendor’s standard report authoring environment. The “report parts” are often self-contained pieces of XML code--or gadgets--that are wired to display predefined sets of data or can be easily associated with data from a semantic layer. Power users can apply filters and selectors to the gadgets without coding.

Mashboards are a great way for organizations to augment enterprise or executive dashboards that are designed to deliver 60% to 80% of casual users’ information needs. The Mashboards can be used to address the other 20% to 40% of those needs on an ad hoc basis or deliver highly personalized dashboard for an executive or manager. (I should note that enterprise dashboards should also be personalizable as well.)

Dashmarts? However, there is a danger that Mashboards will end up becoming just another analytical silo. Their flexibility lends themselves to becoming a visual spreadmart, which is why I’m tempted to call them Dashmarts. However, Mashboards that require power users to source all data elements from existing reports and parts should minimize this to some degree.

All in all, Mashboards are a great additional to a BI portfolio. They provide a new type of ad hoc report that is more visual and easily consumed by casual users. And they are a clever way for vendors to extend the value of their existing reporting and analysis tools.

Posted by Wayne Eckerson on February 16, 20100 comments

Sleep Well at Night: Abstract Your Source Systems

It’s odd that our industry has established a best practice for creating a layer of abstraction between business users and the data warehouse (i.e., a semantic layer or business objects), but we have not done the same thing on the back end.

Today, when a database administrator adds, changes, or deletes fields in a source system, it breaks the feeds to the data warehouse. Usually, source systems owners don’t notify the data warehousing team of the changes, forcing us to scramble to track down the source of the errors, rerun ETL routines, and patch any residual problems before business users awake in the morning and demand to see their error-free reports.

It’s time we get some sleep at night and create a layer of abstraction that insulates our ETL routines from the vicissitudes of source systems changes. This sounds great, but how?

Insulating Netflix

Eric Colson, whose novel approaches to BI appeared two weeks ago in my blog “Revolutionary BI: When Agile is Not Fast Enough,” has found a simple way to abstract source systems at Netflix. Rather than pulling data directly from source systems, Colson’s BI team pulls data from a file that source systems teams publish to. It’s kind of an old-fashioned publish-and-subscribe messaging system that insulates both sides from changes in the other.

“This has worked wonderfully with the [source systems] teams that are using it so far,” says Colson, who believes this layer of abstraction is critical when source systems change at breakneck speed, like they do at Neflix. “The benefit for the source systems team is that they get to go as fast as they want and don’t have to communicate changes to us. One team migrated a system to the cloud and never even told us! The move was totally transparent.”

On the flip side, the publish-and-subscribe system alleviates Colson’s team from having to 1) access to source systems 2) run queries on those systems 3) know the names and logic governing tables and columns in those systems and 4) keep up with changes in the systems. They also get much better quality data from source systems in this way.

Changing Mindsets

However, Colson admits that he might get push back from some source systems teams. “We are asking them to do more work and take responsibility for the quality of data they publish into the file,” says Colson. “But this gives them a lot more flexibility to make changes without having to coordinate with us.” If the source team wants to add a column, it simply appends it to the end of the file.

This approach is a big mindset change from the way most data warehousing teams interface with source systems teams. The mentality is: “We will fix whatever you give us.” Colson’s technique, on the other hand, forces the source systems teams to design their databases and implement changes with downstream analysis in mind. For example, says Colson, “they will inevitably avoid adding proprietary logic and other weird stuff that would be hard to encapsulate in the file.”

Time to Deploy

Call me a BI rube, but I’ve always assumed that BI teams by default create such an insulating layer between their ETL tools and source systems. Perhaps for companies that don’t operate at the speed of Netflix, ETL tools offer enough abstraction. But, it seems to me that Colson’s solution is a simple, low-cost way to improve the adaptability and quality of data warehousing environments that everyone can and should implement.

Let me know what you think!

Posted by Wayne Eckerson on February 8, 20100 comments

Revolutionary BI: When Agile is Not Fast Enough

Developers of BI unite! It is time that we liberate the means of BI production from our industrial past.

Too many BI teams are shackled by outdated modes of industrial organization. In our quest for efficiency, we’ve created rigid, fiefdoms of specialization that have hijacked the development process (and frankly, sucked all the enjoyment out of it as well.)

We’ve created an insidious assembly line in which business specialists document user requirements that they throw over the wall to data management specialists who create data models that they throw over the wall to data acquisition specialists who capture and transform data that they throw over the wall to reporting specialists who create reports for end users that they throw over the wall to a support team who helps users understand and troubleshoot reports.The distance from user need to fulfillment is longer than Odysseus' journey home from Troy and just as fraught with peril.

Flattened BI Teams

Contrary to standard beliefs, linear development based on specialization is highly inefficient. “Coordination [between BI groups] was killing us,” says Eric Colson, director of BI at Netflix. Colson inherited an industrialized BI team set up by managers who came from a banking environment. The first thing Colson did when he inherited the job was tear down the walls and cross-train everyone on the BI staff. “ Everyone now can handle the entire stack--from requirements to database to ETL to BI tools.”

Likewise, the data warehousing team at the University of Illinois found its project backlog growing bigger each year until it reorganized itself into nine small, self-governing interdisciplinary groups. By cross-training its staff and giving members the ability to switch groups every year, the data warehousing team doubled the number of projects it handles with the same staff.

The Power of One

Going one step further, Colson believes that even small teams are too slow. “What some people call agile is actually quite slow.” Colson believes that one developer trained in all facets of a BI stack can work faster and more effectively than a team. For example, it’s easier and quicker for one person to decide whether to apply a calculation in the ETL or BI layer than a small team, he says.

Furthermore, Colson doesn’t believe in requirements documents or quality assurance (QA) testing. He disbanded those groups when he took charge. He believes developers should work directly with users, which is something I posited in a recent blog titled the Principle of Proximity. And he thinks QA testing actually lowers quality because it relieves developers from having to understand the context of the data with which they are working.

It’s safe to say that Colson is not afraid to shake up the establishment. He admits, however, that his approach may not work everywhere: Netflix is a dynamic environment where source systems change daily so flexibility and fluidity are keys to BI success. He also reports directly to the CEO and has strong support as long as he delivers results.

Both the University of Illinois and Netflix have discovered that agility comes from a flexible organizational model and versatile individuals who have the skills and inclination to deliver complete solutions. They are BI revolutionaries who have successfully unshackled their BI organizations from the bondage of industrial era organizational models and assembly line development processes.

Posted by Wayne Eckerson on January 27, 20100 comments

Principle of Proximity: THE Best Practice in BI

After 15 years in the business intelligence industry, I’ve hit the mother lode: I’ve discovered the true secret to BI success. It’s really quite simple, and it’s been staring at us for years. It’s the principle of proximity.

By proximity, I mean seating your BI developers next to your business experts. Not just in a joint-application design session, a requirements interview, or scrum stand-up, but ALL THE TIME! Make them work side by side, elbow to elbow, nose to nose. It doesn’t work to merely locate them on the same campus or in the same building. You need to put them in the same cubicle block, or better yet, in one big room with no walls so everyone can see, hear, smell, and touch everyone else all the time. Radical, but effective.

And don’t mistake me: I’m not talking about business requirements analysts--I’m talking about developers who write the code and design the models. Yes, make the developers get the requirements right from the horse’s mouth. Don’t force them to learn requirements second hand through a business requirements analyst. Trust me, something always gets lost in translation.

To develop awesome BI applications, you have to function like a small start up where there are no departments or organizational boundaries, no separatejargon or incentives, no separate managers or objectives, and NO WALLS. Just one big, messy, energetic, on-the-same-wavelength family that gets things done. And fast.

Role of Agile. I like agile software development methods. They come as close as any methodology to approximating the principle of proximity. If nothing else, go agile. Create a small team of business and technical people and make them do stand-up meetings daily, if not hourly! And hold them jointly accountable for the outcome.

But as good as agile can be, proximity is better. Why? When you place developers and business experts in the same room, they almost don’t need to talk. They absorb what they need to know by osmosis, and they learn to respect what each group needs to do to succeed. And fewer meetings make happier, more productive people.

Several years ago, Wes Flores, a technology manager at Verizon, told me the secret of his group’s success: “We sit side by side with business people and report into the same leadership. The only difference is that we specialize in the data and they specialize in the business process.”

So if you want to succeed at BI, reassign your business requirements analysts and immerse your BI developers in the physical heart of the business by applying the principle of proximity.

Posted by Wayne Eckerson on January 7, 20100 comments