TDWI | Training & Research | Business Intelligence, Analytics, Big Data, Data Warehousing

TDWI Articles

00 Days

00 Hrs

00 Min

00 Sec

Collaborators Are the Foundation for Healthy Analytics

Your organization already has the beginnings of an effective information stewardship program. It's time to recognize how important your data collaborators are.

By Satyen Sangani
June 19, 2018

Within every organization, there are precious few individuals who are exceptionally good collaborators. Research published in the Harvard Business Review examined collaboration across 300 organizations and found that "Up to a third of value-added collaborations come from only 3% to 5% of employees." Within data teams, these individuals identify useful data sets, help analysts find the data assets they need, and share their knowledge with others. If a new analyst is lucky, one of these "super collaborators" will introduce themselves as part of the onboarding process, or a colleague will point them out as a great resource.

For Further Reading:

Balancing Self-Service, Governed BI, and Analytics

Is Collaboration the Critical Success Factor for Data Science?

5 Tips for Getting Your Team Thinking About Data

Ideally, these individuals are recognized as valuable resources, but that's not always the case. The same Harvard Business Review research found that those who are deemed to be the best sources for information in an organization likely have the lowest engagement and career satisfaction scores. Should they ever leave the organization because of their lack of engagement and dissatisfaction, their knowledge will likely be lost, surviving only in a mishmash of email, chat, meeting notes, and half-remembered conversations.

Although most of us probably think of our super collaborators as nice-to-haves -- we're grateful when they flag something useful and relieved when they save us days or weeks of searching -- in reality, they are critically important to our data organizations. These individuals are the seeds for achieving trust in data, delivered through programmatic information stewardship. These super collaborator roles must be formalized and backed with authority to create the best practices that make analytics repeatable across audiences. Their extra effort can't continue to go unnoticed.

The Rise of the Data Steward

Super collaborators are needed in today's data-driven enterprise like never before. The quantity of data has increased exponentially with the rise of self-service analytics over the past few years. Successful self-service analytics initiatives have also quickly democratized and decentralized data analysis.

IT traditionally acted as the data dam, releasing only dashboards and semantic layers that formed the basis of trusted data. With the recent flood of data, IT is now ceding this role and becoming the data enabler, providing direct access to data and sometimes providing context about how to use the data. Self-service has opened up more data access to more people and traditional, top-down data governance models have become outdated vestiges of the old world.

Providing greater access to data moves organizations closer to the ideal outcome of data-driven decisions, but there are still problems with the new model, which must rely on providing trusted context rather than providing trusted data. Even in the new world, there is simply too much data to cover. People have access to so much data that they have trouble discerning what is accurate and what is the "right" data for their analysis. Consequently, the final analysis is often flawed, riddled with inaccuracies, and embedded with bias.

According to a 2016 survey by 451 Research, "Roughly one-third of respondents had some doubt about whether the data they were using was the correct data for their purposes." This lack of trust demonstrates the need for stewards who can add structure and consistency -- a role currently underrepresented in organizations that often does not exist at all. To close the trust gap, knowledgeable individuals must be empowered to become information stewards, with the authority to contextualize and certify gold-standard data assets in the enterprise, establish analytics best practices, and enforce rules for consistent data usage.

Information stewards are also important to the scale and overall effectiveness of a data organization. Like the salons of Paris where intellectuals would gather to share information and debate theory, data teams today share knowledge in small, informal groups. This creates pockets of siloed understanding. The information is usually there, but not everyone is privy to it. To maximize their understanding of data and promote knowledge sharing across the organization, data teams should function like universities rather than salons. In the university of data, information stewards are the professors and teaching assistants, with processes and tools in place to allow them to share knowledge across the organization.

In reality, most stewards don't see themselves as stewards, and they have no desire to add more formal responsibilities to their existing collaborative workload. Traditional governance and stewardship solutions require complex, formalized business processes. One has to pre-define each role, formalize exceptions in a workflow, and ensure compliance. Traditional governance is a hammer, when all you have is a poster that requires a thumbtack. Given the volume of data and the paucity of stewards, organizations need to make it easier, not harder, to steward the data using the carrot of recognition over the stick of complicated workflows.

A New Way to Govern

Information stewards, however, aren't a silver bullet for healthy analytics. Everyone who works with data must take responsibility for their analytics projects and embrace data curation. There is no longer a single answer to most analytics questions -- in truth there probably never has been. Data produced from website log files and other semistructured sources can be interpreted in many valid ways by multiple teams.

The definition of a website session, for example, is context that changes from department to department. A product team may have a different, equally valid definition for the starting and ending events of a session than the marketing or sales team. Understanding these nuances of both data and its context requires raising the data literacy bar.

Organizations need both programmatic information stewardship -- driven by your super collaborators turned information stewards -- and broad, democratic curation carried out by every user of data. If companies take one lesson from the Facebook and Cambridge Analytica situation, it should be that it is not enough to provide API access to the data. The context of that data, how it was collected, and what the assumptions are for fair use are critical pieces of information that should be explicitly documented for anyone with access to that data. Without a culture of curation and programmatic information stewardship, data will eventually be overused and misused.

The Road Forward

A healthy self-service data organization with proper data governance requires a combination of tools, culture, and processes:

-- An automated technology to support transparency in analysis. A data catalog is an example of a technology that allows organizations to track data, its context, and the path of data usage (otherwise known as lineage by data management experts) in a self-service environment. This technology base allows your analysts to show their work and review the work of others. If an organization doesn't know what data it has and how it's used, misuse is inevitable.

A culture of curation. Tools, of course, aren't effective if they aren't used. People who touch data must take responsibility for their analysis, leveraging tools to check and double-check the accuracy of their assumptions, asking for peer review, and taking the extra step to share their perspective by noting the business context of when a data set has been useful or might include inaccuracies. To get to that point, teams must be educated on the nuances of when you should (and shouldn't) trust data.

Information stewardship and the social contract for self-service analysis. The culture flows from your information stewards -- the super collaborators who are empowered to create the programs that promote sharing and reusability and who raise the bar on data literacy. One way information stewards can inspire data curation is to create a social contract. With this contract, analysts understand that they are responsible for the veracity of their analytics. The simple act of making this responsibility known can have a dramatic effect on the analytics culture.

The good news is that your organization already has the beginnings of an effective information stewardship program. Your analytics super collaborators are walking the halls spreading their knowledge, helping new analysts get up to speed, and sharing useful data. It is time to recognize how important these individuals really are.

About the Author

Satyen Sangani is the CEO and a co-founder of Alation. Before Alation, Satyen spent nearly a decade at Oracle, where he ran the financial services warehousing and performance management business. Prior to Oracle, he was an associate at private investment firm Texas Pacific Group and an analyst with Morgan Stanley & Co. You can find the author here.

TDWI Membership

Accelerate Your Projects,
and Your Career

TDWI Members have access to exclusive research reports, publications, communities and training.

Individual, Student, and Team memberships available.

TDWI | Training & Research | Business Intelligence, Analytics, Big Data, Data Warehousing

Research & Resources

Webinars

Virtual Summits

TDWI Articles

Collaborators Are the Foundation for Healthy Analytics

Related Articles

Trending Articles

From Reactive to Proactive: Automating Data Quality in Petabyte-Scale Analytics Pipelines

From Pilot to Production: Why LLM Features Stall, and a Readiness Checklist for Data Leaders

The Inferencing Cost Problem No One Is Talking About: Unstructured Data Quality

The Hidden Cost of Poor Training Data in Generative AI

TDWI Membership

Accelerate Your Projects,
and Your Career

TDWI

Engage

Research

Research & Resources

Webinars

Virtual Summits

TDWI Articles

Collaborators Are the Foundation for Healthy Analytics

Related Articles

Trending Articles

From Reactive to Proactive: Automating Data Quality in Petabyte-Scale Analytics Pipelines

From Pilot to Production: Why LLM Features Stall, and a Readiness Checklist for Data Leaders

The Inferencing Cost Problem No One Is Talking About: Unstructured Data Quality

The Hidden Cost of Poor Training Data in Generative AI

TDWI Membership

Accelerate Your Projects, and Your Career

TDWI

Engage

Research

Accelerate Your Projects,
and Your Career