TDWI Blog

Philip RussomPhilip Russom, Ph.D., is senior director of TDWI Research for data management and is a well-known figure in data warehousing, integration, and quality, having published over 550 research reports, magazine articles, opinion columns, and speeches over a 20-year period. Before joining TDWI in 2005, Russom was an industry analyst covering data management at Forrester Research and Giga Information Group. He also ran his own business as an independent industry analyst and consultant, was a contributing editor with leading IT magazines, and a product manager at database vendors. His Ph.D. is from Yale. You can reach him by email ([email protected]), on Twitter (twitter.com/prussom), and on LinkedIn (linkedin.com/in/philiprussom).


The Role of Centralization and Self-Service in a Successful Data Hub

A hub should centralize governance, standards, and other data controls, plus provide self-service data access and data prep for a wide range of user types.

By Philip Russom, Senior Research Director for Data Management, TDWI

I recently spoke in a webinar run by Informatica Corporation, sharing the stage with Informatica’s Scott Hedrick and Ron van Bruchem, a business architect at Rabobank. We three had an interactive conversation where we discussed the technology and business requirements of data hubs, as faced today by data management professionals and the organizations they serve. There’s a lot to say about data hubs, but we focused on the roles played by centralization and self-service, because these are two of the most pressing requirements. Please allow me to summarize my portion of the webinar.

A data hub is a data platform that serves as a distribution hub.

Data comes into a central hub, where it is collected and repurposed. Data is then distributed out to users, applications, business units, and so on.

The feature sets of data hubs vary. Home-grown hubs tend to be feature poor, because there are limits to what the average user organization can build themselves. By comparison, vendor-built data hubs are more feature rich, scalable, and modern.

A true data hub provides many useful functions. Two of the highest priority functions are:

  • Centralized control of data access for compliance, governance, security
  • Self-service access to data for user autonomy and productivity

A comprehensive data hub integrates with tools that provide many data management functions, especially those for data integration, data quality, technical and business metadata, and so on. The hallmark of a high-end hub is the publish-and-subscribe workflow, which certifies incoming data and automates broad but controlled outbound data use.

A data hub provides architecture for data and its management.

A quality data hub will assume a hub-and-spoke architecture, but be flexible so users can customize the architecture to match their current data realities and future plans. Hub-and-spoke is the preferred architecture for integration technologies (for both data management and applications), because it also falls into obvious, predictable patterns that are easy to learn, design, optimize, and maintain. Furthermore, a hub-and-spoke architecture greatly reduces the number of interfaces deployed, as compared to a point-to-point approach, which in turn reduces complexity for greater ease of use and maintainability.

A data hub centralizes control functions for data management.

When a data hub follows a hub-and-spoke architecture, it provides a single point of integration that fosters technical standards for data structures, data architecture, data management solutions, and multi-department data sharing. That single point also simplifies important business control functions, such as governance, compliance, and collaboration around data. Hence, a true data hub centralizes and facilitates multiple forms of control, for both the data itself and its usage.

A data hub enables self-service for controlled data access.

Self-service is very important, because it’s what your “internal customers” want most from a data hub. (Even so, some technical users benefit from self-service, too.) Self-service has many manifestations and benefits:

  • Self-service access to data makes users autonomous, because they needn’t wait for IT or the data management team to prepare data for them.
  • Self-service creation of datasets makes users productive
  • Self-service data exploration enables a wide range of user types to study data from new sources and discover new facts about the business

These kinds of self-service are enabled by an emerging piece of functionality called data prep, which is short for data preparation and is sometimes called data wrangling or data munging. Instead of overwhelming mildly technical or non-technical users with the richness of data integration functionality, data prep boils it down to a key subset of functions. Data prep’s simplicity and ease-of-use yields speed and agility. It empowers a data analyst, data scientist, DM developer, and some business users to construct a dataset with spontaneity and speed. With data prep, users can quickly create a prototype dataset, improve it iteratively, and publish it or push it into production.

Hence, data prep and self-service work together to make modern use cases possible, such as data exploration, discovery, visualization, and analytics. Data prep and self-service are also inherently agile and lean, thus promoting productive development and nimble business.

A quality hub supports publish and subscribe methods.

Centralization and self-service come together in one of the most important functions found in a true data hub, namely publish-and-subscribe (or simply pub/sub). This type of function is sometimes called a data workflow or data orchestration.

Here’s how pub/sub works: Data entering the hub is certified and cataloged on the way in, so that data’s in a canonical form, high quality, and audited, ready for repurposing and reuse. The catalog and its user-friendly business metadata then make it easy for users and applications to subscribe to specific datasets and generic categories of data. That way, users get quality data they can trust, but within the governance parameters of centralized control.

Summary and Recommendations.

  • Establish a data architecture and stick with it. Rely on a data hub based around a hub-and-spoke architecture, not point-to-point hairballs.
  • Adopt a data hub for the business benefits. At the top of the list would be self-service for data access, data exploration, and diverse analytics, followed by centralized functions for data governance and stewardship.
  • Deploy a data hub for technical advancement. A hub can organize and modernize your infrastructure for data integration and data management, as well as centralize technical standards for data and development.
  • Consider a vendor-built data hub. Home-grown hubs tend to be feature-poor compared to vendor-built ones. When it comes to data hubs, buy it, don’t build it.
  • Demand the important, differentiating functions, especially those you can’t build yourself. This includes pub/sub, self-service data access, data prep, business metadata, and data certification.
  • A modern data hub potentially has many features and functions. Choose and use the ones that fit your requirements today, then grow into others over time.

If you’d like to hear more of my discussion with Informatica’s Scott Hedrick and Rabobank’s Ron van Bruchem, please click here to replay the Informatica Webinar.

Posted on July 12, 20160 comments


Comprehensive and Agile End-to-End Data Management

The trend toward integrated platforms of multiple tools and functions enables broader designs and practices that satisfy new requirements.

By Philip Russom, Senior Research Director for Data Management, TDWI

Earlier this week, I spoke in a webinar run by Informatica Corporation and moderated by Informatica’s Roger Nolan. I talked about trends in user practices and vendor tools that are leading us toward what I call end-to-end (E2E) data management (DM). My talk was based on three assumptions:

  1. Data is diversifying into many structures from new and diverse sources.
  2. Business wants to diversify analytics and other data-driven practices.
  3. End-to-end data management can cope with the diversification of data, analytics, and business requirements in a comprehensive and agile manner.

In our webinar, we answered a number of questions pertinent to comprehensive and agile end-to-end data management. Allow me to summarize some of the answers for you:

What is end-to-end (E2E) data management (DM)?

End-to-end data management is one way to adopt to data’s new requirements. In this context, “end-to-end” has multiple meanings:

End-to-end DM functions. Today’s diverse data needs diverse functions for data integration, quality, profiling, event processing, replication, data sync, MDM, and more.

End-to-end tool platform. Diverse DM functions (and their user best practices) must be enabled by a portfolio of many tools, which are unified in a single integrated platform.

End-to-end agility. With a rich set of DM functions in one integrated toolset, developers can very quickly on-board data, profile it, and iteratively prototype, in the spirit of today’s agile methods.

End-to-end DM solutions. With multiple tools integrated in one platform, users can design single solutions that bring to bear multiple DM disciplines.

End-to-end range of use cases. With a feature-rich tool platform and equally diverse user skills, organizations can build solutions for diverse use cases, including data warehousing, analytics, data migrations, and data sync across applications.

End-to-end data governance. When all or most DM functions flow through one platform, governance, stewardship, compliance, and data standards are greatly simplified.

End-to-end enterprise scope. End-to-end DM draws a big picture that enables the design and maintenance of enterprise-scope data architecture and DM infrastructure.

What is the point of E2E DM?

End-to-end (E2E) data management (DM) is all about being comprehensive and agile:

  • Comprehensive -- All data management functions are integrated for development and deployment, with extras for diverse data structures and business-to-DM collaboration.
  • Agile -- Developers can very quickly on-board diverse data, profile it, and both biz/tech people can iteratively prototype and collaborate, in today’s agile spirit.

What’s an integrated tool platform? What’s it for?

An integrated platform supports many DM tool types, but with tight integration across them. The end-to-end functionality seen in an integrated DM platform typically has a data integration and/or data quality tool at its core, with additional tools for master data management, metadata management, stewardship, changed data capture, replication, event processing, data exchange, data profiling, and so on.

An integrated platform supports modern DM architectures. For example, the old way of architecting a DM solution is to create a plague of small jobs, then integrate and deploy them via scheduling. The new way (which requires an integrated toolset) architects fewer but more complex solutions, where a single data flow calls many different tools and DM functions in a controlled and feature-rich fashion.

An integrated tool platform supports many, diverse use cases. Furthermore, the multiple integrated tools of the end-to-end platform support the agile reuse of people, skills, and development artifacts across use cases. Important use cases include: data warehousing, analytics, application modernization, data migration, complete customer views, right-time data, and real-time data warehousing.

How does an integrated toolset empower agile methods?

Multiple data disciplines supported in one integrated toolset means that developers can design one data flow (instead of dozens of jobs) that includes operations for integration, quality, master data, federation, and more.

The reuse of development artifacts is far more likely with one integrated toolset than working with tools from multiple vendors.

Daily collaboration between a business subject-matter expert and a technical developer is the hallmark of agile development; an integrated DM platform supports this.

Feature-rich metadata management propels the collaboration of a business person (acting as a data steward) and a data management professional, plus self-service for data.

Self-service data access and data prep presented in a visual environment (as seen in mature integrated toolsets) can likewise propel the early prototyping and iterative development assumed of agile methods.

Automated testing and data validation can accelerate development. Manual testing distracts from the true mission, which is to build custom DM solutions that support the business.

Develop once, deploy at any latency. Reuse development artifacts, but deploy them at the speed required by specific business processes, whether batch, trickle feed, or real time.

Reinventing the wheel bogs down development. Mature integrated toolsets include rich libraries of pre-built interfaces, mappings, and templates that plug and play to boost developer productivity and agility.

What’s the role of self service in agile development methods?

Self-service data access for business users. For example, think of a business person who also serves as a data steward and therefore needs to browse data. Or consider a business analyst who is capable of ad hoc queries, when given the right tools.

Data prep for business users, analytics, and agility. Users want to work fast and independently – at the speed of thought – without need for time-consuming data management development. To enable this new best practice, the tools and platforms that support self-service data access now also support data prep, which is a form of data integration, but trimmed down for reasons of agility, usability, and performance.

Self-service and data prep for technical users. For example, self-service data exploration can be a prelude to the detailed data profiling of new data. As another example, the modern, agile approach to requirements gathering involves a business person (perhaps a steward) and a data professional, working side-by-side to explore data and decide how best to get business value from the data.

What’s the role of metadata in self-service and agile functionality?

We need complete, trusted metadata to accomplish anything in DM. And DM’s not agile, when development time is burned up creating metadata. Hence, a comprehensive E2E DM platform must support multiple forms of metadata:

  • Technical metadata – documents properties of data for integrity purposes. Required for computerized processes and their interfaces.
  • Business metadata – describes data in ways biz people understand. Absolutely required for self service data access, team collaboration, and development agility.
  • Operational metadata – records access by users and apps. Provides an audit trail for assuring compliance, privacy, security, and governance relative to data.

If you’d like to hear more, please click here to replay the Informatica Webinar.

Posted on June 30, 20160 comments


Data Warehouse Modernization: An Overview in 30 Tweets

By Philip Russom, Senior Research Director for Data Management, TDWI

To help you better understand what data warehouse (DW) modernization is, what variations it takes, who’s doing it, and why, I’d like to share with you the series of 30 tweets I recently issued on the topic. I think you’ll find the tweets interesting, because they provide an overview of data warehouse modernization in a form that’s compact, yet amazingly comprehensive.

Each tweet below is a short sound bite or stat bite drawn from the recent TDWI report “Data Warehouse Modernization in the Age of Big Data and Analytics,” which I researched and wrote. Many of the tweets focus on a statistic cited in the report, while other tweets are definitions stated in the report.

I left in the arcane acronyms, abbreviations, and incomplete sentences typical of tweets, because I think that all of you already know them or can figure them out. Even so, I deleted a few tiny URLs, hashtags, and repetitive phrases. I issued the tweets in groups, on related topics; so I’ve added some headings to this blog to show that organization. Otherwise, these are raw tweets. Enjoy!

Introduction to Data Warehouse Modernization
1. #DataWarehouse #Modernization ranges widely: upgrades; new subject areas; more platforms etc.
2. #DataWarehouse #Modernization is real. 76% of DWs are evolving dramatically or moderately.
3. 89% of #TDWI survey respondents say #DataWarehouse #Modernization is opp for innovation.

State of Data Warehouse Modernization
4. 91% of users surveyed find #DataWarehouse #Modernization extremely or moderately important.
5. Half of users surveyed say #DataWarehouse is up-to-date. Other half is behind. Both need modernizing.
6. 88% of users surveyed say #DataWarehouse still relevant to how mgt runs biz.

Drivers of Data Warehouse Modernization
7. #DWE #Modernization drivers = aligning DW w/biz; scaling to #BigData; new analytic apps; new tools & data types.
8. #DataWarehouse #Modernization fixes problems w/ DW focus, design, architecture, platform.
9. Modernize DW to leverage new types of data (unstruc,sensors,GPS) & tools (#Hadoop,CEP, cloud,SaaS).

Types of Data Warehouse Modernization
10. Continuous #Modernization is about regular recurring updates & extensions of a #DataWarehouse.
11. Disruptive #DataWarehouse #Modernization is about rip-&-replace of major datasets, platforms, tools.
12. Optimization #Modernization is about remodeling data, interfaces, processing for DW performance.

Benefits and Barriers for Data Warehouse Modernization
13. Leading beneficiaries of DW #Modernization = analytics; biz mgt; #RealTime operations.
14. Leading barriers to DW mod = problems w/ governance, staffing, funding, designs & platforms.
15. #Modernization also needed for systems DW integrates with = reporting, #analytics, #DataIntegration.

Trends in Data Warehouse Modernization
16. No.1 #Modernization trend is toward #DataWarehouse Environments (#DWEs) with multiple standalone data platforms.
17. Improving DW system arch (adding/replacing data platforms) is most common DW #modernization.
18. Platforms added to #DWE are based on column, appliance, event proc, adv’d analytics, # Hadoop.

User Plans for Data Warehouse Modernization
19. Half of org’s surveyed plan to leave current DW platform in place & add complementary platforms.
20. Half of org’s surveyed plan to rip out current DW platform & replace it within 3 to 4 years.
21. Very few users surveyed lack a plan or strategy for #DataWarehouse #Modernization.

Data Warehouse Modernization’s Effect on Architecture
22. #Modernization has reduced the number of single-DBMS-instance #DataWarehouses. Down to 19%.
23. Multi-platform #DataWarehouse Environment (#DWE) is norm for DW sys arch; 34% today.
24. Extreme #DataWarehouse Environment (#DWE) with LOTS of platforms will become sys arch norm in 3 yrs.

Hadoop’s Role in Data Warehouse Modernization, Part 1
25. #Hadoop is often deployed to modernize a DW or #DWE. Orgs w/#Hadoop in #DWE will double in 3yrs.
26. For early adaptors, #Hadoop is DW/#DWE complement, not replacement.
27. #Modernization via #Hadoop helps address “exotic” data: non-relational, unstruc, social, sensors.

Hadoop’s Role in Data Warehouse Modernization, Part 2
28. Modern DW of future will still have relational DBMS at core. But probably integrate w/#Hadoop too.
29. #Hadoop’s relational functions will improve greatly; more likely as DW replacement in 3 to 5 yrs.
30. A few users surveyed think #Hadoop will grow larger than DW but not replace it. 2% now; 14% in 3yrs.

Want to learn more about Data Warehouse Modernization?
For a more detailed discussion – in a traditional publication! – get the TDWI Best Practices Report, titled “Data Warehouse Modernization in the Age of Big Data and Analytics,” which is available in a PDF file via a free download.

You can also register for and replay the TDWI Webinar, where I discussed the findings of the TDWI report.

Posted on June 8, 20160 comments


Highlights from Informatica World 2016

Bigger than ever, with more user speakers and an impressive executive vision for product R&D

By Philip Russom, Senior Research Director for Data Management, TDWI

I just spent three days attending and speaking at Informatica World 2016 in San Francisco’s Moscone Center. Compared to previous years, this year’s event was bigger than ever, with over three thousand people in attendance and five or more simultaneous break-out tracks.

The change this year that I like most is the increased number of user case study speakers – almost double last year! To be honest, that’s my favorite part of any event, although I also like hearing executives explain their product vision and direction. With that in mind, allow me to share some highlights in those two areas, based on sessions I was able to attend at Informatica World 2016.

User Case Studies

I had the honor of sharing the stage with data integration veteran Tom Kato of Republic Services. Based on my research at TDWI, I talked about users’ trends toward integrated platforms that include tools for many data disciplines from a single vendor, as opposed to silo’d tools from multiple vendors. Tom talked about how an integrated tool strategy has played out successfully for his team at Republic Services. By adopting a comprehensive end-to-end toolset from Informatica, it was easier for them to design a comprehensive data architecture, with information lifecycle management that extends from data creation to purge.

I heard great tips by a speaker from Siemens about how their data lake is successful due to policies governing who can put data in the lake, what kind of data is allowed, and how the data is tagged and cataloged. “We saved six to twelve months by using simple flat schema in the data lake,” he said. “Eventually, we’ll add virtual dimensional models to some parts of the data lake to make it more like a data warehouse.”

A speaker from Harvard Business Publishing described a three-year migration and consolidation project, where they moved dozens of applications and datasets to clouds, both on premises and off-site (including AWS). They feel that Informatica Cloud and PowerCenter helped them move to clouds very quickly, which reduced the time that old and new systems ran concurrently with synchronization, which in turn reduced the costs and risks of migration.

Red Hat’s data warehouse architect explained his strategy for data warehouse modernization, based on modern data platforms, hybrid mixtures of clouds, complete views of customers, virtual technologies, and agile methods. Among those, clouds are the secret sauce – including Informatica Cloud, AWS, Redshift, and EC2 – because they provide the elasticity and performance Red Hat needs for the variety of analytic, reporting, and virtual workloads they run.

A dynamic duo from Verizon’s data warehouse team laid out their methods for success with clickstream analytics. They follow Gartner’s Bimodal IT approach, where old and new systems coexist and integrate. New tools capture and process clickstreams, and these are correlated with historic data in the older data warehouse. This is enabled by a hybrid architecture that integrates a mature Teradata implementation and a new Hadoop cluster, via data integration infrastructure by Informatica.

Another dynamic duo explained why and how they use Informatica Data Integration Hub (or simply DI Hub). “As a best practice, a data integration hub should connect four key entities,” said one of the Humana reps. “Those are source applications, publications of data, people who subscribe to the data, and a catalog of topics represented in the data.” Humana chose Informatica DI Hub because it suits their intended best practice, plus it supports additional requirements for a data fabric, virtual views, canonic model, data audit, and self service.

Executive Vision for Product R&D

The general sessions mostly featured keynote addresses by executives from Informatica and leading partner firms. For example, Informatica’s CEO Anil Chakravarthy discussed how Informatica technology is supporting Data 3.0, an emerging shift in data’s sources, types, technical management, and business use.

All the executive speakers were good, but I got the most out of the talk by Amit Walia, Informatica’s Chief Product Officer. It was like drinking from the proverbial fire hose. Walia announced one new product, release, or capability after the next, including new releases of Informatica Cloud, Big Data Management, Data Integration Hub, and Master Data Management (with a cloud edition). Platform realignments are seen in Informatica Intelligent Data Platform (with Hadoop as a compute engine, controlled by a new Smart Executor) and Informatica Intelligent Streaming (based on Hadoop, Spark, Kafka, and Blaze); these reveal a deep commitment to modern open source software (OSS) in Informatica’s tool development strategy. One of Walia’s biggest announcements was the new Live Data Map, which will provide a large-scale framework for complex, multi-platform data integration, as is increasingly the case with modern data ecosystems.

That’s just a sample of what Amit Walia rolled out, and yet it’s a tsunami of new products and releases. So, what’s up with that? Well, to me it means that the acquisition of Informatica last year (which made it a private company) gave Informatica back the mojo that made it famous, namely a zeal and deep financial commitment to product research and development (R&D). Informatica already has a broad and comprehensive integrated platform, which addresses just about anything you’d do in traditional data management. But, with the old mojo for R&D back, I think we’ll soon see that portfolio broaden and deepen to address new requirements around big data, machine data, analytics, IoT, cloud, mobile, social media, hubs, open source, and security.

Informatica customers have always been the sort to keep growing into more data disciplines, more data types and sources, and the business value supported by those. In the near future, those users will have even more options and possibilities to grow into.

Further Learning

To get a feel for Informatica World 2016, start with a one-minute overview video

However, I strongly recommend that you “drink from the fire hose” by hearing Amit Walia’s 40-minute keynote, which includes his amazing catalog of new products and releases.

You might also go to www.YouTube.com and search for “Informatica World 2016,” where you’ll find many useful speeches and sessions that you can replay. For something uplifting, search for Jessica Jackley’s keynote about micro loans in the third world.

Posted on May 31, 20160 comments


Modernizing Business-to-Business Data Exchange

Keep pace with evolving data and data management technologies, plus the evolving ecosystem of firms with whom you do business.

By Philip Russom, TDWI Research Director for Data Management

Earlier this week, I spoke in a webinar run by Informatica Corporation, along with Informatica’s Daniel Rezac and Alan Lundberg. Dan, Alan, and I talked about trends and directions in a very interesting data management discipline, namely business-to-business (B2B) data exchange (DE). Like all data management disciplines, B2B DE is modernizing to keep pace with evolving data types, data platforms, and data management practices, as well as evolving ways that businesses leverage exchanged data to onboard new partners and clients, build up accounts, improve operational efficiency, and analyze supply quality, partner profitability, procurement costs, and so on.

In our webinar, we answered a number of questions pertinent to the modernization of B2B DE. Allow me to summarize those for you:

What is business-to-business (B2B) data exchange (DE)?

It is the exchange of data among operational processes and their applications, whether in one enterprise or across multiple ones. A common example would be a manufacturing firm and the ecosystem of supplier and distributor companies around it. In such examples, many enterprises are involved. However, large firms with multiple, independent business units often practice B2B DE as part of their inter-unit communications within a single enterprise. Hence, B2B DE scales up to global partner ecosystems, but it also scales down to multiple business units of the same enterprise.

B2B DE integrates data across two or more businesses, whether internal or external. But it also integrates an ecosystem of organizations as it integrates data. Therefore, B2B DE is a kind of multi-organizational collaboration. And the collaboration is enabled by the transfer of datasets, documents, and files that are high quality, trusted, and standardized. Hence, there’s more than data flowing through B2B data exchange infrastructure. Your business flows through it, as well.

What are common industries and use cases for B2B DW?

The business ecosystems enabled by B2B DE are often industry specific, as with a manufacturer and its suppliers. The manufacturing ecosystem becomes quite complex, when we consider that it can include several manufacturers (who may work together on complex products, like automobiles) and that many suppliers are also manufacturers. Then there are financiers, insurers, contractors, consultants, distributors, shippers, and so on. The data and documents shared via B2B DE are key to establishing these diverse business relationships, then growing and competing within the business ecosystem.

The retail ecosystem is equally complex. A retailer does daily business with wholesalers and distributors, plus may buy goods directly from manufacturers. All these partners may also work with other retailers. A solid hub for B2B DE can provide communications and integration infrastructure for all.

Other examples of modern business practices relying on B2B DE include subrogation in insurance, trade exchanges in various industries, and the electronic medical record, HL7 standards, and payer activities in healthcare.

Why is B2B DE important?

In the industries and use cases referenced above, much of the business is flowing through B2B DE; therefore users should lavish upon it ample resources and modernization. Furthermore, B2B DE involves numerous technical interfaces, but it also is a metaphorical interface to the companies with whom you need to do business.

What’s the state of B2B DE?

There are two main problems with the current state:

B2B DE is still low-tech or no-tech in many firms. It involves paper, faxes, FedEx packages, poorly structured flat files, and ancient interfaces like electronic document interchange (EDI) and file transfer protocol (FTP). These are all useful, but they should not be the primary media. Instead, a modern B2B DE solution is online and synchronous, ideally operating in real time or close to it, while handling a wide range of data and document formats. Without these modern abilities, B2B relationships are slow to onboard and inflexible over time.

B2B DE is still too silo’d. Whether packaged or home-grown, applications for supply chain and procurement are usually designed to be silos, with little or no interaction with other apps. One way to modernize these apps is to deploy a fully functional data integration (DI) infrastructure that integrates data from supply chain, procurement, and related apps with other enterprise applications, whether for operations or analytics. With a DI foundation, modernized B2B DE can contribute information to other apps (for a more complete view of partners, supplies, etc.) and analytic data (for insights into B2B relationships and activities).

What’s driving users to modernize B2B DE?

Business ecosystems create different kinds of “peer pressure.” For example, if your partners and clients are modernizing, you must too, so you can keep doing business with them and grow their accounts. Likewise, if competitors in the ecosystem are modernizing, you must too, to prevent them from stealing your business. Similarly, data standards and technical platforms for communicating data and documents evolve over time. To continue to be a “player” in an ecosystem, you must modernize to keep pace with the evolution.

Cost is also an important driver. This why many firms are scaling down their dependence on expensive EDI-based legacy applications and the value-add networks (VANs) they often require. The consensus says that systems built around XML, JSON, and other modern standards are more feature-rich, agile, and integrate-able with the enterprise.

Note that some time-sensitive business practices aren’t possible without B2B DE operating in near time, such as like just-in-time inventory in the retail industry and outsourced material management in manufacturing. For this reason, the goal of many modernizations is to add more real-time functions to a B2B DE solution.

Self-service is a driver, too. Business people who are domain experts in supply chain, procurement, material management, manufacturing, etc. need self-service access, so they can browse orders, negotiations, shipments, build plans, and more, as represented in B2B documents and data. Those documents and datasets are infamous for data quality problems, noncompliance with standards, and other issues demanding human intervention; so domain experts need to remediate, onboard, and route them in a self-service fashion.

Why are data standards and translations key to success with B2B DE?

The way your organization models data is probably quite different from how your partners and clients do it. For this reason, B2B DE is regularly accomplished via an exchange data model and/or document type. Many of these are industry specific, as with SWIFT for financials and HL7 for healthcare. Many are “de jure” in that they are adjudicated by a standards body, such as the American National Standards Institute (ANSI) or the International Standards Organization (ISO). However, it’s equally common that partners come together and design their own ad hoc standards.

With all that in mind, your platform for B2B DE should support as many de jure standards as possible, out of the box. But it must also have a development environment where you can implement ad hoc standards. In addition, translating between multiple standards can be a critical success factor; so your platform should include several pre-built translators, as well as development tools for creating ad hoc translations.

What are some best practices and critical success factors for B2B DE?

  • Business-to-business data exchange is critical to your business. So give it ample business and technical resources, and modernize it to remain competitive in your business ecosystem.
  • Remember that B2B DE is not just about you. Balance the requirements of clients, partners, competitors, and (lastly) your organization.
  • Poll the ecosystem you operate in to keep up with its changes. As partners, clients, and competitors adopt new standards and tools, consider doing the same.
  • Mix old and new B2B technologies and practices. Older low-tech and EDI-based systems will linger. But you should still build new solutions on more modern platforms and data standards. The catch is to integrate old and new, so you support all parties, regardless of the vintage of tech they require.
  • Build a business case for B2B data exchange. To get support for modernization, identify a high-value use case (e.g., enterprise integration, real time, pressure from partners and competition), and find a business sponsor who also sees the value.

If you’d like to hear more of my discussion with Informatica’s Daniel Rezac and Alan Lundberg, you can replay the Informatica Webinar.

Posted on April 29, 20160 comments


Modernizing Data Integration and Data Warehousing with Data Hubs

As data and its management continue to evolve, users should consider a variety of modernization strategies, including data hubs.

By Philip Russom, TDWI Research Director for Data Management

This week, I spoke in a webinar run by Informatica Corporation, sharing the stage with Informatica’s Scott Hedrick. Scott and I had an interactive conversation where we discussed modernization trends and options, as faced today by data management professionals and the organizations they serve. Since data hubs are a common strategy for capturing modern data and for modernizing data integration architectures, we included a special focus on hubs in our conversation. We also drilled into how modern hubs can boost various applications in analytics and application data integration operations.

Scott and I organized the webinar around a series of questions. Please allow me to summarize the webinar by posing the questions with brief answers:

What is data management modernization?

It’s the improvement of tools, platforms, and solutions for data integration and other data management disciplines, plus the modernization of both technical and business users’ skills for working with data. Modernization is usually selective, in that it may focus on server upgrades, new datasets, new data types, or how all the aforementioned satisfy new data-driven business requirements for new analytics, complete views, and integrating data across multiple operational applications.

What trends in data management drive modernization?

Just about everything in and around data management is evolving. Data itself is evolving into more massive volumes of greater structural diversity, coming from more sources than ever and generated faster and more frequently than ever. The way we capture and manage data is likewise evolving, with new data platforms (appliances, columnar databases, Hadoop, etc.) and new techniques (data exportation, discovery, prep, lakes, etc.). Businesses are evolving, too, as they seek greater business value and organizational advantage from growing and diversifying data – often through analytics.

What is the business value of modernizing data management?

A survey run by TDWI in late 2015 asked users to identify the top benefits of modernizing data. In priority order, they noted improvements in analytics, decision making (both strategic and operational), real-time reporting and analytics, operational efficiency, agile tech and nimble business, competitive advantage, new business requirements, and complete views of customers and other important business entities.

What are common challenges to modernizing data management?

The TDWI survey mentioned above uncovered the following challenges (in priority order): poor stewardship or governance, poor quality data or metadata, inadequate staffing or skills, funding or sponsorship, and the growing complexity of data management architectures.

What are the best practices for modernizing data management?

First and foremost, everyone must assure that the modernization of data management aligns with the stated goals of the organization, which in turn assures sponsorship and a return on the investment. Replace, update, or redesign one component of data management infrastructure at a time, to avoid a risky big bang project. Don’t forget to modernize your people by training them in new skills and officially supporting new competencies on your development team. Modernization may lead you to embrace best practices that are new to you. Common ones today include: agile development, light-weight data prep, right-time data movement, multiple ingestion techniques, non-traditional data, and new data platform types.

As a special case, TDWI sees various types of data hubs playing substantial roles in data management modernization, because they can support a wide range of datasets (from landing to complete views to analytics) and do so with better and easier data governance, audit trail, and collaboration. Plus, modernizing your data management infrastructure by adding a data hub is an incremental improvement, instead of a risky, disruptive rip-and-replace project.

What’s driving users toward the use of modern data hubs?

Data integration based on a data hub replaces two of the biggest problems in data management design and development: point-to-point interfaces (which limit reuse and standards, plus are impossible to maintain or optimize) and traditional waterfall or other development methods (which take months to complete and are difficult to keep aligned with business goals).

What functions and benefits should users expect from a vendor-built data hub?

Vendor-built data hubs support advanced functions that are impossible for most user organizations to build themselves. These functions include: controlled and governable publish and subscribe methods; the orchestration of workflows and data flows across multiple systems; easy-to-use GUIs and wizards that enable self-service data access; and visibility and collaboration for both technical and business people across a range of data.

Data hubs are great for analytics. But what about data hubs for operational applications and their data?

Instead of consolidating large operational applications in the multi-month or year project, some users integrate and modernize them quickly at the data level via a shared data hub, perhaps on a cloud. For organizations with multiple customer facing applications for customer relationship management (CRM) and salesforce automation (SFA), a data hub can be a single, trusted version of customer data, which is replicated and synchronized across all these applications. A data hub adds additional functions that users of operational applications can use to extend their jobs, namely self-service data access and collaboration over operational data.

What does a truly modern data hub offer as storage options?

Almost all home-grown data hubs and most vendor-built hubs are based on one brand of relational database management system, despite the fact that data’s schema, formats, models, structures, and file types are diversifying aggressively. A modern data hub must support relational databases (because these continue to be vital for data management), but also support newer databases, file systems, and – very importantly – Hadoop.

If you’d like to hear more of my discussion with Informatica’s Scott Hedrick, please click here to replay the Informatica Webinar.

Posted on March 29, 20160 comments