TDWI Articles

Lack of Modern Data Integration Hindering Business Objectives, Survey Finds

Outdated tools and technologies make it difficult to get timely information to business users, corporate executives say.

Organizations struggle to make informed decisions, achieve smarter customer engagement, and enjoy agile and efficient operations without good data integration, according to a survey of current data integration trends conducted by TDWI Research and sponsored by Qlik. There has never been as much pressure on data integration as there is today, with users demanding faster data flows, better performance, and access to big and increasingly diverse data. A forward-thinking data integration strategy can be the linchpin to business success. As critical as data integration technology is to how fast businesses can react to customers and market needs, the survey indicates many organizations are not there yet.

For Further Reading:

Seven High-Priority Areas for Data Integration Modernization

Q&A: Emerging Tech Trends Pave Path to Integration-Platform-as-a-Service

Don't Let Data Integration Be the Downfall of Your Cloud Data Lake

Cloud Trends and the Impact on Data Integration

On-premises systems remain important, but many organizations are shifting the focus of data integration to deliver data quickly to cloud platforms from both new data sources and legacy systems, the survey found. Forward-looking organizations are innovating by deploying automated data pipelines, real-time data streaming, and data lakehouses that integrate cloud data warehouses and data lakes. These innovations are delivering agility to the business. Others are struggling using outdated tools and technologies and even spreadsheets for data integration. Despite the roadblocks, trends are moving in the right direction with cloud migration and the convergence of data warehouses and data lakes.

Cloud Data Management Trends for BI, Analytics

Migration to the cloud is clearly happening for a vast majority of enterprises. Most of the respondents are using a cloud platform for at least some of their organization's data management for BI and analytics. Only 7 percent said they were not using the cloud. However, the respondents are not finding that moving analytics to the cloud is trouble free. Responding to the statement: "We are moving to the cloud, but most or all of the same on-premises challenges remain," 18 percent said that was very true, and a majority (51 percent) said it was somewhat true. Only 19 percent said it wasn't true.

Among the cloud providers in use by respondents, AWS (38 percent) and Microsoft Azure (closely following at 36 percent) were dominant. Other cloud vendors were in single digits: Google Cloud Platform (8 percent), Oracle Cloud Infrastructure (4 percent), and IBM Cloud (1 percent). [Source Q4]

Among those who say AWS is their primary platform:

  • 65 percent have a cloud data warehouse (compared to 57 percent for all respondents); 62 percent have an on-premises data warehouse
  • 58 percent have a cloud data lake (versus 45 percent for all respondents); 29 percent have an on-premises data lake

Among those who say Microsoft Azure is their primary platform:

  • 53 percent have a cloud data warehouse; 81 percent have an on-premises data warehouse
  • 49 percent have a cloud data lake; 32 percent have an on-premises data lake

Despite the hype about growing demands for data analytics, a little more than one-fifth of survey respondents (22 percent) noted it as a significant driver to cloud migration when asked: "How significant a driver is growth in the number of users doing analytics, as well as the data they require, in accelerating your organization's adoption of cloud data management?" A lukewarm response came from 45 percent, who said it was "about as important as other drivers." Another fifth (20 percent) indicated it was "not quite as important as other drivers," and 5 percent said it wasn't "a significant driver at all."

Although analytics on cloud platforms may not be trouble-free, a vast majority of the respondents (79 percent) plan to expand data management for BI and analytics currently on cloud platforms -- with 36 percent checking the box "increase significantly" and 43 percent indicating "increase somewhat." Another 12 percent of cloud users plan to "stay about the same." Only one percent plan to decrease such use.

These results highlight three takeaways:

  • Data integration strategies need to support a spectrum of use cases in the cloud, not just pure analytics. For some organizations, migration to the cloud as part of the digital transformation of business applications and operations is at least as important as analytics. Digital transformation typically increases and enhances the role of data; both users and automated functions need continuous and timely data to drive smarter decisions. Data integration must be agile to address daily operational data needs as well as those for data science and analytics.

  • Some organizations still prefer to keep analytics on premises because data sets and models are the crown jewels. Many of these organizations remain concerned about security and availability. They worry about hacking, unauthorized access, and potential service outages with cloud provider platforms, even as these platforms display increasingly dependable security and availability. If these concerned organizations use the cloud, they often prefer private cloud arrangements.

  • Our research, however, finds major commitment to the cloud among organizations that say analytics is their most significant driver. Four out of five of these respondents (80 percent) have a cloud data warehouse and 65 percent have a cloud data lake. Nearly three out of five (59 percent) expect their cloud data management to increase significantly in the next 12 months.

The Convergence of the Data Warehouse and Data Lake

Having a single data architecture is a goal endorsed by more than seven in 10 respondents (72 percent) who agreed "This is an opportunity for us as it could provide more options for managing an increasingly diverse range of data structures, end user types, and business use cases."

Only 11 percent agreed with the statement: "This would create more problems because of the complexity of the resulting architecture and the work required to build a data lake and/or modernize a data warehouse." On the fence were 12 percent saying that it wouldn't help or hinder their data related needs.

Data lakes appear to be problem areas for the 45 percent of respondents resonating with the statement "Our data lake is basically a data swamp." Ten percent indicated that was "very true" with 35 percent saying it was "somewhat true," and another 35 percent saying it was not true. Data warehouses fared better: a total of 30 percent found truth in the statement: "Our data warehouse is where data goes to die." For 7 percent that was very true and 23 percent indicated it was somewhat true, although more than two-thirds (65 percent) rejected the idea.

Data Integration

Enterprises use a variety of approaches to integrating their data, with the greatest portion (73 percent of survey respondents) indicating they are using traditional ETL. Real-time replication and/or change data capture was a distant second at 35 percent of respondents. Just under a fifth (18 percent) were making use of data virtualization or federation.

Asked to give a "truth value" to the statement: "Most data integration is still done via spreadsheets," 14 percent checked "very true" and a third (33 percent) said "somewhat true." Half (50 percent) checked "not true." Spreadsheet use often leads to data consistency and quality problems because each user might develop and execute processes differently. Spreadsheet programs also fall short as data volume and complexity increase.

Data integration is clearly an impediment to enterprises, with spreadsheet use likely contributing to problems. Over half (57 percent) indicated that the following statement was either "very true" (13 percent) or "somewhat true" (44 percent): "By the time our data integration request has been fulfilled, the business has moved on to other priorities." Only 39 percent said that was not true.

 

Timely data is also key to the best data insights. Among those who report data refresh rates of under an hour, 49 percent are using real-time replication; 34 percent are using ELT, and 34 percent are using data preparation.

Standard ETL is often a source of latency because it takes time to extract the right data from source systems, send it to a staging area for transformation, and then move it once more to a target data warehouse or BI platform. Acceleration and scalability are major reasons why organizations are evaluating whether workloads would benefit from changing the sequence to ELT. Organizations can then take advantage of powerful and scalable processing in the cloud to run data transformation, cleansing, and enrichment processes. It's likely that traditional ETL will continue to be an important option for traditional BI reporting, but as users seek to access and integrate more numerous and varied data sources and have more complex queries, ELT could become the preferred option.

ELT also fits into unified data architecture strategies. Rather than having to manage the data warehouse and data lake separately (not to mention other data silos), our research shows that organizations are very interested in having a more unified data architecture. This holistic platform could support ELT, real-time data streaming, data pipelines, and other data-preparation processes on processing systems that offer speed, faster data refresh rates, and higher scalability.

Difficulty Using Data from Sources

For Further Reading:

Seven High-Priority Areas for Data Integration Modernization

Q&A: Emerging Tech Trends Pave Path to Integration-Platform-as-a-Service

Don't Let Data Integration Be the Downfall of Your Cloud Data Lake

It is probably not surprising that getting data from mainframe/legacy applications was rated most problematic by survey respondents with 55 percent rating it either very difficult (22 percent) or somewhat difficult (33 percent). Another traditional source, on-premises business applications were the next most problematic for 43 percent finding those sources either very difficult (7 percent) or somewhat difficult (36 percent). Organizations need solutions that enable easier and more automated access to these sources to reduce delays and dependence on custom programming.

However, newer sources are not without problems as social media -- including Facebook and Twitter -- were considered very difficult by 10 percent and somewhat difficult by 22 percent. On the plus side, only 2 percent checked the box marked very difficult for cloud data storage or data lakes, on-premises data warehouses, and cloud data warehouses.

Fresh Is Best: Data Refresh Rates

Despite all the focus on real-time analytics and using the freshest data possible, when it comes to data refresh rates for BI and analytics, almost half (48 percent) are still using nightly batch processing. Only 6 percent are doing real-time or less than one minute refreshes. The results are low for all of the more-timely ways of updating data: every 15 minutes (8 percent), hourly (14 percent), or twice a day (11 percent). We can't ignore the 5 percent of respondents checked the box labelled "Less frequently than nightly batch." Lack of current data is definitely a problem.

Given those results, no wonder respondents noted that most users were not very satisfied with data refresh rates. Very satisfied ratings ranged from a high of 18 percent for users of operational BI dashboards to a low of 5 percent for those working with AI and machine learning where a survey low of 29 percent were in the somewhat satisfied category. A majority of enterprise BI query and reporting users (54 percent) were rated somewhat satisfied, followed in that category by users of operational BI dashboards (49 percent) and self-service business analytics users (47 percent).

When the responses were filtered to include only business and IT executives, the satisfaction rates were not markedly different. Clearly satisfaction rates for data refresh is not very high across the board, and we found that line-of-business respondents were somewhat less satisfied across the board.

Adding Data Sources

Another indicator of the difficulty in getting timely data for business is the time it takes to add new data sources to a BI/analytics platform. Most respondents (43 percent) indicate it takes more than a day but less than a month. The next highest timeframe was more than an hour but less than 24 hours cited by 18 percent. One month and more than one month each got 14 percent. At the other end of the timeframe, only 5 percent checked "less than an hour."

Bringing on new data sources hasn't improved for most respondents (61 percent) who indicated that it is taking "about the same amount of time." Only 19 percent said it was taking less time, and 13 percent indicated it was taking more time.

Getting Data to Users

When it comes to the common practice of getting new data sets to data consumers in the organization, the survey indicates that it's still being done the old-fashioned way, "They submit a request to IT, the BI team, or another centralized group" was checked by 66 percent of respondents. Only 20 percent were getting data themselves using SQL and technical tools, and only 9 percent were getting what they needed via a data catalog. In many organizations, this is due to data security and governance policies; IT guards access to approved and authenticated users and monitors how data is used and shared. This suggests the importance of IT having modern data integration solutions that automate repeated processes and improve scalability for numerous and varied workloads.

Challenges and Objectives

What's hindering data integration? Perhaps not surprisingly, 77 percent of those surveyed recognized that "IT cannot keep up with business user requests for data," 33 percent indicated it was a very significant issue and 44 percent rated it somewhat significant. The significant and somewhat significant responses for "Users cannot find the data they need" were 36 percent and 42 percent respectively.

Other roadblocks to more timely use of data included taking too long to add new data to an existing data warehouse with 33 percent rating it significant and 49 percent saying somewhat significant. The largest roadblock was "Significant time and effort required to provide access to new data sources" with 28 percent saying it was significant and 57 percent saying it was somewhat significant for a total of 85 percent of respondents. Getting data to users is a major challenge.

Business Objectives Made Easier

If data integration was made easier, survey respondents see business advantages as obtainable objectives. Better customer service was cited by 74 percent. Sharpened data decision making was foreseen by 68 percent. Improved operational excellence and reduced cost both garnered 64 percent with increased productivity cited by 62 percent. Clear majorities see the business value in improving data integration with more modern tools and technologies.

Recommendations and Conclusion

Given these findings, what steps should enterprises take when it comes to data integration?

First, recognize that there is no one-size-fits-all course of action. Different users (both technical and consumers) and workloads demand different data refresh rates and varying access to new data. Self-service data integration helps users meet immediate and dynamic business requirements. However, IT-driven data integration also needs modernization to handle carefully governed, enterprise-level workloads as well as new projects that require IT experience. You must anticipate using a variety of data integration/management strategies.

Second, enterprises are moving to the cloud, but they aren't leaving their current environments behind. Yes, cloud migration is strong, but hybrid environments will continue for the foreseeable future. Your data integration strategies must enable user access across platforms

With IT struggling to keep up with changing requirements and additional data sources, enterprises may find benefits from automation. Users want (perhaps demand is a better word) speed to align with pace of business; automation can reduce those delays.

Finally, remember the importance of unity. Evaluate how to unify your data architecture. Siloed data warehouses and data lakes increase complexity, including complexity in your data integration strategy and tactics.

Data integration can be a source of frustration when there are too many delays, too much inflexibility, and chaos brought on by data integration silos that bring inconsistency and inefficiency. However, this means that data integration modernization can have a big impact. The results of this research project show that most organizations have a strong interest in modernization, particularly to accelerate data refresh and the addition of new data. Solving data integration problems such as these can go a long way toward enabling more informed, aware, and agile business processes and analytics.

- - -

Survey Methodology and Demographics

In March and April 2021, TDWI emailed invitations sent to the TDWI professional community; social media invitations were issued as well. We received 244 total responses, all of whom completed the survey.

Nearly three in ten respondents are business or IT executives (28 percent). These include corporate executives (titles such as CEO, CDO, CFO, CMO, COO, CIO, CTO, and CSO) and VPs or directors of BI, analytics, and data warehouse. The second-largest group consisted of developers and architects (22 percent), which includes data engineers, architects, and application and solution designers and developers.

Most respondents come from large or very large organizations; 45 percent work in organizations with 1,000 to 9,999 employees; a third (34 percent) come from organizations with 10,000 to 99,999 employees.

Half (50 percent) of respondents have personal experience with data integration technology, including work on data warehouses and data lakes. Almost one-fifth (19 percent) had worked implementing projects from the non-technical perspective such as project management. Sixteen percent use data integration tools in their job.

On-premises data warehouses are in use by 71 percent of respondents, 57 percent were working with a cloud data warehouse, and 45 percent have a cloud data lake. Other platforms noted by respondents, who could check as many platforms as they wished, included on-premises data mart or OLAP cube (43 percent), on-premises data lake (29 percent), cloud data mart or OLAP cube (23 percent).

TDWI Membership

Accelerate Your Projects,
and Your Career

TDWI Members have access to exclusive research reports, publications, communities and training.

Individual, Student, and Team memberships available.