Why Enterprises Should Take a Layered Approach to Their Data Strategy
H.Y. Li of Alluxio explains why the layers of your data platform are an essential part of your data strategy.
- By James E. Powell
- February 18, 2020
Haoyuan (H.Y.) Li is the founder, chairman, and CTO of Alluxio. He received his Ph.D. in computer science from UC Berkeley AMPLab, where he co-created Alluxio (formerly Tachyon) Open Source Data Orchestration System, co-created Apache Spark Streaming, and became an Apache Spark founding committer. He spoke with Upside about how to build an enterprise data strategy.
Upside: What technology or methodology must be part of an enterprise's data strategy if it wants to be competitive today? Why?
Li: I believe that a layered approach is best when it comes to the methodologies and technologies for today's enterprise data strategy. Specifically, it's all about the layers of the data platform that underpins your data strategy. Those layers -- from top to bottom -- are the dashboard (user-facing), compute, data, and storage layers.
This methodology then dictates that it's important this stack be workload-oriented, meaning that your layer(s) should be able to support the particular type of workload you have. For example, in your compute layer, if your data requires interactive OLAP, you may want to use Presto. If you want to support easy ETL, you may want to use Spark. If you want to deploy ML, you may want to use TensorFlow. The ability to use any of these technologies when you need them requires a flexible and agile data platform.
Other important technologies are those that your users will be most familiar with based on their skill sets. For the dashboard layer, technologies such as Tableau, Superset, and Airflow all have significant market share and plenty of users. The compute layer is based on workloads plus the skill sets you have internally. For storage, it's a combination of the cloud and on-premises storage.
Finally, the data layer may be the most critical part in constructing a holistic and coherent data strategy for your data platform. It's vital to have a data orchestration technology that allows you to move data from any compute or storage framework to where you need it.
What one emerging technology are you most excited about and think has the greatest potential? What's so special about this technology?
For me it's less about one specific technology and more about all the emerging technologies I'm seeing in the data space. With the data platform built on top of data orchestration technology, it's easier to harness the value of data through various compute technologies such as Presto, Spark, TensorFlow, and others. Besides these existing technologies, I'm personally very excited about the emerging workloads people will tackle through new technologies being built at this very moment.
What is the single biggest challenge enterprises face today? How do most enterprises respond (and is it working)?
Although there are many big challenges associated with an enterprise's data strategy, fundamentally it comes down to the gap between the product offerings and the industry workforce skill sets. As more companies talk about the importance of data and data technology in their organizations, they'll need to hire the data, AI, and cloud engineers and architects to build it out. However, there aren't enough engineers with the right level of expertise in these technologies. The skill engineers need to develop is the ability to understand data, structured and unstructured, and pick the right approach to analyze it.
The response we've seen so far is more on the vendor side. Part of the skills gap is being addressed by vendors that are creating easier-to-consume product offerings and helping gradually grow knowledge in the data platform space. Still, until the knowledge gap closes, we'll continue to see a shortage of these types of engineers, and in turn, enterprises will continue to fall short on their promises of data everywhere.
Is there a new technology in data and analytics that is creating more challenges than most people realize? How should enterprises adjust their approach to it?
In my opinion, it's not so much a particular technology that's creating more challenges but instead it's harnessing value out of data that is much more challenging than people thought. I believe it fundamentally boils down to creating a coherent technology stack that enables your internal teams to easily use the tools they need to create value from the data they have.
Resolving this issue may not require an "adjustment" to the approach but more a natural evolution in how enterprises build their data platform. Typically, enterprises will build a data platform in a more ad hoc approach, mostly based on project-specific needs. We're seeing the beginning of an evolution toward spending more effort thinking through the data platform holistically instead of just on an ad hoc basis. To really enable data and compute as a service -- the underpinnings of today's data platform -- requires data orchestration technologies and container orchestration technologies. We're beginning to see these types of tools used more frequently in the enterprise than ever before.
Describe your solution and the problem it solves for enterprises.
Alluxio is an open source data orchestration technology that helps bring your data closer to compute. As more enterprise architectures shift to hybrid and multicloud environments, there's a separation of compute and storage that creates new challenges in how data needs to be managed and orchestrated across frameworks, clouds, and storage systems. Alluxio fundamentally enables the separation of compute and storage and brings speed and agility to big data and AI workloads so you get data locality, data accessibility, and data on demand. With Alluxio you can access your data anywhere it's stored for faster compute performance and zero-copy burst to the cloud.
About the Author
James E. Powell is the editorial director of TDWI, including research reports, the Business Intelligence Journal, and Upside newsletter. You can contact him
via email here.