Understanding the Engineer’s Data Access Challenge
Data management has been focused on storage with access an afterthought. We must take the reverse approach, thinking in terms of data access first.
- By Manuel Terranova
- March 17, 2016
Engineers -- particularly those involved in designing and maintaining product lines of expensive, critical machinery such as airliners or mainline propulsion systems for large ships -- have unique data access needs. They need to do more than understand the thinking behind a design -- they also must know the thinking behind the thinking. What were the discussions designers were having three decades ago? Where did they get inspiration? Which old designs were they remaking, and what could be reused in 2016?
Questions like these require access to multiple streams of data that run much wider and deeper than current enterprises have. The simple task of finding some of the most important datasets related to industrial product lines remains elusive.
It's tempting to assume that engineers working with ERP, asset management, and non-conformance systems have all the data they need, but it's just not the case. These systems contain structured information about a design that only accounts for a small percentage of the collective company intellectual property about a product or system. A significant and growing portion of content surrounding a design resides in unstructured form, and may include telemetry or simulation data, AutoCAD files or even design notes that explain the reasoning behind certain design decisions. Ashish Nadkarni of IDC predicts that by 2017 such unstructured data will "account for 79.2% of capacity shipped and 57.3% of revenue."
Unstructured data is challenging to manage. It doesn't fit neatly into a row and column database, and may reside in dispersed locations outside of standard repositories. Consequently, it frequently "goes dark" with hardware tech refresh cycles, leaving engineers with an incomplete picture of the decisions that informed a particular design. In an industrial design setting, unstructured data files are often very large and with widely varied formats, making them even harder to aggregate.
Recreating the Design Process Requires a Broader View of Data Access
We hear plenty today about real-time data analysis, but often older files are left out of the conversation. Engineers should care about files that may be decades old because much of the industrial equipment that keeps the world economy running has a very long lifespan. These product lines require ongoing redesign, testing and maintenance, and engineers may need to find files years or decades after they are created. Real-time data from sensors on this equipment will be of limited value unless it is correctly paired with relevant historical data.
The manufacture and design process involves an ongoing series of tradeoffs. For example, with an automotive microcontroller, the design team may have chosen to trade higher leakage for the maximum performance boost that comes with a shorter channel length. If a design is being revisited, it's important for the team to understand why a particular tradeoff was made in order to make future design decisions. This applies across the board whether you're working with geometry, simulation, or telemetry files.
Without tradeoff knowledge, design teams are partially blind, yet such information often remains elusive to all but those involved in the original design. Informal notes and meeting discussions between members of engineering teams are usually buried in siloed computers. Without a paper (or digital) trail, future teams are handicapped, especially when original engineers leave the company and take their knowledge with them. Design teams are forced to attempt a reconstruction of the previous tradeoff reasoning, but it's a time-consuming process. It's rare that this reconstruction is adequate for the current team's needs.
A View of the Thinking Behind Design Stages Could Allow Us to Do Everything Better
Many companies assume the data access problem is something they have to live with and overlook what could be gained by preserving access to this unstructured data. To begin with, companies would see a tremendous productivity boost within engineering teams, which currently have to spend a significant amount of time tracking down or attempting to reconstruct lost data assets.
With the ability to easily access, aggregate, and analyze critical datasets -- structured and unstructured -- that have accumulated around a design over an extended period of time, engineering teams would be positioned to more easily perform advanced analytics that yield new innovations. These could take the form of design improvements, service models that lead to new revenue streams, or leaps forward in predictive maintenance. We've read a lot about the promise of advanced data analytics, cloud computing, and big data, but delivery on those promises hinges on the ability to implement a sophisticated data access strategy.
The Cornerstones of Data Access for the 21st Century
Companies should consider several factors in putting together a data strategy. First, they must provide the ability to very quickly and easily store many different types of data in a common structure. Engineers and designers leverage a wide variety of file types, ranging from Office, HTML, and PDF to AutoCAD, DXF, and DWG formats. Filing into appropriate datasets can be time-consuming, and it's tempting for engineers to take the path of least resistance and store files where it happens to be most convenient.
By contrast, storing diversified files in a centralized management system requires a measure of discipline. It makes sense to ensure that storing the files requires as little discipline as possible so that engineers follow procedure. Ideally, we should provide a filing system that is as easy to access and navigate as it is to store a Word document on a desktop.
Second, this system must remain stable over the product life cycle. This is a long time when you consider that an airliner might be in service longer than the entire career of a single engineer. One engineer might store a file under a particular pathname, only to have IT move the file to different hardware in order to optimize storage resources. This process might occur several times in a product life cycle, breaking the pathname permanently.
Tech refreshes are going to happen. To avoid an IT-engineering impasse, the pathname under which a file is saved must be "decoupled" from the hardware on which the file resides. This will enable IT to move a dataset without breaking the original pathname under which it is saved. Storage can be optimized without disrupting engineers' workflow. Engineers will know that the pathname will remain the same for the indefinite future or be quickly retrievable through search regardless of where IT may have actually moved it. This can be a resource-intensive process when you're talking about very large files, but it is achievable through virtualization.
Finally, an architecture must be highly distributed in order to manage extremely large and varied datasets without the "performance arc" in which the more data you access, the slower the system runs. The system must give you a high level of availability through replication while enabling the system to scale out to a large number of servers.
Opening New Frontiers in Engineering through Data Access
Data management of the past has primarily focused on storage while access remained an afterthought. As companies move forward with advanced data analytics initiatives, many will find that they hit a wall -- data analysis technologies are only as good as data access. Moving forward, we must take the reverse approach, thinking in terms of data access first. By having all of this critical information accessible in one place -- not just the IoT telemetry that may be streaming in, but also the thinking behind the thinking that informed the original design -- organizations can achieve the true promise of the big data era.
About the Author
Peaxy CEO and president Manuel Terranova is a technology veteran with a long track record of bringing emerging technologies to market. Before co-founding Peaxy, Terranova was a senior vice president at General Electric's Drilling and Production business. During his 13 years at GE, he led a number of successful software development efforts, including a GIS software business, remote pipeline monitoring, and SupportCentral, a knowledge-based portal he co-founded that grew to become the company's second most used application worldwide. Terranova served as CIO of GE's Oil & Gas division from 2002 to 2006. In that role he led efforts to migrate the entire business from legacy applications to ERP and a contemporary application stack. You can contact the author at [email protected].