Deep Trouble for Deep Learning: Hidden Technical Debt
In order to take advantage of the full potential of deep learning, enterprises will need platforms that can help resolve significant challenges with technical debt.
- By Serkan Piantino
- December 2, 2021
In the last few years, deep learning (DL) has grown significantly, driven by trends such as COVID-19, digital transformation, and IoT adoption. It's moving from concept to applications across industries, enabling powerful new capabilities for natural language processing, machine vision, and voice recognition and synthesis.
Thus, it isn't surprising that the market is expected to grow by $7B from 2020–2024. The increasing complexity of deep learning is creating an explosion of technical debt that can only be reduced by introducing an underlying infrastructure platform for AI orchestration and automation that eliminates costly, error-prone ad hoc coding and procedures.
Defining AI Debt
The term "technical debt" traditionally describes the cost of closing the gap between acceptable and optimal functionality in conventional application development (AD). It is the consequence of human decisions favoring delivery speed over code quality and the subsequent need to refactor the code to improve functionality, performance, or reliability at a later date. Therefore, the cost of AD technical debt can be reduced over time.
Artificial intelligence (AI) development introduces a new kind of technical debt in the form of a multitude of ancillary coding and procedural tasks concurrent with the formulation, training, delivery, and deployment of an AI model. It is the consequence of the arbitrary needs of the AI itself in relation to its data, computation, and predictions as it forms, learns, and acts. The price of AI technical debt is constant -- and costly. This kind of technical debt is manifest in complex manual configuration procedures for cloud infrastructure, ad hoc programming for optimizing deep learning efficiency, and operational management oversight and reporting activities for model deployment and monitoring.
AD technical debt is like a low-interest home mortgage; AI technical debt is like a high-interest payday loan.
Building and deploying conventional applications and software systems is a deterministic, unidirectional process of forward iteration where needed changes and technical debt can be predicted, planned, and reduced (release by release) over time.
For AI applications, and especially for DL, managing technical debt in this way is simply not possible. The neural net and transformer algorithms used to create AI models and their performance "in the wild" are effectively stochastic, continuously manifesting unpredictable requirements and behaviors.
Technical debt for DL continuously builds throughout the model's lifespan in unpredictable ways, and it is impossible for developers to manage through ad hoc iteration using manual tools and predefined processes.
Fortunately, AI orchestration and automation platforms (AIOAPs) can help.
Tackling Technical Debt with AIOAP
AIOAPs help reduce AI technical debt by replacing ad hoc coding and repetitive procedures with an active infrastructure that automatically responds to the dynamic, unpredictable needs of developing and deploying AI models.
Presently, AIOAP vendors are solving these problems for the more established market of traditional ML, not for DL applications. Although the two are broadly similar in process, they differ greatly in terms of AI technical debt.
ML models use less (and simpler) training data, require far fewer experiments, consume far fewer compute resources, and are more easily deployed, integrated, and managed than deep learning models. ML algorithms such as linear and logistic regression, decision tree, and random forest behave more predictably with much simpler workflows than the neural net and transformer algorithms of deep learning. This all results in DL incurring much higher technical debt than ML.
The AIOAP Decision
Today, there are three types of AIOAP solutions available for deep learning.
#1: Do It Yourself
With sufficiently advanced system-level engineering skills, it is possible to fashion in house the middleware components needed to perform many of the key AIOAP functions. Some very large enterprises and extraordinarily well-funded AI start-ups do this effectively, while many others fail miserably, crushed by the unanticipated cost and complexity. In-house middleware can also create its own, potentially large conventional AD technical debt stream, which can offset or even transcend the technical debt eliminated by the AIOAP.
AWS, GPC, and Azure each offers AIOAP functions for deep learning that only operate on their respective clouds. They are all different in fundamental ways and provide no portability for workloads and payloads between clouds. However, this is a viable option for companies or groups that wish to develop and deploy all of their DL models on a single cloud platform.
DL in the cloud has one characteristic not shared with traditional ML that may impact a cloud-specific AIOAP decision: the need for costly GPU compute resources for model training. GPU availability and pricing varies widely across cloud regions, both between clouds and within a single cloud. If a DL model must be built in a particular region for regulatory or other reasons, it will not be possible to shop for lower GPU costs from another cloud provider within that region. Conversely, if a model is developed or deployed across multiple regions, it will not be possible to use a multicloud infrastructure optimized for GPU costs.
For the majority of companies seeking an AIOAP solution, the best choice is a prebuilt multicloud platform that allows easy integration of existing tools and systems. This alternative provides the flexibility of DIY at a much lower cost and can take full advantage of all cloud platforms without being locked into any one.
Although there are only a few vendors offering such solutions today, the choices are growing, with many innovative start-ups and several established enterprise software vendors now entering the deep learning AIOAP market.
Industries from manufacturing and hospitality to energy and cybersecurity will rely heavily on deep learning. Resolving the technical debt challenge is necessary to accelerate and expand the number of successful players in this area of AI -- and ultimately create new innovative use cases.
AIOAPs can eliminate AI technical debt, accelerate time-to-value, improve ROI, and help ensure regulatory compliance for all kinds of deep learning. They should become an essential element of every company's infrastructure strategy.
Serkan Piantino is an entrepreneur and technology leader based in New York City. Before founding Spell, he was founder and site director of Facebook New York and co-founder of Facebook AI Research. During his nine years at Facebook, he designed and led the development of many of Facebook's products and infrastructure including News Feed, Edge Rank, Timeline, and Messenger. He is an active advocate for the technology industry in New York City, previously on Mayor Bloomberg's Council on Technology and Innovation, and currently on the boards of Tech:NYC and AFSE. Serkan holds a B.S. in computer science from Carnegie Mellon University. He can be reached via email or LinkedIn.