Using Machine Learning for Automatic Database Tuning
Can ML help database administrators improve the performance of the systems they oversee? We asked Andy Pavlo about his research at Carnegie Mellon University and the future of automated tools.
- By James E. Powell
- September 7, 2021
With the expansion of self-service analytics, greater emphasis on analytics, and the growth of big data, it’s no wonder that performance is key to maximizing the bottom-line benefits of enterprise data. Andy Pavlo of Carnegie Mellon University explains how machine learning can be applied to fine-tuning databases to wring out every last bit of performance automatically.
Upside: Given your teaching and research at Carnegie Mellon, tell us a bit of the history of autonomous database systems and the challenges to date.
Andy Pavlo: Researchers have been working on autonomous databases for decades. Ever since the first relational database management systems (RDBMSs) in the 1970s, people have dreamed of having a system that could handle all aspects of configuring, tuning, and optimizing itself.
The first autonomous DBMSs in the 1970s were called self-adaptive systems. They focused on automatic physical database design, in particular index selection and table partitioning. In the late 1990s and the 2000s, we saw a new wave of research to develop self-tuning (also "auto-tuning") DBMSs. Leading this movement was the seminal work in Microsoft Research's AutoAdmin project. They built advisory tools that recommended optimal indexes, materialized views, and partitioning schemes for a database. Other vendors, such as Oracle and IBM, created similar tools for their respective systems.
The 2000s also saw the beginning of research on automating knob configuration. Such knobs allow you to control various aspects of the DBMS's runtime behavior. For example, they can set how much memory the system allocates for data caching versus the transaction log buffer.
The high-level idea of how these early DBMSs and tools worked is essentially the same as how modern tools work today. The system would first collect metrics on how the application accesses data. It then searches for what changes to make to improve performance based on a cost model. In the case of the early knob tuning tools, they were rule-based: for a given set of inputs about the database (e.g., number of cores, amount of RAM), the tool would pick basic settings for a small number of knobs using a formula written by humans.
What is different about modern tools and what I have focused on with my research at Carnegie Mellon University is applying machine learning (ML) methods for automatic database tuning. The allure of ML is that it can potentially uncover patterns and handle complex problems beyond the abilities of humans. It can also use information collected from tuning databases in the past and apply that knowledge to new databases in the future.
The work to use ML to solve database problems that people have been working on for decades has only just begun (although IBM and Microsoft did some pioneering work in the 2000s). However, the advantages are clear, and some of the early results (including our work with OtterTune) are promising.
What are the trends and most important factors driving the need for an automatic database tuning service?
I believe that DBMSs are the most important software systems of any data-intensive application. They are often the backbone of every workload across any application domain and within any organization (e.g., business, research, government).
Modern DBMSs are notoriously challenging to manage because they have hundreds of configuration knobs that control all aspects of their runtime behavior. They assume that the person running the DBMS will know how to tune it just right for their application. Our research has shown that the performance difference for a PostgreSQL database using the default configuration from Amazon RDS versus a configuration explicitly tuned for that database's application can be nearly threefold. That kind of improvement from just the optimized configuration means that somebody could execute more queries faster using the same hardware. They could also move that database to less-expensive hardware to reduce their cloud costs.
The challenge, of course, is how to come up with that optimized database configuration. Organizations often hire experts to help with tuning activities, but experts are prohibitively expensive for many. Even if the organization could afford to hire an expert, there is still the problem that humans do not scale. The time it takes to optimize a single database can take days or even weeks, assuming that the application's workload is stable, which is usually not the case.
Now, if the workload changes (e.g., the upstream application deploys new features or loads new data), then the tuning process may have to start all over again. This effort is also just for one database. The problem is further exacerbated if an organization has dozens or even hundreds of databases.
We have repeatedly seen this problem at multiple companies, which is a good argument for the need for an automated database tuning service.
How can one use ML to automate knob configuration tuning and what are the benefits to doing so?
The goal of knob tuning is to figure out the optimal configuration settings for a DBMS given its database, workload, and hardware. For example, there is a knob that controls the amount of memory that PostgreSQL uses for sorting data, and that can affect query performance if it is incorrectly set. We want to select a value for that knob that maximizes PostgreSQL’s performance.
Of course, one does not need ML to figure out how to do this tuning if the DBMS only has one or two knobs. The real challenge is when dozens of knobs may have implicit dependencies between them or an organization with hundreds of databases running different workloads needing individual configurations.
The high-level idea of how ML-based tuning works is that the tool predicts how the DBMS will perform with different configuration settings and then selects the best values for the DBMS’s knobs based on these predictions. To do this, the tuner first observes the runtime behavior of the DBMS while it executes queries by collecting the system’s metrics. These metrics are internal performance counters that every DBMS maintains about how much work it did during query execution (e.g., disk reads/writes).
For example, MySQL keeps this information in its Performance Schema, while PostgreSQL tracks it in its Statistics Collector. The tuner also retrieves the DBMS’s current knob configuration.
The tuner then trains an ML model that uses this data for its predictions. It then runs a recommendation algorithm to generate a new configuration and then applies it to the DBMS. It continues to observe the DBMS to determine whether this new configuration improves the system’s performance.
This observation and tuning cycle may take multiple iterations, but with each observation, the tuner collects more training data that helps it improve the DBMS’s algorithms. This is one of the advantages of ML-based tuning methods. They can leverage knowledge gained from tuning previous DBMS deployments to tune new ones. Reusing data from all these tunings reduces the amount of time and resources needed to adjust a new DBMS deployment.
How will ML-driven services impact, augment, or change the role of DBAs within an organization?
We have found in customer deployments of OtterTune that DBAs are by no means replaced, nor is it our intention to replace them. Using ML to automate DBMS tuning takes away the pain of managing these systems from the DBAs.
We’ve also found that it can make them heroes at their companies. In one instance, OtterTune recommended a configuration for a major bank that increased their DBMS’s throughput by 45 percent and helped reduce their Oracle license costs. We also worked with another customer to tune their MySQL DBMSs to improve their efficiency to reduce the number of servers they needed. We have also shown that we can reduce monthly Amazon AWS costs for a PostgreSQL database by 50 percent while maintaining the same performance.
We have learned a lot about how an ML-based DBMS tuner fits an organization’s established database administration policies and practices. These experiences have led us to develop new operational settings that allow humans to control aspects of the automated tuning process. These settings have nothing to do with the ML algorithms but rather how a service such as OtterTune can co-exist with DBAs to make their lives easier.
For example, we allow the DBA to specify time windows when the service is allowed to change the DBMS’s configuration or restart the DBMS. These controls are to avoid unexpected problems when the DBA may not be available. Likewise, we also allow the DBA to specify which knobs the service cannot configure, and the permitted value ranges for certain knobs. These restrictions are to avoid exogenous problems that are not readily measurable by the service.
For example, the algorithms will learn that turning off disk writes for transaction commits makes the DBMS run faster, but if the system crashes, it might lose data because it no longer flushes data to the disk. The tuner’s algorithm has no way of knowing whether that is a problem because that is an external cost it cannot measure. A human has to make that value judgment. Thus, we disallow OtterTune from making those types of changes to a DBMS’s configuration.
What does the future hold for ML-driven database tuning?
I am bullish on the application of ML technologies for databases. In the next five years, I anticipate that more external black-box tuning tools will be released. Microsoft, Oracle, and IBM already have their proprietary tools, but new tools will be available for the most popular open-source DBMSs.
I also think that these new tools will use ML to handle other aspects of database tuning, including index selection, query tuning, and capacity planning. The critical element to achieving these tools is access to a large corpus of training data. Public cloud platforms make it easier to integrate such tools than ever before, so getting access to this data is less challenging than it used to be.
In a decade, I foresee the database community reaching the true goal of all this ML research to build completely autonomous "self-driving" DBMSs. I define a self-driving DBMS to be one that can configure, deploy, and optimize almost all aspects of itself automatically without human intervention or guidance. Such a system will rely on ML to predict future workload requirements, how a DBMS will behave in different scenarios, and what actions to take to ensure the system runs smoothly.
I do not think achieving this high level of automation will be possible with external tools. As such, my research also explores white-box methods for autonomous DBMSs that require developers to integrate ML components in the system's internal architecture. With this approach, I believe that a DBMS can perform all the tuning tasks that humans do today as well as open up new opportunities for more fine-grained optimizations that are only achievable with software. For example, the DBMS could perform "micro-tuning" on a per-query basis where the system changes its behavior for each query at the moment the application executes them.