Three Steps To A Productive Relationship That Ensures Optimal Database Performance
When database performance issues arise, the finger of blame is often pointed at the storage infrastructure, and this can lead to conflict between the database administrator and the storage architect. Neither side is able to gain clear insight into the true nature of the problem, so often the solution is to “throw hardware” at the problem, frequently new storage. This approach overlooks an opportunity to achieve less disruptive and more substantial results by focusing on database performance first to discover the root cause of the problem.
The Blind Leading The Blind
All too often, critical response teams assigned to solve database performance issues involve a blind DBA leading a blind storage administrator. Each can run their own myopic database vendor- or storage vendor-provided performance analysis tools. Because those tools sometimes don’t provide an end-to-end view of the interaction between the database, the storage network and the storage system, the true nature of any I/O performance problem is never really identified and the most optimal solution is frequently not found.
The reality is that the storage architect is the least prepared to determine where a database performance problem actually lies. To compound the issue, DBAs and storage administrators speak different languages. It’s not surprising, then, that throwing hardware at the problem is so tempting, because it can be the path of least resistance.
Addressing database performance issues should be more analogous to tuning the individual instruments in an orchestra so that they can play together most harmonically. The database must interact with all the other components—network, memory, storage, CPU—to perform optimally. If not tuned efficiently, individual components can be over-provisioned.
Not having the ability to identify the specific component in the architecture causing the database performance problem leads many IT departments to “fix” everything by purchasing upgrades throughout the architecture. Over-provisioning may at first seem like it resolved the performance issue, but in reality it can serve to mask the real problem, and over time, the original performance issue will most certainly rear its ugly head again. To make matters worse over-provisioning tends to waste IT budget dollars.
Sometimes Flash Storage Is Not The Answer
The poster child of the “throw hardware at it” problem-solving method is a flash storage solution. Flash can be implemented in a variety of ways, including server-side caching and All-Flash Arrays. The problem is that flash can be the ultimate deception. The technology’s innate speed and low latency can mask many performance issues caused by poor database design. Flash storage, however, is priced at a premium compared to traditional hard disk technology and should really be one of the last items on the performance tuning checklist, instead of being the first. Finally, a masked database problem is likely to reoccur in the future and prevent the flash investment from reaching its full potential.
Step 1 – What Changes Can Be Made To Database Design?
Anytime the subject of database re-design is brought up, many stakeholders rightfully cringe. Database and application architects resist the request and push the DBAs to just make it work. Business stakeholders focus on the short-term delays this might cause in getting other things done. And DBAs can get stuck in the middle, negotiating with all sides to do what’s in the best interest of the long-term health and performance of the database. But the reality is that in some situations, there are changes that can be made to either the database processes or to the underlying database architecture, that can actually forestall the need to move to a flash based storage solution in the first place.
At the very least, these changes could allow the move to flash-based storage to pay even greater dividends as the database infrastructure grows. Changes to database design can range from simple to more complex, but may well prove more effective at addressing performance issues than a storage hardware upgrade, and should at least be considered. It comes back to making sure you first understand the nature of the performance problem clearly before you make a plan of action to resolve it.
The most simple of these changes is to examine when certain queries are run and how that impacts the storage load over time. Queries are a good example of the myopic nature of database design. They are often run when it makes sense for an end user, not when it makes sense for the storage infrastructure. At any given time, there may be processes running from other applications, even other databases,that consume storage I/O resources. This can cause enough storage I/O contention to impact specific query and overall storage system performance. DBAs may not have access to that information, and may even encounter resistance in getting it from the storage administrator.
DBAs and storage administrators can alleviate this problem by using a database performance analysis tool like SolarWinds Database Performance Analyzer (DPA, formerly Confio Ignite), to monitor running queries. Armed with information about the I/O wait times of those queries, they can work together to compare this to the other I/O demands of that database, other applications and the storage infrastructure in general. In our experience, we find that a high percentage of the time, these queries can be run at another time when there is less demand on the storage infrastructure. In some cases, the query can be eliminated altogether because it is an old query that is no longer needed.
Of course, sometimes a slow-running query is indeed necessary and needs to run at the specific time it is run. In other cases, the query needs to be available “on-the-fly”. In these cases, it might be possible to modifying the indexes to make the query return data more efficiently. It’s important to consider that every index has a cost and modifying an index can have unintended consequences such as making an execution plan worse. While modifying indexes does incur overhead, the capacity consumption and extra processing required to build and maintain them is often manageable, especially during defined maintenance windows.
Database query tuning analyzers can be helpful with index management. For the less experience DBA, they can provide the insight required to understand exactly which columns should be indexed and why. They can even provide information on where those indexes would be best stored. Investing in a very small flash storage area just for indexes is significantly less expensive than replacing the entire storage system with flash. For example, one might add a covering index that causes a subset of queries to execute more efficiently and putting those on flash makes it even more efficient, again at a fraction of the cost of a total storage system upgrade.
Step 2 – Changes to The Database Code
The next step is to explore the database code itself. For some organizations, this step may not be possible because the DBA doesn’t have access to the database code or no longer has the expertise in-house to fix that code. This is often an area of friction between the storage and DBA, since the storage administrator does not understand SQL coding and the DBA may be unaware of potential problem areas of the code.
It is important to point out that problems in the code may not be the DBA’s fault. Most often, the DBA team is separate from the database architecture and database developer teams. The scope of the database application often will have expanded well beyond its original intent. And finally, the underlying storage architecture may be dramatically different than when the database was first deployed. This expansion of scope combined with storage architecture changes may result in a database that is totally out of sync with its environment. In general, most designs are not “bad”. The design likely was a good design in the past, and became less optimized when circumstances were changed or when some design assumption became invalid in the wake of a change.
In this event, the DBA and storage administrator must count on external insight to guide them. Using a solution such as DPA allows the database to be as exhaustively analyzed in its current, live environment as is needed, without any bias to skew the problem resolution.
Step 3 – Throw The RIGHT Hardware At The Problem
If in analyzing the root cause of the performance issue, it’s determined that the storage infrastructure is a critical component of the solution, the key is to select the right hardware and to place it at the right point in the storage infrastructure to solve the right problem.
In this case, it’s more important than ever to bring objective performance analysis to the table, and not rely solely on the storage hardware vendor to provide guidance as to which hardware solutions to buy. Storage vendors, like the storage administrator, are largely flying blind when it comes to understanding the impact on database performance. For example, many storage vendors today immediately point to flash as the solution for all storage problems. In reality, an upgrade to a few more hard disks or improving the storage network may be all that is needed to optimize database performance.
First – Server Flash/SSD
If the performance problem is very specific and can be isolated to a single or limited number of servers, then a server-side flash device should be the first consideration. This is especially true if the performance bottleneck is caused by a read heavy operation, something that the database analysis tool can again provide insight into. In these cases, a simple caching software solution can be combined with either a solid state disk (SSD) or PCIe Flash board and deliver a remarkable increase in performance. These server-side solutions also reduce storage latency that may be contributing to the performance problem. Server-side solutions are also the least expensive of the flash performance upgrades when purchased in small quantities.
Second – Shared Flash
If the performance problem is widespread, not only for the number of applications but also the I/O pattern (reads and writes), then a traditional array architecture that is either all-flash or flash assisted may make the most sense. Like any traditional array, these systems provide universal access to storage via the storage area network (SAN). Now they can provide universal access to high performance flash through the SAN fabric. This allows flash capacity to be allocated as granularly as needed to each specific server hosting the database application. A shared all-flash or flash assisted system may also be able to provide an equal performance boost to other IT platforms like server virtualization, desktop virtualization and applications like email.
A key decision to be made here is deciding between all-flash and flash assisted devices. All-Flash Arrays are traditional storage systems that are made exclusively from flash. Flash assisted devices, also called Hybrid Arrays, typically combine flash with hard disk drives (HDD) to offer a better price per GB than All-Flash systems. The decision point between these two options typically comes down to predictability of the I/O pattern which, importantly, a database analysis tool can provide insight into.
Providing predictable performance is a concern with Hybrid Arrays. This is because they try to maximize flash capacity by moving data between the HDD and flash storage types. If the database is expecting flash performance, but for some reason has to fetch data from the HDD tier, then that will obviously impact performance. This may not be an issue if a predictable pattern to I/O can be determined by the database analysis tool, because many Hybrid solutions have the ability to pin a portion of the I/O in the flash storage tier and provide continuous and consistent flash performance for critical applications. Others can even schedule when that pinning should occur and migrate database data into cache prior to a query execution taking place.
All-Flash Arrays and most Hybrid Arrays are also ideal for database write I/O. This is because the write I/O is now stored outside of the server on a shared device with typically no single point of failure. This is unlike server side solutions where if writes are being cached at the server they are vulnerable to server failure. The one area of concern is to make sure that the Hybrid system does indeed cache all inbound write traffic to the flash storage tier and has the appropriate protection of that tier. Not all do.
A Strong Relationship between DBAs and Storage Administrators Leads to Better Database Performance
When database performance issues arise, the DBA and the storage administrator together often hold the key to resolving the issue. To foster a productive relationship between the DBAs and storage administrators, it’s crucial to have a common tool that provides analytics they can both understand. This tool needs to provide an historical perspective that shows a direct correlation between the database and the storage infrastructure and the ability to help pinpoint the true cause of the issue. Making sure the database is configured and tuned to its best potential will reduce the cost of making changes or upgrading the storage infrastructure. And, by working closely together using a shared view of real-time and historical information, more effective decisions can be made about storage infrastructure investments that will truly impact performance.
SolarWinds is a client of Storage Switzerland