High Performance MySQL (Book Review)

DeJia Wang
Nov 30, 2021
3 min read

Updated: Jan 27, 2022

Databasing is almost a hard precondition of almost all modern social activities, coporate accounting, and practically used in any type of high-stakes management system & application whereby organization use them to store very large numbers of records.

(Figure 1: What Poor Database Performance Leads To)

Now, when interacted in high volumes, this can slow the database down. when there are too many queries to process at once, the infrastructure will bottleneck, resulting in a slow database.

This is where Database Optimization comes in to the rescue.

Optimization

To most, database performance are dictated by 5 main factors:

1) Query tuning (i.e. ask the question better)

2) Indexing (i.e. ask if it's been asked before)

3) Server tuning (ask a better person)

4) Replication (ask several people),

5) Benchmarking (ask trick questions).

(Figure 2: The Book)

The Database Bible

However, in the book 'OREILLY's High Performance MySQL: Optimization, Backups, and Replication', the definition of performance is measured by the time required to complete a task. In other words, performance is response time. This is a very important principle.

"Measure performance by tasks and time, not by resources. A database server’s purpose is to execute SQL statements, so the tasks we care about are queries or statements — the bread-and-butter SELECT, UPDATE, INSERT, and so on. 

A database server’s performance is measured by query response time, and the unit of measurement is time per query. Now for another rhetorical question: what is optimization? Many people are very confused about this. 

If you think performance optimization requires you to reduce CPU utilization, for example, you’re thinking about reducing resource consumption. But this is a trap. Resources are there to be consumed. Sometimes making things faster requires that you increase resource consumption." 

- OREILLY's High Performance MySQL, Pg 100

Ask 10 people to define performance and you’ll probably get 10 different answers, filled with terms such as “queries per second,” “CPU utilization,” “scalability,” and so on. This is fine for most purposes, because people understand performance differently in different contexts, but we will use a formal definition.

(Figure 3: Over-Engineering)

Many of Database developers (me included), when trying to optimize something, spend the bulk of their time changing things and very little time measuring. In contrast, we aim to spend most of our time—perhaps upwards of 90%—measuring where the response time is spent.

If we don’t find the answer, we might not have measured correctly or completely. When we gather complete and properly scoped measurements about server activity, performance problems usually can’t hide, and the solution often becomes trivially obvious.

Measuring can be a challenge, however, and it can also be hard to know what to do with the results once we have them—measuring where the time is spent is not the same thing as understanding why the time is spent. We mentioned proper scoping, but what does that mean?

A properly scoped measurement is one that measures only the activity we want to optimize. There are two common ways that we can capture something irrelevant:

• We can begin and end our measurements at the wrong time.

• We can measure things in aggregate instead of specifically targeting the activity itself.

For example, a common mistake is to observe a slow query, and then look at the whole server’s behavior to try to find what’s wrong. If the query is slow, then it’s best to measure the query, not the whole server. And it’s best to measure from the beginning of the query to the end, not before or after.

The time required to execute a task is spent either executing, or waiting. The best way to reduce the time required to execute is to identify and measure the subtasks, and then do one or more of the following: eliminate subtasks completely, make them happen less often, or make them happen more efficiently. Reducing waiting is a more complex exercise, because waiting can be caused by “collateral damage” from other activities on the system, and thus there can be interaction between the task and other tasks that might be contending for access to resources such as the disk or CPU.

Yet we may still need to use different techniques or tools, depending on whether the time is spent executing or waiting to identify and optimize subtasks. But that’s an oversimplification. Infrequent or short subtasks might contribute so little to overall response time that it’s not worth our time to optimize them.

In conclusion, according to the authors of the book; CPU utilization is a symptom, not a goal, and it’s best to measure the goal, or we may get derailed.

Reference:

Schwartz, B., Zaitsev, P., & Tkachenko, V. (2012). High Performance MySQL: Optimization, Backups, and Replication (Third ed.). O’Reilly Media.

Tags: #SQL

High Performance MySQL (Book Review)

Databasing is almost a hard precondition of almost all modern social activities, coporate accounting, and practically used in any type of high-stakes management system & application whereby organization use them to store very large numbers of records.

Optimization

The Database Bible

Recent Posts

Comments