I was surprised last week by a Reddit thread about MySQL performance. My previous post explained one of the problems in the thread: too little context to supply enough information to give a solid answer. A few people tried to answer, but only one of them identified the question as un-answerable. The other problem was a bit more nefarious.
Those people who did offer a guess centered around memory – that the additional memory on the upgraded machine was what caused the loaded server to drop from 100% CPU usage to 5% CPU usage. While we can’t tell what exactly caused the drop in CPU usage, we can be pretty certain that the drop was not because of more memory.
The guesses in the thread centered around paging: that with more memory available, the machine wouldn’t need to page to disk as much and that would reduce CPU load. Thing is, it simply won’t.
Certainly, with more memory (and a solid configuration) we’d expect MySQL to keep more pages hot in memory. If it can, performance will be better because getting data from memory is a couple of orders of magnitudes faster than getting it from disk. Spinning storage was several orders of magnitude slower than memory, but even SSDs are a lot slower than just having the data already available in memory.
But the symptom reported doesn’t match that diagnosis. It wasn’t clear if the contributors though the system was paging inside of MySQL – that is, MySQL itself was evicting pages from its own cache and fetching them from the database files; or if it was allocating too much physical memory for itself and causing the operating system to page fault and swap memory around.
Let’s think of the case where MySQL is moving pages around directly. If it needs to load a page, it will ask the OS to read it. The OS will suspend the thread that made the read request and go process the read. When the read is satisfied, the thread can again run. During the time the thread is blocked on the I/O request, it’s not busy. That is, it isn’t contributing to the percentage of busy CPU reported by any of the management tools you’d use to find such a number.
It’s possible that the OS has more work to do, and would immediately schedule the blocked thread to do that work. But we know that’s not the case because the CPU usage went substantially lower on the faster machine. And in that case, the speedup is because more (and presumably faster) processors are available after the upgrade and not because more memory is available.
What if the OS was paging itself due to high memory load? The same reasoning applies: if a thread causes a page fault, then it is suspended until the OS satisfies the request. And the same caveat about remaining work applies, too. The thread might be scheduled if more CPU is available, but that’s a CPU-bound situation and not a memory-bound scenario.
It’s not too hard to write a program that demonstrates the effect of swapping. I used my Raspberry Pi to do some tests. I don’t have many slow spinning drives available, and it seems like none are attached to a Linux machine. The Raspberry Pi uses a pretty slow USB-based I/O channel to communicate with its flash card, so when it swaps it is pretty slow to satisfy the paging request … though not as slow as a spinning drive. Just the same, the effect will be more noticeable.
swapper.cpp
is a program in C++ that takes a single command line argument – a number of pages to allocate. A page of memory is 4096 bytes, and the program allocates those pages using a regular calloc()
call. calloc()
has a bit of overhead – it’s minimal, maybe 8 or 16 bytes – but it does mean that we’re allocating more than 4K for each page. That doesn’t really matter for this test, but it’s something to consider.
Once all the pages are allocated, the program randomly starts touching them. It is, after all, the act of accessing the allocated memory that causes paging to happen. The program reads from and writes to the pages so that they’re dirty and can’t simply be replaced by zero-backed storage. The loop that randomly touches pages visits each page in random order until every page has been read and written; then it starts again. The order is the same each iteration.
If we run swapper
with a small number of pages, say 1000 for 4 megabytes of memory – it will run very quickly and be CPU bound. The OS can easily find 1000 pages and keep them in memory, no paging happens, and the loop it runs executes very quickly. Hundreds of thousands of page visits per second are possible because everything is in memory and no I/O is involved. In this case, swapper
uses 100% of one CPU core.
My Raspberry Pi has about a gigabyte of memory, so I can run swapper
and ask for about 220000 pages before the OS would shut it down for out-of-meomry. In this case, when running, it’s causing hundreds of page faults per second and isn’t able to make much process. Instead of touching hundreds of thousands of pages per second, the program is lucky to touch a couple thousand per second because each touch is at least one I/O operation on that slow USB-to-SD bus.
We can monitor swapper
’s effect on the system by using vmstat
. I used vmstat 1
, which dumps stats about memory and paging and CPU usage on a single line, once per second, until I stop it. When my Pi is idle, here’s what the vmstat 1
output looks like:
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
r b swpd free buff cache si so bi bo in cs us sy id wa st
0 0 23 827 2 38 0 0 0 0 13011 17579 22 14 64 0 0
0 0 23 827 2 38 0 0 0 0 338 75 0 0 100 0 0
0 0 23 827 2 38 0 0 0 0 441 408 1 1 99 0 0
0 0 23 827 2 38 0 0 8 0 425 242 0 0 99 0 0
0 0 23 827 2 38 0 0 0 0 384 148 0 0 100 0 0
0 0 23 827 2 38 0 0 0 0 461 405 0 1 99 0 0
0 0 23 827 2 38 0 0 128 0 379 154 0 0 100 0 0
0 0 23 827 2 38 0 0 0 0 353 92 0 0 100 0 0
0 0 23 827 2 38 0 0 0 0 413 295 0 1 99 0 0
The “swap” columns show that 0 bytes are coming in and 0 bytes are swapping out. The “cpu” columns tell us that there’s usually 0 user time, just a percent or two of system time, and 99 or 100% idle time on this system.
If we start swapper
and look again, we’ll see a very different story:
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
r b swpd free buff cache si so bi bo in cs us sy id wa st
0 6 102392 24152 116 26680 164 168 23248 248 2412 4824 0 5 20 75 0
3 6 102388 25440 116 26164 172 172 23760 192 2552 5006 0 5 26 70 0
0 7 102396 25444 116 25344 144 148 20384 164 2029 3982 0 4 26 70 0
0 6 102392 25696 124 25236 244 248 22616 356 2518 5254 0 4 26 70 0
1 7 102396 25444 124 25644 144 144 19484 144 1994 4122 0 6 1 94 0
0 5 102396 25192 128 25776 208 200 23908 352 2393 4942 0 4 8 88 0
1 5 102384 24716 128 26200 292 260 18808 260 2283 4739 0 9 24 67 0
The “swap” and “io” columns both show block counts going in and out. Blocks are one kilobyte, so we’re paging between 150 and 250 kilobytes per second for swap. But we can still see the CPU columns, which are about 75% waiting time, 20% idle time, and 5% system time. That is, we’re only 5% busy on the CPU.
swapper
doesn’t do much when it gets a page. It reads to it (causing the paging operation) and then writes to it; that’s all. In a database application, the access will be pretty quick, too. At most, a scan of the page to compute an aggregate or find a value. But it still won’t be much CPU action before the next page fault comes along and stops the CPU again.
We can also use top
to get some idea of what’s going on. With spinner busy pounding away, here’s the output of top
, sorted by CPU usage and showing just the first few entries:
top - 08:58:23 up 164 days, 16:46, 6 users, load average: 5.56, 4.57, 3.39
Tasks: 136 total, 1 running, 135 sleeping, 0 stopped, 0 zombie
%Cpu(s): 0.1 us, 6.5 sy, 0.0 ni, 69.6 id, 23.4 wa, 0.0 hi, 0.4 si, 0.0 st
MiB Mem : 923.5 total, 22.0 free, 869.1 used, 32.5 buff/cache
MiB Swap: 100.0 total, 23.6 free, 76.3 used. 14.2 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
24747 pi 20 0 887440 826788 92 D 8.6 87.4 0:27.37 swapper
24750 root 0 -20 0 0 0 I 8.3 0.0 0:15.38 kworker/0:1H-mmc_complete
68 root 20 0 0 0 0 S 2.6 0.0 1:04.33 kswapd0
24839 root 0 -20 0 0 0 I 2.6 0.0 0:00.41 kworker/3:0H-kblockd
24740 root 0 -20 0 0 0 I 2.3 0.0 0:08.88 kworker/1:1H-kblockd
Sure enough, swapper
is credited with using more than 87% and with using less than 9% CPU. This directly impugns the idea that having too little memory causes high CPU usage.
We can note that the “load average” is pretty high: 5.56. That’s because load average does include the waiting time of processes. The poster on Reddit reported high CPU usage percentage, though – and they did say that load average was high, too. Here, we’ve shown swapping causes high load numbers but not high CPU utilization percentages.
What if there was some other work to do? Well, I wrote another program that just sits in a tight loop and completely consumes a single processor. Spinner.c
is short enough to show here directly:
int main()
{
while (1)
{
}
return 0;
}
Of course, this program never exits–it just spins in its loop. Running a couple instances of spinner and one instance of swapper
at the same time will make the vmstat
output look like this:
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
r b swpd free buff cache si so bi bo in cs us sy id wa st
6 2 102320 42276 496 37884 264 200 17848 268 1927 4300 78 7 0 15 0
5 2 97020 34996 496 38404 6896 1520 7440 1524 4414 13029 72 9 0 19 0
3 5 101416 39828 500 38704 44 4540 1064 4564 2767 6790 75 3 0 22 0
7 0 100060 34060 508 43364 2348 996 8508 1100 3341 8858 93 4 0 3 0
5 0 98656 42400 352 33472 3856 2440 4272 2488 4516 12470 92 7 0 1 0
6 0 91456 35344 352 33340 7188 0 7188 0 4311 12673 92 7 0 1 0
We’re seeing now that we really are more than 90% busy for CPU because spinner is using it all. And that we’re swapping a ton of memory around because swapper
is poking at it.
You should replicate this experiment on your own hardware and see what you notice. It’ll take some time to find a good number for swapper
’s page count argument. Too much and the program will be shut down by the OOM monitor, too little and it won’t cause swapping. One way to get the right number is to ask the OS how much physical memory is available:
$ grep "MemTotal" /proc/meminfo
MemTotal: 945676 kB
My Pi says it’s got 945676 kilobytes, so if I divide that by 4K pages I see 236419. I ran these experiments giving swapper
a parameter of 230000 pages.
And to drive home the punchline: if it was memory swapping that caused high CPU usage, swapper
alone would show high CPU usage all by itself. It doesn’t, and is causing plenty of paging activity. In fact, the system is hard to use in another session while swapper
is banging away. But there’s plenty of CPU free–it’s just that the system is IO-bound while working on servicing the swaps.
When answering questions online, it’s really important to consider all the available facts. Someone who knows how the OS scheduler behaves under demand and what the CPU usage statistics mean is far more likely to give valuable insight into a problem than someone who has a misunderstanding about the system.
More importantly, we can see that diagnosing MySQL (or any app’s performance, really) should start with a coarse diagnosis of what is happening. If we know we’re using a lot of CPU, it means we’re doing a lot of computation and not that we’re doing a lot of I/O or paging. It’s possible that both conditions are true, and they might have related or independent causes. But already we’re in the territory of wishing we knew more about the configurations the original poster had used when they noticed the problem they encountered.
Leave a Reply