Analyzing Swapping and CPU utilization in MySQL and Linux

I was surprised last week by a Reddit thread about MySQL performance. My previous post explained one of the problems in the thread: too little context to supply enough information to give a solid answer. A few people tried to answer, but only one of them identified the question as un-answerable. The other problem was a bit more nefarious.

Those people who did offer a guess centered around memory – that the additional memory on the upgraded machine was what caused the loaded server to drop from 100% CPU usage to 5% CPU usage. While we can’t tell what exactly caused the drop in CPU usage, we can be pretty certain that the drop was not because of more memory.

The guesses in the thread centered around paging: that with more memory available, the machine wouldn’t need to page to disk as much and that would reduce CPU load. Thing is, it simply won’t.

Certainly, with more memory (and a solid configuration) we’d expect MySQL to keep more pages hot in memory. If it can, performance will be better because getting data from memory is a couple of orders of magnitudes faster than getting it from disk. Spinning storage was several orders of magnitude slower than memory, but even SSDs are a lot slower than just having the data already available in memory.

But the symptom reported doesn’t match that diagnosis. It wasn’t clear if the contributors though the system was paging inside of MySQL – that is, MySQL itself was evicting pages from its own cache and fetching them from the database files; or if it was allocating too much physical memory for itself and causing the operating system to page fault and swap memory around.

Let’s think of the case where MySQL is moving pages around directly. If it needs to load a page, it will ask the OS to read it. The OS will suspend the thread that made the read request and go process the read. When the read is satisfied, the thread can again run. During the time the thread is blocked on the I/O request, it’s not busy. That is, it isn’t contributing to the percentage of busy CPU reported by any of the management tools you’d use to find such a number.

It’s possible that the OS has more work to do, and would immediately schedule the blocked thread to do that work. But we know that’s not the case because the CPU usage went substantially lower on the faster machine. And in that case, the speedup is because more (and presumably faster) processors are available after the upgrade and not because more memory is available.

What if the OS was paging itself due to high memory load? The same reasoning applies: if a thread causes a page fault, then it is suspended until the OS satisfies the request. And the same caveat about remaining work applies, too. The thread might be scheduled if more CPU is available, but that’s a CPU-bound situation and not a memory-bound scenario.

It’s not too hard to write a program that demonstrates the effect of swapping. I used my Raspberry Pi to do some tests. I don’t have many slow spinning drives available, and it seems like none are attached to a Linux machine. The Raspberry Pi uses a pretty slow USB-based I/O channel to communicate with its flash card, so when it swaps it is pretty slow to satisfy the paging request … though not as slow as a spinning drive. Just the same, the effect will be more noticeable.

swapper.cpp is a program in C++ that takes a single command line argument – a number of pages to allocate. A page of memory is 4096 bytes, and the program allocates those pages using a regular calloc() call. calloc() has a bit of overhead – it’s minimal, maybe 8 or 16 bytes – but it does mean that we’re allocating more than 4K for each page. That doesn’t really matter for this test, but it’s something to consider.

Once all the pages are allocated, the program randomly starts touching them. It is, after all, the act of accessing the allocated memory that causes paging to happen. The program reads from and writes to the pages so that they’re dirty and can’t simply be replaced by zero-backed storage. The loop that randomly touches pages visits each page in random order until every page has been read and written; then it starts again. The order is the same each iteration.

If we run swapper with a small number of pages, say 1000 for 4 megabytes of memory – it will run very quickly and be CPU bound. The OS can easily find 1000 pages and keep them in memory, no paging happens, and the loop it runs executes very quickly. Hundreds of thousands of page visits per second are possible because everything is in memory and no I/O is involved. In this case, swapper uses 100% of one CPU core.

My Raspberry Pi has about a gigabyte of memory, so I can run swapper and ask for about 220000 pages before the OS would shut it down for out-of-meomry. In this case, when running, it’s causing hundreds of page faults per second and isn’t able to make much process. Instead of touching hundreds of thousands of pages per second, the program is lucky to touch a couple thousand per second because each touch is at least one I/O operation on that slow USB-to-SD bus.

We can monitor swapper’s effect on the system by using vmstat. I used vmstat 1, which dumps stats about memory and paging and CPU usage on a single line, once per second, until I stop it. When my Pi is idle, here’s what the vmstat 1 output looks like:

 procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
  r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
  0  0     23    827      2     38    0    0     0     0 13011 17579 22 14 64  0  0
  0  0     23    827      2     38    0    0     0     0  338   75  0  0 100  0  0
  0  0     23    827      2     38    0    0     0     0  441  408  1  1 99  0  0
  0  0     23    827      2     38    0    0     8     0  425  242  0  0 99  0  0
  0  0     23    827      2     38    0    0     0     0  384  148  0  0 100  0  0
  0  0     23    827      2     38    0    0     0     0  461  405  0  1 99  0  0
  0  0     23    827      2     38    0    0   128     0  379  154  0  0 100  0  0
  0  0     23    827      2     38    0    0     0     0  353   92  0  0 100  0  0
  0  0     23    827      2     38    0    0     0     0  413  295  0  1 99  0  0

The “swap” columns show that 0 bytes are coming in and 0 bytes are swapping out. The “cpu” columns tell us that there’s usually 0 user time, just a percent or two of system time, and 99 or 100% idle time on this system.

If we start swapper and look again, we’ll see a very different story:

procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 0  6 102392  24152    116  26680  164  168 23248   248 2412 4824  0  5 20 75  0
 3  6 102388  25440    116  26164  172  172 23760   192 2552 5006  0  5 26 70  0
 0  7 102396  25444    116  25344  144  148 20384   164 2029 3982  0  4 26 70  0
 0  6 102392  25696    124  25236  244  248 22616   356 2518 5254  0  4 26 70  0
 1  7 102396  25444    124  25644  144  144 19484   144 1994 4122  0  6  1 94  0
 0  5 102396  25192    128  25776  208  200 23908   352 2393 4942  0  4  8 88  0
 1  5 102384  24716    128  26200  292  260 18808   260 2283 4739  0  9 24 67  0

The “swap” and “io” columns both show block counts going in and out. Blocks are one kilobyte, so we’re paging between 150 and 250 kilobytes per second for swap. But we can still see the CPU columns, which are about 75% waiting time, 20% idle time, and 5% system time. That is, we’re only 5% busy on the CPU.

swapper doesn’t do much when it gets a page. It reads to it (causing the paging operation) and then writes to it; that’s all. In a database application, the access will be pretty quick, too. At most, a scan of the page to compute an aggregate or find a value. But it still won’t be much CPU action before the next page fault comes along and stops the CPU again.

We can also use top to get some idea of what’s going on. With spinner busy pounding away, here’s the output of top, sorted by CPU usage and showing just the first few entries:

    top - 08:58:23 up 164 days, 16:46,  6 users,  load average: 5.56, 4.57, 3.39
    Tasks: 136 total,   1 running, 135 sleeping,   0 stopped,   0 zombie
    %Cpu(s):  0.1 us,  6.5 sy,  0.0 ni, 69.6 id, 23.4 wa,  0.0 hi,  0.4 si,  0.0 st
    MiB Mem :    923.5 total,     22.0 free,    869.1 used,     32.5 buff/cache
    MiB Swap:    100.0 total,     23.6 free,     76.3 used.     14.2 avail Mem

      PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
    24747 pi        20   0  887440 826788     92 D   8.6  87.4   0:27.37 swapper
    24750 root       0 -20       0      0      0 I   8.3   0.0   0:15.38 kworker/0:1H-mmc_complete
       68 root      20   0       0      0      0 S   2.6   0.0   1:04.33 kswapd0
    24839 root       0 -20       0      0      0 I   2.6   0.0   0:00.41 kworker/3:0H-kblockd
    24740 root       0 -20       0      0      0 I   2.3   0.0   0:08.88 kworker/1:1H-kblockd

Sure enough, swapper is credited with using more than 87% and with using less than 9% CPU. This directly impugns the idea that having too little memory causes high CPU usage.

We can note that the “load average” is pretty high: 5.56. That’s because load average does include the waiting time of processes. The poster on Reddit reported high CPU usage percentage, though – and they did say that load average was high, too. Here, we’ve shown swapping causes high load numbers but not high CPU utilization percentages.

What if there was some other work to do? Well, I wrote another program that just sits in a tight loop and completely consumes a single processor. Spinner.c is short enough to show here directly:

int main()
{
	while (1)
	{
	}
	return 0;
}

Of course, this program never exits–it just spins in its loop. Running a couple instances of spinner and one instance of swapper at the same time will make the vmstat output look like this:

 procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
  r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
  6  2 102320  42276    496  37884  264  200 17848   268 1927 4300 78  7  0 15  0
  5  2  97020  34996    496  38404 6896 1520  7440  1524 4414 13029 72  9  0 19  0
  3  5 101416  39828    500  38704   44 4540  1064  4564 2767 6790 75  3  0 22  0
  7  0 100060  34060    508  43364 2348  996  8508  1100 3341 8858 93  4  0  3  0
  5  0  98656  42400    352  33472 3856 2440  4272  2488 4516 12470 92  7  0  1  0
  6  0  91456  35344    352  33340 7188    0  7188     0 4311 12673 92  7  0  1  0

We’re seeing now that we really are more than 90% busy for CPU because spinner is using it all. And that we’re swapping a ton of memory around because swapper is poking at it.

You should replicate this experiment on your own hardware and see what you notice. It’ll take some time to find a good number for swapper’s page count argument. Too much and the program will be shut down by the OOM monitor, too little and it won’t cause swapping. One way to get the right number is to ask the OS how much physical memory is available:

$ grep "MemTotal" /proc/meminfo
MemTotal:         945676 kB

My Pi says it’s got 945676 kilobytes, so if I divide that by 4K pages I see 236419. I ran these experiments giving swapper a parameter of 230000 pages.

And to drive home the punchline: if it was memory swapping that caused high CPU usage, swapper alone would show high CPU usage all by itself. It doesn’t, and is causing plenty of paging activity. In fact, the system is hard to use in another session while swapper is banging away. But there’s plenty of CPU free–it’s just that the system is IO-bound while working on servicing the swaps.

When answering questions online, it’s really important to consider all the available facts. Someone who knows how the OS scheduler behaves under demand and what the CPU usage statistics mean is far more likely to give valuable insight into a problem than someone who has a misunderstanding about the system.

More importantly, we can see that diagnosing MySQL (or any app’s performance, really) should start with a coarse diagnosis of what is happening. If we know we’re using a lot of CPU, it means we’re doing a lot of computation and not that we’re doing a lot of I/O or paging. It’s possible that both conditions are true, and they might have related or independent causes. But already we’re in the territory of wishing we knew more about the configurations the original poster had used when they noticed the problem they encountered.

Analyzing Swapping and CPU utilization in MySQL and Linux

Comments

Leave a Reply Cancel reply