Archive for category Storage

Crystal Disk Mark and SQL Server recommended some benchmarking tools for SQL Server users via Glenn Berry’s SQL Server Performance Blog. Problem is, one of the tools is pretty badly flawed.

Berry recommended Crystal Disk Mark, which is a cute tool that ostensibly measures disk performance. Thing is, it does a pretty shoddy job. If you run the test, you’ll notice that the disk queue for the drive its testing is never deeper than one. That is, it doesn’t do multi-threaded or overlapped I/O requests. Instead, it issues a single blocking requests. It doesn’t take long to download the source code and confirm this fact.

For desktop machines, doing only one I/O at a time isn’t so bad because that matches the workload desktop machines typically show their drives during normal use.

It seems a bit irresponsible to recommend such a tool for servers, though–particularly machines which will be running SQL Server, where multiple outstanding I/O operations are key to achieving any sensible level of performance. I like to have at least a couple outstanding I/O operations per logical spindle on the servers I run. This lets the controller and the drive have a shot at elevator seeking across multiple requests, combining redundant requests in cache, and batching as best as possible.

Since none of Crystal Disk Mark’s test modes provide any I/O request depth, it won’t provide an accurate measure of performance for the disk arrays and storage subsystems that are typically used with SQL Server installations and it isn’t an appropriate tool for capacity planning or evaluating hardware for such installations.

Thinking about Large Data an Scale-Out

Adam Jacobs wrote an article, which I believe was first published in Queue, paper called The Pathologies of Big Data, which looks at — you know — big data.

The title begs the question of what “big data” really is, and the author doesn’t ignore that. Without saying it explicitly, he points out that the definition of “big data” is relative to time–based on what storage technologies are available at the time, what they cost, and how maintainable they are.

The author then makes an interesting posit: he builds some sample data, carefully packing the record so it stores as much information as possible. The list represents a set of people, who have ages. Since people don’t live past 127 years old very often, he uses only seven bits of a byte. Most database systems don’t allow this level of control over the storage type and packing, so it should be no surprise that the data he stores in his commercial database package takes a lot more space than his hand-coded representation.

What is suprising, however, is that he stores one billion rows of three bytes and the data ends up taking more than 40 gigs on disk. Ideally, the storage would take three gigs; 3 bytes times one billion. The author offers no explanation about why his database system causes more than 1300% overhead in the storage of the data, which seems rather negligent, particularly since he calls this “sort of inflation … typical of a traditional RDBMS”.

The author describes his eight-core Macintosh as “hefty hardware”, but ironically gives no description of the disk subsystem. He says that it has two terabytes of RAID-0 disk, but doesn’t mention if he’s using two commodity one-terabyte drives (which are slow — 12 or 15 milliseconds of latency), or fourteen enterprise class, 15 KRPM 147-gig SAS drives (which are about as fast as you can buy these days, if you’re shopping for mechanical storage).

In investigating the query performance, the author does examine some interesting facts, though he stops just short of directly indicting the PostgreSQL query optimizer or execution engine as the performance issue.

The experiment is a good idea, and I intend to perform a similar test on my own hardware when I have a few moments. What piques me about the article, though, is that the author claims that this behaviour isn’t pathological, and that it happens all the time. Indeed, it does: what’s happening is that the author is using the wrong tool for the wrong job. Scanning large tables and performing aggregation summaries overt the contained data is a pattern that database systems do exercise when supporting data warehousing or business decision systems. But it is, pretty plainly, a misapplication of such a system and isn’t suprising that it’s slow.

I think the author is further misguided in stating that businesses don’t produce such large amounts of data. They do: website logs, transactions, network monitoring, and so on–they’re all applications that aren’t uncommon in today’s businesses, and they do (or have the potential to) generate massive amounts of data. Oddly, the author cites several such applications–and the multipliers, like time that make them big–later in the article.

Anyway, the author does acknowledge that data warehousing is a solution to the problem he’s working, but does not entertain it. Indeed, “merely saying we will build a data warehouse” doesn’t get the job done. Just saying anything does not get the job done–someone has to do actual work.

The article makes one very interesting point, though defeats it itself. The author points out that randomly scanning memory is slightly slower (36.7 million/second) than sequentially reading from disk (53.2 million/second). Reading from disk is probably the slowest thing your computer can do, and we know that cache misses in memory reads are painful–but it is a bit surprising to realize that cache misses total something far slower than a sequential, physical disk read.

The author makes the point that denormalizing tables can help. This is certainly true, particularly when they avoid the sort of joins that the author is describing. However, I think it’s debatable how often such scans are interesting. The suggested pattern, joining transaction data back to the user table to show who performed these transactions isn’t typical because the aggregation isn’t common. When it is, it’s easy enough to cache or hash the lookups so they’re far more efficient than randomly probing the data. It’s also possible to realize this access pattern in the query optimizer and fully scan both tables rather than randomly probing. This access pattern is a bit counter-intutivie, but it turns out the sequential read is so much faster that its benefit easily overwhelms the act of reading data that isn’t actually required to drive the query.

I like the author’s treatment of scale-out solutions for handling larger data sets. Scaling to multiple computers can be an inexpensive way to share the load and requires less specialty hardware. The problem is that it requires lots of special engineering. These days, engineering is lots more expensive than the hardware; it’s easier to spend the money for an exorbitant server just once than it is to spend the money for a team of engineers to make cheaper, lesser hardware do the same job.

The usual solution of commoditization applies here, I think. That is, we have to rely on vendors to absorb the difficult engineering problems and make products which we can purchase at a fraction of the original engineering cost that address the problems we have with large data. No sane organization would ever build their own version of a product as complicated as SQL Server, for example; we shouldn’t expect those same organizations to build even more complicated software.

While I think some parts of this article are poorly written or poorly supported, I like the author’s definition of “big data”, which involves stepping past the “tried-and-true techniques” that we’re used to. I don’t think large data will be successfully utilized as an asset in most organizations until the tools are really there.

Why do Drives Fail?

Disk drive failures are terrifying. Drives that haven’t failed yet, arguably, are just as terrifying as failed drives—they’ll fail eventually!

Some companies use lots of disks; Google is one of them. They surveyed their drive population over time, analyzing failures and applications. The observations are recorded in a paper called Failure Trends in a Large Disk Drive Population. The study investigates almost 3500 drives that failed while being used at Google’s data centers.

There are several conclusions based on the observations and data that seem to contradict commonly held beliefs about disk dives. One is that there’s no link between drive activity or the heat in its environment and the failure rate of the drive. A cooler drive doesn’t necessarily enjoy more longevity, and a drive that’s used heavily doesn’t always last longer.

Surprisingly, the paper also fails to conclude that the age of a drive determines its failure rates. This is partially because the drive technology changes so quickly, and drives which are older are mostly drives that are of a different design or model. You’d expect either a linear curve between age and failure rate, or an upswing as drives age and then fail more. There’s an upswing in drive failure rates as drives age, but there’s also a hump and a decline in mid-life drives. I don’t think that the paper uses enough data to reach a hard conclusion, but I think it’s still notable that the conclusion can’t be reached even though it might have seemed quite apparent.

Further, the paper explains that SMART doesn’t help much in predicting drive failure. This is rather shocking, considering what the drive industry has done to invest in SMART and the support around it. The drives in Google’s servers are monitored frequently, and the SMART data is recorded. SMART gave no clear indication of impending failure.

Internet forums are full of people who proclaim that they’ve had a lot of a certain brand of drive fail, and this is very dubious logic. Almost all of these authors have very small samples contributing to their experience, and the anecdotal result is nothing to be relied upon. The fact remains that drives fail, and I don’t believe any particular model or line to be more or less reliable than another. I was prepared for this, but I wasn’t really expecting that SMART wouldn’t be useful for monitoring drive health.