SAN vs Local-disk :: innodb_flush_method performance benchmarks

If you’ve been tuning your MySQL database and have wondered what effect the innodb_flush_method settings have on write performance, then this information might help. I’ve recently been doing a lot of baseline load tests to show performance differences between localdisk and the new SAN we’re deploying. Since we run InnoDB for everything in production, and writes are very heavy, I decided to run comparison tests between two identical servers to find the best setting for innodb_flush_method. We have the following specs for the hardware:

  • Dell R610
  • 24 core Intel Xeon X5670 @ 2.93ghz
  • 72GB ECC RAM
  • Brocade 825 HBA
  • Local disk: RAID-10 15K SAS Ext3 (ugh)
  • SAN: Oracle 7420 with four Intel Xeon X7550 @ 2.00GHz, 512GB RAM, 2TB read-cache(MLC-SSD), 36GB write cache (SLC-SSD), 3 disk shelves populated with 60x2TB 7200RM SATA drives setup in mirrored format with striped logs, dual 8Gb FC links to redundant fabric, connected to Brocade DCX 8510-4.
  • The my.cnf file being used for the tests: click-click

I’m using the following sysbench command to run the tests. On each server the same commands are used. I ran a 1B row prepare prior to the 1B row test.

sysbench –db-driver=mysql –num-threads=64 –max-requests=1000000000 –max-time=3600 –test=oltp –verbosity=3 –validate=off –oltp-test-mode=complex –oltp-read-only=off –oltp-table-name=sbtest –oltp-table-size=1000000000 –oltp-dist-type=special –mysql-host=localhost –mysql-port=3306  –mysql-table-engine=innodb run

On the server that is utilizing SAN paths there are two LUNS presented for MySQL use. /db/data01 for InnoDB data files, /db/logs01 for InnoDB logs. These filesystems are both formatted as XFS. The server running local-disk tests is running Ext3. I might run some more tests later with the local-disk setup as XFS if time allows.

Here are the results. Clearly a well designed SAN infrastructure is superior to even RAID-10 15K SAS drives. And of course you can see the different performance values from using O_DIRECT for the innodb_flush_method for the different data storage mediums.

1B Row Complex Transactional Test, 64 threads

  • SAN O_DIRECT: read/write requests: 31560140 (8766.61 per sec.)
  • SAN O_DSYNC: read/write requests: 5179457 (1438.52 per sec.)
  • SAN fdatasync: read/write requests: 9445774 (2623.66 per sec.)
  • Local-disk O_DIRECT: read/write requests: 3258595 (905.06 per sec.)
  • Local-disk O_DSYNC: read/write requests: 3494632 (970.65 per sec.)
  • Local-disk fdatasync: read/write requests: 4223757 (1173.04 per sec.)
Tagged , , , ,

13 thoughts on “SAN vs Local-disk :: innodb_flush_method performance benchmarks

  1. Partha Dutta says:

    When you ran the sysbench test against the local drives, what i/o scheduler did you use? If it was cfq, then try using deadline or noop and re-run your tests. It’s not going to be order of magnitude difference, but shout be noticeable.

  2. admin says:

    I’m using CFQ. I’ll check out the other schedulers and report any interesting stats.

  3. Andy says:

    > Clearly a well designed SAN infrastructure is superior to even RAID-10 15K SAS drives.

    Your local disks are 15k SAS drives. Your SAN has 512GB of RAM plus over 2TB of SSD.

    What you really showed is that SSD is faster than 15k SAS. Not exactly a surprise.

  4. Jay Janssen says:

    It’s interesting to see O_DIRECT do so well on a SAN when the manual specifically states otherwise.

    How long was this test? innodb_io_capacity=200 and I’d expect dirty buffers to start slowing things down.

  5. admin says:

    The point was the show the difference in innodb_flush_method settings on both local disk and this particular SAN. But take what you will from the post, if you want to focus on SSD being faster than 15K SAS then that’s fine too: because it is, and it’s not a surprise; I agree.

  6. admin says:

    That surprised me as well, because on other SANs (EMC, 3par) I’ve generally seen better performance from O_DSYNC. The tests were each 3600 seconds long.

  7. James Day says:

    There’s been plenty of discussion of what we should put in the manual for this but the only true and accurate answer is to try each and see what works best for your particular situation.

    We routinely get feedback that one setting or another is best for a particular setup and saying that we should say people should use that one, the trouble is that those suggestions are for different settings. As you’ve seen, the optimal one differs based on the system and also the workload. We can say what we think might be best for many as a starting point and for those who won’t or can’t test but it’s just not as reliable as testing and finding out the real answer.

    O_DIRECT benefits will vary by filesystem (XFS likely to be good) and the workload. Since O_DIRECT can serialise writes it may not offer great benefits to a workload that’s using a single table in a single file when there’s lots of concurrency. Partitioning may help. Using innodb_file_per_table is likely to help with many production workloads that spread the work around many tables.

    If you have an insert-heavy workload and suitable partitioning method available you might try adding a few, low single digits, partitions. I’ve seen that produce a five times speed improvement for tables undergoing heavy insert workloads.

    Partha is right about CFQ, it’s likely to be a horribly bad choice in part because it’ll effectively serialise the work. Noop is likely to be best, possibly deadline for some workloads.

    For the local disks the stripe size and alignment can be very important. 16k stripes without perfect alignment with InnoDB pages will probably be terrible. 256k without perfect alignment is likely to be reasonable.

    You have a pretty large sort_buffer_size setting at 2M. If you’re not aware that this causes MySQL with InnoDB to initialise a large part of the buffer even when not required you might want to try a much smaller size like 256k. We’ve just fixed this in 5.6, it’s covered by .

    When it comes to looking at the caching on the SAN, you might consider changing the post title from “SAN vs Local-disk :: innodb_flush_method performance benchmarks” to something that doesn’t focus on people comparing SAN vs local disk, then noticing the high amount of caching on the SAN. Your comparison of flush methods is a good thing, just the title pointing people in a direction other than the one you want.

    James Day, MySQL Senior Principal Support Engineer, Oracle. Views are my own; if you want an official company position, contact a PR person.

  8. Doams says:

    well, O_DIRECT does not guarantee flushing, it guarantees just bypassing cache layer, are you sure your durability is same?

  9. Domas says:

    Ergh, my name up there is Domas :)

  10. Steve says:

    Although this 7420 had SSDs in it, the 7420 uses the SSDs as CACHE, not as disk space. So the SAN still had to learn what data to cache as it worked, and then kept that data in the L1 and L2 ARC cache areas. This test shows it did that over FC better than a local RAID 10 array that didn’t have to travel at all, and that’s very cool. Thanks, Matt.

  11. Svar says:

    Very suprised as well, in most of a real live scenario i’ve been facing OD_SYNC show superior result with SAN, did you get the best performance with 64 threads. In most case higher concurrency produce better results because queries spend most time waiting for the IOps on the FC , would you retry with innodb_thread_concurrency=0, innodb_write_io_threads=512 and 1024 threads ?

  12. Svar says:

    Are you sure you have BBU in the localdisk test.On RAID 10 4 SAS 15K disks full ACID and group commit i can report 15K write only transactions but on a smaller table. So from your post: the size of the table does not feet the local server memory and start to hit the disk in read. The SAN read all from SSD and that would explain the extra performance here. We can propably reformulate the title into how much cost 512G of RAM compare to the SAN :)

  13. Federated says:

    Having to deal with a SAN is a whole other beast than local while aiming for performance, I’d be interested to know how your going to allocate Lun’s , what’s the raid group and how your going to balance usage. In actual deployment was the cache as effective in testing?

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>