SAN vs Local-disk :: innodb_flush_method performance benchmarks

If you’ve been tuning your MySQL database and have wondered what effect the innodb_flush_method settings have on write performance, then this information might help. I’ve recently been doing a lot of baseline load tests to show performance differences between localdisk and the new SAN we’re deploying. Since we run InnoDB for everything in production, and writes are very heavy, I decided to run comparison tests between two identical servers to find the best setting for innodb_flush_method. We have the following specs for the hardware:

  • Dell R610
  • 24 core Intel Xeon X5670 @ 2.93ghz
  • 72GB ECC RAM
  • Brocade 825 HBA
  • Local disk: RAID-10 15K SAS Ext3 (ugh)
  • SAN: Oracle 7420 with four Intel Xeon X7550 @ 2.00GHz, 512GB RAM, 2TB read-cache(MLC-SSD), 36GB write cache (SLC-SSD), 3 disk shelves populated with 60x2TB 7200RM SATA drives setup in mirrored format with striped logs, dual 8Gb FC links to redundant fabric, connected to Brocade DCX 8510-4.
  • The my.cnf file being used for the tests: click-click

I’m using the following sysbench command to run the tests. On each server the same commands are used. I ran a 1B row prepare prior to the 1B row test.

sysbench –db-driver=mysql –num-threads=64 –max-requests=1000000000 –max-time=3600 –test=oltp –verbosity=3 –validate=off –oltp-test-mode=complex –oltp-read-only=off –oltp-table-name=sbtest –oltp-table-size=1000000000 –oltp-dist-type=special –mysql-host=localhost –mysql-port=3306  –mysql-table-engine=innodb run

On the server that is utilizing SAN paths there are two LUNS presented for MySQL use. /db/data01 for InnoDB data files, /db/logs01 for InnoDB logs. These filesystems are both formatted as XFS. The server running local-disk tests is running Ext3. I might run some more tests later with the local-disk setup as XFS if time allows.

Here are the results. Clearly a well designed SAN infrastructure is superior to even RAID-10 15K SAS drives. And of course you can see the different performance values from using O_DIRECT for the innodb_flush_method for the different data storage mediums.

1B Row Complex Transactional Test, 64 threads

  • SAN O_DIRECT: read/write requests: 31560140 (8766.61 per sec.)
  • SAN O_DSYNC: read/write requests: 5179457 (1438.52 per sec.)
  • SAN fdatasync: read/write requests: 9445774 (2623.66 per sec.)
  • Local-disk O_DIRECT: read/write requests: 3258595 (905.06 per sec.)
  • Local-disk O_DSYNC: read/write requests: 3494632 (970.65 per sec.)
  • Local-disk fdatasync: read/write requests: 4223757 (1173.04 per sec.)
Read More

Quick How-To for DRBD + MySQL + LVS

I wrote this up a while ago and decided that I didn’t want to lose it in a shuffle of documents during my transition to a new workstation. It’s the basics of setting up Heartbeat (LVS) + DRBD (block replication between active/passive master servers) + MySQL. This should give you the basics of a H/A system without the benefits of SAN but also without the associated cost. The validity of this setup for H/A purposes is highly dependent on your workload and environment. You should know the ins and outs of your H/A solution before deciding to blame the system for not performing as expected. As with all production systems you should test, test, test and test some more before going live.

When I get around to it later I’ll post my How-To for setting up RHCS + SAN + MySQL. You can download the DRBD document PDF here: DRBD_LVS_Install-Configure_HowTo

Read More

Quadrant Framework – rev7 update adds DyGraphs support

Quick update to the framework that was released yesterday; I’ve added automatic graph generation. I chose DyGraphs due to the quick ability to enable support – the HTML is very quick and simply loads the CSV data. It has the same zooming features of Highcharts without the JS overhead.

Now when you run a load test you will get (in the output directory) a mixture of files: the main cumulative CSV and HTML file for the hostname that was tested, and then one CSV and HTML per report variable that was tested. This means you don’t have to drag the main CSV file into an alternate program or spend time parsing out certain variables one at a time to generate specific graphs.  I’ve also added support for limiting output of SNMP variables (LOAD,CPU,MEM). Head over here and download the update: http://code.google.com/p/quadrant-framework/

To enable the graph generation, as it is not on by default, use this flag: –output-graphcode-enable

Read More

MySQL Community – what do you want in a load testing framework?

So I’ve been doing a fair number of automated load tests these past six months. Primarily with Sysbench, which is a fine, fine tool. First I started using some simple bash based loop controls to automate my overnight testing, but as usually happens with shell scripts they grew unwieldy and I rewrote them in python. Now I have some flexible and easily configurable code for sysbench based MySQL benchmarking to offer the community. I’ve always been a fan of giving back to such a helpful group of people – you’ll never hear me complain about “my time isn’t free”. So, let me know what you want in an ideal testing environment (from a load testing framework automation standpoint) and I’ll integrate it into my existing framework and then release it via the BSD license. The main goal here is to have a standardized modular framework, based on sysbench, that allows anyone to compare their server performance via repeatable tests. It’s fun to see other people’s benchmarks but it’s often difficult to repeat and compare since most tests aren’t fully documented in their blog posts – this could be a solution to that.

Currently I have the harness doing iterations based on:

  • incrementing (choose a global dynamic variable, ie: sync_binlog=0-1000) system values
  • storage engine vs storage engine for the same workload
  • thread quantity increments for read-only or read+write
  • N-nodes in a cluster workloads with WRR traffic distribution (need to code WLC and others)
  • QPS testing for connection pool vs open/close connection
  • multi-table vs single-table workloads

Outputs available: CSV, XML, JSON for easy integration into any number of the various graphing frameworks available. I’ll probably code up a light weight python http server preloaded with Highcharts and Sparklines so you can see your benchmarks easily without having to roll your own graphs.

Quick now, tell me what you’d like me to code for you!

Read More

MySQL and RAMdisk, or how to make tmpdir-usage queries faster

Have you ever looked at your processlist and seen queries in the state “copying to tmp table” and then run an explain on it and noticed that the tmp table is being created on disk? Happens a lot with some servers and some workloads. Of course disk is much slower than RAM so this becomes a slow process and makes queries execute slower than they could if they were allowed to use RAM. So, one way to get this process to speed up (aside from tuning your queries which should be done first) is to create a tmpfs or ram-disk and let MySQL use that for it’s temp-table-on-disk creations. MySQL on Linux defaults to /tmp for the tmpdir location so this will need to be changed.

Here is how you get MySQL to use a 1G size tmpfs. How you size your tmpfs depends how much ram your system has and how much tmpdir space mysql needs for your workload. If you need more tmpdir space than you can make in a tmpfs mount then mysql allows multiple tmpdir locations to be set (see the manual: http://dev.mysql.com/doc/refman/5.1/en/server-options.html#option_mysqld_tmpdir).

0. mkdir -p /db/tmpfs01 && chown mysql:mysql /db/tmpfs01
1. mount tmpfs /db/tmpfs01 -t tmpfs -o size=1G
3. edit my.cnf file and add: tmpdir = /db/tmpfs01
4. restart MySQL

To have persistent tmpfs with reboot put the following into /etc/fstab. On this system the mysql user is uid/gid of 27, yours might vary so adjust accordingly.
tmpfs /db/tmpfs01 tmpfs rw,uid=27,gid=27,size=1G,nr_inodes=10k,mode=0755 0 0

Read More

MySQL Analytics: updated query for table engine data statistics

This is a follow up to my previous post titled “MySQL analytics: information_schema polling for table engine percentages”. Here’s an updated query with more output and quicker execution time. What you get: innodb table space utilization percentage, data+index usage total and per innodb/myisam engine, innodb data/index/percentage, myisam data/index/percentages, and overall percentage values. Rather useful for profiling your table engine usage.

Sample output:
innodb_tablespace_utilization_perc: 100
total_size_gb: 26.275011910126
index_size_gb: 2.994891166687
data_size_gb: 23.280120743439
innodb_total_size_gb: 6.751220703125
innodb_data_size_gb: 5.2576751708984
innodb_index_size_gb: 1.4935455322266
myisam_total_size_gb: 19.523791207001
myisam_data_size_gb: 18.02244557254
myisam_index_size_gb: 1.5013456344604
perc_index: 11.3982
perc_data: 88.6018
innodb_perc_index: 22.1226
innodb_perc_data: 77.8774
myisam_perc_index: 7.6898
myisam_perc_data: 92.3102
innodb_perc_total_index: 49.8698
innodb_perc_total_data: 22.5844
myisam_perc_total_index: 50.1302
myisam_perc_total_data: 77.4156

select round(sum(innodb_data_size + innodb_index_size) / (innodb_data_free + sum(innodb_data_size + innodb_index_size))) * 100  as 'innodb_tablespace_utilization_perc'
, (data_size + index_size) / gb as total_size_gb
, index_size / gb as index_size_gb
, data_size / gb as data_size_gb
, sum(innodb_index_size + innodb_data_size) / pow(1024,3) as innodb_total_size_gb
, innodb_data_size / pow(1024,3) as innodb_data_size_gb
, innodb_index_size / pow(1024,3) as innodb_index_size_gb
, sum(myisam_index_size + myisam_data_size) / pow(1024,3) as myisam_total_size_gb
, myisam_data_size / pow(1024,3) as myisam_data_size_gb
, myisam_index_size / pow(1024,3) as myisam_index_size_gb
, index_size / (data_size + index_size) * 100 as perc_index
, data_size / (data_size + index_size) * 100 as perc_data
, innodb_index_size / (innodb_data_size + innodb_index_size) * 100 as innodb_perc_index
, innodb_data_size / (innodb_data_size + innodb_index_size) * 100 as innodb_perc_data
, myisam_index_size / (myisam_data_size + myisam_index_size) * 100 as myisam_perc_index
, myisam_data_size / (myisam_data_size + myisam_index_size) * 100 as myisam_perc_data
, innodb_index_size / index_size * 100 as innodb_perc_total_index
, innodb_data_size / data_size * 100 as innodb_perc_total_data
, myisam_index_size / index_size * 100 as myisam_perc_total_index
, myisam_data_size / data_size * 100 as myisam_perc_total_data
from ( select sum(data_length) data_size,
	sum(index_length) index_size,
	sum(if(engine = 'innodb', data_length, 0)) as innodb_data_size,
	sum(if(engine = 'innodb', index_length, 0)) as innodb_index_size,
	sum(if(engine = 'myisam', data_length, 0)) as myisam_data_size,
	sum(if(engine = 'myisam', index_length, 0)) as myisam_index_size,
	sum(if(engine = 'innodb', data_free, 0)) as innodb_data_free,
	pow(1024, 3) gb from information_schema.tables )
a\G
Read More

Win a free book at the February Python Book Contest

This month is a special month. It’s not because of Valentines day or even the exciting day where we see groundhogs. No, this month is special because I’m have a book contest where you, the reader, get to win something free for doing absolutely nothing more than posting a comment saying that you want one of the several books I have available in the contest.

So without getting into boring details I’ll keep this short. I’ve been reviewing a lot of books lately and I think it’s time to get some books into people’s hands to enjoy themselves. This month the giveaways are all Python oriented.

So, all you have to do is take a look at the following titles and post a comment here saying that you want one of them. At the end of the month two readers will be chosen via a random list sorting python script I’ve whipped up for just this purpose. You will then get an email from the publisher who will send a brand new e-copy of the book free of charge. I’ll also be reviewing these books at a later date for those that do not win the contest.



Python Text Processing with NLTK 2.0 Cookbook


Python 2.6 Text Processing: Beginners Guide


Python 2.6 Graphics Cookbook

Post a comment now and tell me which book you want!

Read More

Python for Automation: using pdsh for a menu-driven command execution environment

I’ve been playing around with some quick system automation scripts that are handy to use when you don’t want / need to setup a chef or puppet action. I like to keep all of my hostnames and login details in a MySQL database (a cmdb actually) but for this example we’ll just use a couple of nested lists. This script executes commands in parallel across the hosts you choose in the menu system via the “pdsh” command, so make sure you have that installed before running. Alternately you can change the command call to use ssh instead of pdsh for a serialized execution, but that’s not as fun or fast. With some customizations here and there you can expand this to operate parallelized jobs for simplifying daily work in database administration, usage reporting, log file parsing, or other system automation as you see fit. Here’s the code. Comments welcome as always!

#!/usr/bin/env python
## NAME: menu_parallel_execution.py
## DATE: 2011-02-04
## AUTHOR: Matt Reid
## WEBSITE: http://kontrollsoft.com
## EMAIL: mreid@kontrollsoft.com
## LICENSE: BSD http://www.opensource.org/licenses/bsd-license.php

import commands
import sys
import os
import operator

ver = sys.version.split(' ')[0].split(".")
major=ver[:1]
minor=ver[1:2]
version="%s.%s"%(major[0],minor[0])
if version in ('2.4','2.3','2.2','2.1','2.0'):
	pyver = "old"
else:
	from subprocess import Popen, PIPE, STDOUT, call
	pyver = "new"

def main():
        #This section of host definitions could be replaced by a database
        #call if you are storing your hosts in that manner, but we'll keep
        #this script simple for the time being.
	dc0 = [
		["host0-0","10.0.0.1","ssh-username"],
		["host0-1","10.0.0.2","ss-username"]]

	dc1 = [
		["host1-0","10.1.0.1","ssh-username"],
		["host1-1","10.1.0.2","ss-username"]]

	dc2 = [
		["host2-0","10.2.0.1","ssh-username"],
		["host2-1","10.2.0.2","ss-username"]]

	print '''[1] Datacenter-1
[2] Datacenter-2
[3] Datacenter-3
'''
	dc = int(raw_input("Datacenter ID: "))
	if dc == 1:
		hosts = dc0
	elif dc == 2:
		hosts = dc1
	elif dc == 3:
		hosts = dc3

	nodes = []
	stay = True
	while stay == True:
		i = 0
		nest = sorted(hosts, key=operator.itemgetter(0))
		for x in range(len(nest)):
			print "[%i] %s | %s"%(i,nest[i][1],nest[i][0])
			i+=1

		print "\nSelect node to add to execution list"
		ch = int(raw_input("ID: "))
		xx = [nest[ch][1],nest[ch][2]]
		nodes.append(xx)
		s = str(raw_input("\nAdd another node? [Y/n] "))
		if s == "n" or s == "N":
			stay = False

	if(pyver == "new"):
		addrs = ""
		for node in nodes:
			address = node[1]+"@"+node[0]+","
			address = address.rstrip("\n")
			addrs = addrs+address

		addrs = addrs.strip(",")
		cmd = str(raw_input("\nEnter the command to execute: "))				

		try:
			c = "pdsh -w %s %s"%(addrs,cmd)
			print "Executing: %s"%(c)
			call(c,shell=True)
		except:
			print "Failed to execute pdsh command: %s"%(c)
			sys.exit(1)

		sys.exit(0)

	if(pyver == "old"):
		print "Please upgrade to Python 2.6+"
		sys.exit(1)

## START
if __name__ == "__main__":
	try:
		retval = main()
	except (KeyboardInterrupt, SystemExit):
		sys.exit(1)
Read More

Review: MySQL for Python by Albert Lukaszewski

Packt Publishing recently sent me a copy of MySQL for Python to review and after reading through the book I must say that I’m rather impressed at the variety of topics that the book covers.

It starts off with the basics of setting up MySQL for your testing/development needs by going over several of the common installation and configuration methods. After that it’s a quick intro for connection methods and simple error reporting for connections. The author gives a quick intro to CRUD and how it relates to databases and python before heading into the common tasks of simple queries. I was surprised to see some database profiling discussion; which is rather handy for a new coder or a person new to MySQL. Once the basics of Inserts/Selects/Updates/Deletes are covered, which is a rather quick read, there is a welcome discussion of transactions and commit methods – if you do not read this section and are new to MySQL then believe me, you’re missing a very important topic. Most people will gloss over the basics and head right to the more advanced chapters that feature exception handling, the all too common “the mysql server has gone away” error, date&time functions, aggregate functions, and metadata queries. These chapters were the most interesting to me as they covered some great code for python that I have not yet played around with. Previously I’ve done a lot of work on those topics with perl and php so seeing how they were done in python was a great treat. The code is concise, easy to read, and well explained.

A number of topics cover the time saving solutions that no one should be without. Namely, bulk data inserting, data formatting, row iteration, and CSV parsing. Logging methods for access and changes to the database are also covered, and in the end will save your development cycle a lot of time when you are troubleshooting app-to-db interaction.

Two chapters will be of interest to DBAs in particular, and possibly not as interesting to pure developers, of which these are the Disaster Recovery and MySQL Administration topics. The author covers offline backups as well as online hot backups, two sections that no DBA should be without. The code for this type of work is covered in a decent amount of discussion but, along with the other chapters in the book, the theory and background of the topic is also discussed which gives the new reader an understanding of “why” and not just left with the “how”. The administration section of the book covers user creation and permissions management, along with a bit of background on security involved with that task, and also goes into quite a lot of coverage on web-based GUI administration and command line interaction for admin purposes.

Overall I enjoyed the contents of the book and would recommend taking a look if you are new to Python and MySQL or are even looking for a quick reference to the common tasks of database driven application development. This book does not cover the common ORM database interactions you’re likely to see in an app like Django or Pylons, but it will give you a solid foundation on how python and MySQL interact without an abstraction layer. If you are writing quick admin code or building your own database interaction layer, then this book would do well to be in your collection.

You can find the book at Amazon or directly from Packt.

Read More

A simple load test script in Python

Lately I’ve had to do some environment load testing so I wrote this quick script. It can be modified as needed but the basic idea is that it spawns $x threads (–threads) and then sends two connections (or however many you want with –per-connection=) per thread to the URL (–url=). You can have it wait a configurable time between connections as well (–wait=).

The url is appended with a 32 character randomized string so that any database/caching on the backend of the site isn’t serving data from a warm cache. You can hunt down the string length for 32 and change it to whatever you want. Feel free to change and use as needed, just keep my info at top.

#!/usr/bin/python
################################################################################
## DATE: 2010-10-26
## AUTHOR: Matt Reid
## MAIL: mreid@kontrollsoft.com
## SITE: http://kontrollsoft.com
## LICENSE: BSD http://www.opensource.org/licenses/bsd-license.php
################################################################################

from __future__ import division
import threading
import sys
import urllib2
import select
import random
import string
import getopt
import time

class threader(threading.Thread):
    def __init__(self):
        threading.Thread.__init__(self)
    def run(self):
        global url
        global per
        global u
        for i in range(per):
            if wait > 0:
                time.sleep(wait)
            str = randstr(32)
            # IMPORTANT: this is where we append the search string to the main URL
            # you might need to change this for your site.
            url = "%s/search/%s"%(u,str)
            print "polling url: %s"%(url)
            urllib2.urlopen(url)

def randstr(length):
    global url
    twoletters = [c+d for c in string.letters for d in string.letters]
    r = random.random
    n = len(twoletters)
    l2 = length//2
    lst = [None] * l2
    for i in xrange(l2):
        lst[i] = twoletters[int(r() * n)]
        if length & 1:
            lst.append(random.choice(string.letters))

    return "".join(lst)

def init_thread():
    backgrounds = []
    for thread in range(threads):
        print "Spawning thread: %s"%(thread)
        background = threader()
        background.start()
        backgrounds.append(background)
    for background in backgrounds:
        background.join()

def print_help():
    print '''loader.py - URL load test script
==================================================
Date: 2010-08-26
Website: http://themattreid.com
Author: Matt Reid
Email: themattreid@gmail.com
License: new BSD license
==================================================
Use the following flags to change default behavior

   Option                 Description
   --url=                 URL to test
   --per-connection=      Number of sequential reqests per connection (default 2)
   --threads=             Number of threads for url connections (default 50)
   --wait=                Time to wait in-between requests
   --help                 Print this message

   -u                     Same as --url
   -p                     Same as --per-connection
   -t                     Same as --threads
   -w                     Same as --wait
   -h                     Same as --help
   '''

def main():
    init_thread()
    sys.exit(0)

if __name__ == "__main__":
    global threads #num threads/connections to open
    global u #url to hit
    global per #per connection url hits
    try:
        options, remainder = getopt.getopt(
            sys.argv[1:], 'ptuw', ['per-connection=',
                                   'threads=',
                                   'url=',
                                   'wait=',
                                   'help'])
    except getopt.GetoptError, err:
        print str(err)
        sys.exit(2)

    for opt, arg in options:
        if opt in ('--per-connection'):
            per = int(arg)
        elif opt in ('--threads'):
            threads = int(arg)
        elif opt in ('--url'):
            u = arg
        elif opt in ('--wait'):
            wait = int(arg)
        elif opt in ('--help'):
            print_help()
            sys.exit(2)

    try:
        threads
    except NameError:
        print "No thread quantity specified."
        print_help()
        sys.exit(2)
    try:
        per
    except NameError:
        per = 2
    try:
        u
    except NameError:
        print "No URL Specified"
        print_help()
        sys.exit(2)
    try:
        wait
    except NameError:
        wait=0

    main()
Read More