Filed under Programming

MySQL Load Testing Framework – initial release

It seems that everyone loves load testing these days. Problem is that everyone is using their own quick scripts, simple or complex, to drive their tests without the ability for other DBAs to duplicate those tests. Let’s say I write a great test and share my results and graphs on the blog – you want to run the same tests to see how your new DB servers compare in performance: this framework allows you to do that without duplicating any work or writing code. This is a basic release that will get the ball rolling. I’ve included some sample tests in the README file, so give them a try.

This codebase offers a user friendly framework for creating and visualizing MySQL database load test jobs. It is based around Sysbench, which is generally considered the industry standard load test application. The framework allows you to do the following:

    standardize your tests without requiring you to write one-off bash scripts to handle looping and iteration commands.
    graphing output via Highcharts and Sparklines (coming in the next release in a week)
    CSV test result output for easy import to other data visualization systems or import into a SQL database
    customize your iterations via ability to watch an unlimited number of MySQL global status variables
    customize your iterations via ability to alter MySQL global variables between iterations
    run arbitrary SQL commands between each iteration
    allows you to control the caches between iterations
    set custom default values and create configuration templates for easy repeatability and portability of your tests
    extensible code base: written purely in Python 2.6

Download it now:

MySQL Community – what do you want in a load testing framework?

So I’ve been doing a fair number of automated load tests these past six months. Primarily with Sysbench, which is a fine, fine tool. First I started using some simple bash based loop controls to automate my overnight testing, but as usually happens with shell scripts they grew unwieldy and I rewrote them in python. Now I have some flexible and easily configurable code for sysbench based MySQL benchmarking to offer the community. I’ve always been a fan of giving back to such a helpful group of people – you’ll never hear me complain about “my time isn’t free”. So, let me know what you want in an ideal testing environment (from a load testing framework automation standpoint) and I’ll integrate it into my existing framework and then release it via the BSD license. The main goal here is to have a standardized modular framework, based on sysbench, that allows anyone to compare their server performance via repeatable tests. It’s fun to see other people’s benchmarks but it’s often difficult to repeat and compare since most tests aren’t fully documented in their blog posts – this could be a solution to that.

Currently I have the harness doing iterations based on:

  • incrementing (choose a global dynamic variable, ie: sync_binlog=0-1000) system values
  • storage engine vs storage engine for the same workload
  • thread quantity increments for read-only or read+write
  • N-nodes in a cluster workloads with WRR traffic distribution (need to code WLC and others)
  • QPS testing for connection pool vs open/close connection
  • multi-table vs single-table workloads

Outputs available: CSV, XML, JSON for easy integration into any number of the various graphing frameworks available. I’ll probably code up a light weight python http server preloaded with Highcharts and Sparklines so you can see your benchmarks easily without having to roll your own graphs.

Quick now, tell me what you’d like me to code for you!

Fun with Bash: aliases make your live easier… share your favorites

I’ve always been a big fan of having a customized .bashrc file. The one I distribute to all of my servers has aliases for quick commands to save me time on the command line, functions that get work done when aliases are too simplistic, reporting for the server for each cli login, and of course a formatted and colored prompt (for terms that support colors). I also change certain aspects and commands based on the operating system since I’m not always on a redhat box or linux at all. Here’s my bashrc file – maybe you have some fun additions that you’d like to share. What saves you time on the command line?

Tagged ,

Win a free book at the February Python Book Contest

This month is a special month. It’s not because of Valentines day or even the exciting day where we see groundhogs. No, this month is special because I’m have a book contest where you, the reader, get to win something free for doing absolutely nothing more than posting a comment saying that you want one of the several books I have available in the contest.

So without getting into boring details I’ll keep this short. I’ve been reviewing a lot of books lately and I think it’s time to get some books into people’s hands to enjoy themselves. This month the giveaways are all Python oriented.

So, all you have to do is take a look at the following titles and post a comment here saying that you want one of them. At the end of the month two readers will be chosen via a random list sorting python script I’ve whipped up for just this purpose. You will then get an email from the publisher who will send a brand new e-copy of the book free of charge. I’ll also be reviewing these books at a later date for those that do not win the contest.

Python Text Processing with NLTK 2.0 Cookbook

Python 2.6 Text Processing: Beginners Guide

Python 2.6 Graphics Cookbook

Post a comment now and tell me which book you want!

Python for Automation: using pdsh for a menu-driven command execution environment

I’ve been playing around with some quick system automation scripts that are handy to use when you don’t want / need to setup a chef or puppet action. I like to keep all of my hostnames and login details in a MySQL database (a cmdb actually) but for this example we’ll just use a couple of nested lists. This script executes commands in parallel across the hosts you choose in the menu system via the “pdsh” command, so make sure you have that installed before running. Alternately you can change the command call to use ssh instead of pdsh for a serialized execution, but that’s not as fun or fast. With some customizations here and there you can expand this to operate parallelized jobs for simplifying daily work in database administration, usage reporting, log file parsing, or other system automation as you see fit. Here’s the code. Comments welcome as always!

#!/usr/bin/env python
## NAME:
## DATE: 2011-02-04
## AUTHOR: Matt Reid

import commands
import sys
import os
import operator

ver = sys.version.split(' ')[0].split(".")
if version in ('2.4','2.3','2.2','2.1','2.0'):
	pyver = "old"
	from subprocess import Popen, PIPE, STDOUT, call
	pyver = "new"

def main():
        #This section of host definitions could be replaced by a database 
        #call if you are storing your hosts in that manner, but we'll keep 
        #this script simple for the time being.
	dc0 = [

	dc1 = [

	dc2 = [

	print '''[1] Datacenter-1
[2] Datacenter-2
[3] Datacenter-3
	dc = int(raw_input("Datacenter ID: "))
	if dc == 1:
		hosts = dc0
	elif dc == 2:
		hosts = dc1
	elif dc == 3:
		hosts = dc3
	nodes = []
	stay = True
	while stay == True:
		i = 0
		nest = sorted(hosts, key=operator.itemgetter(0))
		for x in range(len(nest)):
			print "[%i] %s | %s"%(i,nest[i][1],nest[i][0])

		print "\nSelect node to add to execution list"
		ch = int(raw_input("ID: "))
		xx = [nest[ch][1],nest[ch][2]]
		s = str(raw_input("\nAdd another node? [Y/n] "))
		if s == "n" or s == "N":
			stay = False

	if(pyver == "new"):
		addrs = ""
		for node in nodes:
			address = node[1]+"@"+node[0]+","
			address = address.rstrip("\n")
			addrs = addrs+address
		addrs = addrs.strip(",")
		cmd = str(raw_input("\nEnter the command to execute: "))				

			c = "pdsh -w %s %s"%(addrs,cmd)
			print "Executing: %s"%(c)
			print "Failed to execute pdsh command: %s"%(c)


	if(pyver == "old"):
		print "Please upgrade to Python 2.6+"

if __name__ == "__main__":
		retval = main()
	except (KeyboardInterrupt, SystemExit):

Reviewed: MySQL for Python by Albert Lukaszewski

Packt Publishing recently sent me a copy of MySQL for Python to review and after reading through the book I must say that I’m rather impressed at the variety of topics that the book covers.

It starts off with the basics of setting up MySQL for your testing/development needs by going over several of the common installation and configuration methods. After that it’s a quick intro for connection methods and simple error reporting for connections. The author gives a quick intro to CRUD and how it relates to databases and python before heading into the common tasks of simple queries. I was surprised to see some database profiling discussion; which is rather handy for a new coder or a person new to MySQL. Once the basics of Inserts/Selects/Updates/Deletes are covered, which is a rather quick read, there is a welcome discussion of transactions and commit methods – if you do not read this section and are new to MySQL then believe me, you’re missing a very important topic. Most people will gloss over the basics and head right to the more advanced chapters that feature exception handling, the all too common “the mysql server has gone away” error, date&time functions, aggregate functions, and metadata queries. These chapters were the most interesting to me as they covered some great code for python that I have not yet played around with. Previously I’ve done a lot of work on those topics with perl and php so seeing how they were done in python was a great treat. The code is concise, easy to read, and well explained.

A number of topics cover the time saving solutions that no one should be without. Namely, bulk data inserting, data formatting, row iteration, and CSV parsing. Logging methods for access and changes to the database are also covered, and in the end will save your development cycle a lot of time when you are troubleshooting app-to-db interaction.

Two chapters will be of interest to DBAs in particular, and possibly not as interesting to pure developers, of which these are the Disaster Recovery and MySQL Administration topics. The author covers offline backups as well as online hot backups, two sections that no DBA should be without. The code for this type of work is covered in a decent amount of discussion but, along with the other chapters in the book, the theory and background of the topic is also discussed which gives the new reader an understanding of “why” and not just left with the “how”. The administration section of the book covers user creation and permissions management, along with a bit of background on security involved with that task, and also goes into quite a lot of coverage on web-based GUI administration and command line interaction for admin purposes.

Overall I enjoyed the contents of the book and would recommend taking a look if you are new to Python and MySQL or are even looking for a quick reference to the common tasks of database driven application development. This book does not cover the common ORM database interactions you’re likely to see in an app like Django or Pylons, but it will give you a solid foundation on how python and MySQL interact without an abstraction layer. If you are writing quick admin code or building your own database interaction layer, then this book would do well to be in your collection.

You can find the book at Amazon or directly from Packt.

Simple Python: a job queue with threading

Every so often you need to use a queue to manage operations in an application. Python makes this very simple. Python also, as I’ve written about before, makes threading very easy to work with. So in this quick program I’ll describe via comments, how to make a simple queue where each job is processed by a thread. Integrating this code to read jobs from a mysql database would be trivial as well; simply replace the “jobs = […” code with a database call to a row select query.

#!/usr/bin/env python
## DATE: 2011-01-20
## FILE:
## AUTHOR: Matt Reid
from Queue import *
from threading import Thread, Lock

'''this function will process the items in the queue, in serial'''
def processor():
    if queue.empty() == True:
        print "the Queue is empty!"
        job = queue.get()
        print "I'm operating on job item: %s"%(job)
        print "Failed to operate on job"

'''set variables'''
queue = Queue()
threads = 4
'''a list of job items. you would want this to be more advanced,
like reading from a file or database'''
jobs = [ "job1", "job2", "job3" ]

'''iterate over jobs and put each into the queue in sequence'''
for job in jobs:
     print "inserting job into the queue: %s"%(job)

'''start some threads, each one will process one job from the queue'''
for i in range(threads):
     th = Thread(target=processor)

'''wait until all jobs are processed before quitting'''

A simple load test script in Python

Lately I’ve had to do some environment load testing so I wrote this quick script. It can be modified as needed but the basic idea is that it spawns $x threads (–threads) and then sends two connections (or however many you want with –per-connection=) per thread to the URL (–url=). You can have it wait a configurable time between connections as well (–wait=).

The url is appended with a 32 character randomized string so that any database/caching on the backend of the site isn’t serving data from a warm cache. You can hunt down the string length for 32 and change it to whatever you want. Feel free to change and use as needed, just keep my info at top.

## DATE: 2010-10-26
## AUTHOR: Matt Reid
## MAIL:
## SITE:

from __future__ import division
import threading
import sys
import urllib2
import select
import random
import string
import getopt
import time

class threader(threading.Thread):
    def __init__(self):
    def run(self):
        global url
        global per
        global u
        for i in range(per):
            if wait > 0:
            str = randstr(32)
            # IMPORTANT: this is where we append the search string to the main URL
            # you might need to change this for your site. 
            url = "%s/search/%s"%(u,str)
            print "polling url: %s"%(url)
def randstr(length):
    global url
    twoletters = [c+d for c in string.letters for d in string.letters]
    r = random.random
    n = len(twoletters)
    l2 = length//2
    lst = [None] * l2
    for i in xrange(l2):
        lst[i] = twoletters[int(r() * n)]
        if length & 1:

    return "".join(lst)
def init_thread():    
    backgrounds = []
    for thread in range(threads):
        print "Spawning thread: %s"%(thread)
        background = threader()        
    for background in backgrounds:

def print_help():
    print ''' - URL load test script
Date: 2010-08-26
Author: Matt Reid
License: new BSD license
Use the following flags to change default behavior
   Option                 Description
   --url=                 URL to test
   --per-connection=      Number of sequential reqests per connection (default 2)
   --threads=             Number of threads for url connections (default 50)
   --wait=                Time to wait in-between requests
   --help                 Print this message

   -u                     Same as --url
   -p                     Same as --per-connection
   -t                     Same as --threads
   -w                     Same as --wait
   -h                     Same as --help

def main():    

if __name__ == "__main__":
    global threads #num threads/connections to open
    global u #url to hit
    global per #per connection url hits
        options, remainder = getopt.getopt(
            sys.argv[1:], 'ptuw', ['per-connection=',
    except getopt.GetoptError, err:
        print str(err) 
    for opt, arg in options:
        if opt in ('--per-connection'):
            per = int(arg)
        elif opt in ('--threads'):
            threads = int(arg)
        elif opt in ('--url'):
            u = arg
        elif opt in ('--wait'):
            wait = int(arg)
        elif opt in ('--help'):

    except NameError:
        print "No thread quantity specified."
    except NameError:
        per = 2
    except NameError:     
        print "No URL Specified"
    except NameError:

Easy Python: multi-threading MySQL queries [updated]

There are many times when writing an application that single threaded database operations are simply too slow. In these cases it’s a matter of course that you’ll use multi-threading or forking to spawn secondary processes to handle the database actions. In this simple example for Python multi-threading you’ll see the how simple it is to improve the performance of your python app.

#!/usr/bin/env python
@author: Matt Reid
@date: 2012-01-20
@purpose: example to show thread-safe MySQL connection pooling with multi-threaded SQL execution
@license: new BSD:
@requires: Python 2.6, MySQL Server 5.0+, MySQL-python connector
@SQL table create statement:
  CREATE TABLE `test` (
  `col_a` varchar(255) DEFAULT NULL,
  `col_b` varchar(255) DEFAULT NULL,
  `col_c` varchar(255) DEFAULT NULL,

import threading
import sys
import MySQLdb
import pdb

class threader(threading.Thread):
    def __init__(self,db):
        self.db = db
    def run(self):

def run_insert(db):
    sql = "INSERT INTO `test` (`id`,`col_a`,`col_b`,`col_c`) VALUES (NULL,'0','0','0');"
    print "thread executing sql:%s"%(sql)
        cursor = db.cursor()
        print "insert failed"


def init_thread(): 
    backgrounds = []
    for db in connections:
        print "connection: %s"%(db)
        background = threader(db)

    for background in backgrounds:

def main():
        print "failed to initiate threads"

if __name__ == "__main__":
    mysql_host = "localhost" 
    mysql_user = "root" 
    mysql_pass = "password" 
    mysql_port = int(3306)
    mysql_db = "test" 
    threads = int(4) #quantity of execution threads and size of connection pool

    connections = []
    for thread in range(threads):
            connections.append(MySQLdb.connect(host=mysql_host, user=mysql_user, passwd=mysql_pass, db=mysql_db, port=mysql_port))

        except MySQLdb.Error, e:
            print "Error %d: %s"%(e.args[0], e.args[1])
            sys.exit (1)



$> python ./ 
connection: <_mysql.connection open to 'localhost' at 2389850>
thread executing sql:INSERT INTO `test` (`id`,`col_a`,`col_b`,`col_c`) VALUES (NULL,'0','0','0');
connection: <_mysql.connection open to 'localhost' at 240da70>
thread executing sql:INSERT INTO `test` (`id`,`col_a`,`col_b`,`col_c`) VALUES (NULL,'0','0','0');
connection: <_mysql.connection open to 'localhost' at 24c6d70>
thread executing sql:INSERT INTO `test` (`id`,`col_a`,`col_b`,`col_c`) VALUES (NULL,'0','0','0');
connection: <_mysql.connection open to 'localhost' at 23c6290>
thread executing sql:INSERT INTO `test` (`id`,`col_a`,`col_b`,`col_c`) VALUES (NULL,'0','0','0');

* this script has been updated

Easy Python: display LVM details in XML

If you need to work with LVM in your scripts but haven’t found a good method to access details about Logical Volume Groups, here’s a simple Python script that will print the details about any volumes on your system. This could be useful for writing a partition check script for your MySQL data directory (if you’re not using a standard monitoring system like Nagios).

import sys
import os
import commands
import subprocess
import select

def lvm():
    print ""
    LVM_PATH = "/sbin"
    LVM_BIN = os.path.join(LVM_PATH, 'lvm')
    argv = list()

    process = subprocess.Popen(argv, stdout=subprocess.PIPE)
    output = ""
    out = process.stdout.readline()
    output += out
    lines = output.splitlines()
    for line in lines:
        line = line.strip()
        words = line.split(";")

        lvname = words[0].strip()
        vgname = words[1].strip()
        lv_size = int(words[2])
        print '''  
  '''%(lvname, vgname, lv_size)

    print ""