Update: OSS MySQL Monitoring Solution

Progress on the Open Source enterprise grade MySQL monitoring system;  the schema for Monolith version 2 has been designed. Due to the many suggestions for features and the interest it has received I’ve put this on the front burner. That said, here is some more info on the next steps I’ll be taking.

  • Monitored servers will use a command line agent (called remotely) to pull information from both MySQL and the OS.
  • Historical information will include all values from global status and global variables, as well as CPU/Memory/IO/Disk usage.
  • Standard graphing functions – the ones in the list from the previous post – will gather information from various view tables that contain historical data that is collected from the agent script. User defined graphing will allow you to look at historical values over time for any of the various global status or global variable settings (integer based ones anyway).
  • The cnf file for each server will be stored in the monitoring database and available for viewing for historical purposes. 
Read More

Request: What do you want in a OpenSource MySQL Monitoring solution?

What would you like to see in a free enterprise-grade monitoring system for your daily MySQL needs?I’m rewriting Monolith – MySQL DBA Console from the ground up. This will be version 2 and I would like to get some input from the global MySQL community.So far I am going with the following; comment with any improvements/additions.

  • Variable interval polling of server statistics
  • Over 50 different alerts (see list below)
  • Graphing of various server statistics (see list below)
  • Tuning recommendations with cnf file changes to apply to server
  • Change control documents for recommended performance/security tuning
  • Threshold based alerting with multiple alert groups: info,warn,critical
  • Sorting/ordering of servers via groups. ie: client -> dev,stage,prod
  • RSS feeds for each alert group
  • XML export with user defined fields for external applications (API of sorts)
  • Slave server alerts, IO thread / SQL thread, seconds behind master (with threshold)
  • Various general stats: version, system ram size, # of schema, default table type
  • Ability to view current cnf file from server
  • Ability to view current global variables as a list
  • Email / pager reporting for alerts (choose warn/crit alerts to report on)

Alerts - can be enabled or disabled for reporting on a per server basis

  • number of connection failures (threshold)
  • number of connections errors (threshold)
  • binary logging not enabled
  • sync_binlog not set
  • max_used_connections too high compared to max_connections (ratio over 85% utilization)
  • query cache size too small: improper utilization
  • query cache size too large: improper utilization
  • memory usage over 85% of system ram
  • table scans excessive
  • tmp table to disk ratio too high
  • innodb buffer too small
  • innodb buffer too large (resources can be used elsewhere)
  • key buffer too large (resources can be used elsewhere)
  • key buffer too small
  • sort buffer too small
  • sort buffer too large (resources can be used elsewhere)
  • join buffer too small
  • join buffer too large (resources can be used elsewhere)
  • open_files_limit too small compared to open_files usage ratio
  • table locks too high
  • table cache too small
  • table cache too large (resources can be used elsewhere)
  • thread cache too small
  • thread cache too large (resources can be used elsewhere)
  • binlog size too small
  • excessive disk tmp table usage
  • flush time not zero value
  • indexes not being utilized properly
  • innodb doublewrite buffer enabled
  • innodb flush method not properly set
  • innodb transaction isolation level not properly set for ACID
  • innodb log wait time too high
  • conncurrent insert not set correctly
  • query cache not enabled
  • table locks too high
  • thread cache not enabled
  • concurrent queries too high (threshold)
  • binary logs not set to auto-purge
  • binary logging not enabled
  • slave not active if server is set to slave
  • slave sql thread stopped
  • slave io thread stopped
  • slave behind master too long (threshold)
  • slave not set as read-only
  • slave relay logs not automatically purged
  • account has global privs
  • account set to old_password
  • accounts able to be added with empty password
  • account has grant option
  • root can login remotely
  • root has no password
  • blank username detected
  • test database exists
  • usage user exists
  • general query log enabled
  • CPU usage over threshold
  • Load Average over threshold

Graphs - default

  • Query cache utilization, in/out hit ratio
  • Temp tables usage
  • Temp files usage
  • Open tables usage
  • Thread cached / created
  • Connections
  • Query rate
  • Data size, Index size, overall size
  • Uptime
  • CPU usage
  • I/O usage
  • Load average
  • Data transfer in/out
  • Memory allocation
  • Swap usage
  • Slave delay (if configured as a slave)

As mentioned before, this is a OSS application that has no license or subscription fees and will be updated on a regular basis. It’s a mix of PHP for the web side and Perl for the system polling and reporting processes. The installer will be new, web based, and a simple process, much improved from the current method.Please add your comments now, as I am drawing up plans this week for version 2. :)

Read More