Dirty growth-over-time query

I’ve been messing around with the Kontrollbase schema for the last couple of days, writing various queries for the daily reporting scripts that will eventually be an automated pdf report. I’ll give you examples of two of the queries, the first being overall environment stats, and the second being single-host growth over time.

Overall environment stats
select ((((MAX(os_mem_used)) / 1024 ) / 1024) / 1024) max_os_mem_used, ((((MIN(os_mem_used)) / 1024 ) / 1024) / 1024) min_os_mem_used, ((((AVG(os_mem_used)) / 1024 ) / 1024) / 1024) avg_os_mem_used, ((((STDDEV_POP(os_mem_used)) / 1024 ) / 1024) / 1024) stdev_os_mem_used, ((((MAX(length_data + length_index)) / 1024 ) / 1024) / 1024) max_size, ((((MIN(length_data + length_index)) / 1024 ) / 1024) / 1024) min_size, ((((AVG(length_data + length_index)) / 1024 ) / 1024) / 1024) avg_size, ((((STDDEV_POP(length_data + length_index)) / 1024 ) / 1024) / 1024) stdev_size, MAX(num_connections) max_connections, MIN(num_connections) min_connections, AVG(num_connections) avg_connections, STDDEV_POP(num_connections) stdev_connections, MAX(queries_per_second) max_qps, MIN(queries_per_second) min_qps, AVG(queries_per_second) avg_qps, STDDEV_POP(queries_per_second) stdev_qps from server_statistics;
*************************** 1. row ***************************
max_os_mem_used: 50.743488311768
min_os_mem_used: 0.023044586182
avg_os_mem_used: 1.5759627057922952
stdev_os_mem_used: 2.4064208226596184
max_size: 283.815660957247
min_size: 0.000492287800
avg_size: 10.8213909777686940
stdev_size: 39.9443980717105447
max_connections: 435
min_connections: 0
avg_connections: 16.9734
stdev_connections: 37.3269
stdev_os_mem_used: 2.4064208226596184
max_qps: 9243.6533203125
min_qps: 0.00011409764556447
avg_qps: 216.421064774444
stdev_qps: 1071.72792986232
1 row in set (0.00 sec)

single-host growth over time
mysql> select (select (((length_data + length_index) / 1024) / 1024) as curr_size from server_statistics where server_list_id='44' and CURDATE() < = Creation_time order by Creation_time asc limit 1) as 0_day_size_mb, (select (((length_data + length_index) / 1024) / 1024) as curr_size from server_statistics where server_list_id='44' and DATE_SUB(CURDATE(),INTERVAL 30 DAY) <= Creation_time order by Creation_time asc limit 1) as 30_day_size_mb, ( (select (((length_data + length_index) / 1024) / 1024) as curr_size from server_statistics where server_list_id='44' and CURDATE() <= Creation_time order by Creation_time asc limit 1) - (select (((length_data + length_index) / 1024) / 1024) as curr_size from server_statistics where server_list_id='44' and DATE_SUB(CURDATE(),INTERVAL 30 DAY) <= Creation_time order by Creation_time asc limit 1)) as difference, ( select (select ((select (((length_data + length_index) / 1024) / 1024) as curr_size from server_statistics where server_list_id='44' and CURDATE() <= Creation_time order by Creation_time asc limit 1) - (select (((length_data + length_index) / 1024) / 1024) as curr_size from server_statistics where server_list_id='44' and DATE_SUB(CURDATE(),INTERVAL 30 DAY) <= Creation_time order by Creation_time asc limit 1)) as difference) / (select(select (((length_data + length_index) / 1024) / 1024) as curr_size from server_statistics where server_list_id='44' and CURDATE() <= Creation_time order by Creation_time asc limit 1) as 0_day_size_mb ) * 100) as percent_growth;
+---------------+----------------+--------------+----------------+
| 0_day_size_mb | 30_day_size_mb | difference | percent_growth |
+---------------+----------------+--------------+----------------+
| 9100.43986130 | 8199.39263916 | 901.04722214 | 9.901139240197 |
+---------------+----------------+--------------+----------------+
1 row in set (0.01 sec)

Read More

MySQL Data Type Q&A

Question: “When I use procedure analyse() on my schema it suggests TINYINT for the columns which have the data type VARCHAR. Based on the performance and data requirements, which one is better?”

Answer: TINYTEXT and TINYINT and VARCHAR are quite different. For reference I would refer you to the mysql manual page about data types.

However, procedure analyse() will read the values you have in your columns and if they consistently fit a pattern that would be better suited to another data type then it will suggest the correct one. As in, if your column is VARCHAR(1) and your data is similar to “1,4,7,5,2″ etc then TINYINT would be a better suited data type since you are dealing with numbers and not variable characters. Similarly, if you have the same varchar column, but your data is “a,b,t,h,o” etc then TINYTEXT or CHAR would be better than VARCHAR for this type of data. Mostly, CHAR would be better for my example here but your datasets might make mysql think TINYTEXT is better.

In regard to data storage, which procedure analyse() also looks at given the data you have, here are the following differences between the three data types.

TINYINT: 1 byte
TINYTEXT: 1 byte length prefix plus data
VARCHAR: 1 or 2 byte prefix plus data depending on string length.

So you have to choose the data type depending on the data it’s going to store. There’s no reason to store 2 byte prefixes and then the data if you can just store the 1 byte of data. In addition, the various TEXT data types have other attributes that are bigger conversation, see the documentation for more details there.

That said, if you are running procedure analyse() on your data and it recommends TINYTEXT when you have simple testing data like I mentioned above, but you choose VARCHAR because you know that your data will eventually *need* that type of storage requirement – say it will eventually store multi-byte usernames or passwords in excess of 255 characters (since VARCHAR can hold more data than TINYTEXT), then you should not change your column to TINYTEXT. This is because procedure analyse() is only recommending this because of the current data, not future data – so you need to involve your own interpretation for the optimization process.

Read More