Archive for category Data organization

In praise of Ohloh

I have some Google Alerts setup to notify me of certain tags that are found on a daily basis. One of them of course is Kontrollbase and the alert linked me over to http://ohloh.com and their pickup of the code commits from the Google subversion code repo. After looking at some stats I can say that I’m impressed with their automation of data gathering and code inspection. I like the reports and analytics. You can see for yourself here:
https://www.ohloh.net/p/kontrollbase/analyses/latest
https://www.ohloh.net/p/kontrollbase/contributors/1404918162307859

Tags: , ,

MySQL Data Type Q&A

Question: “When I use procedure analyse() on my schema it suggests TINYINT for the columns which have the data type VARCHAR. Based on the performance and data requirements, which one is better?”

Answer: TINYTEXT and TINYINT and VARCHAR are quite different. For reference I would refer you to the mysql manual page about data types.

However, procedure analyse() will read the values you have in your columns and if they consistently fit a pattern that would be better suited to another data type then it will suggest the correct one. As in, if your column is VARCHAR(1) and your data is similar to “1,4,7,5,2″ etc then TINYINT would be a better suited data type since you are dealing with numbers and not variable characters. Similarly, if you have the same varchar column, but your data is “a,b,t,h,o” etc then TINYTEXT or CHAR would be better than VARCHAR for this type of data. Mostly, CHAR would be better for my example here but your datasets might make mysql think TINYTEXT is better.

In regard to data storage, which procedure analyse() also looks at given the data you have, here are the following differences between the three data types.

TINYINT: 1 byte
TINYTEXT: 1 byte length prefix plus data
VARCHAR: 1 or 2 byte prefix plus data depending on string length.

So you have to choose the data type depending on the data it’s going to store. There’s no reason to store 2 byte prefixes and then the data if you can just store the 1 byte of data. In addition, the various TEXT data types have other attributes that are bigger conversation, see the documentation for more details there.

That said, if you are running procedure analyse() on your data and it recommends TINYTEXT when you have simple testing data like I mentioned above, but you choose VARCHAR because you know that your data will eventually *need* that type of storage requirement – say it will eventually store multi-byte usernames or passwords in excess of 255 characters (since VARCHAR can hold more data than TINYTEXT), then you should not change your column to TINYTEXT. This is because procedure analyse() is only recommending this because of the current data, not future data – so you need to involve your own interpretation for the optimization process.

Tags: , ,

To headline or to summarize 1000+ articles per week?

Ok, here’s your chance to make a recommendation on the front page layout for relationalnews.com , the new website that features news for all of the major relational database systems. Check out the three options below and leave a comment or email me with your own solution. If selected you will be credited one the site for your influence. 

The question is: we’re getting just over 1000 new articles per week from the feeds, and the current front page displays the title and summary of the latest 10 articles. This means lots of pagination clicking to see even the approximate 150 new articles for the current day. 

Option 1: display only the headlines, no article summaries, for the past 24 hours with links to the full article from the archive. Sorted by published date, limited to the past 24 hours regardless of number of posts. Paginate the rest.

Option 2: display blocks of headlines sorted by category tags, limit to 10 headlines. Paginate the rest.

Option 3: leave the layout how it currently is.

Option 4: make a suggestion?

Tags: , , ,