Battling XHTML :: Storing UTF-8 data in MySQL
In the xml parser that I’ve been writing for rss/atom feeds I’ve encountered what many people have found; bizarre encoding issues when displaying the data from the database on a webpage. Since this is not really well explained by the searches I did on google I’ll explain it here.
Issue: you have utf-8 data coming from a source, you put it into a utf8_general_ci column of a mysql database table. You read the data from the database and display it as html/xhtml. Instead of getting things like double backquotes or long dashes you get euro signs or umlaut type of characters, usually strings of them instead of the correct format.
Potential solution: use utf8_encode and htmlentities in PHP to clean the data before going into the database. This does not work. Why? Those characters are not covered by html standards since they are above ascii code 126. See here for the full code chart: http://www.ascii.cl/htmlcodes.htm
Solution: clean out the invalid characters when the data is displayed on a browser. Similar to this post, but with changes: htmlentities
function htmlfriendly2($txt){
$len = strlen($txt);
$res = "";
for($i = 0; $i < $len; ++$i) {
$ord = ord($txt{$i});
if($ord >= 127) {
$res .= " ";
}
else {
$res .= $txt{$i};
}
}
return $res;
}







mike503 Said,
September 11, 2008 @ 4:10 pm
If the meta tag is set to content-type utf-8, the form input will be utf-8 encoded, and passed to mysql as plain old binary data. mysql doesn’t care the encoding even with latin1 charset. (it would if you were doing mysql string functions on it though)
however for just inserting and fetching information you can dump utf-8 encoded content into any type of column and display it back out to the browser with the same utf-8 meta tag and it works great.
i believe the only thing i do on output is htmlspecialchars() to encode any html/XSS attempts.
admin Said,
September 11, 2008 @ 4:23 pm
mike503, I would agree with you however I’m not using any form data in this case since the xml parser that is inputting the data is driven from the command line with php.
I posted the good solution here: http://themattreid.com/wordpress/?p=66