A few days ago I upgraded/emerged MySQL installation on my Unixshell Gentoo box from 4.1.14 to 4.1.19. As it emerges, it also replaced the default configuration file and now uses utf8 as the default encoding to database and client libraries. I also re-emerged dev-lang/php-5.1.4, and it turned out all my unicode on PHP sites are broken! PHP pulls junks out from the database and can’t display them. Meanwhile, anything unicode you put in there becomes a series of question marks ???? in the rendered page.
D’oh
When I looked at the databases and tables, they all had default charset set to latin1, as they were created in MySQL 4.0.x. However I would like them to be ut8. These are the steps I took to fix up my databases.
-
Back up all databases!!! This is very important. Just in case anything goes wrong, at least we can still restore from the database.
$ mysqldump --opt --allow-keywords --hex-blob --all-databases -qc > backup.sql
Or you can back up on database by database level to make restoring easier.
-
Change the charset of your databases to utf8. If your databases were created prior to MySQL 4.1, the character set will be latin1. Use the command line MySQL client, and issue the following command to change charset:
mysql> ALTER DATABASE [database name] DEFAULT CHARACTER SET utf8 -> COLLATE utf8_general_ci;Or use the following shell script to change character set for all databases:
for database in `echo 'SHOW DATABASES;' | mysql -s` do echo "ALTER DATABASE \`$database\` DEFAULT CHARACTER SET utf8 COLLATE utf8_general_ci;" done | mysqlNow all database should have default charset as utf8, but tables and fields are still in latin1 format.
-
Dump database in latin1 and restore in utf8. This is the tricker part. As
mysqldumpis default to utf8, but we want to dump the tables in latin1 because that’s what the character sets were. You can do this on database by database level.$ mysqldump --opt --allow-keywords --hex-blob \ > --default-character-set=latin1 -qc [database name] > database.sql
Now, open
database.sqlup and changet all tables/fields declaration from latin1 to utf8! A global search and replace might do, or use some sed magic.After the edited SQL dump has been saved, just reload them back into the database. Update: Added
--default-character-setoption just in case MySQL client is not in utf8, which is default for Gentoo (but might not be in other distributions).$ mysql --default-character-set=utf8 [database name] < database.sql
And now you have a database that should be utf8 everywhere and works with MySQL 4.1.
Hopefully that works for you as well. I am pretty sure that there are easier ways to achieve it, and I would love to know them as well. If it doesn’t work out right… Hmm. You still got the backup?!
Update 2006-05-25: Robin wrote me his MySQL unicode fixes to his Zope-based application.
Luckily I found out what the main problem was. Setting “
default-character-set=latin1” in the [client] section of/etc/mysql/my.cnfmade sure all my legacy apps that don’t support unicode get latin1 data to them. Seems like a lot of PHP applications assume the data stream is in latin1. Thedefault-character-setsolution did the trick for b2Evolution, phpBB, Mambo and a lot of other applications. For Zope it wasn’t that easy, because MySQLdb supported unicode and the latest version of ZMySQLDA didn’t. Found a patch for ZMySQLDA in the mailmanager repository on SourceForge that added unicode support to it. After installing that, activating unicode support in all the connections in the Zope admin interface I was back on track.The good thing is now I can migrate my applications over to Unicode/UTF8 in an orderly fashion as soon as the applications get native support for Unicode directly with MySQL.
Thanks for the tip, Robin!

Delicious
Digg
Reddit
Post new comment