I have a MySQL DB with text encoded supposedly in UTF-8. This DB is
used by a web application written in PHP. The php.ini file has this
default_charset = "utf-8"
and the text displays as it should it web browsers. All well and good
However, it has to be transferred to another system which expects the
text to be in latin1. Changing the other system is not possible. The
data is exported from MySQL with
SELECT INTO $FILE and is converted to latin1 with recode. Or rather,
WAS converted. This worked last year, but this year, recode fails to
do the conversion.
I have created a simple test field in the database via the web UI
which contains just
abc Ä Ö Ü ä ö ü 123
and then exported that into a text file. Trying to convert it I get:
% recode UTF8..ISO_8859-15 < /tmp/umlaut
abc Ã" Ã- Ã½ Ãrecode: Invalid input in step `UTF-8..ISO-8859-15'
% iconv -f utf-8 -t latin1 /tmp/umlaut
abc Ãiconv: illegal input sequence at position 6
hexdump -C of the file follows:
00000000 61 62 63 20 c3 83 e2 80 9e 20 c3 83 e2 80 93 20 |
abc ..... ..... |
00000010 c3 83 c5 93 20 c3 83 c2 a4 20 c3 83 c2 b6 20 c3
|.... .... .... .|
00000020 83 c2 bc 20 31 32 33 0a |... 123.|
Can anyone suggest what kind of encoding is being used here, and how I
can convert the files to latin1 / ISO8859-1 ?
I have use the MySQL set names command with latin, utf8, and binary -
in every case the output file is identical.
This is an urgent and serious problem. There will be plentiful beer at
the next POTD for a solution.
Maintained by the ILUG website team. The aim of Linux.ie is to
support and help commercial and private users of Linux in Ireland. You can
display ILUG news in your own webpages, read backend
information to find out how. Networking services kindly provided by HEAnet, server kindly donated by
Dell. Linux is a trademark of Linus Torvalds,
used with permission. No penguins were harmed in the production or maintenance
of this highly praised website. Looking for the
Indian Linux Users' Group? Try here. If you've read all this and aren't a lawyer: you should be!