LINUX.IE, website of the Irish Linux Users' Group
Tux rules!

   
Home
New Users
Articles
Download
Projects
Community
Vendors

  Print Version
Email to...
 
Archives:


planetILUG

Recent News

News Archive


Join the
ILUG
on FaceBook


Join the
ILUG
on LinkedIn


Join the
ILUG SETI
Group



















 
 :: Mailing Lists

[ILUG] Problem with UTF-8 encoded data

[ILUG] Problem with UTF-8 encoded data

Maciej Bliziński maciej.blizinski at gmail.com
Wed May 13 23:38:46 IST 2009


On Wed, May 13, 2009 at 9:41 PM, Niall O Broin <niall at linux.ie> wrote:
> hexdump -C of the file follows:
>
> 00000000  61 62 63 20 c3 83 e2 80  9e 20 c3 83 e2 80 93 20  |abc ..... .....
> |
> 00000010  c3 83 c5 93 20 c3 83 c2  a4 20 c3 83 c2 b6 20 c3  |.... .... ....
> .|
> 00000020  83 c2 bc 20 31 32 33 0a                           |... 123.|

Based on that I wrote this Python program to spit out exactly the
content of your file:

maciej at clover ~ $ cat garbled.py
#!/usr/bin/python

import sys

data = [0x61, 0x62, 0x63, 0x20, 0xc3, 0x83, 0xe2, 0x80, 0x9e, 0x20, 0xc3, 0x83,
       0xe2, 0x80, 0x93, 0x20, 0xc3, 0x83, 0xc5, 0x93, 0x20, 0xc3, 0x83, 0xc2,
       0xa4, 0x20, 0xc3, 0x83, 0xc2, 0xb6, 0x20, 0xc3, 0x83, 0xc2, 0xbc, 0x20,
       0x31, 0x32, 0x33, 0x0a,]
s = ""
for d in data:
   s += chr(d)
sys.stdout.write(s)

And then:

maciej at clover ~ $ python garbled.py | iconv -c -f utf-8 -t cp1252
abc Ä Ö Ü ä ö ü 123

It means, your application has taken utf-8 for cp1252, and then
recoded this "cp1252" to utf-8. The shell line above reverses the
process. To fix that in MySQL, you need to convert your columns (do a
backup first! :-) ) from utf-8, with conversion, to cp1252, then
without conversion to binary, and then, without conversion, to utf-8.

I wrote one blog post on a similar topic some time ago:
http://automatthias.wordpress.com/2008/12/26/fixing-character-sets-in-mysql/

Does that help?

Maciej



More information about the ILUG mailing list
Read this without the formatting.
                                                                                                    

 

Hosted by HEAnet


Maintained by the ILUG website team. The aim of Linux.ie is to support and help commercial and private users of Linux in Ireland. You can display ILUG news in your own webpages, read backend information to find out how. Networking services kindly provided by HEAnet, server kindly donated by Dell. Linux is a trademark of Linus Torvalds, used with permission. No penguins were harmed in the production or maintenance of this highly praised website. Looking for the Indian Linux Users' Group? Try here. If you've read all this and aren't a lawyer: you should be!
RSS Version
Powered by Dell