LINUX.IE, website of the Irish Linux Users' Group
Tux rules!

   
Home
New Users
Articles
Download
Projects
Community
Vendors

  Print Version
Email to...
 
Archives:


planetILUG

Recent News

News Archive


Join the
ILUG
on FaceBook


Join the
ILUG
on LinkedIn


Join the
ILUG SETI
Group



















 
 :: Mailing Lists

[ILUG] Problem with UTF-8 encoded data

[ILUG] Problem with UTF-8 encoded data

Maciej Bliziński maciej.blizinski at gmail.com
Thu May 14 11:27:22 IST 2009


2009/5/14 Niall O Broin <niall at linux.ie>:
> If I send the output of your iconv line through hexdump -C again, this is
> what I get:
>
> 00000000  61 62 63 20 c3 84 20 c3  96 20 c3 9c 20 c3 a4 20  |abc ?. ?. ?. ä
> |
> 00000010  c3 b6 20 c3 bc 20 31 32  33 0a                    |ö ü 123.|
>
> which looks remarkably like - UTF-8 !

Yes, because that's what your terminal uses. When you say "I see
exactly what I should", it essentially means that you have the text
correctly encoded in utf-8, otherwise it wouldn't display correctly.
The command I've shown in my first e-mail converts your garbled text
to the encoding that your system/terminal displays. If you wanted the
conversion to be complete and explicit, you could write:

iconv -f utf-8 -t cp1252 | iconv -f utf-8 -t <your-terminal's-encoding>

If <your-terminal's encoding> is utf-8, it'll be an identity, which
you can safely skip. The crucial point is where you convert "to
cp1252" and then interpret it as utf-8.

As a side note, try no to use the -c option with iconv -- it will hide
lossy conversion. Having iconv failing with "illegal input sequence"
is a good indicator of data loss during conversion.

Maciej



More information about the ILUG mailing list
Read this without the formatting.
                                                                                                    

 

Hosted by HEAnet


Maintained by the ILUG website team. The aim of Linux.ie is to support and help commercial and private users of Linux in Ireland. You can display ILUG news in your own webpages, read backend information to find out how. Networking services kindly provided by HEAnet, server kindly donated by Dell. Linux is a trademark of Linus Torvalds, used with permission. No penguins were harmed in the production or maintenance of this highly praised website. Looking for the Indian Linux Users' Group? Try here. If you've read all this and aren't a lawyer: you should be!
RSS Version
Powered by Dell