2009/5/14 Niall O Broin <niall at linux.ie>:
> If I send the output of your iconv line through hexdump -C again, this is
> what I get:
>> 00000000 61 62 63 20 c3 84 20 c3 96 20 c3 9c 20 c3 a4 20 |abc ?. ?. ?. ä
> |
> 00000010 c3 b6 20 c3 bc 20 31 32 33 0a |ö ü 123.|
>> which looks remarkably like - UTF-8 !
Yes, because that's what your terminal uses. When you say "I see
exactly what I should", it essentially means that you have the text
correctly encoded in utf-8, otherwise it wouldn't display correctly.
The command I've shown in my first e-mail converts your garbled text
to the encoding that your system/terminal displays. If you wanted the
conversion to be complete and explicit, you could write:
iconv -f utf-8 -t cp1252 | iconv -f utf-8 -t <your-terminal's-encoding>
If <your-terminal's encoding> is utf-8, it'll be an identity, which
you can safely skip. The crucial point is where you convert "to
cp1252" and then interpret it as utf-8.
As a side note, try no to use the -c option with iconv -- it will hide
lossy conversion. Having iconv failing with "illegal input sequence"
is a good indicator of data loss during conversion.
Maciej
Maintained by the ILUG website team. The aim of Linux.ie is to
support and help commercial and private users of Linux in Ireland. You can
display ILUG news in your own webpages, read backend
information to find out how. Networking services kindly provided by HEAnet, server kindly donated by
Dell. Linux is a trademark of Linus Torvalds,
used with permission. No penguins were harmed in the production or maintenance
of this highly praised website. Looking for the
Indian Linux Users' Group? Try here. If you've read all this and aren't a lawyer: you should be!