On 14 May 2009, at 11:27, Maciej Bliziński wrote:
> 2009/5/14 Niall O Broin <niall at linux.ie>:
>> If I send the output of your iconv line through hexdump -C again,
>> this is
>> what I get:
>>>> 00000000 61 62 63 20 c3 84 20 c3 96 20 c3 9c 20 c3 a4 20 |
>> abc ?. ?. ?. ä
>> |
>> 00000010 c3 b6 20 c3 bc 20 31 32 33 0a |ö ü
>> 123.|
>>>> which looks remarkably like - UTF-8 !
>> Yes, because that's what your terminal uses. When you say "I see
> exactly what I should", it essentially means that you have the text
> correctly encoded in utf-8, otherwise it wouldn't display correctly.
> The command I've shown in my first e-mail converts your garbled text
> to the encoding that your system/terminal displays. If you wanted the
> conversion to be complete and explicit, you could write:
>> iconv -f utf-8 -t cp1252 | iconv -f utf-8 -t <your-terminal's-
> encoding>
and indeed, that's what I finally had to do, with <your-terminal's-
encoding> replaced by latin1, as the final destination of the text
wanted that encoding.
> If <your-terminal's encoding> is utf-8, it'll be an identity, which
> you can safely skip. The crucial point is where you convert "to
> cp1252" and then interpret it as utf-8.
Yes, and this is the bit which flabbergasts me - conversion from utf-8
to cp1252 produces valid utf-8. It's like the original file was in
(utf-8)^2 :-)
As the American politician reputedly said - if English was good enough
for Jesus Christ, it's good enough for me.
> As a side note, try no to use the -c option with iconv -- it will hide
> lossy conversion. Having iconv failing with "illegal input sequence"
> is a good indicator of data loss during conversion.
Yes - and if I DON'T use -c, I do get 'illegal input sequence'. But
for what I need to do, some data loss during conversion is preferable
to no conversion at all - or rather having the conversion halt at the
first thing it can't handle, which is what happens.
Thanks once again for your assistance, and to Pádraig too (I gather
you two were chewing it over in the pub :-) )
Niall
Maintained by the ILUG website team. The aim of Linux.ie is to
support and help commercial and private users of Linux in Ireland. You can
display ILUG news in your own webpages, read backend
information to find out how. Networking services kindly provided by HEAnet, server kindly donated by
Dell. Linux is a trademark of Linus Torvalds,
used with permission. No penguins were harmed in the production or maintenance
of this highly praised website. Looking for the
Indian Linux Users' Group? Try here. If you've read all this and aren't a lawyer: you should be!