On 17/02/07, Aine Douglas <aine.douglas at gmail.com> wrote:
> On 2/17/07, Brian Foster <blf at blf.utvinternet.ie> wrote:
> The files in question are HTML files, created on a windows workstation
> and edited with windows notepad to include chinese script, and saved
> as "unicode" as opposed to ascii, uploaded in a binary transfer, and
> display beautifully online, and display beautifully in windows
> notepad.
>> Beyond that, I know zero about the encoding.
I suspect your files may be in a utf-16 encoding rather than utf-8.
The main observable difference to you will be that, although "cat"
will show the content sanely, "vi" won't because every second octet is
a NUL, typically displayed as ^@ or \x00.
I suppose they might even be utf-32, in which case 3 out of 4 octets
of otherwise-ascii content will be NUL; use "od" on the start of the
file to see the first four octets, and you'll find out what it is -
encoding and endianness. From "vim tips" at
http://vim.sourceforge.net/tips/tip.php?tip_id=246, the early octets
should be
utf-16 le: FF FE
utf-16 be: FE FF
utf-32 le: FF FE 00 00
utf-32 be: 00 00 FE FF
Once you know what you're dealing with, you can try to convince your
editor to read and write in the same encoding. If it is utf-16,
http://joe-editor.sourceforge.net/hints.html says you're out of luck;
but vim appears able to use that (although it calls it ucs-2).
:help fileencodings, in particular ucs-bom might do what you want.
> > Second, how will the editor be used without downloading
> > the files in question?
>> I like to SSH into the linux webserver and edit right off the
> commandline, esp since an ascii download / upload appears to corrupt
> the files.
Yes, it would. ftp text conversion would likely swap CR LF and LF
depending on the direction; but a not-one-octet-per-character file
would have something like
NUL CR NUL LF, which could become NUL CR NUL CR LF (or pretty much
anything else) which no longer matches the rest of the file. Use a
binary transfer protocol if you're going to do that -- ftp binary,
http, scp, or whatever is available.
> It can't be such an arbitary encoding if cat can handle the files.
cat probably doesn't display the NULs, so your "plain text" looks okay there.
What is $LANG when you start vim? The docs seem to suggest that you
can use that to induce vim to treat the file as ucs-2.
> I've never had a problem with vim until now. The very fist job
> interview I had when I left uni questioned me to descibe what happens
> if I start vi and type my name, maybe I've kept using it just for the
> next time I have to answer that question!
Nice question. Could be interesting if an early letter included an accent...
Good luck,
f
Maintained by the ILUG website team. The aim of Linux.ie is to
support and help commercial and private users of Linux in Ireland. You can
display ILUG news in your own webpages, read backend
information to find out how. Networking services kindly provided by HEAnet, server kindly donated by
Dell. Linux is a trademark of Linus Torvalds,
used with permission. No penguins were harmed in the production or maintenance
of this highly praised website. Looking for the
Indian Linux Users' Group? Try here. If you've read all this and aren't a lawyer: you should be!