LINUX.IE, website of the Irish Linux Users' Group
Tux rules!

   
Home
New Users
Articles
Download
Projects
Community
Vendors

  Print Version
Email to...
 
Archives:


planetILUG

Recent News

News Archive


Join the
ILUG
on FaceBook


Join the
ILUG
on LinkedIn


Join the
ILUG SETI
Group



















 
 :: Mailing Lists

[ILUG] Editing unicode text files.

[ILUG] Editing unicode text files.

Brian Foster blf at blf.utvinternet.ie
Sat Feb 17 17:12:58 GMT 2007


  | Date: Sat, 17 Feb 2007 15:22:40 +0000
  | From: Kae Verens <kae at verens.com>
  | 
  | Aine Douglas wrote:
  | >[ Brian Foster wrote ]:
  | >>  Editors that can handle the full UCS/Unicode in a variety
  | >>  of encodings include vim(1), mined, and yudit.  Some other
  | >>  editors, such as joe(1), handle UTF-8 but not necessarily
  | >>  an arbitrary encoding.
  | >
  | > On my shell account, I have vim and joe, both render garbage.
  | > Will get mined and yudit later and test.
  | 
  | vim is usually quite good about that - as long as the console can
  | display the characters, it should work okay.  I just opened up the
  | Russian language file for KFM in vi and vim, and both worked fine.
  | This was in Konsole; KDE's terminal emulator.  I have had trouble
  | with charsets in xterm and many other terms, so make sure that's
  | not a problem first.

 The first trick to using any X terminak is to ensure
 the font is adequate; broadly, this (seems to) mean
 an ISO-10646 font.

 And then, for xterm(1) specifically, ensure it is in
 UTF-8 mode.

 In addition to KDE konsole and xterm (both work fine
 for me), there is also mlterm(8), and I believe recent
 versions of rxvt(1) are also UTF-8 capable.

 I concur with Kae's point here:  Until you can simply
 cat(1) the file and see what you _should_ see, things
 are not set up correctly.

  | Also, UTF-8 files, which I presume you're talking about, usually
  | start with a single marker character to distinguish them from
  | otherwise-plain-text files. If that marker character is missing,
  | vim may not be figuring out the charset correctly.

 NO (and yes):  Micro$oft UTF-8 files do tend to start
 with a BOMb, but no-one else's does.  The BOMb is
 never needed, not even on Windross (for UTF-8).

 In any case, if vim(1) is confused, simpy set the
 fileencoding (`:help fileencoding' for details).

 Having said that, I understand the (HTML) files in
 question were written by an M$ thingie on Windross
 as “Unicode” — which very probably means they are
 encoded as UTF-16LE.

 I was, just now, able to edit a UTF-16LE version of
 this reply using:

   vim --cmd 'set fileencodings=utf-16le' ...

cheers!
	-blf-
-- 
Experienced (>25 yrs) kernel/software Eng: | Brian Foster   Montpellier,
 • Unix, embedded, &tc;  • Linux;  • doc;  | blf at utvinternet.ie   FRANCE
 • IDL, automated testing, process, &tc.   |  Stop E$$o (ExxonMobile)!
Résumé (CV) http://www.blf.utvinternet.ie  |     http://www.stopesso.com



More information about the ILUG mailing list
Read this without the formatting.
                                                                                                    

 

Hosted by HEAnet


Maintained by the ILUG website team. The aim of Linux.ie is to support and help commercial and private users of Linux in Ireland. You can display ILUG news in your own webpages, read backend information to find out how. Networking services kindly provided by HEAnet, server kindly donated by Dell. Linux is a trademark of Linus Torvalds, used with permission. No penguins were harmed in the production or maintenance of this highly praised website. Looking for the Indian Linux Users' Group? Try here. If you've read all this and aren't a lawyer: you should be!
RSS Version
Powered by Dell