LINUX.IE, website of the Irish Linux Users' Group
Tux rules!

   
Home
New Users
Articles
Download
Projects
Community
Vendors

  Print Version
Email to...
 
Archives:


planetILUG

Recent News

News Archive


Join the
ILUG
on FaceBook


Join the
ILUG
on LinkedIn


Join the
ILUG SETI
Group



















 
 :: Mailing Lists

[ILUG] Editing unicode text files.

[ILUG] Editing unicode text files.

Brian Foster blf at blf.utvinternet.ie
Sat Feb 17 10:42:04 GMT 2007


  | Date: Fri, 16 Feb 2007 16:31:43 +0000
  | From: "Aine Douglas" <aine.douglas at gmail.com>
  | 
  | Can anyone recommend a commandline text editor that is capable of
  | editing unicode text files?
  | 
  | I've got some webpages to edit which contain chinese script, and when
  | I open them in vi i get long strings of @@@@@^^^???@@ etc, and its a
  | pain downloading them for really small edits.

 I don't quite grok what it is you want to do?
 First, “Unicode” is ambiguous to the point of meaningless;
 what matters is the encoding, not what is encoded.
 ( Briefly:  Every character is in the UCS (Universal
  Character Set, ISO-10646, also called “Unicode”†).
  A character's binary representation is an encoding.
  US-ASCII, e.g., is the first 128 charaters of the UCS;
  ISO-8859-1 is the first 256; ISO-8859-15 is a slightly
  different set of 256; UTF-8 is all two billion; and
  there are many other encodings. )

 Second, how will the editor be used without downloading
 the files in question?

 And third, by “command line” do you mean something like
 sed(1), or just an editor you can launch from the shell
 (like the vi(1) mentioned?).

 Editors that can handle the full UCS/Unicode in a variety
 of encodings include vim(1), mined, and yudit.  Some other
 editors, such as joe(1), handle UTF-8 but not necessarily
 an arbitrary encoding.

 I've only used `vim' in anger (in several senses! ;-) ):
 `vim', at least, will autodetect the file's encoding and
 map it to yer locale's, and hence you can use `vim' to
 edit a SJIS file on a UTF-8 system.  The file is saved
 in its original encoding.  Almost needless to say, this
 mapping works best if the system/locale uses UTF-8 (on
 Linux), since UTF-8 round-trips the full UCS.  Result is,
 provided you are displaying UTF-8 correctly (mostly a
 matter of fonts), `vim' works quite well (albeit keying
 in non-keyboard characters can be a pain:  I tend to use
 gucharmap(1) and copy-and-paste).

cheers!
	-blf-
 
  †  Pedantically, “Unicode” means three different things,
    and is not a synonym for the UCS.

-- 
Experienced (>25 yrs) kernel/software Eng: | Brian Foster   Montpellier,
 • Unix, embedded, &tc;  • Linux;  • doc;  | blf at utvinternet.ie   FRANCE
 • IDL, automated testing, process, &tc.   |  Stop E$$o (ExxonMobile)!
Résumé (CV) http://www.blf.utvinternet.ie  |     http://www.stopesso.com



More information about the ILUG mailing list
Read this without the formatting.
                                                                                                    

 

Hosted by HEAnet


Maintained by the ILUG website team. The aim of Linux.ie is to support and help commercial and private users of Linux in Ireland. You can display ILUG news in your own webpages, read backend information to find out how. Networking services kindly provided by HEAnet, server kindly donated by Dell. Linux is a trademark of Linus Torvalds, used with permission. No penguins were harmed in the production or maintenance of this highly praised website. Looking for the Indian Linux Users' Group? Try here. If you've read all this and aren't a lawyer: you should be!
RSS Version
Powered by Dell