LINUX.IE, website of the Irish Linux Users' Group
Tux rules!

   
Home
New Users
Articles
Download
Projects
Community
Vendors

  Print Version
Email to...
 
Archives:


planetILUG

Recent News

News Archive


Join the
ILUG
on FaceBook


Join the
ILUG
on LinkedIn


Join the
ILUG SETI
Group



















 
 :: Mailing Lists

[ILUG] sed question

[ILUG] sed question

Brian Foster blf at blf.utvinternet.co.uk
Mon Aug 9 23:53:46 IST 2004


  | Date: Mon, 09 Aug 2004 14:04:54 +0100
  | From: Ciaran Mac Lochlainn <ciaran17 at eircom.net>
  | 
  | First of all, I'm not sure if sed is even the right tool for this job, 
  | but here goes-

 I realize the problem has been solved.
 just one comment ...

  | I need to send a stream of output from a .net app on a windows box (or 
  | boxen) to a serial port on a Linux machine.  Each line has to contain a 
  | preamble which includes unprintable characters.  The .net app writes 
  | these in Unicode (e.g. \340 is written as \303\240) which the hardware 
  | device on the serial port can't interpret.

 “Unicode” is not an encoding (or charset) in modern
 usage; that is an _obsolete_ term for an _obsolete_
 16-bit encoding of the UCS (Universal Character Set).
 at first glance, what is meant here is UCS-16, the
 current de jour 16+ bit encoding of the UCS (and
 used by Windross in its LE (Little Endian) form).

 vim(1) can be set to read/write UTF-16 (as well as
 many other encodings) using the `set fileencoding=X'
 `:'-command:

    :set fileencoding=utf-16

 I actually composed this e-mail reply in UTF-16BE
 (Windross is UTF-16LE), using the above to set the
 encoding --- albeit I'll send it in UTF-8.

 anyways, since you could be dealing with UTF-16,
 this is another trick you might have been able to
 use .... albeit there are several gottchas here,
 such as UTF-16 nominally contains embedded nul (\0)
 bytes, which is liable to confuse many programs on
 *ix systems.

 _however_, looking at yer example (\340 is \303\240),
 I suspect you are really dealing with UTF-8.  (I have
 no idea what .NET specifies, if anything.)  if I have
 decoded it correctly in my head(!), \303\240 is the
 correct UTF-8 for U+00E0 (à, “LATIN SMALL LETTER A
 WITH GRAVE”).  I have no idea if that makes sense in
 context or not, or is just a coincidence?

 this probably confuses more than it helps --- sorry!
cheers,
	-blf-
-- 
«How many surrealists does it take to    |  Brian Foster      Montpellier,
 change a lightbulb?  Three.  One calms  |  blf at utvinternet.ie      FRANCE
 the warthog, and two fill the bathtub   |    Stop E$$o (ExxonMobile)!
 with brightly-colored machine tools.»   |        http://www.stopesso.com



More information about the ILUG mailing list
Read this without the formatting.
                                                                                                    

 

Hosted by HEAnet


Maintained by the ILUG website team. The aim of Linux.ie is to support and help commercial and private users of Linux in Ireland. You can display ILUG news in your own webpages, read backend information to find out how. Networking services kindly provided by HEAnet, server kindly donated by Dell. Linux is a trademark of Linus Torvalds, used with permission. No penguins were harmed in the production or maintenance of this highly praised website. Looking for the Indian Linux Users' Group? Try here. If you've read all this and aren't a lawyer: you should be!
RSS Version
Powered by Dell