LINUX.IE, website of the Irish Linux Users' Group
Tux rules!

   
Home
New Users
Articles
Download
Projects
Community
Vendors

  Print Version
Email to...
 
Archives:


planetILUG

Recent News

News Archive


Join the
ILUG
on FaceBook


Join the
ILUG
on LinkedIn


Join the
ILUG SETI
Group



















 
 :: Mailing Lists

[ILUG] sed answer, unicode, block buffering

[ILUG] sed answer, unicode, block buffering

Ciaran Mac Lochlainn ciaran17 at eircom.net
Tue Aug 10 12:26:28 IST 2004


Brian Foster wrote:

> “Unicode” is not an encoding (or charset) in modern
> usage; that is an _obsolete_ term for an _obsolete_
> 16-bit encoding of the UCS (Universal Character Set).
> at first glance, what is meant here is UCS-16, the
> current de jour 16+ bit encoding of the UCS (and
> used by Windross in its LE (Little Endian) form).
>  
>
I don't know what encoding is being used, only that it's not 8-bit.  The 
output has to be sent to a dumb hardware device which needs the specific 
8-bit codes \027a\340\300\004\340\300\005\340\300\002# before every 
line.  We tried hardcoding these into the Windows program, but it 
converts them on the fly to a multibyte format.  I guess from what you 
say below that this is UTF-8.

> _however_, looking at yer example (\340 is \303\240),
> I suspect you are really dealing with UTF-8.  (I have
> no idea what .NET specifies, if anything.)  if I have
> decoded it correctly in my head(!), \303\240 is the
> correct UTF-8 for U+00E0 (à, “LATIN SMALL LETTER A
> WITH GRAVE”).  I have no idea if that makes sense in
> context or not, or is just a coincidence?
>  
>
It helps flesh out my understanding of what was going on... \340 in 
octal is 00E0 in hex, which was being translated into \303\240.  It 
makes sense in that when I cat the file, I see à signs.

So... seeing as we're using a shell script to copy this data from a 
samba share to a serial port anyway, I thought of integrating sed into 
the script.  It worked (Thanks P at draig) but then I ran into another problem.

(solution follows)

To test all this, I tried

tail -f /home/pos/trace | sed 's/^#/blah#/' > /dev/trace

and on another tty, I did

tail -f /dev/trace

(/dev/trace is a regular file at the moment, for testing purposes.  In 
practice it will be a character device.  /home/pos/trace is a regular 
file on a Samba share, which is written to by the Windoze application)

...but no output appeared.

I tried dropping the -f:

tail /home/pos/trace | sed 's/^#/blah#/' > /dev/trace

and this produced output for me.

"tail -f /home/pos/trace | less" also worked.

I was stumped again...  but Google (and an archive of comp.unix.misc) 
came to the rescue!

The issue here was that sed block buffers, so it was waiting for a full 
block of a few kb before it would produce any output.  tail -f does not 
block buffer, it produces the timely output I want.

The solution was to pass each line through sed as it appeared.

tail -f /home/pos/trace \
 | while read line
do  # use read line so that sed outputs each line as it is written
  echo $line | sed 's/^#/blah#/' >> /dev/trace
done

Now this parses and outputs each line as it is written, giving real time 
output in the correct format.

The only downside is that it wouldn't scale up too well because it kicks 
off a sed process for each line of output, but that's unlikely to be an 
issue for our customers.






More information about the ILUG mailing list
Read this without the formatting.
                                                                                                    

 

Hosted by HEAnet


Maintained by the ILUG website team. The aim of Linux.ie is to support and help commercial and private users of Linux in Ireland. You can display ILUG news in your own webpages, read backend information to find out how. Networking services kindly provided by HEAnet, server kindly donated by Dell. Linux is a trademark of Linus Torvalds, used with permission. No penguins were harmed in the production or maintenance of this highly praised website. Looking for the Indian Linux Users' Group? Try here. If you've read all this and aren't a lawyer: you should be!
RSS Version
Powered by Dell