Brian Foster wrote:
> “Unicode” is not an encoding (or charset) in modern
> usage; that is an _obsolete_ term for an _obsolete_
> 16-bit encoding of the UCS (Universal Character Set).
> at first glance, what is meant here is UCS-16, the
> current de jour 16+ bit encoding of the UCS (and
> used by Windross in its LE (Little Endian) form).
>>I don't know what encoding is being used, only that it's not 8-bit. The
output has to be sent to a dumb hardware device which needs the specific
8-bit codes \027a\340\300\004\340\300\005\340\300\002# before every
line. We tried hardcoding these into the Windows program, but it
converts them on the fly to a multibyte format. I guess from what you
say below that this is UTF-8.
> _however_, looking at yer example (\340 is \303\240),
> I suspect you are really dealing with UTF-8. (I have
> no idea what .NET specifies, if anything.) if I have
> decoded it correctly in my head(!), \303\240 is the
> correct UTF-8 for U+00E0 (à, “LATIN SMALL LETTER A
> WITH GRAVE”). I have no idea if that makes sense in
> context or not, or is just a coincidence?
>>It helps flesh out my understanding of what was going on... \340 in
octal is 00E0 in hex, which was being translated into \303\240. It
makes sense in that when I cat the file, I see à signs.
So... seeing as we're using a shell script to copy this data from a
samba share to a serial port anyway, I thought of integrating sed into
the script. It worked (Thanks P at draig) but then I ran into another problem.
To test all this, I tried
tail -f /home/pos/trace | sed 's/^#/blah#/' > /dev/trace
and on another tty, I did
tail -f /dev/trace
(/dev/trace is a regular file at the moment, for testing purposes. In
practice it will be a character device. /home/pos/trace is a regular
file on a Samba share, which is written to by the Windoze application)
...but no output appeared.
I tried dropping the -f:
tail /home/pos/trace | sed 's/^#/blah#/' > /dev/trace
and this produced output for me.
"tail -f /home/pos/trace | less" also worked.
I was stumped again... but Google (and an archive of comp.unix.misc)
came to the rescue!
The issue here was that sed block buffers, so it was waiting for a full
block of a few kb before it would produce any output. tail -f does not
block buffer, it produces the timely output I want.
The solution was to pass each line through sed as it appeared.
tail -f /home/pos/trace \
| while read line
do # use read line so that sed outputs each line as it is written
echo $line | sed 's/^#/blah#/' >> /dev/trace
Now this parses and outputs each line as it is written, giving real time
output in the correct format.
The only downside is that it wouldn't scale up too well because it kicks
off a sed process for each line of output, but that's unlikely to be an
issue for our customers.
Maintained by the ILUG website team. The aim of Linux.ie is to
support and help commercial and private users of Linux in Ireland. You can
display ILUG news in your own webpages, read backend
information to find out how. Networking services kindly provided by HEAnet, server kindly donated by
Dell. Linux is a trademark of Linus Torvalds,
used with permission. No penguins were harmed in the production or maintenance
of this highly praised website. Looking for the
Indian Linux Users' Group? Try here. If you've read all this and aren't a lawyer: you should be!