| From: kevin <kevin at cybercolloids.net>
| Date: Wed, 18 Aug 2004 11:21:50 +0100
|[ ... ]
| Cornish uses some accents including t-cedilla in words such as
|
| conveţhaz - Verb, to understand
|
| I can write this using codes in UTF-8 like conveţhaz [ ... ]
uh, not exactly. “ţ” does not (cannot)
represent literal UTF-8 per se. (it _is_ the
UCS codepoint value for U+0163, which is
“LATIN SMALL LETTER T WITH CEDILLA”, which
apparently is the character you want.)
I cannot recall if the “&#<dec>;” and “&#X<hex>;”
HTML/XML entities specify UCS codepoints (i.e.,
independent of the document's charset/encoding),
or character values specific to the document's
encoding.
I presume yer document effectively specifies
its encoding is UTF-8, in which case my bad
memory matters less than usual: the 163 hex
(355 decimal) UCS value is turned into the
correct UTF-8 byte sequence (which is the
two hex bytes C5 A3).
pedantically cheers!
-blf-
--
«How many surrealists does it take to | Brian Foster Montpellier,
change a lightbulb? Three. One calms | blf at utvinternet.ie FRANCE
the warthog, and two fill the bathtub | Stop E$$o (ExxonMobile)!
with brightly-colored machine tools.» | http://www.stopesso.com
Maintained by the ILUG website team. The aim of Linux.ie is to
support and help commercial and private users of Linux in Ireland. You can
display ILUG news in your own webpages, read backend
information to find out how. Networking services kindly provided by HEAnet, server kindly donated by
Dell. Linux is a trademark of Linus Torvalds,
used with permission. No penguins were harmed in the production or maintenance
of this highly praised website. Looking for the
Indian Linux Users' Group? Try here. If you've read all this and aren't a lawyer: you should be!