From: Matthew French (mfrench42 at domain yahoo.co.uk)
Date: Thu 25 Apr 2002 - 15:01:16 IST
David Neary had a migraine:
> > You are correct - 0xC3A9 is the UTF-8 encoded version the Unicode
> > 0xE9.
> My head hurts.
> OK - so there exists a bijective mapping from Unicode to UTF-8,
> but they're not the same thing. I can live with that. I wasn't
> aware there was a difference.
Unicode is the standard, UTF-8 is an implementation? In other words, Unicode
assigns every single possible character a unique number, whereas UTF-8 is
just a way of encoding that number.
Unicode uses a 32-bit address space, but encoding every character using four
bytes would be a complete waste of space. So UTF-8 encoding ensures that the
most commonly used characters (7-bit ASCII) occupy just 1 byte, less common
characters occupy 2 bytes, and so on.
There is also a UTF-16 encoding format if speed is more important than size.
There are also many other alternatives if one gets bored.
See the following link for some more information:
Do You Yahoo!?
Get your free at domain yahoo.com address at http://mail.yahoo.com
This archive was generated by hypermail 2.1.6 : Thu 06 Feb 2003 - 13:16:22 GMT