Lee Hosty wrote:
> I'm using HTMLFilter (http://linux.duke.edu/projects/mini/htmlfilter/) to
> safely allow certain HTML tags as user input to be displayed later. I use
> htmlspecialchars() to display this text in a textarea during editting, so
> everything displays as valid XHTML at this stage, without any user
> confusion.
>> However if the user inputs a HTML entity (&, " or ' for example), it gets
> saved as is to the DB - and displays fine as XHTML in a textarea - but
> when output to a browser at the viewing stage (as opposed to the editting
> stage) - these bare entities are not valid XHTML and need converting. I
> can do this either at the saving to DB stage or just before outputting to
> browser.
>> However I can't blindly convert all HTML entities found to their relative
> values anymore using htmlspecialchars(), as some of the entities may be
> inside the tags that the user has input, and I don't want these converted.
>> ie. user inputs <a href="whatever.html" target='new_target'>"my amazin'
> links & stuff"</a>
>> needs to be converted to <a href="whatever.html"
> target='new_target'>"my amazin' links & stuff"</a>
>> Any ideas? I'm new to PHP and would rather not re-invent any wheels.
The method we (my company) use is to not allow the user to enter HTML at
all - convert /all/ entities to HTML. Besides - how many ordinary users
do you know that can write HTML?
So - we convert all characters to their entities, then when outputting,
we reconvert, using agreed formatting tags. Some of them are:
*bold*
/italic/
_underscore_
[http://alink.com/|link's title]
I'm afraid we /did/ re-invent the wheel in that case, but only because I
started writing that convertor well before I heard of similar scripts
such as Textism (http://www.textism.com/tools/textile/).
What you could do is convert all quotes and ampersands, then reconvert
the ones surrounded by '<' and '>'.
In PHP (not tested):
$txt=htmlspecialchars($original,ENT_QUOTES);
$txt=preg_replace('/\(<[^>]*\)"\([^>]*>\)/','\1"\2/',$txt);
$txt=preg_replace('/\(<[^>]*\)'\([^>]*>\)/','\1\'\2/',$txt);
$txt=preg_replace('/\(<[^>]*\)&\([^>]*>\)/','\1\&\2/',$txt);
The last line (reconverting ampersands) should be reconsidered - plain
ampersands are illegal in XHTML, and should only appear on their own
when contained in a CDATA block.
Kae
Maintained by the ILUG website team. The aim of Linux.ie is to
support and help commercial and private users of Linux in Ireland. You can
display ILUG news in your own webpages, read backend
information to find out how. Networking services kindly provided by HEAnet, server kindly donated by
Dell. Linux is a trademark of Linus Torvalds,
used with permission. No penguins were harmed in the production or maintenance
of this highly praised website. Looking for the
Indian Linux Users' Group? Try here. If you've read all this and aren't a lawyer: you should be!