LINUX.IE, website of the Irish Linux Users' Group
Tux rules!

   
Home
New Users
Articles
Download
Projects
Community
Vendors

  Print Version
Email to...
 
Archives:


planetILUG

Recent News

News Archive


Join the
ILUG
on FaceBook


Join the
ILUG
on LinkedIn


Join the
ILUG SETI
Group



















 
 :: Mailing Lists

[ILUG-Webdev] PHP - converting HTML entities outside tags

[ILUG-Webdev] PHP - converting HTML entities outside tags

Kae Verens kae at verens.com
Wed Aug 25 07:14:22 IST 2004


Lee Hosty wrote:
> I'm using HTMLFilter (http://linux.duke.edu/projects/mini/htmlfilter/) to
> safely allow certain HTML tags as user input to be displayed later. I use
> htmlspecialchars() to display this text in a textarea during editting, so
> everything displays as valid XHTML at this stage, without any user
> confusion.
> 
> However if the user inputs a HTML entity (&, " or ' for example), it gets
> saved as is to the DB - and displays fine as XHTML in a textarea - but
> when output to a browser at the viewing stage (as opposed to the editting
> stage) - these bare entities are not valid XHTML and need converting. I
> can do this either at the saving to DB stage or just before outputting to
> browser.
> 
> However I can't blindly convert all HTML entities found to their relative
> values anymore using htmlspecialchars(), as some of the entities may be
> inside the tags that the user has input, and I don't want these converted.
> 
> ie. user inputs <a href="whatever.html" target='new_target'>"my amazin'
> links & stuff"</a>
> 
> needs to be converted to <a href="whatever.html"
> target='new_target'>&quot;my amazin&#039; links &amp; stuff&quot;</a>
> 
> Any ideas? I'm new to PHP and would rather not re-invent any wheels.

The method we (my company) use is to not allow the user to enter HTML at 
all - convert /all/ entities to HTML. Besides - how many ordinary users 
do you know that can write HTML?

So - we convert all characters to their entities, then when outputting, 
we reconvert, using agreed formatting tags. Some of them are:
  *bold*
  /italic/
  _underscore_
  [http://alink.com/|link's title]

I'm afraid we /did/ re-invent the wheel in that case, but only because I 
started writing that convertor well before I heard of similar scripts 
such as Textism (http://www.textism.com/tools/textile/).

What you could do is convert all quotes and ampersands, then reconvert 
the ones surrounded by '<' and '>'.

In PHP (not tested):
  $txt=htmlspecialchars($original,ENT_QUOTES);
  $txt=preg_replace('/\(<[^>]*\)&quot;\([^>]*>\)/','\1"\2/',$txt);
  $txt=preg_replace('/\(<[^>]*\)&apos;\([^>]*>\)/','\1\'\2/',$txt);
  $txt=preg_replace('/\(<[^>]*\)&amp;\([^>]*>\)/','\1\&\2/',$txt);

The last line (reconverting ampersands) should be reconsidered - plain 
ampersands are illegal in XHTML, and should only appear on their own 
when contained in a CDATA block.

Kae



More information about the Webdev mailing list
Read this without the formatting.
                                                                                                    

 

Hosted by HEAnet


Maintained by the ILUG website team. The aim of Linux.ie is to support and help commercial and private users of Linux in Ireland. You can display ILUG news in your own webpages, read backend information to find out how. Networking services kindly provided by HEAnet, server kindly donated by Dell. Linux is a trademark of Linus Torvalds, used with permission. No penguins were harmed in the production or maintenance of this highly praised website. Looking for the Indian Linux Users' Group? Try here. If you've read all this and aren't a lawyer: you should be!
RSS Version
Powered by Dell