LINUX.IE, website of the Irish Linux Users' Group
Tux rules!

   
Home
New Users
Articles
Download
Projects
Community
Vendors

  Print Version
Email to...
 
Archives:


planetILUG

Recent News

News Archive


Join the
ILUG
on FaceBook


Join the
ILUG
on LinkedIn


Join the
ILUG SETI
Group



















 
 :: Mailing Lists

[ILUG] Perl guru advice needed

[ILUG] Perl guru advice needed

Kevin Philp kevin at cybercolloids.net
Mon Mar 27 16:51:43 IST 2006


A quick question for any perl guru's out there.

We have a small programme that downloads data from a website and dumps the 
data in a MySQL database. The website contains a lot of hex coded entities 
such as:

® in place of ® and others

We use a programme that is based on the LWP module and when it downloads the 
text with the get_text it automatically decodes all entities. Unfortunately 
it makes a complete pigs ear of the whole thing so:

using decode_entities: ® = ®

using get_trimmed_text: ® = î

I would prefer not to decode the entities at all but looking at the module 
reference you can't switch it off for get_text.

As anyone any suggestions to get around this or an algorithm to convert the 
corrupt text back to something sensible?

Thanks

Kevin.







More information about the ILUG mailing list
Read this without the formatting.
                                                                                                    

 

Hosted by HEAnet


Maintained by the ILUG website team. The aim of Linux.ie is to support and help commercial and private users of Linux in Ireland. You can display ILUG news in your own webpages, read backend information to find out how. Networking services kindly provided by HEAnet, server kindly donated by Dell. Linux is a trademark of Linus Torvalds, used with permission. No penguins were harmed in the production or maintenance of this highly praised website. Looking for the Indian Linux Users' Group? Try here. If you've read all this and aren't a lawyer: you should be!
RSS Version
Powered by Dell