http://textmining.org/ , if you like JAVA.
John
Peter McEvoy wrote:
> Hi,
> I've got over 2000 word documents I need to convert to html. I've been
> trying with wvHtml from wvWare with varying results - some of the docs
> contain a little text box with the recieved date inside, wvHtml has
> problems with this and gives me this line:
>> <i>(subject to editorial corrections)</i><b><img alt="0x08 graphic"
> src="StrangeNoGraphicData"><br></b><i><img alt="0x08 graphic"
> src="StrangeNoGraphicData"><br></i>
>> Googling on this gives a few hits, most of them saying to make sure
> wvware is compiled with libwmf support, which I'm pretty sure the debian
> sid package I'm using is.
>> So, failing getting this to work, does anyone know any other apps that
> can do what I require? A lot of the other apps I've looked at seem to
> use wv as thier backend (abiword, etc) or just dont bother working with
> anything like an image (catdoc). Oowriter from openoffice does a
> marvellous job when I open the doc and manually save it as html, but it
> would seem far from trivial to script.
>> TIA
>
******************************************************************************
The information in this e-mail is confidential and may be legally privileged.
It is intended solely for the addressee. Access to this e-mail by anyone else
is unauthorised. If you are not the intended recipient, any disclosure,
copying, distribution, or any action taken or omitted to be taken in reliance
on it, is prohibited and may be unlawful.
Please note that emails to, from and within RTE may be subject to the Freedom
of Information Act 1997 and may be liable to disclosure.
******************************************************************************
Maintained by the ILUG website team. The aim of Linux.ie is to
support and help commercial and private users of Linux in Ireland. You can
display ILUG news in your own webpages, read backend
information to find out how. Networking services kindly provided by HEAnet, server kindly donated by
Dell. Linux is a trademark of Linus Torvalds,
used with permission. No penguins were harmed in the production or maintenance
of this highly praised website. Looking for the
Indian Linux Users' Group? Try here. If you've read all this and aren't a lawyer: you should be!