Re: [ILUG] Stripping html in mutt

From: Scott Wunsch (ilug at domain tracking.wunsch.org)
Date: Wed 05 Sep 2001 - 16:23:23 IST


On Tue, 04-Sep-2001 at 16:42:54 -0700, Rick Moen wrote:

> Incidentally, one good resource for converters, including the in-line,
> MSWord-to-something-reasonable kind, is this site:
> http://wvware.sourceforge.net/

Very handy for those pesky Word document senders.

In case anybody's interested, I have the following in my mailcap:

 text/html; html2text; copiousoutput
 text/rtf; rtf2text %s; copiousoutput
 application/rtf; rtf2text %s; copiousoutput
 application/msword; word2text %s; copiousoutput

The script html2text contains the following:

 #!/bin/sh
 echo
 /usr/bin/w3m -dump -T text/html | perl -pe 's/\n\s*\n/\n\n/gs; s/\xa0/ /gs;'

I find that w3m produces nicer text output than Lynx does, especially when
dealing with tables.

The script rtf2text is from the Perl package RTF::Parser. The script
word2text contains the following:

 #!/bin/sh
 wvWare -x /usr/local/share/wv/wvHtml.xml "$1" 2>/dev/null | perl -0777 -p \
   -e 's|<img .*?>||gs;' | html2text

It uses the wvWare package referenced above, and my simple html2text
script.

The end result of all this stuff is that I can generally read just about
anything people send me, without having to leave the comfort of the Mutt
pager.

-- 
Take care,
Scott \\'unsch
... A conclusion is simply the place where you got tired of thinking.


This archive was generated by hypermail 2.1.6 : Thu 06 Feb 2003 - 13:12:00 GMT