Ar Sat 20 Aug 2005 16:27, do scríobh greg wm :
> hi folks,
>> feels rather like i've ventured into uncharted territory, but somebody
> out there somewhere must know the way..
>> i used wget to copy the entire http://nonviolentpeaceforce.org site to
>http://nvpf.org/np. the former is asp pages, the latter captured as html.
>> for example, http://nonviolentpeaceforce.org/spanish/welcome.asp was
> captured to http://nvpf.org/np/spanish/welcome.asp.html>> as you can see, the capture is mostly fine, including spanish characters
> in the text (eg año), however the spanish characters in the menus didn't
> do quite so well (eg Misi?n)
>> in the file año appears as año which is apparently "good", but
> Misi?n appears as Misión, which is apparently "bad".
>> first question: why is that bad?
>> if i tell galeon, instead of automatic encoding, use western iso-8859-1,
> or any of many others, presto, the page appears nicely. but i don't
> have to do that to see the original, nor do i have to do that for
> anybody else's pages, and of course i can't expect our audience to go
> and fiddle with that in their browsers.
>> but really now, why isn't an ó an ó? right after the title the file
> says <meta http-equiv="Content-Type" content="text/html;
> charset=iso-8859-1">. why isn't that good enough? do i need to change
> some directive or setting in apache?
In Firefox the page is displaying in utf8 and when you set the coding to
iso-8859-1 then the accents are displayed correctly.
To solve the problem
1 find out why the page is being displayed as utf8
2 Change the accented characters to ñ format.
>> second question: it looks like wget was inconsistent! why?
>> likely hint: the menus were rendered out of some .asp database or
> whatever, differently than the rest of the text of the page.
>> but so what? why didn't wget capture something identical to what my
> browser shows? the command i ran was
> wget -ENKkrl19 -nH -w2 -owget.log http://nonviolentpeaceforce.org>> so anyway i sez hey no problem, i'll just find and replace. well ha.
> couldn't get either egrep nor sed to find an that was right under
> their noses.
>> third question: what's the trick to find and replace these buggers?
> vim can find them, in interactive mode, so.. should i be trying to
> figger out how to use vim as a grep replacement.. uhh.. ..?
I use kwrite. Highlight the ñ and away you go.
>> fourth question: where should i be asking these questions, or, where do
> i look for the mysterical solution, and will i recognize it when i see it?
When you find out tell the list!
At least in Irish I only have to worry about 10 accented characters, just goes
to show there is always someone worse off than yourself!
Lord and master of WWW.IONAD.ORG
Maintained by the ILUG website team. The aim of Linux.ie is to
support and help commercial and private users of Linux in Ireland. You can
display ILUG news in your own webpages, read backend
information to find out how. Networking services kindly provided by HEAnet, server kindly donated by
Dell. Linux is a trademark of Linus Torvalds,
used with permission. No penguins were harmed in the production or maintenance
of this highly praised website. Looking for the
Indian Linux Users' Group? Try here. If you've read all this and aren't a lawyer: you should be!