Re: [ILUG] Squid vs Browser cache

From: Fergal Daly (fergal at domain esatclear.ie)
Date: Sat 19 Feb 2000 - 14:16:06 GMT


At 08:38 19/02/00 -0500, Subba Rao wrote:
>The squid access.log show the following:
>
>950966673.946 501 127.0.0.1 TCP_MISS/301 530 GET
>http://pws.prserv.net/truemax root DIRECT/pws.pr
>serv.net text/html
>950966675.095 1098 127.0.0.1 TCP_MISS/200 4680 GET
>http://32.97.166.75/truemax/ root DIRECT/32.97.
>166.75 text/html
>
>The squid store.log show the following:
>
>950966673.946 RELEASE FFFFFFFF 301 950966342 -1 -1
>text/html -1/300 GET http://pws.pr
>serv.net/truemax
>950966675.095 RELEASE FFFFFFFF 200 950966239 -1 -1
>text/html -1/4505 GET http://32.97
>.166.75/truemax/
>
>The above pages are static. But it appears that Squid is downloading it
>again on revisits.

I'm not 100% sure of what follows, but at least you can test my theory.
They may seem static, but the web server is not saying that have a look at
what is returned when the page is requested

fergal at domain host:~> telnet pws.prserv.net 80
Trying 32.97.166.75...
Connected to pws.prserv.net.
Escape character is '^]'.
GET /truemax/ HTTP/1.0

HTTP/1.1 200 OK
Date: Sat, 19 Feb 2000 13:54:37 GMT
Server: Apache/1.3.9 (Unix)
Connection: close
Content-Type: text/html

fergal at domain host:~> telnet www 80
Trying 194.145.128.11...
Connected to oompa.esatclear.ie.
Escape character is '^]'.
GET / HTTP/1.0

HTTP/1.1 200 OK
Date: Sat, 19 Feb 2000 14:04:07 GMT
Server: Apache/1.3.11 (Unix)
Last-Modified: Mon, 27 Sep 1999 15:54:36 GMT
ETag: "401b-3a5-37ef933c"
Accept-Ranges: bytes
Content-Length: 933
Connection: close
Content-Type: text/html

the difference is that the esatclear one returns a last modified header.So
if squid is asked for the page, it can check the change time instead of
downloading the whole page. The truemax page gives no mtime and no caching
instructions so presumably squid feels obliged to get it everytime. To see
if my theory is correct, point your browser at www.esatclear.ie a couple of
times and see what happens. Let me know how you get on.

> > I'm not sure what DB you're taling about, do you mean the cache of stored
> > web pages? I haven't ever seen squid set to do updates at set intervals.
> > Whenever I have set up squid, it followed the standard HTTP caching rules
> > ie. look at the headers (and possibly meta tags, not sure) for details of
> > how long this to keep the page for before it must reload it.
> >
>
>I was talking from my observations in cache.log. It appears the stored pages
>are done on an hourly basis (apporximately).
>
>2000/02/19 02:15:54| NETDB state saved; 134 entries, 4 msec
>2000/02/19 03:03:07| NETDB state saved; 134 entries, 344 msec
>2000/02/19 04:06:19| NETDB state saved; 134 entries, 754 msec
>2000/02/19 05:00:18| NETDB state saved; 134 entries, 12 msec
>2000/02/19 06:00:29| NETDB state saved; 134 entries, 314 msec
>2000/02/19 06:49:11| NETDB state saved; 134 entries, 404 msec
>
>If the pages are not stored then, would squid go out again and get it?
>That would be a waste of bandwidth.

I think cache.log is actually squid's internal state database, it's what
allows squid to remember what pages it had saved etc. the last time it was
running. It does not actually contain stored pages, these are contained in
files in the cache directories. Cache.log tells quid which file corresponds
to which web page. Calling it .log was a bit silly when it's really a
database index.

Fergal



This archive was generated by hypermail 2.1.6 : Thu 06 Feb 2003 - 13:05:28 GMT