From: John Gay (johngay at domain eircom.net)
Date: Sat 27 Jul 2002 - 18:01:14 IST
On Sat 27 Jul 2002 01:57, you wrote:
> On Fri, Jul 26, 2002 at 11:24:30PM +0100, John Gay wrote:
> > A while ago I asked what other packages I should optomize for Pentium.
> > One person answered GlibC. This got me thinking about GCC itself, so I
> > asked on another list and got a few answers, most were "don't even think
> > about it" but a few suggested GCC and one pointed me to Linux From
> > Scratch.
> or more specifically, what do you mean? on one hand you can optimise
> how gcc is compiled. all that will do is make it generate the exact
> same code just a smidge faster. and since gcc is such a memory pig,
> you'd do better to buy more ram to up your fs cache hits and to keep
> gcc's heap out of swap.
To explain what I mean:
According to the PGCC site, GCC by itself is not very good at taking
advantage of the pipelining features introduced with the Pentium family. The
PGCC patches are supposed to make GCC generate tighter code, but your point
about compiler bugs is well taken. This is why I am taking things slow and
looking into these things. As Isaid, the PGCC site does not seem to have been
updated in at least a year or more?!? I am also looking into GCC itself. Now
that the 3.1. series is out, it might be better than when the PGCC patches
were written. The bottom line is, Pentiums have better instruction sets than
the original 386 instructions that they still support. The Pentium also
started introducing pipelining so properly generated code can be upto 30%
faster than equivulent code that performs the same function! As for why
optimise GCC if it will only produce that same code only slightly faster? The
speed is based on a percentage of the total compile time. The first time I
compiled the qt libs, with only 16M and a LOT of swap, it took over 48 hours.
I've now got 128M in the box but at 200Mhz any increase, distributed over
such a long compile is still considerable.
> on the other side you can look into patches to gcc that affect it's
> code generation. um, ok, but keep in mind that compiler errors suck.
> i can't express that enough. compilers should just work. perfectly.
> always. doing anything that might affect that is, in my opinion, insane.
> they're hard to trace and you'd better have a deep knowledge of what's
> going on to either report bugs to the patch developers or to fix it
> yourself. plus my understanding is that gcc would need major changes
> to get large speed boosts on x86 chips.
My understanding, and I've followed the development of the Intel family since
the 8080, each generation since the 386 has introduced better and faster
The 486 introduced I.E.E.E floating point instructions by incorporating an
FPU on board. The first few generations were flaky, so Intel disabled the
dodgy ones and sold then as 486SX, I.E. without the FPU. Later generations
were better, this is why you only find slow 486SX's ;-) Therefore 486's, with
working FPU's can calculate floats faster than 386's, but you must generate
the proper codes to take advantage of this.
The Pentium's improved the FPU logic and introduced pipelining. The first
generations of Pentiums had faulty FPu logic programmed into them, the
Pentium Bug, but subsequent ones were fine. These added instructions are
faster again then the 486 equivulents. Also, the pipelining needs careful
instruction ordering to take full advantage of it's speed improvements,
again, something the compiler must know about to utilise to full effect.
According the the PGCC site GCC does this poorly, but that info seems to be
dated, GCC3.1.x might be better. This is one of the areas I am researching
closely to get an answer.
MMX added the ability to perform matrix calculations on int's with single
instructions and using special DMA features within the Pentium to speed this
up. Two problems with this:
1) int's are not very useful for most matrix calculations, floats would be
2) this is not something that can be optimised well by a compiler. It need to
be identified and provided for in the sources.
I.E. not much use to anyone, but makes great ad copy ;-)
The PentiumPro improved the pipeline enormously. Again, a properly written
compiler should be able to optimise for this, once it can organise the code
The PIII added MMX-type instructions for floats! Now this IS useful!
Graphic-Intensive programs can take greate advantage of this, but it must be
provided for in the source code. Compilers can not, usually optimise for this
sort of thing. XFree86 and DRI are two prime examples that do provide for
this, so the PIII can run XFree86 and DRI quite a bit faster, IF it's
compiled for these SSE instructions!
Not sure what improvements the P4 introduce? I think it's mostly just speed
improvements rather than any execution changes.
So, The difference between the 386 and the PentiumMMX 'should' yield a
significant speed boost if optimised correctly. There are faster floating
point instructions and pipelining that need optimising for. I'm not sure if
GCC can optimise properly for the pipelining, at least the PGCC group found
significant improvements to add to GCC2.95.3 to gain speed improvements of
upto 30%. 30% of 48 hours is 14.4 hours. Of course none of this will have any
effect on O/I bound processes but GUI's are mostly CPU bound. I am also
finding out about object pre-linking optimisations which should give even
better performance for QT and KDE.
Now if I had another PIII for my box, I could take advantage of those SSE
instructions to optimise XFree86 as well!
An, of course, I've loads of time on my hands now and I need something to
keep me busy. At least I can say that I've sucessfully built a full Linux
system, including KDE3 from scratch when I'm done ;-)
This archive was generated by hypermail 2.1.6 : Thu 06 Feb 2003 - 13:18:05 GMT