LINUX.IE, website of the Irish Linux Users' Group
Tux rules!

   
Home
New Users
Articles
Download
Projects
Community
Vendors

  Print Version
Email to...
 
Archives:


planetILUG

Recent News

News Archive


Join the
ILUG
on FaceBook


Join the
ILUG
on LinkedIn


Join the
ILUG SETI
Group



















 
 :: Mailing Lists

[ILUG] a quick gcc microoptimisation question

[ILUG] a quick gcc microoptimisation question

Brian Foster blf at blf.utvinternet.co.uk
Fri Dec 17 10:17:34 GMT 2004


  | From: jm at jmason.org (Justin Mason)
  | Date: Thu, 16 Dec 2004 10:32:48 -0800
  | 
  | HiLUG,
  | 
  | quickie: when using gcc's -O switches, what is more efficient:
  |     if (condition) {
  |       /* more common case */
  |     } else {
  |       /* less common case */
  |     }
  | or
  |     if (condition) {
  |       /* less common case */
  |     } else {
  |       /* more common case */
  |     }
  | ??
  | 
  | I know there is/was a tool that using coverage/profile data to optimise
  | branching and cache locality of code, but is there a technique to do this
  | efficiently without requiring that and the profiling runs it'd require?
  | In other words, with gcc optimisation, is the branch instruction used for
  | the "if" case or the "else" case by default?

 *sigh*
 this is micro-pretend-optimising at its worst.
 and without sufficient details to actually answer
 the question:  what architecture?  what cpu-model?
 which version of gcc?  (which I presume means the
 language in question is C (this can matter).)  and
 are there any peculiarities of yer application(-mix)
 and platform which may (_do_, actually) affect the
 cache behaviour?  also, since you are talking about
 code-ordering, I _assume_ you mean I-cache and not
 D-cache.  in which case, does the «condition» begin
 on, end on, and/or straddle an I-cache-line?  ditto
 for the «more ...» and «less common case»s?

 finally, _if_ yer code really is (I-)cache sensitive,
 _why_ are you using C?  this is a situation when you
 should be using assembler (possibly in-lined).

 think some more.  in general yer code is of the form:

    B                  // before
    if (C)             // «condition»al-test
       T               // then (or true)
    else
       F               // else (or false)
    A                  // after

 now what happens if someone adds (or removes) some
 code to any of B, C, T, or (in some cases) F:  that
 will change the position of some/all of the sensitive
 code (at least), thus possibly changing just what is
 cached when.  at the level of yer question, you may
 have to keep reversing C and the order of T and F
 each time A changes!  good grief.

 and then, if yer unspecified cpu(-model) speculatively
 prefetches, it is quite possible you can rearrange the
 code all you want and not make any difference.

 finally, what about the other threads (including interrupt
 handlers) running on the cpu?  how are they going to affect
 the I-cache?  if yer code is so sensitive you have to play
 with code ordering, I _presume_ you are running in an
 environment where you've done something --- out of curiosity,
 what? --- to insure concurrent execution doesn't disturb
 the (I-)cache.

 my suggestion?  choose a good data structure and algorithm.
 and write clear (read: maintainable) code.   let the tools
 worry about instruction scheduling and optimal code order.

 one specific hint in this situation is to write the code
 like:

    if (C)             // «condition»al-test
       T               // the shorter (smaller) block of code
    else
       F               // the longer (larger) block of code

 this is _not_ an optimisation hint per se, but a code
 readability hint.  it is far easier to understand the
 flow-of-control when the «condition» is close to the
 start of the conditional code blocks.  consider the
 following example  (NOTE to people using MUAs that
 reformat e-mail:  Turn It Off!  this example depends
 on the literal formating; yer MUA may needlessly
 confuse the issue.):

    /* This is an example of BAD PRACTICE! */
    if (some complicated condition) {
       a very long
       and convoluted
       bunch of code.
       after reading it,
       many people tend
       to have forgotten
       exactly what the condition was.
       hence, understanding
       what the following
       clause does is much
       harder then it needs
       to be.
    }
    else {
       what was that condition again?
    }

 that is amazingly hard to understand in any listing
 of code, esp. pages and pages of code.

cheers!
	-blf-

p.s.  the cache behaviour tool you are thinking of is
     might be `helgrind', part of the `valgrind' tool
     suite.  it is only for code built for the ia32/x86
     architecture, albeit AFAICR there is a beta(?)
     PowerPC version.  I have never(?) used `helgrind'
     myself --- I've never been in a situation where I
     needed to --- but I do Strongly Recommend the main
     `valgrind' tool!

-- 
Experienced (20+ years) kernel engineer: |  Brian Foster      Montpellier,
 • Unix, embedded, &tc; • documentation; |  blf at utvinternet.ie      FRANCE
 • IDL, automated testing, process, &tc. |    Stop E$$o (ExxonMobile)!
Résumé:  http://www.blf.utvinternet.ie   |        http://www.stopesso.com



More information about the ILUG mailing list
Read this without the formatting.
                                                                                                    

 

Hosted by HEAnet


Maintained by the ILUG website team. The aim of Linux.ie is to support and help commercial and private users of Linux in Ireland. You can display ILUG news in your own webpages, read backend information to find out how. Networking services kindly provided by HEAnet, server kindly donated by Dell. Linux is a trademark of Linus Torvalds, used with permission. No penguins were harmed in the production or maintenance of this highly praised website. Looking for the Indian Linux Users' Group? Try here. If you've read all this and aren't a lawyer: you should be!
RSS Version
Powered by Dell