9fans - fans of the OS Plan 9 from Bell Labs
 help / color / mirror / Atom feed
* [9fans] when is a branch not a branch?
@ 2008-06-02 21:55 ron minnich
  2008-06-02 22:04 ` Charles Forsyth
                   ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: ron minnich @ 2008-06-02 21:55 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

[-- Attachment #1: Type: text/plain, Size: 3532 bytes --]

Here's an interesting thing I'm learning about the kind of
optimizations that might be in the EU of the newer machines. This
started out as a pretty simple 'measure the branch penalty' exercise.

Given this program:

int b(int i){
        return i+1;
}

main(){
        int i;
        for(i = 0; i < 1000000000; i = b(i));
        if (i != 1000000000)
                printf("Fix me\n");
}

The assembly code on newer micros looks like this:
       .file   "sa.c"
        .text
.globl b
        .type   b, @function
b:
        pushl   %ebp
        movl    %esp, %ebp
        movl    8(%ebp), %eax
        addl    $1, %eax
        popl    %ebp
        ret
        .size   b, .-b
        .section        .rodata
.LC0:
        .string "Fix me"
        .text
.globl main
        .type   main, @function
main:
        leal    4(%esp), %ecx
        andl    $-16, %esp
        pushl   -4(%ecx)
        pushl   %ebp
        movl    %esp, %ebp
        pushl   %ecx
        subl    $20, %esp
        movl    $0, -8(%ebp)
        jmp     .L4
.L5:
        movl    -8(%ebp), %eax
        movl    %eax, (%esp)
        call    b
        movl    %eax, -8(%ebp)
.L4:
        cmpl    $999999999, -8(%ebp)
        jle     .L5
        cmpl    $1000000000, -8(%ebp)
        je      .L10
        movl    $.LC0, (%esp)
        call    puts
.L10:
        addl    $20, %esp
        popl    %ecx
        popl    %ebp
        leal    -4(%ecx), %esp
        ret
        .size   main, .-main
        .ident  "GCC: (GNU) 4.1.2 20070925 (Red Hat 4.1.2-27)"
        .section        .note.GNU-stack,"",@progbits

OK, let's modify b: a little as follows;
b:
#if JMP > 0
        jmp     bb
#if JMP > 1
                call    b
#if JMP > 2
                call    b
#if JMP > 3
                call    b
#endif
#endif
#endif
#endif
bb:

So, simple: if JMP is 0, no change, if JMP is 1, we do (in essence) br
.+2, if JMP is 2, we do br .+7, etc.

now I time the run 10 times (I can run longer but it seems good enough
to establish behavior). I should get some rough idea of the cost of
the branch.

Attachment 1 shows this cpu running in 32-bit mode:
processor       : 0
vendor_id       : GenuineIntel
cpu family      : 15
model           : 3
model name      : Intel(R) Pentium(R) 4 CPU 3.40GHz
stepping        : 4
cpu MHz         : 3415.468
cache size      : 1024 KB

the short form: the penalty for the branch is zero. As I say, I can
run it until I get getter statistics, but the trend is clear. Note
this was an unloaded machine.

Well, how about my laptop? The penalty for the branch is negative. Our
guess: the very highly dynamic power management is playing tricks with
our minds. But we don't know. But the code with the branch is
consistently 25% faster.
vendor_id       : GenuineIntel
cpu family      : 6
model           : 15
model name      : Intel(R) Core(TM)2 CPU         T7200  @ 2.00GHz
stepping        : 6
cpu MHz         : 1000.000
cache size      : 4096 KB

how about a 64-bit cpu?

cpu family      : 15
model           : 3
model name      :                   Intel(R) Xeon(TM) CPU 3.40GHz
stepping        : 4
cpu MHz         : 3400.283
cache size      : 1024 KB

third graph. Here you can see the penalty increased a bit as the size
of the jmp increased.

Don't try this with 8a. 8a is too damn smart -- it just optimizes the
branch out (unless there is a switch to avoid doing that). You need a
dumb assembler.
ron

[-- Attachment #2: prism.gif --]
[-- Type: image/gif, Size: 3490 bytes --]

[-- Attachment #3: results.gif --]
[-- Type: image/gif, Size: 3389 bytes --]

[-- Attachment #4: runtimes.64bit.gif --]
[-- Type: image/gif, Size: 3417 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2008-06-05 20:19 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-06-02 21:55 [9fans] when is a branch not a branch? ron minnich
2008-06-02 22:04 ` Charles Forsyth
2008-06-02 22:35   ` ron minnich
2008-06-02 22:50     ` Charles Forsyth
2008-06-02 22:55       ` Charles Forsyth
2008-06-02 23:29       ` ron minnich
2008-06-05 20:19         ` dave.l
2008-06-02 22:13 ` Brantley Coile
2008-06-02 22:37   ` ron minnich
2008-06-02 22:22 ` Roman Shaposhnik

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).