9fans - fans of the OS Plan 9 from Bell Labs
 help / color / mirror / Atom feed
* [9fans] when is a branch not a branch?
@ 2008-06-02 21:55 ron minnich
  2008-06-02 22:04 ` Charles Forsyth
                   ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: ron minnich @ 2008-06-02 21:55 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

[-- Attachment #1: Type: text/plain, Size: 3532 bytes --]

Here's an interesting thing I'm learning about the kind of
optimizations that might be in the EU of the newer machines. This
started out as a pretty simple 'measure the branch penalty' exercise.

Given this program:

int b(int i){
        return i+1;
}

main(){
        int i;
        for(i = 0; i < 1000000000; i = b(i));
        if (i != 1000000000)
                printf("Fix me\n");
}

The assembly code on newer micros looks like this:
       .file   "sa.c"
        .text
.globl b
        .type   b, @function
b:
        pushl   %ebp
        movl    %esp, %ebp
        movl    8(%ebp), %eax
        addl    $1, %eax
        popl    %ebp
        ret
        .size   b, .-b
        .section        .rodata
.LC0:
        .string "Fix me"
        .text
.globl main
        .type   main, @function
main:
        leal    4(%esp), %ecx
        andl    $-16, %esp
        pushl   -4(%ecx)
        pushl   %ebp
        movl    %esp, %ebp
        pushl   %ecx
        subl    $20, %esp
        movl    $0, -8(%ebp)
        jmp     .L4
.L5:
        movl    -8(%ebp), %eax
        movl    %eax, (%esp)
        call    b
        movl    %eax, -8(%ebp)
.L4:
        cmpl    $999999999, -8(%ebp)
        jle     .L5
        cmpl    $1000000000, -8(%ebp)
        je      .L10
        movl    $.LC0, (%esp)
        call    puts
.L10:
        addl    $20, %esp
        popl    %ecx
        popl    %ebp
        leal    -4(%ecx), %esp
        ret
        .size   main, .-main
        .ident  "GCC: (GNU) 4.1.2 20070925 (Red Hat 4.1.2-27)"
        .section        .note.GNU-stack,"",@progbits

OK, let's modify b: a little as follows;
b:
#if JMP > 0
        jmp     bb
#if JMP > 1
                call    b
#if JMP > 2
                call    b
#if JMP > 3
                call    b
#endif
#endif
#endif
#endif
bb:

So, simple: if JMP is 0, no change, if JMP is 1, we do (in essence) br
.+2, if JMP is 2, we do br .+7, etc.

now I time the run 10 times (I can run longer but it seems good enough
to establish behavior). I should get some rough idea of the cost of
the branch.

Attachment 1 shows this cpu running in 32-bit mode:
processor       : 0
vendor_id       : GenuineIntel
cpu family      : 15
model           : 3
model name      : Intel(R) Pentium(R) 4 CPU 3.40GHz
stepping        : 4
cpu MHz         : 3415.468
cache size      : 1024 KB

the short form: the penalty for the branch is zero. As I say, I can
run it until I get getter statistics, but the trend is clear. Note
this was an unloaded machine.

Well, how about my laptop? The penalty for the branch is negative. Our
guess: the very highly dynamic power management is playing tricks with
our minds. But we don't know. But the code with the branch is
consistently 25% faster.
vendor_id       : GenuineIntel
cpu family      : 6
model           : 15
model name      : Intel(R) Core(TM)2 CPU         T7200  @ 2.00GHz
stepping        : 6
cpu MHz         : 1000.000
cache size      : 4096 KB

how about a 64-bit cpu?

cpu family      : 15
model           : 3
model name      :                   Intel(R) Xeon(TM) CPU 3.40GHz
stepping        : 4
cpu MHz         : 3400.283
cache size      : 1024 KB

third graph. Here you can see the penalty increased a bit as the size
of the jmp increased.

Don't try this with 8a. 8a is too damn smart -- it just optimizes the
branch out (unless there is a switch to avoid doing that). You need a
dumb assembler.
ron

[-- Attachment #2: prism.gif --]
[-- Type: image/gif, Size: 3490 bytes --]

[-- Attachment #3: results.gif --]
[-- Type: image/gif, Size: 3389 bytes --]

[-- Attachment #4: runtimes.64bit.gif --]
[-- Type: image/gif, Size: 3417 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [9fans] when is a branch not a branch?
  2008-06-02 21:55 [9fans] when is a branch not a branch? ron minnich
@ 2008-06-02 22:04 ` Charles Forsyth
  2008-06-02 22:35   ` ron minnich
  2008-06-02 22:13 ` Brantley Coile
  2008-06-02 22:22 ` Roman Shaposhnik
  2 siblings, 1 reply; 10+ messages in thread
From: Charles Forsyth @ 2008-06-02 22:04 UTC (permalink / raw)
  To: 9fans

> Don't try this with 8a. 8a is too damn smart

no, it's simply following instructions.




^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [9fans] when is a branch not a branch?
  2008-06-02 21:55 [9fans] when is a branch not a branch? ron minnich
  2008-06-02 22:04 ` Charles Forsyth
@ 2008-06-02 22:13 ` Brantley Coile
  2008-06-02 22:37   ` ron minnich
  2008-06-02 22:22 ` Roman Shaposhnik
  2 siblings, 1 reply; 10+ messages in thread
From: Brantley Coile @ 2008-06-02 22:13 UTC (permalink / raw)
  To: 9fans

> OK, let's modify b: a little as follows;
> b:
> #if JMP > 0
>         jmp     bb
> #if JMP > 1
>                 call    b
> #if JMP > 2
>                 call    b
> #if JMP > 3
>                 call    b
> #endif
> #endif
> #endif
> #endif
> bb:

I'm lost.  So the calls never get called--they just
make the jmp reach farther?

 Brantley




^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [9fans] when is a branch not a branch?
  2008-06-02 21:55 [9fans] when is a branch not a branch? ron minnich
  2008-06-02 22:04 ` Charles Forsyth
  2008-06-02 22:13 ` Brantley Coile
@ 2008-06-02 22:22 ` Roman Shaposhnik
  2 siblings, 0 replies; 10+ messages in thread
From: Roman Shaposhnik @ 2008-06-02 22:22 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs; +Cc: Fans of the OS Plan 9 from Bell Labs

On Mon, 2008-06-02 at 14:55 -0700, ron minnich wrote:
> now I time the run 10 times (I can run longer but it seems good enough
> to establish behavior). I should get some rough idea of the cost of
> the branch.

I hate to say it: but these days you can't time anything in isolation.
The CPU is just too complex. It is no longer an x86, really. The
microcode is what you have to worry about. And how the microinstructions
get scheduled, and how they interact with each other, etc. etc.
You wouldn't believe the kind of crazy stuff that sometimes you see in
the x86 compiler optimization team here at Sun. So, unless your
actual code is exactly like the benchmark you quoted -- the results
could be completely off. I talk from experience. The experience of SPEC.

Thanks,
Roman.

P.S. With the modern CPUs even the most basic of questions could turn
out to be quite surprising. I'm still shocked that even by utilizing
some of the close ties to Intel/AMD I wasn't able to find an
*authoritative* source on information on a very basic thing: cache
architecture for modern x86 CPUs. Well, everybody assume that they
know:
   http://people.redhat.com/drepper/cpumemory.pdf
how the cache operates (e.g. what is the "hash function" for the
virtual/physical address, etc.) but none of the documentation
from Intel/AMD explicitly confirms that.




^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [9fans] when is a branch not a branch?
  2008-06-02 22:04 ` Charles Forsyth
@ 2008-06-02 22:35   ` ron minnich
  2008-06-02 22:50     ` Charles Forsyth
  0 siblings, 1 reply; 10+ messages in thread
From: ron minnich @ 2008-06-02 22:35 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

On Mon, Jun 2, 2008 at 3:04 PM, Charles Forsyth <forsyth@terzarima.net> wrote:
>> Don't try this with 8a. 8a is too damn smart
>
> no, it's simply following instructions.
>
>

i meant that as praise for 8a if that came across wrong.

ron



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [9fans] when is a branch not a branch?
  2008-06-02 22:13 ` Brantley Coile
@ 2008-06-02 22:37   ` ron minnich
  0 siblings, 0 replies; 10+ messages in thread
From: ron minnich @ 2008-06-02 22:37 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

On Mon, Jun 2, 2008 at 3:13 PM, Brantley Coile <brantley@coraid.com> wrote:
>> OK, let's modify b: a little as follows;
>> b:
>> #if JMP > 0
>>         jmp     bb
>> #if JMP > 1
>>                 call    b
>> #if JMP > 2
>>                 call    b
>> #if JMP > 3
>>                 call    b
>> #endif
>> #endif
>> #endif
>> #endif
>> bb:
>
> I'm lost.  So the calls never get called--they just
> make the jmp reach farther?
>

yep. just a spacer.

Actually this lack of a delta in time is a good thing. It just
surprised me, I was getting ready to argue that the cost of time in
the branch was "not that bad" compared to the alternatives, and then
it goes negative on me :-)

ron



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [9fans] when is a branch not a branch?
  2008-06-02 22:35   ` ron minnich
@ 2008-06-02 22:50     ` Charles Forsyth
  2008-06-02 22:55       ` Charles Forsyth
  2008-06-02 23:29       ` ron minnich
  0 siblings, 2 replies; 10+ messages in thread
From: Charles Forsyth @ 2008-06-02 22:50 UTC (permalink / raw)
  To: 9fans

>>> Don't try this with 8a. 8a is too damn smart
>>
>> no, it's simply following instructions.
>>
>>
>
> i meant that as praise for 8a if that came across wrong.

not at all: i meant that as a description of 8a's algorithm to eliminate branches




^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [9fans] when is a branch not a branch?
  2008-06-02 22:50     ` Charles Forsyth
@ 2008-06-02 22:55       ` Charles Forsyth
  2008-06-02 23:29       ` ron minnich
  1 sibling, 0 replies; 10+ messages in thread
From: Charles Forsyth @ 2008-06-02 22:55 UTC (permalink / raw)
  To: 9fans

> not at all: i meant that as a description of 8a's algorithm to eliminate branches

to be more accurate, it's not 8a at all, because 8a is stupid, but 8l does the work.
i missed that point the first time until i re-read the messages.




^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [9fans] when is a branch not a branch?
  2008-06-02 22:50     ` Charles Forsyth
  2008-06-02 22:55       ` Charles Forsyth
@ 2008-06-02 23:29       ` ron minnich
  2008-06-05 20:19         ` dave.l
  1 sibling, 1 reply; 10+ messages in thread
From: ron minnich @ 2008-06-02 23:29 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

On Mon, Jun 2, 2008 at 3:50 PM, Charles Forsyth <forsyth@terzarima.net> wrote:
>>>> Don't try this with 8a. 8a is too damn smart
>>>
>>> no, it's simply following instructions.
>>>
>>>
>>
>> i meant that as praise for 8a if that came across wrong.
>
> not at all: i meant that as a description of 8a's algorithm to eliminate branches
>
>
>

and a damned fine joke, which only hit me later.

ron



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [9fans] when is a branch not a branch?
  2008-06-02 23:29       ` ron minnich
@ 2008-06-05 20:19         ` dave.l
  0 siblings, 0 replies; 10+ messages in thread
From: dave.l @ 2008-06-05 20:19 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

On 3 Jun 2008, at 00:29, ron minnich wrote:

> On Mon, Jun 2, 2008 at 3:50 PM, Charles Forsyth
> <forsyth@terzarima.net> wrote:
>>>>> Don't try this with 8a. 8a is too damn smart
>>>>
>>>> no, it's simply following instructions.
>>>>
>>>>
>>>
>>> i meant that as praise for 8a if that came across wrong.
>>
>> not at all: i meant that as a description of 8a's algorithm to
>> eliminate branches
>>
>>
>>
>
> and a damned fine joke, which only hit me later.

You're not the only one, Ron.
Cunning linguist.

DaveL


I'm jumping off the top of Guy's Hospital
to support the Myasthenia Gravis Association.

Please support me!

http://www.justgiving.com/davelukes-abseil




^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2008-06-05 20:19 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-06-02 21:55 [9fans] when is a branch not a branch? ron minnich
2008-06-02 22:04 ` Charles Forsyth
2008-06-02 22:35   ` ron minnich
2008-06-02 22:50     ` Charles Forsyth
2008-06-02 22:55       ` Charles Forsyth
2008-06-02 23:29       ` ron minnich
2008-06-05 20:19         ` dave.l
2008-06-02 22:13 ` Brantley Coile
2008-06-02 22:37   ` ron minnich
2008-06-02 22:22 ` Roman Shaposhnik

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).