Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
327 views
in Technique[技术] by (71.8m points)

c - Fastest way to find out minimum of 3 numbers?

In a program I wrote, 20% of the time is being spent on finding out the minimum of 3 numbers in an inner loop, in this routine:

static inline unsigned int
min(unsigned int a, unsigned int b, unsigned int c)
{
    unsigned int m = a;
    if (m > b) m = b;
    if (m > c) m = c;
    return m;
}

Is there any way to speed this up? I am ok with assembly code too for x86/x86_64.

Edit: In reply to some of the comments:
* Compiler being used is gcc 4.3.3
* As far as assembly is concerned, I am only a beginner there. I asked for assembly here, to learn how to do this. :)
* I have a quad-core Intel 64 running, so MMX/SSE etc. are supported.
* It's hard to post the loop here, but I can tell you it's a heavily optimized implementation of the levenshtein algorithm.

This is what the compiler is giving me for the non-inlined version of min:

.globl min
    .type   min, @function
min:
    pushl   %ebp
    movl    %esp, %ebp
    movl    8(%ebp), %edx
    movl    12(%ebp), %eax
    movl    16(%ebp), %ecx
    cmpl    %edx, %eax
    jbe .L2
    movl    %edx, %eax
.L2:
    cmpl    %ecx, %eax
    jbe .L3
    movl    %ecx, %eax
.L3:
    popl    %ebp
    ret
    .size   min, .-min
    .ident  "GCC: (Ubuntu 4.3.3-5ubuntu4) 4.3.3"
    .section    .note.GNU-stack,"",@progbits

The inlined version is within -O2 optimized code (even my markers mrk = 0xfefefefe, before and after the call to min()) are getting optimized away by gcc, so I couldn't get hold of it.

Update: I tested the changes suggested by Nils, ephemient, however there's no perceptible performance boost I get by using the assembly versions of min(). However, I get a 12.5% boost by compiling the program with -march=i686, which I guess is because the whole program is getting the benefits of the new faster instructions that gcc is generating with this option. Thanks for your help guys.

P.S. - I used the ruby profiler to measure performance (my C program is a shared library loaded by a ruby program), so I could get time spent only for the top-level C function called by the ruby program, which ends up calling min() down the stack. Please see this question.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Make sure you are using an appropriate -march setting, first off. GCC defaults to not using any instructions that were not supported on the original i386 - allowing it to use newer instruction sets can make a BIG difference at times! On -march=core2 -O2 I get:

min:
    pushl   %ebp
    movl    %esp, %ebp
    movl    8(%ebp), %edx
    movl    12(%ebp), %ecx
    movl    16(%ebp), %eax
    cmpl    %edx, %ecx
    leave
    cmovbe  %ecx, %edx
    cmpl    %eax, %edx
    cmovbe  %edx, %eax
    ret

The use of cmov here may help you avoid branch delays - and you get it without any inline asm just by passing in -march. When inlined into a larger function this is likely to be even more efficient, possibly just four assembly operations. If you need something faster than this, see if you can get the SSE vector operations to work in the context of your overall algorithm.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...