Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
330 views
in Technique[技术] by (71.8m points)

c - Calling convention for function returning struct

For the following C code:

struct _AStruct {
    int a;
    int b;
    float c;
    float d;
    int e;
};

typedef struct _AStruct AStruct;

AStruct test_callee5();
void test_caller5();

void test_caller5() {
    AStruct g = test_callee5();
    AStruct h = test_callee5();    
}

I get the following disassembly for Win32:

_test_caller5:
  00000000: lea         eax,[esp-14h]
  00000004: sub         esp,14h
  00000007: push        eax
  00000008: call        _test_callee5
  0000000D: lea         ecx,[esp+4]
  00000011: push        ecx
  00000012: call        _test_callee5
  00000017: add         esp,1Ch
  0000001A: ret

And for Linux32:

00000000 <test_caller5>:
   0:  push   %ebp
   1:  mov    %esp,%ebp
   3:  sub    $0x38,%esp
   6:  lea    0xffffffec(%ebp),%eax
   9:  mov    %eax,(%esp)
   c:  call   d <test_caller5+0xd>
  11:  sub    $0x4,%esp  ;;;;;;;;;; Note this extra sub ;;;;;;;;;;;;
  14:  lea    0xffffffd8(%ebp),%eax
  17:  mov    %eax,(%esp)
  1a:  call   1b <test_caller5+0x1b>
  1f:  sub    $0x4,%esp   ;;;;;;;;;; Note this extra sub ;;;;;;;;;;;;
  22:  leave
  23:  ret

I am trying to understand the difference in how the caller behaves after the call. Why does the caller in Linux32 do these extra subs?

I would assume that both targets would follow the cdecl calling convention. Doesn't cdecl define the calling convention for a function returning a structure?!

EDIT:

I added an implementation of the callee. And sure enough, you can see that the Linux32 callee pops its argument, while the Win32 callee does not:

AStruct test_callee5()
{
    AStruct S={0};
    return S;
}

Win32 disassembly:

test_callee5:
  00000000: mov         eax,dword ptr [esp+4]
  00000004: xor         ecx,ecx
  00000006: mov         dword ptr [eax],0
  0000000C: mov         dword ptr [eax+4],ecx
  0000000F: mov         dword ptr [eax+8],ecx
  00000012: mov         dword ptr [eax+0Ch],ecx
  00000015: mov         dword ptr [eax+10h],ecx
  00000018: ret

Linux32 disassembly:

00000000 <test_callee5>:
   0:   push   %ebp
   1:   mov    %esp,%ebp
   3:   sub    $0x20,%esp
   6:   mov    0x8(%ebp),%edx
   9:   movl   $0x0,0xffffffec(%ebp)
  10:   movl   $0x0,0xfffffff0(%ebp)
  17:   movl   $0x0,0xfffffff4(%ebp)
  1e:   movl   $0x0,0xfffffff8(%ebp)
  25:   movl   $0x0,0xfffffffc(%ebp)
  2c:   mov    0xffffffec(%ebp),%eax
  2f:   mov    %eax,(%edx)
  31:   mov    0xfffffff0(%ebp),%eax
  34:   mov    %eax,0x4(%edx)
  37:   mov    0xfffffff4(%ebp),%eax
  3a:   mov    %eax,0x8(%edx)
  3d:   mov    0xfffffff8(%ebp),%eax
  40:   mov    %eax,0xc(%edx)
  43:   mov    0xfffffffc(%ebp),%eax
  46:   mov    %eax,0x10(%edx)
  49:   mov    %edx,%eax
  4b:   leave
  4c:   ret    $0x4  ;;;;;;;;;;;;;; Note this ;;;;;;;;;;;;;;
See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Why does the caller in Linux32 do these extra subs?

The reason is the use of a hidden pointer (named return value optimization), injected by the compiler, for returning the struct by value. In SystemV's ABI, page 41, in the section about "Function Returning Structures or Unions", it says:

The called function must remove this address from the stack before returning.

That is why you get a ret $0x4 at the end of test_callee5(), it is for compliance with the ABI.

Now about the presence of sub $0x4, %esp just after each test_callee5() call sites, it is a side-effect of the above rule, combined with optimized code generated by the C compiler. As the local storage stack space is pre-reserved entirely by:

3:  sub    $0x38,%esp

there is no need to push/pop the hidden pointer, it is just written at bottom of the pre-reserved space (pointed at by esp), using mov %eax,(%esp) at lines 9 and 17. As the stack pointer is not decremented, the sub $0x4,%esp is there to negate the effect of ret $0x4, and keep the stack pointer unchanged.

On Win32 (using MSVC compiler I guess), there is no such ABI rule, a simple ret is used (as expected in cdecl), the hidden pointer is pushed on the stack at line 7 and 11. Though, those slots are not freed after the calls, as an optimization, but only before callee exits, using an add esp,1Ch, freeing the hidden pointer stack slots (2 * 0x4 bytes) and the local AStructstruct (0x14 bytes).

Doesn't cdecl define the calling convention for a function returning a structure?!

Unfortunately, it does not, it varies with C compilers and operating systems


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...