Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
1.1k views
in Technique[技术] by (71.8m points)

macos - Basic OS X Assembly and the Mach-O format

I am interested in programming in x86-64 assembly on the Mac OS X platform. I came across this page about creating a 248B Mach-O program, which led me to Apple's own Mach-O format reference. After that I thought I'd make that same simple C program in Xcode and check out the generated assembly.

This was the code:

int main(int argc, const char * argv[])
{
    return 42;
}

But the assembly generated was 334 lines, containing (based on the 248B model) a lot of excess content.

Firstly, why is so much DWARF debug info included in the Release build of a C executable? Secondly, I notice the Mach-O header data is included 4 times (in different DWARF-related sections). Why is this necessary? Finally, the Xcode assembly includes:

.private_extern _main
.globl  _main
_main:
    .cfi_startproc

But in the 248B program, these are all nowhere to be seen - the program instead begins at _start. How is that possible if all programs by definition begin in main?


Full Xcode Assembly:

# Assembly output for main.c
# Generated at 4:04:08 PM on Sunday, January 20, 2013
# Using Release configuration, x86_64 architecture for Tiny target of Tiny project

    .section    __TEXT,__text,regular,pure_instructions
    .file   1 "/Users/####/Desktop/Tiny/Tiny/main.c"
    .section    __DWARF,__debug_info,regular,debug
Lsection_info:
    .section    __DWARF,__debug_abbrev,regular,debug
Lsection_abbrev:
    .section    __DWARF,__debug_aranges,regular,debug
    .section    __DWARF,__debug_macinfo,regular,debug
    .section    __DWARF,__debug_line,regular,debug
Lsection_line:
    .section    __DWARF,__debug_loc,regular,debug
    .section    __DWARF,__debug_pubtypes,regular,debug
    .section    __DWARF,__debug_str,regular,debug
Lsection_str:
    .section    __DWARF,__debug_ranges,regular,debug
Ldebug_range:
    .section    __DWARF,__debug_loc,regular,debug
Lsection_debug_loc:
    .section    __TEXT,__text,regular,pure_instructions
Ltext_begin:
    .section    __DATA,__data
    .section    __TEXT,__text,regular,pure_instructions
    .private_extern _main
    .globl  _main
_main:                                  ## @main
    .cfi_startproc
Lfunc_begin0:
    .loc    1 12 0                  ## /Users/####/Desktop/Tiny/Tiny/main.c:12:0
## BB#0:
    pushq   %rbp
Ltmp2:
    .cfi_def_cfa_offset 16
Ltmp3:
    .cfi_offset %rbp, -16
    movq    %rsp, %rbp
Ltmp4:
    .cfi_def_cfa_register %rbp
    ##DEBUG_VALUE: main:argc <- EDI+0
    ##DEBUG_VALUE: main:argv <- RSI+0
    movl    $42, %eax
    .loc    1 15 5 prologue_end     ## /Users/####/Desktop/Tiny/Tiny/main.c:15:5
Ltmp5:
    popq    %rbp
    ret
Ltmp6:
Lfunc_end0:
    .cfi_endproc

Ltext_end:
    .section    __DATA,__data
Ldata_end:
    .section    __TEXT,__text,regular,pure_instructions
Lsection_end1:
    .section    __DWARF,__debug_info,regular,debug
Linfo_begin1:
    .long   127                     ## Length of Compilation Unit Info
    .short  2                       ## DWARF version number
Lset0 = Labbrev_begin-Lsection_abbrev   ## Offset Into Abbrev. Section
    .long   Lset0
    .byte   8                       ## Address Size (in bytes)
    .byte   1                       ## Abbrev [1] 0xb:0x78 DW_TAG_compile_unit
Lset1 = Lstring0-Lsection_str           ## DW_AT_producer
    .long   Lset1
    .short  12                      ## DW_AT_language
Lset2 = Lstring1-Lsection_str           ## DW_AT_name
    .long   Lset2
    .quad   0                       ## DW_AT_entry_pc
    .long   0                       ## DW_AT_stmt_list
Lset3 = Lstring2-Lsection_str           ## DW_AT_comp_dir
    .long   Lset3
    .byte   1                       ## DW_AT_APPLE_optimized
    .byte   2                       ## Abbrev [2] 0x27:0x3e DW_TAG_subprogram
Lset4 = Lstring3-Lsection_str           ## DW_AT_name
    .long   Lset4
    .byte   1                       ## DW_AT_decl_file
    .byte   11                      ## DW_AT_decl_line
    .byte   1                       ## DW_AT_prototyped
    .long   101                     ## DW_AT_type
    .byte   1                       ## DW_AT_external
    .quad   Lfunc_begin0            ## DW_AT_low_pc
    .quad   Lfunc_end0              ## DW_AT_high_pc
    .byte   1                       ## DW_AT_frame_base
    .byte   86
    .byte   3                       ## Abbrev [3] 0x46:0xf DW_TAG_formal_parameter
Lset5 = Lstring5-Lsection_str           ## DW_AT_name
    .long   Lset5
    .byte   1                       ## DW_AT_decl_file
    .byte   11                      ## DW_AT_decl_line
    .long   101                     ## DW_AT_type
Lset6 = Ldebug_loc0-Lsection_debug_loc  ## DW_AT_location
    .long   Lset6
    .byte   3                       ## Abbrev [3] 0x55:0xf DW_TAG_formal_parameter
Lset7 = Lstring6-Lsection_str           ## DW_AT_name
    .long   Lset7
    .byte   1                       ## DW_AT_decl_file
    .byte   11                      ## DW_AT_decl_line
    .long   125                     ## DW_AT_type
Lset8 = Ldebug_loc2-Lsection_debug_loc  ## DW_AT_location
    .long   Lset8
    .byte   0                       ## End Of Children Mark
    .byte   4                       ## Abbrev [4] 0x65:0x7 DW_TAG_base_type
Lset9 = Lstring4-Lsection_str           ## DW_AT_name
    .long   Lset9
    .byte   5                       ## DW_AT_encoding
    .byte   4                       ## DW_AT_byte_size
    .byte   4                       ## Abbrev [4] 0x6c:0x7 DW_TAG_base_type
Lset10 = Lstring7-Lsection_str          ## DW_AT_name
    .long   Lset10
    .byte   6                       ## DW_AT_encoding
    .byte   1                       ## DW_AT_byte_size
    .byte   5                       ## Abbrev [5] 0x73:0x5 DW_TAG_const_type
    .long   108                     ## DW_AT_type
    .byte   6                       ## Abbrev [6] 0x78:0x5 DW_TAG_pointer_type
    .long   115                     ## DW_AT_type
    .byte   6                       ## Abbrev [6] 0x7d:0x5 DW_TAG_pointer_type
    .long   120                     ## DW_AT_type
    .byte   0                       ## End Of Children Mark
Linfo_end1:
    .section    __DWARF,__debug_abbrev,regular,debug
Labbrev_begin:
    .byte   1                       ## Abbreviation Code
    .byte   17                      ## DW_TAG_compile_unit
    .byte   1                       ## DW_CHILDREN_yes
    .byte   37                      ## DW_AT_producer
    .byte   14                      ## DW_FORM_strp
    .byte   19                      ## DW_AT_language
    .byte   5                       ## DW_FORM_data2
    .byte   3                       ## DW_AT_name
    .byte   14                      ## DW_FORM_strp
    .byte   82                      ## DW_AT_entry_pc
    .byte   1                       ## DW_FORM_addr
    .byte   16                      ## DW_AT_stmt_list
    .byte   6                       ## DW_FORM_data4
    .byte   27                      ## DW_AT_comp_dir
    .byte   14                      ## DW_FORM_strp
    .ascii   "341177"             ## DW_AT_APPLE_optimized
    .byte   12                      ## DW_FORM_flag
    .byte   0                       ## EOM(1)
    .byte   0                       ## EOM(2)
    .byte   2                       ## Abbreviation Code
    .byte   46                      ## DW_TAG_subprogram
    .byte   1                       ## DW_CHILDREN_yes
    .byte   3                       ## DW_AT_name
    .byte   14                      ## DW_FORM_strp
    .byte   58                      ## DW_AT_decl_file
    .byte   11                      ## DW_FORM_data1
    .byte   59                      ## DW_AT_decl_line
    .byte   11                      ## DW_FORM_data1
    .byte   39                      ## DW_AT_prototyped
    .byte   12                      ## DW_FORM_flag
    .byte   73                      ## DW_AT_type
    .byte   19                      ## DW_FORM_ref4
    .byte   63                      ## DW_AT_external
    .byte   12                      ## DW_FORM_flag
    .byte   17                      ## DW_AT_low_pc
    .byte   1                       ## DW_FORM_addr
    .byte   18                      ## DW_AT_high_pc
    .byte   1                       ## DW_FORM_addr
    .byte   64                      ## DW_AT_frame_base
    .byte   10                      ## DW_FORM_block1
    .byte   0                       ## EOM(1)
    .byte   0                       ## EOM(2)
    .byte   3                       ## Abbreviation Code
    .byte   5                       ## DW_TAG_formal_parameter
    .byte   0                       ## DW_CHILDREN_no
    .byte   3                       ## DW_AT_name
    .byte   14                      ## DW_FORM_strp
    .byte   58                      ## DW_AT_decl_file
    .byte   11                      ## DW_FORM_data1
    .byte   59                      ## DW_AT_decl_line
    .byte   11                      ## DW_FORM_data1
    .byte   73                      ## DW_AT_type
    .byte   19                      ## DW_FORM_ref4
    .byte   2                       ## DW_AT_location
    .byte   6                       ## DW_FORM_data4
    .byte   0                       ## EOM(1)
    .byte   0                       ## EOM(2)
    .byte   4                       ## Abbreviation Code
    .byte   36                      ## DW_TAG_base_type
    .byte   0                       ## DW_CHILDREN_no
    .byte   3                       ## DW_AT_name
    .byte   14                      ## DW_FORM_strp
    .byte   62                      ## DW_AT_encoding
    .byte   11                      ## DW_FORM_data1
    .byte   11                      ## DW_AT_byte_size
    .byte   11                      ## DW_FORM_data1
    .byte   0                       ## EOM(1)
    .byte   0                       ## EOM(2)
    .byte   5                       ## Abbreviation Code
    .byte   38                      ## DW_TAG_const_type
    .byte   0                       ## DW_CHILDREN_no
    .byte   73                      ## DW_AT_type
    .byte   19                      ## DW_FORM_ref4
    .byte   0                       ## EOM(1)
    .byte   0                       ## EOM(2)
    .byte   6                       ## Abbreviation Code
    .byte   15                      ## DW_TAG_pointer_type
    .byte   0                       ## DW_CHILDREN_no
    .byte   73                      ## DW_AT_type
    .byte   19                      ## DW_FORM_ref4
    .byte   0                       ## EOM(1)
    .byte   0                       ## EOM(2)
    .byte   0                       ## EOM(3)
Labbrev_end:
    .section    __DWARF,__apple_names,regular,debug
Lnames_begin:
    .long   1212240712              ## Header Magic
    .short  1                       ## Header Version
    .short  0                       ## Header Hash Function
    .long   1                       ## Header Bucket Count
    .long   1                       ## Header Hash Count
    .long   12                      ## Header Data Length
    .long   0                       ## HeaderData Die Offset Base
    .long   1                       ## HeaderData Atom Count
    .short  1                       ## eAtomTypeDIEOffset
    .short  6                       ## DW_FORM_data4
    .long   0                       ## Bucket 0
    .long   2090499946              ## Hash in Bucket 0
    .long   LNames0-Lnames_begin    ## Offset in Bucket 0
LNames0:

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Firstly, why is so much DWARF debug info included in the Release build of a C executable?

Being able to debug optimized code is incredibly useful. Cases in which bugs are only visible in optimized builds are not rare. If you're hand writing assembly you're unlikely to care about DWARF information though, so I'd suggest building your comparison code without the -g argument.


Secondly, I notice the Mach-O header data is included 4 times (in different DWARF-related sections). Why is this necessary?

These aren't Mach-O headers that you're seeing. They're the headers for DWARF accelerator tables, an LLVM extension to DWARF that optimizes the test for whether a symbol is defined within a given compilation unit.


But in the 248B program, these are all nowhere to be seen - the program instead begins at _start. How is that possible if all programs by definition begin in main?

Historically on OS X all programs begin at start. However, this symbol typically comes from a system library rather than being defined by the program itself. The system implementation of start will perform some initialization and then jump to your programs "real" entry point.

The entry points to Mach-O binaries is defined by either the LC_UNIXTHREAD or LC_MAIN load commands. When LC_UNIXTHREAD, the convention for pre-10.8 versions of OS X, is used with a regular C or C++ program the linker uses start as the entry point. This symbol typically comes from /usr/lib/crt1.o, and its address is written in to the instruction pointer field of the LC_UNIXTHREAD load command. The 248B binary you link to includes an LC_UNIXTHREAD command with eip set to 0x000010e8. That's the address of the symbol _start. Since this small program is a static executable and the binary is generated directly it can write whatever address it wishes to in to the instruction pointer field of the load command.

If you're building your executable targeting OS X 10.8+ the linker will generate an LC_MAIN load command instead of LC_UNIXTHREAD. The kernel knows that binaries using the LC_MAIN command should be executed by loading the dynamic linker and jumping to its entry point. The dynamic linker, dyld, initializes itself and then jumps to the address specified in the LC_MAIN command. In this brave new world no symbol named start is used at all.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...