Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
503 views
in Technique[技术] by (71.8m points)

c - How to make GCC generate bswap instruction for big endian store without builtins?

Update: This was fixed in GCC 8.1.

I'm working on a function that stores a 64-bit value into memory in big endian format. I was hoping that I could write portable C99 code that works on both little and big endian platforms and have modern x86 compilers generate a bswap instruction automatically without any builtins or intrinsics. So I started with the following function:

#include <stdint.h>

void
encode_bigend_u64(uint64_t value, void *vdest) {
    uint8_t *bytes = (uint8_t *)vdest;
    bytes[0] = value >> 56;
    bytes[1] = value >> 48;
    bytes[2] = value >> 40;
    bytes[3] = value >> 32;
    bytes[4] = value >> 24;
    bytes[5] = value >> 16;
    bytes[6] = value >> 8;
    bytes[7] = value;
}

This works fine for clang which compiles this function to:

bswapq  %rdi
movq    %rdi, (%rsi)
retq

But GCC fails to detect the byte swap. I tried a couple of different approaches but they only made things worse. I know that GCC can detect byte swaps using bitwise-and, shift, and bitwise-or, but why doesn't it work when writing bytes?

Edit: I found the corresponding GCC bug.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

This seems to do the trick:

void encode_bigend_u64(uint64_t value, void* dest)
{
  value =
      ((value & 0xFF00000000000000u) >> 56u) |
      ((value & 0x00FF000000000000u) >> 40u) |
      ((value & 0x0000FF0000000000u) >> 24u) |
      ((value & 0x000000FF00000000u) >>  8u) |
      ((value & 0x00000000FF000000u) <<  8u) |      
      ((value & 0x0000000000FF0000u) << 24u) |
      ((value & 0x000000000000FF00u) << 40u) |
      ((value & 0x00000000000000FFu) << 56u);
  memcpy(dest, &value, sizeof(uint64_t));
}

clang with -O3

encode_bigend_u64(unsigned long, void*):
        bswapq  %rdi
        movq    %rdi, (%rsi)
        retq

clang with -O3 -march=native

encode_bigend_u64(unsigned long, void*):
        movbeq  %rdi, (%rsi)
        retq

gcc with -O3

encode_bigend_u64(unsigned long, void*):
        bswap   %rdi
        movq    %rdi, (%rsi)
        ret

gcc with -O3 -march=native

encode_bigend_u64(unsigned long, void*):
        movbe   %rdi, (%rsi)
        ret

Tested with clang 3.8.0 and gcc 5.3.0 on http://gcc.godbolt.org/ (so I don't know exactly what processor is underneath (for the -march=native) but I strongly suspect a recent x86_64 processor)


If you want a function which works for big endian architectures too, you can use the answers from here to detect the endianness of the system and add an if. Both the union and the pointer casts versions work and are optimized by both gcc and clang resulting in the exact same assembly (no branches). Full code on godebolt:

int is_big_endian(void)
{
    union {
        uint32_t i;
        char c[4];
    } bint = {0x01020304};

    return bint.c[0] == 1;
}

void encode_bigend_u64_union(uint64_t value, void* dest)
{
  if (!is_big_endian())
    //...
  memcpy(dest, &value, sizeof(uint64_t));
}

Intel? 64 and IA-32 Architectures Instruction Set Reference (3-542 Vol. 2A):

MOVBE—Move Data After Swapping Bytes

Performs a byte swap operation on the data copied from the second operand (source operand) and store the result in the first operand (destination operand). [...]

The MOVBE instruction is provided for swapping the bytes on a read from memory or on a write to memory; thus providing support for converting little-endian values to big-endian format and vice versa.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...