c - vectorized strlen getting away with reading unallocated memory

Question

Welcome To Ask or Share your Answers For Others

c - vectorized strlen getting away with reading unallocated memory

posted Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

c - vectorized strlen getting away with reading unallocated memory

While studying OSX 10.9.4's implementation of strlen, I notice that it always compares a chunk of 16-bytes and skips ahead to the following 16-bytes until it encounters a ''. The relevant part:

3de0:   48 83 c7 10             add    $0x10,%rdi
3de4:   66 0f ef c0             pxor   %xmm0,%xmm0
3de8:   66 0f 74 07             pcmpeqb (%rdi),%xmm0
3dec:   66 0f d7 f0             pmovmskb %xmm0,%esi
3df0:   85 f6                   test   %esi,%esi
3df2:   74 ec                   je     3de0 <__platform_strlen+0x40>

0x10 is 16 bytes in hex.

When I saw that, I was wondering: this memory could just as well not be allocated. If I had allocated a C string of 20 bytes and passed it to strlen, it would read 36 bytes of memory. Why is it allowed to do that? I started looking and found How dangerous is it to access an array out of bounds?

Which confirmed that it's definitely not always a good thing, unallocated memory might be unmapped, for example. Yet, there must be something that makes this work. Some of my hypotheses:

OSX not only guarantees that its allocations are 16-byte aligned, but also that the "quantum" of an allocated is a 16-byte chunks. Said another way, allocating 5 bytes will actually allocate 16 bytes. Allocating 20 bytes will actually allocate 32 bytes.
It's not harmful per se to read of the end of an array when you're writing asm, as it's not undefined behaviour, as long as its within bounds (within a page?).

What's the actual reason?

EDIT: just found Why I'm getting read and write permission on unallocated memory?, which seems to indicate my first guess was right.

EDIT 2: Stupidly enough, I had forgotten that even though Apple seems to have removed the source of most of its asm implementations (Where did OSX's x86-64 assembly libc routines go?), it left strlen: http://www.opensource.apple.com/source/Libc/Libc-997.90.3/x86_64/string/strlen.s

In the comments we find:

//  returns the length of the string s (i.e. the distance in bytes from
//  s to the first NUL byte following s).  We look for NUL bytes using
//  pcmpeqb on 16-byte aligned blocks.  Although this may read past the
//  end of the string, because all access is aligned, it will never
//  read past the end of the string across a page boundary, or even
//  accross a cacheline.

EDIT: I honestly think all answerers deserved an accepted answer, and basically all contained the information necessary to understand the issue. So I went for the answer of the person that had the least reputation.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2021-10-23T20:03:27+0000

I'm the author of the routine in question.

As some others have said, the key thing is that the reads are all aligned. While reading outside the bounds of an array is undefined behavior in C, we're not writing C; we know lots of details of the x86 architecture beyond what the C abstract machine defines.

In particular, reads beyond the end of a buffer are safe (meaning they cannot produce a trap or other observable side effect) so long as they do not cross a page boundary (because memory attributes and mappings are tracked at page granularity). Since the smallest supported page size is 4096 bytes, an aligned 16 byte load cannot cross a page boundary.

Categories

c - vectorized strlen getting away with reading unallocated memory

c - vectorized strlen getting away with reading unallocated memory

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags