Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
323 views
in Technique[技术] by (71.8m points)

c - Does C99 guarantee that arrays are contiguous?

Following an hot comment thread in another question, I came to debate of what is and what is not defined in C99 standard about C arrays.

Basically when I define a 2D array like int a[5][5], does the standard C99 garantee or not that it will be a contiguous block of ints, can I cast it to (int *)a and be sure I will have a valid 1D array of 25 ints.

As I understand the standard the above property is implicit in the sizeof definition and in pointer arithmetic, but others seems to disagree and says casting to (int*) the above structure give an undefined behavior (even if they agree that all existing implementations actually allocate contiguous values).

More specifically, if we think an implementation that would instrument arrays to check array boundaries for all dimensions and return some kind of error when accessing 1D array, or does not give correct access to elements above 1st row. Could such implementation be standard compilant ? And in this case what parts of the C99 standard are relevant.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

We should begin with inspecting what int a[5][5] really is. The types involved are:

  • int
  • array[5] of ints
  • array[5] of arrays

There is no array[25] of ints involved.

It is correct that the sizeof semantics imply that the array as a whole is contiguous. The array[5] of ints must have 5*sizeof(int), and recursively applied, a[5][5] must have 5*5*sizeof(int). There is no room for additional padding.

Additionally, the array as a whole must be working when given to memset, memmove or memcpy with the sizeof. It must also be possible to iterate over the whole array with a (char *). So a valid iteration is:

int  a[5][5], i, *pi;
char *pc;

pc = (char *)(&a[0][0]);
for (i = 0; i < 25; i++)
{
    pi = (int *)pc;
    DoSomething(pi);
    pc += sizeof(int);
}

Doing the same with an (int *) would be undefined behaviour, because, as said, there is no array[25] of int involved. Using a union as in Christoph's answer should be valid, too. But there is another point complicating this further, the equality operator:

6.5.9.6 Two pointers compare equal if and only if both are null pointers, both are pointers to the same object (including a pointer to an object and a subobject at its beginning) or function, both are pointers to one past the last element of the same array object, or one is a pointer to one past the end of one array object and the other is a pointer to the start of a different array object that happens to immediately follow the first array object in the address space. 91)

91) Two objects may be adjacent in memory because they are adjacent elements of a larger array or adjacent members of a structure with no padding between them, or because the implementation chose to place them so, even though they are unrelated. If prior invalid pointer operations (such as accesses outside array bounds) produced undefined behavior, subsequent comparisons also produce undefined behavior.

This means for this:

int a[5][5], *i1, *i2;

i1 = &a[0][0] + 5;
i2 = &a[1][0];

i1 compares as equal to i2. But when iterating over the array with an (int *), it is still undefined behaviour, because it is originally derived from the first subarray. It doesn't magically convert to a pointer into the second subarray.

Even when doing this

char *c = (char *)(&a[0][0]) + 5*sizeof(int);
int  *i3 = (int *)c;

won't help. It compares equal to i1 and i2, but it isn't derived from any of the subarrays; it is a pointer to a single int or an array[1] of int at best.

I don't consider this a bug in the standard. It is the other way around: Allowing this would introduce a special case that violates either the type system for arrays or the rules for pointer arithmetic or both. It may be considered a missing definition, but not a bug.

So even if the memory layout for a[5][5] is identical to the layout of a[25], and the very same loop using a (char *) can be used to iterate over both, an implementation is allowed to blow up if one is used as the other. I don't know why it should or know any implementation that would, and maybe there is a single fact in the Standard not mentioned till now that makes it well defined behaviour. Until then, I would consider it to be undefined and stay on the safe side.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...