Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
358 views
in Technique[技术] by (71.8m points)

c - how do compilers assign memory addresses to variables?

I teach a course where students get to ask questions about programming (!): I got this question:

Why does the machine choose were variables go in memory? Can we tell it where to store a variable?

I don't really know what to say. Here's my first attempt:

The compiler (not the machine) chooses where to store the variables in the process address space automatically. Using C, we cannot tell the machine where to store variables.

But that "automatically" is somewhat anticlimactic and begs the question... and I've realized I don't even know if it's the compiler or the runtime or the OS or who does the assignment. Maybe someone can answer the student's question better than me.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

The answer to this question is quite complex since there are various approaches to memory allocation depending on variable scope, size and programming environment.

Stack allocated variables

Typically local variables are put on the "stack". This means that the compiler assigns an offset to the "stack pointer" which can be different depending on the invocation of the current function. I.e. the compiler assumes that memory locations like Stack-Pointer+4, Stack-Pointer+8, etc. are accessible and usable by the program. Upon return-ing from the function the memory locations are not guaranteed to retain these values.

This is mapped into assembly instructions similar to the following. esp is the stack pointer, esp + N refers to a memory location relative to esp:

mov eax, DWORD PTR SS:[esp]
mov eax, DWORD PTR SS:[esp + 4]
mov eax, DWORD PTR SS:[esp + 8]

Heap

Then there are variables that are heap-allocated. This means that there is a library call to request memory from the standard library (alloc in C or new in C++). This memory is reserved until the end of the programs execution. alloc and new return pointers to memory in a region of memory called the heap. The allocating functions have to make sure that the memory is not reserved which can make heap-allocation slow at times. Also, if you don't want to run out of memory you should free (or delete) memory that is not used anymore. Heap allocation is quite complicated internally since the standard library has to keep track of used and unsused ranges in memory as well as freed ranges of memory. Therefore even freeing a heap-allocated variable can be more time-consuming than allocating it. For more information see How is malloc() implemented internally?

Understanding the difference between stack and heap is quite fundamental to learning how to program in C and C++.

Arbitrary Pointers

Naively one might assume, that by setting a pointer to an arbitrary address int *a = 0x123 it should be possible to address arbitrary locations in the computer's memory. This does not exactly hold true since (depending on the CPU und system) programs are heavily restricted when addressing memory.

Getting a feel for memory

In a guided classroom experience, it might be beneficial to explore some simple C code by compiling source code to assembler (gcc can do this for example). A simple function such as int foo(int a, int b) { return a+b;} should suffice (without optimizations). Then see something like int bar(int *a, int *b) { return (*a) + (*b);};

When invoking bar, allocate the parameters once on the stack, once per malloc.

Conclusion

The compiler does perform some variable placment and alignment relative to base-adresses which are obtained by the program/standard library at runtime.

For a deep understanding of memory related questions see Ulrich Drepper's "What every programmer should know about memory" http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.91.957

Apart from C-ish Country idenote

Then there also is Garbage Collection which is popular among lots of scripting languages (Python, Perl, Javascript, lisp) and device independent environments (Java, C#). It is related to heap allocation but slightly more complicated.

Varieties of programming languages are only heap-based (stackless python) or entirely stack based (forth).


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...