Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
471 views
in Technique[技术] by (71.8m points)

memory management - Difference between local allocatable and automatic arrays

I am interested in the difference between alloc_array and automatic_array in the following extract:

subroutine mysub(n)
integer, intent(in)  :: n
integer              :: automatic_array(n)
integer, allocatable :: alloc_array(:)

allocate(alloc_array(n))
...[code]...

I am familiar enough with the basics of allocation (not so much on advanced techniques) to know that allocation allows you to change the size of the array in the middle of the code (as pointed out in this question), but I'm interested in considering the case where you don't need to change the size of the array; they might be passed onto other subroutines for operation, but the only purpose of both variables in the code and any subroutine is to hold the data of an array of dimension n (and maybe change the data, but not the size).

(1) Is there any difference in memory usage? I am not an expert in low level procedures, but I have a very slight knowledge of how they matter and how they can impact on the higher level programming (kind of experience I'm talkng about: once trying to run a big code in fortran I was getting a mistake I didn't understand, sysadmin told me "oh, yeah, you are probably saturating the stack; try adding this line in your running script"; anything that gives me insight into how to consider this things when actually coding and not having to patch them later is welcomed). I've been told by people that it might be dependent on many other things like compiler or architecture, but I interpreted from those responses that they were not completely sure of exactly how this was so. Is it so absolutely dependant on a multitude of factors or is there a default/intended behavior in the coding that may then be over-riden by optional compiling keywords or system preferences?

(2) Would the subroutines have different interface needs? Again, not an expert, but it had happened to me before that because of the way I declare variables of subroutine, I end up having to put the subroutines in a module. I've been given to understand this may vary depending on whether I use things that are special for allocatable variables. I am thinking about the case in which everything I do with the variables can be done both by allocatables and automatics, not intentionally using anything specific of allocatables (other than allocation before usage, that is).

Finally, in case this is of use: the reason I am asking is because we are developing in a group and we have recently noticed different people use those two declarations in different ways and we needed to determine if this is something that can be left to personal preference or if there might be any reasons why it might be a good idea to set a clear criteria (and how to set that criteria). I don't need extremely detailed answers, I am trying to determine if this is something I should be doing research about to be careful on how we use it and in what aspects of it should the research be directed.

Though I would be interested to know of "interesting tricks" than can be done with allocation but are not directly related to the need of having size variability, I am leaving those for a possible future follow-up question and focusing here on the strictly functional differences (meaning: what I am explicitly telling compilers to do with my code). The two items I mentioned are the thing I could come up with due to previous experiences, but any other important one that I am missing and should consider, please do mention them.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Because gfortran or ifort + Linux(x86_64) are among the most popular combinations used for HPC, I made some performance comparison between local allocatable vs automatic arrays for these combinations. The CPU used is Xeon E5-2650 v2@2.60GHz, and the compilers are gfortran4.8.2 and ifort14.0. The test program is like the following.

In test.f90:

!------------------------------------------------------------------------           
subroutine use_automatic( n )
    integer :: n

    integer :: a( n )   !! local automatic array (with unknown size at compile-time)
    integer :: i

    do i = 1, n
        a( i ) = i
    enddo

    call sub( a )
end

!------------------------------------------------------------------------           
subroutine use_alloc( n )
    integer :: n

    integer, allocatable :: a( : )  !! local allocatable array                      
    integer :: i

    allocate( a( n ) )

    do i = 1, n
        a( i ) = i
    enddo

    call sub( a )

    deallocate( a )  !! not necessary for modern Fortran but for clarity                  
end

!------------------------------------------------------------------------           
program main
    implicit none
    integer :: i, nsizemax, nsize, nloop, foo
    common /dummy/ foo

    nloop = 10**7
    nsizemax = 10

    do i = 1, nloop
        nsize = mod( i, nsizemax ) + 1

        call use_automatic( nsize )
        ! call use_alloc( nsize )                                                   
    enddo

    print *, "foo = ", foo   !! to check if sub() is really called
end

In sub.f90:

!------------------------------------------------------------------------
subroutine sub( a )
    integer a( * )
    integer foo
    common /dummy/ foo

    foo = a( 1 )
ends

In the above program, I tried avoiding compiler optimization that eliminates a(:) itself (i.e., no operation) by placing sub() in a different file and making the interface implicit. First, I compiled the program using gfortran as

gfortran -O3 test.f90 sub.f90

and tested different values of nsizemax while keeping nloop = 10^7. The result is in the following table (time is in sec, measured several times by the time command).

nsizemax    use_automatic()    use_alloc()
10          0.30               0.31               # average result
50          0.48               0.47
500         1.0                0.90
5000        4.3                4.2
100000      75.6               75.7

So the overall timing seems almost the same for two calls when -O3 is used (but see Edit for different options). Next, I compiled with ifort as

[O3]  ifort -O3 test.f90 sub.f90
or
[O3h] ifort -O3 -heap-arrays test.f90 sub.f90

In the former case the automatic array is stored on the stack, while when -heap-arrays is attached the array is stored on the heap. The obtained result is

         use_automatic()    use_alloc()
         [O3]    [O3h]      [O3]    [O3h]
10       0.064   0.39       0.48    0.48
50       0.094   0.56       0.65    0.66
500      0.45    1.03       1.12    1.12
5000     3.8     4.4        4.4     4.4
100000   74.5    75.3       76.5    75.5

So for ifort, the use of automatic arrays seems beneficial when relatively small arrays are mainly used. On the other hand, gfortran -O3 shows no difference because both arrays are treated the same way (see Edit for more details).

Additional comparison:

Below is the result for Oracle Fortran compiler 12.4 for Linux (used with f90 -O3). The overall trend seems similar; automatic arrays are faster for small n, indicating the internal use of stack.

nsizemax    use_automatic()    use_alloc()
10          0.16               0.45
50          0.17               0.62
500         0.37               0.97
5000        2.04               2.67
100000      65.6               65.7

Edit

Thanks to Vladimir's comment, it has turned out that gfortran -O3 put automatic arrays (with unknown size at compile-time) on the heap. This explains why use_automatic() and use_alloc() did not make any difference above. So I made another comparison between different options below:

[O3]  gfortran -O3
[O5]  gfortran -O5
[O3s] gfortran -O3 -fstack-arrays
[Of]  gfortran -Ofast                   # this includes -fstack-arrays

Here, -fstack-arrays means that the compiler puts all local arrays with unknown size on the stack. Note that this flag is enabled by default with -Ofast. The obtained result is

nsizemax    use_automatic()               use_alloc()
            [Of]   [O3s]  [O5]  [O3]     [Of]  [O3s]  [O5]  [O3]
10          0.087  0.087  0.29  0.29     0.29  0.29   0.29  0.29
50          0.15   0.15   0.43  0.43     0.45  0.44   0.44  0.45
500         0.57   0.56   0.84  0.84     0.92  0.92   0.92  0.92
5000        3.9    3.9    4.1   4.1      4.2   4.2    4.2   4.2
100000      75.1   75.0   75.6  75.6     75.6  75.3   75.7  76.0

where the average of ten measurements are shown. This table demonstrates that if -fstack-arrays is included, the execution time for small n becomes shorter. This trend is consistent with the results obtained for ifort above.

It should be mentioned, however, that the above comparison probably corresponds to the "best-case" scenario that highlights the difference between them, so the timing difference can be much smaller in practice. For example, I have compared the timing for the above options by using some other program (involving both small and large arrays), and the results were not much affected by the stack options. Also the result should depend on machine architecture as well as compilers, of course. So your mileage may vary.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...