Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
401 views
in Technique[技术] by (71.8m points)

OpenMP in C array reduction / parallelize the code

I have a problem with my code, it should print number of appearances of a certain number.

I want parallelize this code with OpenMP, and I tried to use reduction for arrays but it's obviously didn't working as I wanted.

The error is: "segmentation fault". Should some variables be private? or it's the problem with the way I'm trying to use the reduction?

I think each thread should count some part of array, and then merge it somehow.

#pragma omp parallel for reduction (+: reasult[:i])
    for (i = 0; i < M; i++) {   
      for(j = 0; j < N; j++) {
         if ( numbers[j] == i){
            result[i]++;
         }
      }
  }

Where N is big number telling how many numbers I have. Numbers is array of all numbers and result array with sum of each number.

question from:https://stackoverflow.com/questions/65871611/openmp-in-c-array-reduction-parallelize-the-code

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

First you have a typo on the name

#pragma omp parallel for reduction (+: reasult[:i])

should actually be "result" not "reasult"

Nonetheless, why are you section the array with result[:i]? Based on your code, it seems that you wanted to reduce the entire array, namely:

#pragma omp parallel for reduction (+: result)
    for (i = 0; i < M; i++)   
      for(j = 0; j < N; j++)
         if ( numbers[j] == i)
            result[i]++;

When one's compiler does not support the OpenMP 4.5 array reduction feature one can alternatively explicitly implement the reduction (check this SO thread to see how).

As pointed out by @Hristo Iliev in the comments:

Provided that M * sizeof(result[0]) / #threads is a multiple of the cache line size, and even if it isn't when the value of M is large enough, there is absolutely no need to involve reduction in the process. Unless the program is running on a NUMA system, that is.

Assuming that the aforementioned conditions are met, and if you analyze carefully the outermost loop iterations (i.e., variable i) are assigned to the threads, and since the variable i is used to access the result array, each thread will be updating a different position of the result array. Therefore, you can simplified your code to:

#pragma omp parallel for
for (i = 0; i < M; i++)   
   for(j = 0; j < N; j++)
      if ( numbers[j] == i)
         result[i]++;

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...