Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
1.1k views
in Technique[技术] by (71.8m points)

c - Parallelizing matrix times a vector by columns and by rows with OpenMP

For some homework I have, I need to implement the multiplication of a matrix by a vector, parallelizing it by rows and by columns. I do understand the row version, but I am a little confused in the column version.

Lets say we have the following data:

Matix times vector

And the code for the row version:

#pragma omp parallel default(none) shared(i,v2,v1,matrix,tam) private(j)
  {
#pragma omp for
    for (i = 0; i < tam; i++)
      for (j = 0; j < tam; j++){
//        printf("Hebra %d hizo %d,%d
", omp_get_thread_num(), i, j);
        v2[i] += matrix[i][j] * v1[j];
      }
  }

Here the calculations are done right and the result is correct.

The column version:

#pragma omp parallel default(none) shared(j,v2,v1,matrix,tam) private(i)
  {
    for (i = 0; i < tam; i++)
#pragma omp for
      for (j = 0; j < tam; j++) {
//            printf("Hebra %d hizo %d,%d
", omp_get_thread_num(), i, j);
        v2[i] += matrix[i][j] * v1[j];
      }
  }

Here, due to how the parallelization is done, the result varies on each execution depending on who thread execute each column. But it happens something interesting, (And I would think is because of compiler optimizations) if I uncomment the printf then the results all the same as the row version and therefore, correct, for example:

Thread 0 did 0,0
Thread 2 did 0,2
Thread 1 did 0,1
Thread 2 did 1,2
Thread 1 did 1,1
Thread 0 did 1,0
Thread 2 did 2,2
Thread 1 did 2,1
Thread 0 did 2,0

 2.000000  3.000000  4.000000 
 3.000000  4.000000  5.000000 
 4.000000  5.000000  6.000000 


V2:
20.000000, 26.000000, 32.000000,

Is right, but If I remove the printf:

V2:
18.000000, 11.000000, 28.000000,

What kind of mechanism should I use to get the column version right?

Note: I care more about the explanation rather than the code you may post as answer, because what I really want is understand what is going wrong in the column version.

EDIT

I've found a way of get rid of the private vector proposed by Z boson in his answer. I've replaced that vector by a variable, here is the code:

    #pragma omp parallel
      {
        double sLocal = 0;
        int i, j;
        for (i = 0; i < tam; i++) {
    #pragma omp for
          for (j = 0; j < tam; j++) {
            sLocal += matrix[i][j] * v1[j];
          }
    #pragma omp critical
          {
            v2[i] += sLocal;
            sLocal = 0;
          }
        }
      }
See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

I don't know exactly what your homework means by parallelize along row and column but I know why your code is not working. You have a race condition when you write to v2[i]. You can fix it by making private versions of v2[i], filling them in parallel, and then merging them with a critical section.

#pragma omp parallel
{
    float v2_private[tam] = {};
    int i,j;
    for (i = 0; i < tam; i++) {
        #pragma omp for
        for (j = 0; j < tam; j++) {
            v2_private[i] += matrix[i][j] * v1[j];
        }
    }
    #pragma omp critical
    {
        for(i=0; i<tam; i++) v2[i] += v2_private[i];
    }
}

I tested this. You can see the results here http://coliru.stacked-crooked.com/a/5ad4153f9579304d

Note that I did not explicitly define anything shared or private. It's not necessary to do. Some people think you should explicitly define everything. I personalty think the opposite. By defining i and j (and v2_private) inside the parallel section they are made private.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

1.4m articles

1.4m replys

5 comments

57.0k users

...