I need to convert the following code from C++ with OpenMP to C++ with CUDA. As answered in this question: CUDA access matrix stored in RAM and possibility of being implemented
. It is possible to write the portion with OpenMP in CUDA. The first problem is that I don’t know what to do with que sums inside the kernel function.
Legacy Code:
/* definition of variables */
for (int l = 0; l < N_mesh_points_x; l++){
for (int m = 0; m < N_mesh_points_y; m++){
for (int p = 0; p < N_mesh_points_z; p++){
sum_1 = 0;
sum_2 = 0;
#pragma omp parallel for schedule(dynamic) private(phir) reduction(+: sum_1,sum_2)
for (int i = 0; i < N_mesh_points_x; i++){
for (int j = 0; j < N_mesh_points_y; j++){
for (int k = 0; k < N_mesh_points_z; k++){
if (!(i==l) || !(j==m) || !(k==p)){
phir = weights_x[i]*weights_y[j]*weights_z[k]*kern_1(i,j,k,l,m,p);
sum_1 += phir * matrix1[position(i,j,k)];
sum_2 += phir;
}
}
}
}
(*K2)[position(l,m,p)] = sum_1 + (5 - 2*sum_2) * matrix1[position(l,m,p)];
}
}
}
I read some articles about reduction, but I don’t have an array, is only a series of sums. Should I create an array to store the value phir
and after that, use reduction on that array? There are any implemented function who does that?
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…