如何使用openmp并行化这段代码: xp、yp、zp、gpx、gpy和gpz是已知的一维向量。
for (ies = 0; ies < 1000000; ies++){
for (jes = ies+1; jes < 1000000; jes++){
double dxp = xp[ies] - xp[jes];
double dyp = yp[ies] - yp[jes];
double dzp = zp[ies] - zp[jes];
double distance = sqrt( dxp * dxp + dyp * dyp + dzp * dzp );
double gpspec = gpx[ies] * gpx[jes] + gpy[ies] * gpy[jes] + gpz[ies] * gpz[jes];
#pragma omp parallel for
for (kes = 1; kes <= 100; kes++){
double distan = kes * distance;
E1[kes] = E1[kes] + gpspec * sin(distan) / distan;
}
}
}发布于 2020-10-14 21:43:38
这是一种可能性(未测试)
#pragma omp parallel for reduction(+: E1) private(jes, kes) schedule(dynamic)
for (ies = 0; ies < 1000000; ies++){
for (jes = ies+1; jes < 1000000; jes++){
double dxp = xp[ies] - xp[jes];
double dyp = yp[ies] - yp[jes];
double dzp = zp[ies] - zp[jes];
double distance = sqrt( dxp * dxp + dyp * dyp + dzp * dzp );
double gpspec = gpx[ies] * gpx[jes] + gpy[ies] * gpy[jes] + gpz[ies] * gpz[jes];
for (kes = 1; kes <= 100; kes++){
double distan = kes * distance;
E1[kes] = E1[kes] + gpspec * sin(distan) / distan;
}
}
}我已经放置了一个schedule(dynamic)来尝试补偿由循环覆盖的索引域ies * jes的三角形方面引入的线程之间的工作负载不平衡。这也取决于E1的定义方式,这可能会被编译器接受,也可能不会被接受。但在任何情况下,如果reduction(+: E1)不被接受,则始终可以使用critical构造手动执行缩减。
发布于 2020-10-13 22:36:53
在最内层的循环中已经有了一个omp parallel for杂注。要实现此效果,您可能需要通过设置编译器标志(例如,对于GCC编译器套件,这将是-fopenmp标志)来在编译器中启用OpenMP支持。您可能还需要#include omp.h报头。
但话虽如此,我怀疑你是否会从这种并行化中获得太多好处,因为你正在并行化的一次循环并不会做太多的工作。存在与并行化相关的运行时开销,它抵消了同时运行多个循环迭代的收益,所以我认为您不会获得太多收益。
https://stackoverflow.com/questions/64336192
复制相似问题