Figure 4
The OpenMP reduction algorithm scheme for six threads. The object array is subdivided into submatrices (blue) and distributed between the threads, shown as object layers 1–6. Each submatrix is again subdivided into smaller blocks of 64 cache lines and a consecutive sequence number (e.g. 1–28 for the first thread). On each addition, a thread fetches the current sequence counter and increases it atomically. To avoid having multiple threads working on the same array block, a reservation lock is used. If blocks are not fully covered (orange), they are treated separately. |