[OpenMP] Fix num_iters in __kmpc_*_loop DeviceRTL functions (#133435)

This patch removes the addition of 1 to the number of iterations when
calling the following DeviceRTL functions:
- `__kmpc_distribute_for_static_loop*`
- `__kmpc_distribute_static_loop*`
- `__kmpc_for_static_loop*`

Calls to these functions are currently only produced by the OMPIRBuilder
from flang, which already passes the correct number of iterations to
these functions. By adding 1 to the received `num_iters` variable,
worksharing can produce incorrect results. This impacts flang OpenMP
offloading of `do`, `distribute` and `distribute parallel do`
constructs.

Expecting the application to pass `tripcount - 1` as the argument seems
unexpected as well, so rather than updating flang I think it makes more
sense to update the runtime.
This commit is contained in:
Sergio Afonso
2025-04-01 10:29:08 +01:00
committed by GitHub
parent e17d864f55
commit 66fca0674d

View File

@@ -911,19 +911,19 @@ public:
IdentTy *loc, void (*fn)(TY, void *), void *arg, TY num_iters, \
TY num_threads, TY block_chunk, TY thread_chunk) { \
ompx::StaticLoopChunker<TY>::DistributeFor( \
loc, fn, arg, num_iters + 1, num_threads, block_chunk, thread_chunk); \
loc, fn, arg, num_iters, num_threads, block_chunk, thread_chunk); \
} \
[[gnu::flatten, clang::always_inline]] void \
__kmpc_distribute_static_loop##BW(IdentTy *loc, void (*fn)(TY, void *), \
void *arg, TY num_iters, \
TY block_chunk) { \
ompx::StaticLoopChunker<TY>::Distribute(loc, fn, arg, num_iters + 1, \
ompx::StaticLoopChunker<TY>::Distribute(loc, fn, arg, num_iters, \
block_chunk); \
} \
[[gnu::flatten, clang::always_inline]] void __kmpc_for_static_loop##BW( \
IdentTy *loc, void (*fn)(TY, void *), void *arg, TY num_iters, \
TY num_threads, TY thread_chunk) { \
ompx::StaticLoopChunker<TY>::For(loc, fn, arg, num_iters + 1, num_threads, \
ompx::StaticLoopChunker<TY>::For(loc, fn, arg, num_iters, num_threads, \
thread_chunk); \
}