While lowering (#pragma omp target update from), clang's generated
.omp_task_entry. is setting up 9 arguments while calling
__tgt_target_data_update_nowait_mapper.
At the same time, in __tgt_target_data_update_nowait_mapper, call to
targetData<TaskAsyncInfoWrapperTy>() is converted to a sibcall assuming
it has the argument count listed in the signature.
AARCH64 asm sequence for this is as follows (removed unrelated insns):
`
.omp_task_entry..108:
sub sp, sp, #32
stp x29, x30, sp, #16 // 16-byte Folded Spill
add x29, sp, #16
str x8, sp, #8. // stack canary
str xzr, [sp]
bl __tgt_target_data_update_nowait_mapper
__tgt_target_data_update_nowait_mapper:
sub sp, sp, #32
stp x29, x30, sp, #16 // 16-byte Folded Spill
add x29, sp, #16
str x8, sp, #8 // stack canary
// Sibcall argument setup
adrp x8,
:got:_Z16targetDataUpdateP7ident_tR8DeviceTyiPPvS4_PlS5_S4_S4_R11AsyncInfoTyb
ldr x8, [x8,
:got_lo12:_Z16targetDataUpdateP7ident_tR8DeviceTyiPPvS4_PlS5_S4_S4_R11AsyncInfoTyb]
stp x9, x8, x29, #16
adrp x8, .L.str.8
add x8, x8, :lo12:.L.str.8
str x8, x29, #32. <==. This is the insn that erases $fp
ldp x29, x30, sp, #16 // 16-byte Folded Reload
add sp, sp, #32
// Sibcall
b
ZL10targetDataI22TaskAsyncInfoWrapperTyEvP7ident_tliPPvS4_PlS5_S4_S4_PFiS2_R8DeviceTyiS4_S4_S5_S5_S4_S4_R11AsyncInfoTybEPKcSD
`
On AArch64, call to __tgt_target_data_update_nowait_mapper in
.omp_task_entry. sets up only single space on stack and this results in
ovewriting $fp and subsequent stack corruption. This issue can be
credited to discrepancy of __tgt_target_data_update_nowait_mapper
signature in openmp/libomptarget/include/omptarget.h taking 13 arguments
while clang/lib/CodeGen/CGOpenMPRuntime.cpp and
llvm/include/llvm/Frontend/OpenMP/OMPKinds.def taking only 9 arguments.
This patch modifies __tgt_target_data_update_nowait_mapper signature to
match .omp_task_entry usage(and other 2 files mentioned above).
Co-authored-by: Kugan Vivekanandarajah <kvivekananda@nvidia.com>