clang-p2996/mlir/lib/Conversion/AMDGPUToROCDL/AMDGPUToROCDL.cpp at dda73336ad22bd0b5ecda17040c50fb10fcbe5fb

Files

Krzysztof Drewniak b05c15259b [mlir][AMDGPU] Improve amdgpu.lds_barrier, add warnings (#77942 )

On some architectures (currently gfx90a, gfx94*, and gfx10**), we can
implement an LDS barrier using compiler intrinsics instead of inline
assembly, improving optimization possibilities and decreasing the
fragility of the underlying code.

Other AMDGPU chipsets continue to require inline assembly to implement
this barrier, as, by the default, the LLVM backend will insert waits on
global memory (s_waintcnt vmcnt(0)) before barriers in order to ensure
memory watchpoints set by debuggers work correctly.

Use of amdgpu.lds_barrier, on these architectures, imposes a tradeoff
between debugability and performance. The documentation, as well as the
generated inline assembly, have been updated to explicitly call
attention to this fact.

For chipsets that did not require the inline assembly hack, we move to
the s.waitcnt and s.barrier intrinsics, which have been added to the
ROCDL dialect. The magic constants used as an argument to the waitcnt
intrinsic can be derived from
llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.cpp

2024-03-11 10:06:49 -05:00

38 KiB

Raw Blame History

View Raw

38 KiB Raw Blame History

38 KiB

Raw Blame History