clang-p2996/llvm/lib/Target/AMDGPU/AMDGPUAtomicOptimizer.cpp at ef2cdfe393d01cd4935c806387ac912b5a2c8ced

Files

Jay Foad 9d08f276d7 [AMDGPU] Use reductions instead of scans in the atomic optimizer

If the result of an atomic operation is not used then it can be more
efficient to build a reduction across all lanes instead of a scan. Do
this for GFX10, where the permlanex16 instruction makes it viable. For
wave64 this saves a couple of dpp operations. For wave32 it saves one
readlane (which are generally bad for performance) and one dpp
operation.

Differential Revision: https://reviews.llvm.org/D98953

2021-03-26 15:38:14 +00:00

25 KiB

Raw Blame History

View Raw

25 KiB Raw Blame History

25 KiB

Raw Blame History