Files
clang-p2996/llvm/test/Transforms/ScalarizeMaskedMemIntrin/AMDGPU
Krzysztof Drewniak 25d976b45c [ScalarizeMaskedMemIntr] Don't use a scalar mask on GPUs (#104842)
ScalarizedMaskedMemIntr contains an optimization where the <N x i1> mask
is bitcast into an iN and then bit-tests with powers of two are used to
determine whether to load/store/... or not.

However, on machines with branch divergence (mainly GPUs), this is a
mis-optimization, since each i1 in the mask will be stored in a
condition register - that is, ecah of these "i1"s is likely to be a word
or two wide, making these bit operations counterproductive.

Therefore, amend this pass to skip the optimizaiton on targets that it
pessimizes.

Pre-commit tests #104645
2024-08-22 19:02:45 -05:00
..