clang-p2996/llvm/lib/Transforms/Scalar/ScalarizeMaskedMemIntrin.cpp at 8fcb1263f42657ecbc355beff12500dfbcddee17

Files

Krzysztof Drewniak 25d976b45c [ScalarizeMaskedMemIntr] Don't use a scalar mask on GPUs (#104842 )

ScalarizedMaskedMemIntr contains an optimization where the <N x i1> mask
is bitcast into an iN and then bit-tests with powers of two are used to
determine whether to load/store/... or not.

However, on machines with branch divergence (mainly GPUs), this is a
mis-optimization, since each i1 in the mask will be stored in a
condition register - that is, ecah of these "i1"s is likely to be a word
or two wide, making these bit operations counterproductive.

Therefore, amend this pass to skip the optimizaiton on targets that it
pessimizes.

Pre-commit tests #104645

2024-08-22 19:02:45 -05:00

43 KiB

Raw Blame History

View Raw

43 KiB Raw Blame History

43 KiB

Raw Blame History