clang-p2996

Files

Krzysztof Drewniak 25d976b45c [ScalarizeMaskedMemIntr] Don't use a scalar mask on GPUs (#104842 )

ScalarizedMaskedMemIntr contains an optimization where the <N x i1> mask
is bitcast into an iN and then bit-tests with powers of two are used to
determine whether to load/store/... or not.

However, on machines with branch divergence (mainly GPUs), this is a
mis-optimization, since each i1 in the mask will be stored in a
condition register - that is, ecah of these "i1"s is likely to be a word
or two wide, making these bit operations counterproductive.

Therefore, amend this pass to skip the optimizaiton on targets that it
pessimizes.

Pre-commit tests #104645

2024-08-22 19:02:45 -05:00

expamd-masked-load.ll

…

expand-masked-gather.ll

…

expand-masked-scatter.ll

…

expand-masked-store.ll

…

lit.local.cfg

…