If we have a shifted mask, we may be able to reduce the load width
to the width of the non-zero part of the mask and use an offset
to the base address to remove the srl. The offset is given by
C+trailingzeros(ShiftedMask).
Then we add a final shl to restore the trailing zero bits.
I've use the ARM test because that's where the existing (and (srl
(load))) tests were.
The X86 test was modified to keep the H register.
42 KiB
42 KiB