befa925aca introduced preliminary hoisting of COPY
instructions when the user of the COPY is inside the same loop. That
optimization appeared to be too aggressive and hoisted too many COPY's
greatly increasing register pressure causing performance regressions for
AMDGPU target.
This is intended to fix the regression by hoisting COPY instruction only
if either:
- User of COPY can be hoisted (other args are invariant)
or
- Hoisting COPY doesn't bring high register pressure
60 KiB
60 KiB