Previously, `getRegUsageForType` was implemented using `getTypeLegalizationCost`. `getRegUsageForType` is used by the loop vectorizer to estimate the register pressure caused by using a vector type. However, `getTypeLegalizationCost` currently only appears to understand splitting and not scalarization, so significantly underestimates the register requirements. Instead, use `getNumRegisters`, which understands when scalarization can occur (via computeRegisterProperties). This was discovered while investigating D118979 (Set maximum VF with shouldMaximizeVectorBandwidth), where under fixed-length 512-bit SVE the loop vectorizer previously ends up costing an v128i1 as 2 v64i* registers where it actually occupies 128 i32 registers. I'm sending this patch early for comment, I'm still doing some sanity checking with LNT. I note that getRegisterClassForType appears to return VectorRC even though the type in question (large vNi1 types) end up occupying scalar registers. That might be worth fixing too. Differential Revision: https://reviews.llvm.org/D125918
16 KiB
16 KiB