When storing a scalable vector and loading a fixed-size vector, where the scalable vector is known to be larger based on vscale_range, perform store-to-load forwarding through temporary @llvm.vector.extract calls. InstCombine then folds the insert/extract pair away. The usecase is shown in https://godbolt.org/z/KT3sMrMbd, which shows that clang generates IR that matches this pattern when the "arm_sve_vector_bits" attribute is used: ```c typedef svfloat32_t svfloat32_fixed_t __attribute__((arm_sve_vector_bits(512))); struct svfloat32_wrapped_t { svfloat32_fixed_t v; }; static inline svfloat32_wrapped_t add(svfloat32_wrapped_t a, svfloat32_wrapped_t b) { return {svadd_f32_x(svptrue_b32(), a.v, b.v)}; } svfloat32_wrapped_t foo(svfloat32_wrapped_t a, svfloat32_wrapped_t b) { // The IR pattern this patch matches is generated for this return: return add(a, b); } ```
123 KiB
123 KiB