When storing a scalable vector and loading a fixed-size vector, where
the
scalable vector is known to be larger based on vscale_range, perform
store-to-load forwarding through temporary @llvm.vector.extract calls.
InstCombine then folds the insert/extract pair away.
The usecase is shown in https://godbolt.org/z/KT3sMrMbd, which shows
that clang generates IR that matches this pattern when the
"arm_sve_vector_bits" attribute is used:
```c
typedef svfloat32_t svfloat32_fixed_t
__attribute__((arm_sve_vector_bits(512)));
struct svfloat32_wrapped_t {
svfloat32_fixed_t v;
};
static inline svfloat32_wrapped_t
add(svfloat32_wrapped_t a, svfloat32_wrapped_t b) {
return {svadd_f32_x(svptrue_b32(), a.v, b.v)};
}
svfloat32_wrapped_t
foo(svfloat32_wrapped_t a, svfloat32_wrapped_t b) {
// The IR pattern this patch matches is generated for this return:
return add(a, b);
}
```