Currently we model subvector inserts and extracts as shuffles,
potentially going as far as scalarizing. If the types are legal then
they can just be simple zip/unzip operations, or possible even no-ops.
Change the cost to a relatively small one to ensure that simple loops
featuring such operations between fixed and scalable vector types that
are effectively the same at a given sve width can be unrolled and
further optimized.