When we have an actual shuffle, we can impose the additional restriction that the mask replicates the elements of the first operand, so we know the replication factor as a ratio of output and op0 vector sizes.
175 KiB
175 KiB