Files
clang-p2996/llvm/test/CodeGen/WebAssembly
Sam Parker 103119a435 [WebAssembly] Lower wide SIMD i8 muls (#130785)
Currently, 'wide' i32 simd multiplication, with extended i8 elements,
will perform the multiplication with i32 So, for IR like the following:
```
  %wide.a = sext <8 x i8> %a to <8 x i32>
  %wide.b = sext <8 x i8> %a to <8 x i32>
  %mul = mul <8 x i32> %wide.a, %wide.b
  ret <8 x i32> %mul
```

We would generate the following sequence:
```
  i16x8.extend_low_i8x16_s $push6=, $1
  local.tee $push5=, $3=, $pop6
  i32x4.extmul_low_i16x8_s $push0=, $pop5, $3
  v128.store 0($0), $pop0
  i8x16.shuffle $push1=, $1, $1, 4, 5, 6, 7, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0
  i16x8.extend_low_i8x16_s $push4=, $pop1
  local.tee $push3=, $1=, $pop4
  i32x4.extmul_low_i16x8_s $push2=, $pop3, $1
  v128.store 16($0), $pop2
  return
```

But now we perform the multiplication with i16, resulting in:
```
  i16x8.extmul_low_i8x16_s $push3=, $1, $1
  local.tee $push2=, $1=, $pop3
  i32x4.extend_high_i16x8_s $push0=, $pop2
  v128.store 16($0), $pop0
  i32x4.extend_low_i16x8_s $push1=, $1
  v128.store 0($0), $pop1
  return
```
2025-03-21 06:57:57 +00:00
..