clang-p2996/llvm/lib/Target/AMDGPU/AMDGPUPromoteAlloca.cpp at 6a21dfaac66ffa39dc7faaec1cd7932099c052d4

Files

Harrison Hao 1a7f5f5833 [AMDGPU] Promote nestedGEP allocas to vectors (#141199 )

Supports the `nestedGEP`pattern that
 appears when an alloca is first indexed as an array element and then
 shifted with a byte‑offset GEP:

```llvm
  %SortedFragments = alloca [10 x <2 x i32>], addrspace(5), align 8
  %row  = getelementptr [10 x <2 x i32>], ptr addrspace(5) %SortedFragments, i32 0, i32 %j
  %elt1 = getelementptr i8, ptr addrspace(5) %row, i32 4
  %val  = load i32, ptr addrspace(5) %elt1
```

The pass folds the two levels of addressing into a single vector lane
 index and keeps the whole object in a VGPR:

```llvm
  %vec  = freeze <20 x i32> poison              ; alloca promote  <20 x i32>
  %idx0 = mul i32 %j, 2                         ; j * 2
  %idx  = add i32 %idx0, 1                      ; j * 2 + 1
  %val  = extractelement <20 x i32> %vec, i32 %idx
```

This eliminates the scratch read.

2025-06-02 16:20:14 +08:00

61 KiB

Raw Blame History

View Raw

61 KiB Raw Blame History

61 KiB

Raw Blame History