clang-p2996

Files

Drew Kersnar a1e1a84d2c [NVPTX] Vectorize and lower 256-bit global loads/stores for sm_100+/ptx88+ (#139292 )

PTX 8.8+ introduces 256-bit-wide vector loads/stores under certain
conditions. This change extends the backend to lower these loads/stores.
It also overrides getLoadStoreVecRegBitWidth for NVPTX, allowing the
LoadStoreVectorizer to create these wider vector operations.

See the spec for the three relevant PTX instructions here:
- https://docs.nvidia.com/cuda/parallel-thread-execution/#data-movement-and-conversion-instructions-ld
- https://docs.nvidia.com/cuda/parallel-thread-execution/#data-movement-and-conversion-instructions-ld-global-nc
- https://docs.nvidia.com/cuda/parallel-thread-execution/#data-movement-and-conversion-instructions-st

2025-05-13 13:36:09 -07:00

AArch64

…

AMDGPU

[NFC] Precommit tests for an LSV patch (#138167 )

2025-05-01 12:50:31 -04:00

NVPTX

[NVPTX] Vectorize and lower 256-bit global loads/stores for sm_100+/ptx88+ (#139292 )

2025-05-13 13:36:09 -07:00

X86

[LoadStoreVectorizer] Postprocess and merge equivalence classes (#121861 )

2025-01-07 17:17:26 -08:00

int_sideeffect.ll

…