The `test-lower-to-nvvm` pipeline serves as the common and proper
pipeline for nvvm+host compilation, and it's used across our CUDA
integration tests.
This PR updates the `test-lower-to-nvvm` pipeline to `gpu-lower-to-nvvm`
and moves it within `InitAllPasses.h`. The aim is to call it from
Python, also having a standardize compilation process for nvvm.
Note, tensor.empty may feed into SPARSE output (meaning it truly has no
values yet), but for a DENSE output, it should always have an initial
value. We ran a verifier over all our tests and this is the only
remaining omission.
The "Dim" prefix is a legacy left-over that no longer makes sense, since
we have a very strict "Dimension" vs. "Level" definition for sparse
tensor types and their storage.
Note, this is a redo of https://github.com/llvm/llvm-project/pull/72712
which was reverted due to time outs in the bot. I have timed the tests
on various settings, and it does not even hit the top 20 of integration
tests. To be safe, I removed the SIMD version of the tests, just keeping
libgen/direcIR paths (which are the most important to test for us).
I will also keep an eye on
https://lab.llvm.org/buildbot/#/builders/264/builds after submitting to
make sure there is no repeat.
I always enjoy a good stress test. This end-to-end integration test
ensures the major ordering of both the block and within the block are
correctly handled (giving row-row, row-col, col-row and col-row as
options).
This change implements the correct *level* sizes set up for the direct
IR codegen fields in the sparse storage scheme. This brings libgen and
codegen together again.
This is step 3 out of 3 to make sparse_tensor.new work for BSR
Previous change no longer properly used the GPU libgen pass (even though
most tests still passed falling back to CPU). This revision puts the
proper pass order into place. Also bit of a cleanup of CPU codegen vs.
libgen setup.
Note that the (dis)assemble operations still make some simplfying
assumptions (e.g. trailing 2-D COO in AoS format) but now at least both
the direct IR and support library path behave exactly the same.
Generalizing the ops is still TBD.
The flag seems to be doing practically the same thing for zero cost and
pinned dma. In addition, the register host is not truly the right zero
cost mechanism according to Thomas. So we are simplifying the setup for
now, until we have a better definition for what to implement and test.
https://github.com/llvm/llvm-project/issues/64316
When the Powers That Be decided that the name "sparse compiler" should
be changed to "sparsifier", we negected to change some of the comments
in the code; this pull request completes the name change.
This adds COO and loose compressed to output testing. Also prepares BSR
for output testing, but needs the conversion to work first. Cleanup of
stale TODOs
Updates:
1. Infer lvlToDim from dimToLvl
2. Add more tests for block sparsity
3. Finish TODOs related to lvlToDim, including adding lvlToDim to python
binding
Verification of lvlToDim that user provides will be implemented in the
next PR.
The vector.extract assembly format currently only contains the source
type, for example:
%1 = vector.extract %0[1] : vector<3x7x8xf32>
it's not immediately obvious if this is the source or result type. This
patch improves the assembly format to make this clearer, so the above
becomes:
%1 = vector.extract %0[1] : vector<7x8xf32> from vector<3x7x8xf32>