clang-p2996

Author	SHA1	Message	Date
Ryan Holt	847a6f8f0a	[mlir][MemRef] Add runtime bounds checking (#75817 ) This change adds (runtime) bounds checks for `memref` ops using the existing `RuntimeVerifiableOpInterface`. For `memref.load` and `memref.store`, we check that the indices are in-bounds of the memref's index space. For `memref.reinterpret_cast` and `memref.subview` we check that the resulting address space is in-bounds of the input memref's address space.	2023-12-22 11:49:15 +09:00
Jakub Kuderski	560564f51c	[mlir][vector][gpu] Align minf/maxf reduction kind names with arith (#75901 ) This is to avoid confusion when dealing with reduction/combining kinds. For example, see a recent PR comment: https://github.com/llvm/llvm-project/pull/75846#discussion_r1430722175. Previously, they were picked to mostly mirror the names of the llvm vector reduction intrinsics: https://llvm.org/docs/LangRef.html#llvm-vector-reduce-fmin-intrinsic. In isolation, it was not clear if `<maxf>` has `arith.maxnumf` or `arith.maximumf` semantics. The new reduction kind names map 1:1 to arith ops, which makes it easier to tell/look up their semantics. Because both the vector and the gpu dialect depend on the arith dialect, it's more natural to align names with those in arith than with the lowering to llvm intrinsics. Issue: https://github.com/llvm/llvm-project/issues/72354	2023-12-20 00:14:43 -05:00
Guray Ozen	5caae72d1a	[mlir][gpu] Productize `test-lower-to-nvvm` as `gpu-lower-to-nvvm` (#75775 ) The `test-lower-to-nvvm` pipeline serves as the common and proper pipeline for nvvm+host compilation, and it's used across our CUDA integration tests. This PR updates the `test-lower-to-nvvm` pipeline to `gpu-lower-to-nvvm` and moves it within `InitAllPasses.h`. The aim is to call it from Python, also having a standardize compilation process for nvvm.	2023-12-19 08:40:46 +01:00
Yinying Li	7bc6c4abe8	[mlir][print]Add functions for printing memref f16/bf16/i16 (#75094 ) 1. Added functions for printMemrefI16/f16/bf16. 2. Added a new integration test for all the printMemref functions.	2023-12-14 13:06:25 -05:00
Benjamin Maxwell	9505cf457f	[mlir][ArmSME][test] Use `only-if-required-by-ops` rather than `enable_arm_streaming_ignore` (NFC) (#75209 ) This moves the fix out of the IR and into the pass description, which seems nicer. It also works as an integration test for the `only-if-required-by-ops` flag :)	2023-12-13 10:29:28 +00:00
Matthias Springer	95d6aa21fb	[mlir][SparseTensor][NFC] Use `tensor.empty` for dense tensors (#74804 ) Use `tensor.empty` + initialization for dense tensors instead of `bufferization.alloc_tensor`.	2023-12-12 08:56:47 +09:00
Aart Bik	21213f39e2	[mlir][sparse] fix uninitialized dense tensor out in conv2d test (#74884 ) Note, tensor.empty may feed into SPARSE output (meaning it truly has no values yet), but for a DENSE output, it should always have an initial value. We ran a verifier over all our tests and this is the only remaining omission.	2023-12-08 12:44:57 -08:00
Aart Bik	ec9e49796d	[mlir][sparse] add sparse convolution with 5x5 kernel (#74793 ) Also unifies some of the test set up parts in other conv tests	2023-12-07 18:11:04 -08:00
Aart Bik	7003e255d3	[mlir][sparse] code formatting (NFC) (#74779 )	2023-12-07 15:46:24 -08:00
Peiming Liu	78e2b74f96	[mlir][sparse] fix bugs when generate sparse conv_3d kernels. (#74561 )	2023-12-06 15:59:10 -08:00
Peiming Liu	8206b75a1e	[mlir][sparse] fix crash when generate rotated convolution kernels. (#74146 )	2023-12-01 14:13:57 -08:00
Andrzej Warzyński	bc802407d1	[mlir][sve][nfc] Merge the integration tests for linalg.matmul (#74059 ) At the moment the logic to tile and vectorize `linalg.matmul` is duplicated in multiple test files: * matmul.mlir * matmul_mixed_ty.mlir Instead, this patch uses `transform.foreach` to apply the same sequence to multiple functions within the same test file (e.g. `matmul_f32` and `matmul_mixed_ty` as defined in the original files). This allows us to merge relevant test files.	2023-12-01 17:39:48 +00:00
Spenser Bauman	0d87e25779	[mlir][tosa] Improve lowering to tosa.fully_connected (#73049 ) The current lowering of tosa.fully_connected produces a linalg.matmul followed by a linalg.generic to add the bias. The IR looks like the following: %init = tensor.empty() %zero = linalg.fill ins(0 : f32) outs(%init) %prod = linalg.matmul ins(%A, %B) outs(%zero) // Add the bias %initB = tensor.empty() %result = linalg.generic ins(%prod, %bias) outs(%initB) { // add bias and product } This has two down sides: 1. The tensor.empty operations typically result in additional allocations after bufferization 2. There is a redundant traversal of the data to add the bias to the matrix product. This extra work can be avoided by leveraging the out-param of linalg.matmul. The new IR sequence is: %init = tensor.empty() %broadcast = linalg.broadcast ins(%bias) outs(%init) %prod = linalg.matmul ins(%A, %B) outs(%broadcast) In my experiments, this eliminates one loop and one allocation (post bufferization) from the generated code.	2023-12-01 15:16:51 +00:00
Andrzej Warzyński	f42ce1621f	[mlir][sve][nfc] Update a test to use transform-interpreter (#73771 ) This is a follow-up of #70040 in which the test updated here was missed. Includes a few additional NFC changes in preparation for extending this test.	2023-12-01 10:08:00 +00:00
Benjamin Maxwell	eaff02f28e	[mlir][ArmSME] Switch to an attribute-based tile allocation scheme (#73253 ) This reworks the ArmSME dialect to use attributes for tile allocation. This has a number of advantages and corrects some issues with the previous approach: * Tile allocation can now be done ASAP (i.e. immediately after `-convert-vector-to-arm-sme`) * SSA form for control flow is now supported (e.g.`scf.for` loops that yield tiles) * ArmSME ops can be converted to intrinsics very late (i.e. after lowering to control flow) * Tests are simplified by removing constants and casts * Avoids correctness issues with representing LLVM `immargs` as MLIR values - The tile ID on the SME intrinsics is an `immarg` (so is required to be a compile-time constant), `immargs` should be mapped to MLIR attributes (this is already the case for intrinsics in the LLVM dialect) - Using MLIR values for `immargs` can lead to invalid LLVM IR being generated (and passes such as -cse making incorrect optimizations) As part of this patch we bid farewell to the following operations: ```mlir arm_sme.get_tile_id : i32 arm_sme.cast_tile_to_vector : i32 to vector<[4]x[4]xi32> arm_sme.cast_vector_to_tile : vector<[4]x[4]xi32> to i32 ``` These are now replaced with: ```mlir // Allocates a new tile with (indeterminate) state: arm_sme.get_tile : vector<[4]x[4]xi32> // A placeholder operation for lowering ArmSME ops to intrinsics: arm_sme.materialize_ssa_tile : vector<[4]x[4]xi32> ``` The new tile allocation works by operations implementing the `ArmSMETileOpInterface`. This interface says that an operation needs to be assigned a tile ID, and may conditionally allocate a new SME tile. Operations allocate a new tile by implementing... ```c++ std::optional<arm_sme::ArmSMETileType> getAllocatedTileType() ``` ...and returning what type of tile the op allocates (ZAB, ZAH, etc). Operations that don't allocate a tile return `std::nullopt` (which is the default behaviour). Currently the following ops are defined as allocating: ```mlir arm_sme.get_tile arm_sme.zero arm_sme.tile_load arm_sme.outerproduct // (if no accumulator is specified) ``` Allocating operations become the roots for the tile allocation pass, which currently just (naively) assigns all transitive uses of a root operation the same tile ID. However, this is enough to handle current use cases. Once tile IDs have been allocated subsequent rewrites can forward the tile IDs to any newly created operations.	2023-11-30 10:22:22 +00:00
Andrzej Warzyński	4b2ba5a61a	[mlir][sve] Add an e2e for linalg.matmul with mixed types (#73773 ) Apart from the test itself, this patch also updates a few patterns to fix how new VectorType(s) are created. Namely, it makes sure that "scalability" is correctly propagated. Regression tests will be updated seperately while auditing Vector dialect tests in the context of scalable vectors: * https://github.com/orgs/llvm/projects/23	2023-11-29 21:21:10 +00:00
Aart Bik	1944c4f76b	[mlir][sparse] rename DimLevelType to LevelType (#73561 ) The "Dim" prefix is a legacy left-over that no longer makes sense, since we have a very strict "Dimension" vs. "Level" definition for sparse tensor types and their storage.	2023-11-27 14:27:52 -08:00
Cullen Rhodes	fae3964cbc	[mlir][linalg] Add an e2e test for linalg.matmul to ArmSME (#72144 ) This patch adds an integration test lowering a linalg.matmul to SME via vector.outerproduct. It's similar to the linalg.matmul_transpose_a e2e test added recently in as well as vector transpose canonicalizations, to lower the following sequence (taken from the inner loop): ``` %subview = memref.subview %arg0[%arg3, %arg5] [%2, 1] [1, 1] : memref<?x?xf32, strided<[?, ?], offset: ?>> to memref<?x1xf32, strided<[?, ?], offset: ?>> %mask = vector.create_mask %2, %c1 : vector<[4]x1xi1> %0 = vector.transfer_read %subview[%c0, %c0], %pad, %mask {in_bounds = [true, true]} : memref<?x1xf32, strided<[?, ?], offset: ?>>, vector<[4]x1xf32> %1 = vector.transpose %0, [1, 0] : vector<[4]x1xf32> to vector<1x[4]xf32> %2 = vector.extract %1[0] : vector<[4]xf32> from vector<1x[4]xf32> ``` Rank-2 vectors with leading scalable dim can't be type converted to an array. TransferReadDropUnitDimsPattern drops the unit dim on the vector.transfer_read so it can be lowered via the generic path (to SVE). The transpose canonicalizations lower the transpose to a shape_cast which folds away.	2023-11-23 08:53:43 +00:00
Benjamin Maxwell	dff97c1e4c	[mlir][ArmSME] Move ArmSME -> intrinsics lowerings to `convert-arm-sme-to-llvm` pass (#72890 ) This gives more flexibility with when these lowerings are performed, without also lowering unrelated vector ops. This is a NFC (other than adding a new `-convert-arm-sme-to-llvm` pass)	2023-11-22 13:36:36 +00:00
Aart Bik	c97e4273e2	[mlir][sparse] test on read/convert permuted 3d sparse tensors (#72925 ) 3! = 6	2023-11-21 09:26:04 -08:00
Peiming Liu	b52eb7c2fe	[mlir][sparse] add a csr x bsr matmul test case (#73012 )	2023-11-21 09:14:45 -08:00
Aart Bik	6352a07ba6	[mlir][sparse] test four row/col major versions of BSR (#72898 ) Note, this is a redo of https://github.com/llvm/llvm-project/pull/72712 which was reverted due to time outs in the bot. I have timed the tests on various settings, and it does not even hit the top 20 of integration tests. To be safe, I removed the SIMD version of the tests, just keeping libgen/direcIR paths (which are the most important to test for us). I will also keep an eye on https://lab.llvm.org/buildbot/#/builders/264/builds after submitting to make sure there is no repeat.	2023-11-20 12:28:16 -08:00
Mehdi Amini	2b71f91b06	Revert "[mlir][sparse] stress test BSR" (#72735 ) Reverts llvm/llvm-project#72712 This causes timeouts on the bots.	2023-11-17 19:06:49 -08:00
Aart Bik	813aaf39f9	[mlir][sparse] stress test BSR (#72712 ) I always enjoy a good stress test. This end-to-end integration test ensures the major ordering of both the block and within the block are correctly handled (giving row-row, row-col, col-row and col-row as options).	2023-11-17 15:47:38 -08:00
Aart Bik	6b56dd6a93	[mlir][sparse] enable 2:4 test for both directIR/libgen path (#72593 )	2023-11-17 09:40:32 -08:00
Aart Bik	83cf0dc982	[mlir][sparse] implement direct IR alloc/empty/new for non-permutations (#72585 ) This change implements the correct level sizes set up for the direct IR codegen fields in the sparse storage scheme. This brings libgen and codegen together again. This is step 3 out of 3 to make sparse_tensor.new work for BSR	2023-11-16 17:17:41 -08:00
Aart Bik	5535e48be2	[mlir][sparse] Capitalize class comment (#72436 )	2023-11-15 13:04:27 -08:00
Aart Bik	58090617c6	[mlir][sparse] fix broken test (merge conflict marker was left) (#72438 )	2023-11-15 13:01:43 -08:00
Tim Harvey	dce7a7cf69	Changed all code and comments that used the phrase "sparse compiler" to instead use "sparsifier" (#71875 ) The changes in this p.r. mostly center around the tests that use the flag sparse_compiler (also: sparse-compiler).	2023-11-15 20:12:35 +00:00
Aart Bik	a89c15aa2e	[mlir][sparse] enable Python BSR test (#72325 )	2023-11-14 15:35:03 -08:00
Aart Bik	a40900211a	[mlir][sparse] set rwx permissions to consistent values (#72311 ) some files had "x" permission set, others were missing "r"	2023-11-14 13:32:55 -08:00
Aart Bik	5f32bcfbae	[mlir][sparse][gpu] re-enable all GPU libgen tests (#72185 ) Previous change no longer properly used the GPU libgen pass (even though most tests still passed falling back to CPU). This revision puts the proper pass order into place. Also bit of a cleanup of CPU codegen vs. libgen setup.	2023-11-14 09:06:15 -08:00
Benjamin Maxwell	783ac3b6fb	[mlir][ArmSME] Make use of backend function attributes for enabling ZA storage (#71044 ) Previously, we were inserting za.enable/disable intrinsics for functions with the "arm_za" attribute (at the MLIR level), rather than using the backend attributes. This was done to avoid a dependency on the SME ABI functions from compiler-rt (which have only recently been implemented). Doing things this way did have correctness issues, for example, calling a streaming-mode function from another streaming-mode function (both with ZA enabled) would lead to ZA being disabled after returning to the caller (where it should still be enabled). Fixing issues like this would require re-doing the ABI work already done in the backend within MLIR. Instead, this patch switches to use the "arm_new_za" (backend) attribute for enabling ZA for an MLIR function. For the integration tests, this requires some way of linking the SME ABI functions. This is done via the `%arm_sme_abi_shlib` lit substitution. By default, this expands to a stub implementation of the SME ABI functions, but this can be overridden by providing the `ARM_SME_ABI_ROUTINES_SHLIB` CMake cache variable (pointing it at an alternative implementation). For now, the ArmSME integration tests pass with just stubs, as we don't make use of nested ZA-enabled calls. A future patch may add an option to compiler-rt to build the SME builtins into a standalone shared library to allow easily building/testing with the actual implementation.	2023-11-14 12:50:38 +00:00
Peiming Liu	269685545e	[mlir][sparse] remove filter-loop based algorithm support to handle a… (#71840 ) …ffine subscript expressions.	2023-11-13 11:36:49 -08:00
Aart Bik	af8428c0d9	[mlir][sparse] unify support of (dis)assemble between direct IR/lib path (#71880 ) Note that the (dis)assemble operations still make some simplfying assumptions (e.g. trailing 2-D COO in AoS format) but now at least both the direct IR and support library path behave exactly the same. Generalizing the ops is still TBD.	2023-11-13 10:05:00 -08:00
Peiming Liu	bfe08c094d	[mlir][sparse] support sparsifying 2:4 block sparsity (#71749 )	2023-11-10 12:25:53 -08:00
Cullen Rhodes	fe8c649d01	[mlir][linalg] Add an e2e test for linalg.matmul_transpose_a to ArmSME (#71644 ) This patch adds an integration test demonstrating the first e2e example lowering a linalg.matmul to SME via vector.outerproduct. The test uses a 'linalg.matmul_transpose_a' rather than 'linalg.matmul' since the latter emits a 'vector.transfer_read' with a vector type of 'vector<[4]x1xf32>' that can't be currently lowered via generic (SVE) path, since it has leading scalable dim.	2023-11-10 07:52:39 +00:00
Cullen Rhodes	4240b1790f	[mlir][ArmSME] Lower transfer_write + transpose to vertical store (#71181 ) This patch extends the lowering of vector.transfer_write in VectorToArmSME to support in-flight transpose via SME vertical store.	2023-11-10 07:51:06 +00:00
Peiming Liu	5a6ffc5503	[mlir][sparse] temporarily disable BSR GPU libgen tests. (#71870 )	2023-11-09 13:54:02 -08:00
Peiming Liu	a2d9d2e1d9	[mlir][sparse] re-enable aarch64 test. (#71855 ) Should have been fixed by initializing output tensor to zeros in https://github.com/llvm/llvm-project/pull/71845	2023-11-09 11:46:52 -08:00
Peiming Liu	30e4b09d49	[mlir][sparse] try fix flanky test. (#71845 )	2023-11-09 11:10:59 -08:00
Peiming Liu	4eb01f7d5e	[mlir][sparse] disable aarch64 test to fix buildbot error. (#71818 ) To fix https://github.com/llvm/llvm-project/pull/71448	2023-11-09 10:50:58 -08:00
Peiming Liu	c99951d491	[mlir][sparse] end-to-end matmul between Dense and BSR tensors (#71448 )	2023-11-08 11:28:00 -08:00
Aart Bik	5ef446790f	[mlir][sparse][gpu] cleanup GPUDataTransferStrategy (#71615 ) The flag seems to be doing practically the same thing for zero cost and pinned dma. In addition, the register host is not truly the right zero cost mechanism according to Thomas. So we are simplifying the setup for now, until we have a better definition for what to implement and test. https://github.com/llvm/llvm-project/issues/64316	2023-11-08 09:45:11 -08:00
Cullen Rhodes	9783cf448a	[mlir][ArmSME] Add support for lowering masked tile_load ops (#70915 ) This patch extends ArmSMEToSCF to support lowering of masked tile_load ops. Only masks created by 'vector.create_mask' are currently supported. There are two lowerings depending on the pad. For pad of constant zero, the tile is first zeroed, then only active rows are loaded. For non-zero pad, the scalar pad is broadcast to a 1-D vector and a regular 'vector.masked_load' (will be lowered to SVE, not SME) loads each slice, with padding specified as a passthru and the 2-D mask combined into a 1-D mask. The resulting slice is then inserted into the tile with 'arm_sme.move_vector_to_tile_slice'.	2023-11-08 09:02:09 +00:00
Tim Harvey	c43e627457	Changed the phrase sparse-compiler to sparsifier in comments (#71578 ) When the Powers That Be decided that the name "sparse compiler" should be changed to "sparsifier", we negected to change some of the comments in the code; this pull request completes the name change.	2023-11-07 20:55:00 +00:00
Aart Bik	160d483b1f	[mlir][sparse] implement loose-compressed/2:4 on direct IR codegen path (#71461 ) Fills in the missing cases for direct IR codegen. Note that non-permutation handling is still TBD.	2023-11-06 17:30:56 -08:00
Cullen Rhodes	fbc70c5a9e	[mlir][ArmSME] remove addressof ops to undefined symbols (NFC) The string symbols were replaced with 'vector.print str' calls in `061d978043` (#68973) but the addressof ops weren't removed. This was missed as the test is currently XFAIL'ed.	2023-11-06 11:43:43 +00:00
Cullen Rhodes	ed350bb3d8	[mlir][ArmSME] Add support for lowering masked tile_store ops (#71180 ) This patch extends ArmSMEToSCF to support lowering of masked tile_store ops. Only masks created by 'vector.create_mask' are currently supported. Example: %mask = vector.create_mask %c3, %c2 : vector<[4]x[4]xi1> arm_sme.tile_store %tile, %dest[%c0, %c0], %mask : memref<?x?xi32>, vector<[4]x[4]xi32> Produces: %num_rows = arith.constant 3 : index %num_cols = vector.create_mask %c2 : vector<[4]xi1> scf.for %slice_idx = %c0 to %num_rows step %c1 arm_sme.store_tile_slice %tile, %slice_idx, %num_cols, %dest[%slice_idx, %c0] : memref<?x?xi32>, vector<[4]xi1>, vector<[4]x[4]xi32>	2023-11-06 11:18:57 +00:00
Christian Ulmann	52491c99fa	[MLIR][LLVM] Remove typed pointer remnants from integration tests (#71208 ) This commit removes all LLVM dialect typed pointers from the integration tests. Typed pointers have been deprecated for a while now and it's planned to soon remove them from the LLVM dialect. Related PSA: https://discourse.llvm.org/t/psa-removal-of-typed-pointers-from-the-llvm-dialect/74502	2023-11-03 21:21:25 +01:00

1 2 3 4 5 ...

706 Commits