clang-p2996

Author	SHA1	Message	Date
Durgadoss R	13d6233e77	[MLIR][NVGPU] Fix nvgpu_arrive syntax in matmulBuilder.py (#113713 ) This patch updates the syntax for nvgpu_arrive Op in matmulBuilder.py. This fixes the compilation error for this test. For the warp-specialized matmul_kernel implementation, removing the WaitGroupSyncOp (after the mma-main-loop) fixes the hang observed. With these two fixes, the test compiles and executes successfully on an sm90a machine. Signed-off-by: Durgadoss R <durgadossr@nvidia.com>	2024-10-26 11:15:50 +05:30
Andrzej Warzyński	37ad65ffb6	[mlir][arith] Remove some e2e tests (#112012 ) I am removing the recently added integration test for various Arith Ops. These operations and their lowerings are effectively already verified by the Arith-to-LLVM conversion tests in: * "mlir/test/Conversion/ArithToLLVM/arith-to-llvm.mlir" I've noticed that a few variants of `arith.cmpi` were missing in that file - those are added here as well. This is a follow-up for this discussion: * https://github.com/llvm/llvm-project/pull/92272 See also the recent update to our guidelines on e2e tests in MLIR: * https://github.com/llvm/mlir-www/pull/203	2024-10-16 07:43:49 +01:00
Durgadoss R	a8b5115441	[MLIR][NVGPU] Fix the cga_cluster.mlir test (#112191 ) This patch fixes the sm90 cluster test by: * Fixing a typo in LowerGpuOpsToNVVMOps where one of the ClusterDim Op conversion pattern should actually be for the ClusterDimBlocks Op. This addresses the compilation error for this test. * The grid-size should be (4,4,1) instead of (2,2,1). This passes the scf-if check against the threshold of 3 below and actually generates the required prints from the GPU. Signed-off-by: Durgadoss R <durgadossr@nvidia.com>	2024-10-14 19:44:13 +05:30
Durgadoss R	9432f7074c	[MLIR][NVGPU-Tests] Fix a failing sm90 test (#111731 ) The memref.expand_shape explicitly takes an output_shape now. This patch adds it to the Op and fixes the failing test. Signed-off-by: Durgadoss R <durgadossr@nvidia.com>	2024-10-10 10:51:59 +05:30
Matthias Springer	99c05b2690	[mlir] Remove `-finalizing-bufferize` from tests (#111177 ) This pass is not needed when the IR was bufferized with `-one-shot-bufferize`.	2024-10-04 18:14:26 +02:00
Matthias Springer	8e33ff7d56	[mlir][GPU][NFC] Move `dump-ptx.mlir` test case (#111142 )	2024-10-04 15:13:20 +02:00
Andrzej Warzyński	56d6b56739	[mlir][vector] Relax the requirements on broadcast dims (#99341 ) NOTE: This is a follow-up for #97049 in which the `in_bounds` attribute was made mandatory. This PR updates the semantics of the `in_bounds` attribute so that broadcast dimensions are no longer required to be "in bounds". Specifically, these xfer_read/xfer_write Ops become valid after this change: ```mlir %read = vector.transfer_read %A[%base1, %base2], %pad {in_bounds = [false], permutation_map = affine_map<(d0, d1) -> (0)>} {permutation_map = affine_map<(d0, d1) -> (0)>} : memref<?x?xf32>, vector<9xf32> vector.transfer_write %vec, %A[%base1, %base2], {in_bounds = [false], permutation_map = affine_map<(d0, d1) -> (0)>} {permutation_map = affine_map<(d0, d1) -> (0)>} : vector<9xf32>, memref<?x?xf32> ``` Note that the value `false` merely means "may run out-of-bounds", i.e., the corresponding access can still be "in bounds". In fact, the folder for xfer Ops is also updated () and will update the attribute value corresponding to broadcast dims to `true` if all non-broadcast dims are marked as "in bounds". Note that this PR doesn't change any of the lowerings. The changes in "SuperVectorize.cpp", "Vectorization.cpp" and "AffineMap.cpp" are simple reverts of recent changes in #97049. Those were only meant to facilitate making `in_bounds` mandatory and to work around the extra requirements for broadcast dims (those requirements ere removed in this PR). All changes in tests are also reverts of changes from #97049. For context, here's a PR in which "broadcast" dims where forced to always be "in-bounds": https://reviews.llvm.org/D102566 (*) See `foldTransferInBoundsAttribute`.	2024-10-04 07:41:20 +01:00
Mateusz Sokół	e3686f1e44	[MLIR][sparse] Fix SparseTensor `test_output.py` test (#110882 ) This PR fixes a test failure introduced in https://github.com/llvm/llvm-project/pull/109135	2024-10-02 15:55:00 -07:00
Mateusz Sokół	b50ce4c81e	[MLIR][sparse] Add `soa` property to `sparse_tensor` Python bindings (#109135 )	2024-10-02 09:07:55 -07:00
Guray Ozen	816134b333	[MLIR] Dump sass (#110227 ) This PR dump sass by using nvdiasm	2024-09-27 13:52:15 +02:00
Aart Bik	0e34dbb4f4	[mlir][sparse] fix bug with all-dense assembler (#108615 ) When only all-dense "sparse" tensors occur in a function prototype, the assembler would skip the method conversion purely based on input/output counts. It should rewrite based on the presence of any annotation, however.	2024-09-13 17:24:48 -07:00
Hugo Trachino	8aeb104ce4	[mlir][SME] Update E2E test to show optional loop optimisation (NFC) (#107585 ) Introduces loop hoisting to ARM SME E2E tests to allow the hoisting of the tile load offering very important speedup. Discussed here : https://discourse.llvm.org/t/mlir-for-arm-sme-reducing-tile-data-transfers/80065/2	2024-09-10 09:25:01 +01:00
Benjamin Maxwell	833ce5d27b	[mlir][ArmSME] Fix test after #98043 (NFC)	2024-08-30 10:16:44 +00:00
Maciej Gabka	95d2d1cba0	Move stepvector intrinsic out of experimental namespace (#98043 ) This patch is moving out stepvector intrinsic from the experimental namespace. This intrinsic exists in LLVM for several years now, and is widely used.	2024-08-28 12:48:20 +01:00
Jacob Yu	1c46fc00f5	[mlir][arith] Add comparison integration tests (#96974 ) Comparison operations regression tests, from the original larger PR that has been broken down: https://github.com/llvm/llvm-project/pull/92272 --------- Co-authored-by: Jakub Kuderski <kubakuderski@gmail.com> Co-authored-by: Andrzej Warzyński <andrzej.warzynski@gmail.com>	2024-08-25 13:59:25 -04:00
Peiming Liu	f607102a0d	[mlir][sparse] partially support lowering sparse coiteration loops to scf.while/for. (#105565 )	2024-08-23 10:47:44 -07:00
Christian Sigg	b4ac5c4b7c	[mlir][cuda] NFC: Remove accidentally committed 'asd' file. (#105491 ) Co-authored-by: Christian Sigg <chsigg@users.noreply.github.com>	2024-08-22 10:52:50 +02:00
Benjamin Maxwell	ea28668f8c	[mlir][ArmSME] Remove XFAILs (#104758 ) Buildbots have been updated to QEMU 9.0.2: https://github.com/llvm/llvm-project/pull/104758#issuecomment-2296592845	2024-08-19 14:55:09 +01:00
Zhaoshi Zheng	fe55c34d19	[MLIR][test] Run SVE and SME Integration tests using qemu-aarch64 (#101568 ) To run integration tests using qemu-aarch64 on x64 host, below flags are added to the cmake command when building mlir/llvm: -DMLIR_INCLUDE_INTEGRATION_TESTS=ON \ -DMLIR_RUN_ARM_SVE_TESTS=ON \ -DMLIR_RUN_ARM_SME_TESTS=ON \ -DARM_EMULATOR_EXECUTABLE="<...>/qemu-aarch64" \ -DARM_EMULATOR_OPTIONS="-L /usr/aarch64-linux-gnu" \ -DARM_EMULATOR_MLIR_CPU_RUNNER_EXECUTABLE="<llvm_arm64_build_top>/bin/mlir-cpu-runner-arm64" \ -DARM_EMULATOR_LLI_EXECUTABLE="<llvm_arm64_build_top>/bin/lli" \ -DARM_EMULATOR_UTILS_LIB_DIR="<llvm_arm64_build_top>/lib" The last three above are prebuilt on, or cross-built for, an aarch64 host. This patch introduced substittutions of "%native_mlir_runner_utils" etc. and use them in SVE/SME integration tests. When configured to run using qemu-aarch64, mlir runtime util libs will be loaded from ARM_EMULATOR_UTILS_LIB_DIR, if set. Some tests marked with 'UNSUPPORTED: target=aarch64{{.*}}' are still run when configured with ARM_EMULATOR_EXECUTABLE and the default target is not aarch64. A lit config feature 'mlir_arm_emulator' is added in mlir/test/lit.site.cfg.py.in and to UNSUPPORTED list of such tests.	2024-08-15 21:37:51 -07:00
Angel Zhang	863a2ed440	[mlir][memref] Rename `MemRef` directories and files. NFC. (#102337 ) This PR renames the `MemRef` integration test directory for and the `DecomposeMemref.s.cpp` so that they can be found when doing a case-sensitive search on file paths.	2024-08-07 15:41:40 -04:00
Benjamin Maxwell	8a5f33fd12	[mlir][ArmSME] Update `OuterProductFusion` to account for recent changes (#102125 ) - Use vector.interleave rather than the LLVM intrinsic - Remove dependency on LLVM dialect - Remove manual outerproduct erases (these are now trivially dead) - Remove comment explaining issues with previous tile allocator - Update pipeline in `multi-tile-matmul-mixed-types.mlir` Recent changes: #90448, #80965	2024-08-07 13:10:10 +01:00
Zhaoshi Zheng	6942f1d5aa	[MLIR][Linalg] Scalable Vectorization of Reduction on the Trailing Dimension (#97788 ) Allow scalable vectorization of linalg::reduce and linalg::generic that has reduction iterator(s) with two restrictions: 1. The reduction dim is the last (innermost) dim of the op; and 2. Only the reduction dim is requested for scalable vectorization. One exception is that scalable vectorization of the reduction dim in Matmul-like ops are not supported even above restrictions are met. Allowed combinations of scalable flags and iterator types: Matmul: Iterators: ["parallel", "parallel", "reduction"] Scalable Flags: ["true", "true", "false"] ["false", "true", "false"] Matvec: Iterators: ["parallel", "reduction"] Scalable Flags: ["false", "true"] ["true", "false"]	2024-07-23 21:52:22 -07:00
Guray Ozen	f2251f93ab	[mlir][gpu] Add mlir_c_runner_utils to fix #99035 This fixes the unit test that is broken in #99035.	2024-07-17 09:23:32 +02:00
Guray Ozen	20861f1f2f	[mlir][gpu] Use alloc OP's `host_shared` in cuda runtime (#99035 )	2024-07-17 07:25:11 +02:00
Andrzej Warzyński	2ee5586ac7	[mlir][vector] Make the in_bounds attribute mandatory (#97049 ) At the moment, the in_bounds attribute has two confusing/contradicting properties: 1. It is both optional _and_ has an effective default-value. 2. The default value is "out-of-bounds" for non-broadcast dims, and "in-bounds" for broadcast dims. (see the `isDimInBounds` vector interface method for an example of this "default" behaviour [1]). This PR aims to clarify the logic surrounding the `in_bounds` attribute by: * making the attribute mandatory (i.e. it is always present), * always setting the default value to "out of bounds" (that's consistent with the current behaviour for the most common cases). #### Broadcast dimensions in tests As per [2], the broadcast dimensions requires the corresponding `in_bounds` attribute to be `true`: ``` vector.transfer_read op requires broadcast dimensions to be in-bounds ``` The changes in this PR mean that we can no longer rely on the default value in cases like the following (dim 0 is a broadcast dim): ```mlir %read = vector.transfer_read %A[%base1, %base2], %f, %mask {permutation_map = affine_map<(d0, d1) -> (0, d1)>} : memref<?x?xf32>, vector<4x9xf32> ``` Instead, the broadcast dimension has to explicitly be marked as "in bounds: ```mlir %read = vector.transfer_read %A[%base1, %base2], %f, %mask {in_bounds = [true, false], permutation_map = affine_map<(d0, d1) -> (0, d1)>} : memref<?x?xf32>, vector<4x9xf32> ``` All tests with broadcast dims are updated accordingly. #### Changes in "SuperVectorize.cpp" and "Vectorization.cpp" The following patterns in "Vectorization.cpp" are updated to explicitly set the `in_bounds` attribute to `false`: * `LinalgCopyVTRForwardingPattern` and `LinalgCopyVTWForwardingPattern` Also, `vectorizeAffineLoad` (from "SuperVectorize.cpp") and `vectorizeAsLinalgGeneric` (from "Vectorization.cpp") are updated to make sure that xfer Ops created by these hooks set the dimension corresponding to broadcast dims as "in bounds". Otherwise, the Op verifier would complain Note that there is no mechanism to verify whether the corresponding memory access are indeed in bounds. Still, this is consistent with the current behaviour where the broadcast dim would be implicitly assumed to be "in bounds". [1] `4145ad2bac/mlir/include/mlir/Interfaces/VectorInterfaces.td (L243-L246)` [2] https://mlir.llvm.org/docs/Dialects/Vector/#vectortransfer_read-vectortransferreadop	2024-07-16 16:49:52 +01:00
Matthias Springer	7775be4d48	[mlir] Fix GPU integration test (part 2) (#98918 ) Fix tests that were broken by #97903.	2024-07-15 17:39:16 +02:00
Matthias Springer	6469faf9fd	[mlir] Fix GPU integration test (#98917 ) Fix tests that were broken by #97903.	2024-07-15 17:30:04 +02:00
Jacob Yu	393eff4e02	[mlir][arith] Adding mul operation regressions (#96975 ) Regressions for the mul operation, a part of the original large PR https://github.com/llvm/llvm-project/pull/92272	2024-07-08 15:31:40 -04:00
Jacob Yu	66a2058e76	[mlir][arith] Adding addition regression tests (#96973 ) arith addition regression tests, a component of the large https://github.com/llvm/llvm-project/pull/92272	2024-07-08 15:30:57 -04:00
Benjamin Maxwell	5ed5d723db	[mlir][ArmSME] Lower multi-tile stores to a single loop (#96187 ) This adds a new pattern that can legalize a multi-tile transfer_write as a single store loop. This is done as part of type decomposition as at this level we know each tile write is disjoint, but that information is lost after decomposition (without analysis to reconstruct it). Example (pseudo-MLIR): ``` vector.transfer_write %vector, %dest[%y, %x], %mask : vector<[16]x[8]xi16>, memref<?x?xi16> ``` Is rewritten to: ``` scf.for %slice_idx = %c0 to %c8_vscale step %c1 { %upper_slice_mask = vector.extract %mask[%slice_idx] ─┐ : vector<[8]xi1> from vector<[16]x[8]xi1> \| %upper_slice = vector.extract %upper_tile[%slice_idx] \|- Store upper tile : vector<[8]xi16> from vector<[8]x[8]xi16> \| vector.transfer_write %upper_slice, \| %dest[%slice_idx + %y, %x], %upper_slice_mask \| : vector<[8]xi16>, memref<?x?xi16> ┘ %lower_slice_idx = %slice_idx + %c8_vscale ─┐ %lower_slice_mask = vector.extract %mask[%lower_slice_idx] \| : vector<[8]xi1> from vector<[16]x[8]xi1> \| %lower_slice = vector.extract %lower_tile[%slice_idx] \|- Store lower : vector<[8]xi16> from vector<[8]x[8]xi16> \| tile vector.transfer_write %lower_slice, \| %dest[%lower_slice_idx + %y, %x], %lower_slice_mask \| : vector<[8]xi16>, memref<?x?xi16> ┘ } ```	2024-06-25 12:46:56 +01:00
Guray Ozen	f8ff909471	[mlir][gpu] Add py binding for AsyncTokenType (#96466 ) The PR adds py binding for `AsyncTokenType`	2024-06-24 11:39:22 +02:00
Peiming Liu	a02010b3e9	[mlir][sparse] support sparsifying sparse kernels to sparse-iterator-based loop (#95858 )	2024-06-17 16:50:12 -07:00
Pradeep Kumar	bd6568c98a	[MLIR][GPU] Add gpu.cluster_dim_blocks and gpu.cluster_block_id Ops (#95245 ) This commit adds support for `gpu.cluster_dim_blocks` and `gpu.cluster_block_id` Ops to represent number of blocks per cluster and block id inside a cluster respectively. Also, fixed the description of `gpu.cluster_dim` Op and updated the `cga_cluster.mlir` test file to use `gpu.cluster_dim_blocks` Co-authored-by: pradeepku <pradeepku@nvidia.com> Co-authored-by: Guray Ozen <guray.ozen@gmail.com>	2024-06-14 10:35:35 +05:30
Jay Foad	d4a0154902	[llvm-project] Fix typo "seperate" (#95373 )	2024-06-13 20:20:27 +01:00
Benjamin Maxwell	d319fc41d0	[mlir][ArmSME] Add option to only enable streaming mode for scalable code (#94759 ) This adds a new option `-enable-arm-streaming=if-contains-scalable-vectors`, which only applies the selected streaming/ZA modes if the function contains scalable vector types. As a NFC this patch also removes the `only-` prefix from the `if-required-by-ops` mode.	2024-06-10 12:02:16 +01:00
Mubashar Ahmad	7d69095fd5	[mlir][vector] Remove Emulated Sub-directory (#94742 ) The "Emulated" sub-directories under "ArmSVE" and "ArmSME" have been removed. Associated tests have been moved up a directory and now include the "REQUIRES" constraint for the arm-emulator.	2024-06-07 15:38:50 +01:00
Andrzej Warzyński	435114f9fe	[mlir][test] Rename Vector integration tests for CPU (nfc) (#93521 ) To keep the test filenames consistent, this patch: * removes "test-" from file names (there used to be a mix of "test-feature-1.mlir" and "feature-2.mlir"), * replaces "_" with "-" (there used to be a mix of "feature-3.mlir" and "feature_4.mlir"). Only files under test/Integration/Dialect/Vector/CPU are updated.	2024-05-30 18:06:43 +01:00
Mubashar Ahmad	bc946f5287	[mlir][vector] Add 1D vector.deinterleave lowering (#93042 ) This patch implements the lowering of vector.deinterleave for 1D vectors. For fixed vector types, the operation is lowered to two llvm shufflevector operations. One for even indexed elements and the other for odd indexed elements. A poison operation is used to satisfy the parameters of the shufflevector parameters. For scalable vectors, the llvm vector.deinterleave2 intrinsic is used for lowering. As such the results found by extraction and used to form the result struct for the intrinsic.	2024-05-30 09:42:35 +01:00
Kunwar Grover	debdbeda15	[mlir] Remove dialect specific bufferization passes (Reland) (#93535 ) These passes have been depreciated for a long time and replaced by one-shot bufferization. These passes are also unsafe because they do not check for read-after-write conflicts. Relands https://github.com/llvm/llvm-project/pull/93488 which failed on buildbot. Fixes the failure by updating integration tests to use one-shot-bufferize instead.	2024-05-28 20:04:27 +01:00
Cullen Rhodes	ea20647023	[mlir][ArmSME] NFC: -force-streaming-compatible-sve rename fixup (#93177 ) -force-streaming-compatible-sve was renamed in #92774 but this test was missed, no longer required so removing.	2024-05-28 09:07:07 +01:00
Jakub Kuderski	714aee31e1	[mlir][vector] Add result type to `interleave` assembly format (#93392 ) This is to make it more obvious for what the result type is, especially with some less trivial cases like 0-d inputs resulting in 1-d inputs or interaction with scalable vector types. Note that `vector.deinterleave` uses the same format with explicit result type. Also improve examples and clean up surrounding code.	2024-05-27 11:03:36 -04:00
Guray Ozen	7c137f7e51	[mlir][nvvm] Remove unused check-ptx (#93147 ) The test used the check generated ptx with `CHECK-PTX`, but does not check that anymore. The PR removes these lines.	2024-05-23 12:09:51 +02:00
Benjamin Maxwell	c93b45aaee	[mlir][ArmSME] Reword in-memory tile warning (NFC) (#92415 ) It did not make sense that this said "all tile operations will go through memory". Only the operations where the warning is emitted will go through memory. The message has been updated to reflect that.	2024-05-17 11:20:11 +01:00
Benjamin Maxwell	041baf2f60	[mlir][ArmSME] Use liveness information in the tile allocator (#90448 ) This patch rewrites the ArmSME tile allocator to use liveness information to make better tile allocation decisions and improve the correctness of the ArmSME dialect. This algorithm used here is a linear scan over live ranges, where live ranges are assigned to tiles as they appear in the program (chronologically). Live ranges release their assigned tile ID when the current program point is passed their end. This is a greedy algorithm (which is mainly to keep the implementation relatively straightforward), and because it seems to be sufficient for most kernels (e.g. matmuls) that use ArmSME. The general steps of this are roughly from https://link.springer.com/content/pdf/10.1007/3-540-45937-5_17.pdf, though there have been a few simplifications and assumptions made for our use case. Hopefully, the only changes needed for a user of the ArmSME dialect is that: - `-allocate-arm-sme-tiles` will no longer be a standalone pass - `-test-arm-sme-tile-allocation` is only for unit tests - `-convert-arm-sme-to-llvm` must happen after `-convert-scf-to-cf` - SME tile allocation is now part of the LLVM conversion By integrating this into the `ArmSME -> LLVM` conversion we can allow high-level (value-based) ArmSME operations to be side-effect-free, as we can guarantee nothing will rearrange ArmSME operations before we emit intrinsics (which could invalidate the tile allocation). The hope is for ArmSME operations to have no hidden state/side effects and allow easily lowering dialects such as `vector` and `arith` to SME, without making assumptions about how the input IR looks, as the semantics of the operations will be the same. That is no (new) side effects and the IR follows the rules of SSA (a value will never change). The aim is correctness, so we have a base for working on optimizations.	2024-05-14 14:59:01 +01:00
Yinying Li	eb177803bf	[mlir][sparse] Change sparse_tensor.print format (#91528 ) 1. Remove the trailing comma for the last element of memref and add closing parenthesis. 2. Change integration tests to use the new format.	2024-05-09 12:09:40 -04:00
srcarroll	2c1c67674c	[mlir][transform] Consistent `linalg` `transform` op syntax for dynamic index lists (#90897 ) This patch is a first pass at making consistent syntax across the `LinalgTransformOp`s that use dynamic index lists for size parameters. Previously, there were two different forms: inline types in the list, or place them in the functional style tuple. This patch goes for the latter. In order to do this, the `printPackedOrDynamicIndexList`, `printDynamicIndexList` and their `parse` counterparts were modified so that the types can be optionally provided to the corresponding custom directives. All affected ops now use tablegen `assemblyFormat`, so custom `parse`/`print` functions have been removed. There are a couple ops that will likely add dynamic size support, and once that happens it should be made sure that the assembly remains consistent with the changes in this patch. The affected ops are as follows: `pack`, `pack_greedily`, `tile_using_forall`. The `tile_using_for` and `vectorize` ops already used this syntax, but their custom assembly was removed. --------- Co-authored-by: Oleksandr "Alex" Zinenko <ftynse@gmail.com>	2024-05-08 09:11:53 -05:00
Aart Bik	c4e5a8a4d3	[mlir][sparse] support 'batch' dimensions in sparse_tensor.print (#91411 )	2024-05-07 19:01:36 -07:00
Aart Bik	5c5116556f	[mlir][sparse] force a properly sized view on pos/crd/val under codegen (#91288 ) Codegen "vectors" for pos/crd/val use the capacity as memref size, not the actual used size. Although the sparsifier itself always uses just the defined pos/crd/val parts, printing these and passing them back to a runtime environment could benefit from wrapping the basic pos/crd/val getters into a proper memref view that sets the right size.	2024-05-07 09:20:56 -07:00
Peiming Liu	78885395c8	[mlir][sparse] support tensor.pad on CSR tensors (#90687 )	2024-05-01 15:37:38 -07:00
Han-Chung Wang	a1423ba427	[mlir][tensor] Fix integration tests that uses reshape ops. (#90649 ) Due to generalization introduced in https://github.com/llvm/llvm-project/pull/90040	2024-04-30 12:34:03 -07:00

1 2 3 4 5 ...

910 Commits