clang-p2996

Author	SHA1	Message	Date
Chao Chen	2162723636	[MLIR][XeGPU] Updates XeGPU TensorDescAttr and Refine Gather/Scatter definition. (#109144 ) The PR makes the following refine changes to the XeGPU dialect. 1. Separated the old `TensorDescAttr` into two independent attributes: `BlockTensorDescAttr` and `ScatterTensorDescAttr` 2. Renamed the `MemoryScopeAttr` to `MemorySpaceAttr` and updated the enumeration value for shared memory following OpenCL standard. 3. Introduced `transpose` UnitAttr to `StoreScatterOp`and `LoadGatherOp` 4. Added memory space check for `CreateNdDesc` and `CreateDesc` op, as well as valid and invalid test cases for them.	2024-09-23 09:00:26 -05:00
Matteo Franciolini	2f664f2bdf	[mlir][mesh] Fix empty `split_axes` sharding annotation (#108236 ) The `split_axes` attribute is defined as "array attribute of array attributes". Following the definition, empty `split_axes` values should not be allowed, since that would break the definition and would lead to invalid IR. In such scenario, passes leveraging the mesh dialect can observe: * crashes in sharding-propagation; * creation of null MeshShardingAttrs in spmdization; * non roundtrippable IR. The patch prevents `split_axes` to become empty by modifying the `removeTrailingEmptySubArray` such that a minimum size of one is guaranteed when constructing the attribute, and adds a test that would crash without the change.	2024-09-21 08:43:12 -07:00
Daniel Hernandez-Juarez	b014265d99	[mlir][AMDGPU] New gfx12 barrier instructions and update lowering LDSBarrierOp (#109273 ) New gfx12 barrier instructions: s.barrier.signal, s.barrier.wait and s.wait.dscnt. And update lowering LDSBarrierOp accordingly. CC: @krzysz00 @manupak @giuseros	2024-09-20 17:41:36 -05:00
Daniil Fukalov	65bc259a97	[NFC] Add explicit #include llvm-config.h where its macros are used, last part. (#107615 ) (this is the part related to bolt, lld and mlir) Without these explicit includes, removing other headers, who implicitly include llvm-config.h, may have non-trivial side effects. For example, `clangd` may report even `llvm-config.h` as "no used" in case it defines a macro, that is explicitly used with #ifdef. It is actually amplified with different build configs which use different set of macros.	2024-09-20 19:59:39 +02:00
Umang Yadav	d0a7cb709e	[ROCDL] Pass `amd_code_object_version` when serializing ROCDL gpu module (#108874 ) This PR adds ability to pass non-default value to `.amdhsa_code_object_version` metadata when serializing ROCDL GPU modules. It also fixes typos in two places. --------- Co-authored-by: Fabian Mora <fmora.dev@gmail.com>	2024-09-20 09:53:09 -05:00
David Spickett	737c414e1d	Revert "[clang][flang][mlir] Support -frecord-command-line option (#102975 )" This reverts commit `b3533a156d`. It caused test failures in shared library builds: https://lab.llvm.org/buildbot/#/builders/80/builds/3854	2024-09-20 11:30:50 +00:00
Chuanqi Xu	e8a7390624	[mlir] [LLVM IR] Introduce VaArgOp (#109260 ) I find there is no LLVMOp corresponding to LLVM's [va_arg instruction](https://llvm.org/docs/LangRef.html#va-arg-instruction) so I tried to add one. This is helpful for clangir (https://github.com/llvm/clangir/pull/865). New to MLIR and not sure who are the appropriate reviewers. Appreciated in ahead for reviewing and triaging.	2024-09-20 13:19:50 +08:00
Tarun Prabhu	b3533a156d	[clang][flang][mlir] Support -frecord-command-line option (#102975 ) Add support for the -frecord-command-line option that will produce the llvm.commandline metadata which will eventually be saved in the object file. This behavior is also supported in clang. Some refactoring of the code in flang to handle these command line options was carried out. The corresponding -grecord-command-line option which saves the command line in the debug information has not yet been enabled for flang.	2024-09-19 18:28:50 -06:00
Andrzej Warzyński	1335a11176	[mlir][vector][nfc] Clean-up VectorOps.{h\|cpp} (#109316 )	2024-09-19 21:45:01 +01:00
Adam Siemieniuk	02d34d800b	[mlir][vector][xegpu] Vector to XeGPU conversion pass (#107419 ) Add pass for Vector to XeGPU dialect conversion and initial conversion patterns for vector.transfer_read\|write operations.	2024-09-19 15:16:23 -05:00
Ivan Butygin	96ac627238	[mlir][vector][nfc] Update vector load/store doc wrt unit strides. (#109267 ) Follow up to https://github.com/llvm/llvm-project/pull/108998. Non-contiguous strides are allowed now for 1-element vector load/stores.	2024-09-19 14:52:35 +03:00
Jianjian Guan	87dc3e89e7	[mlir][LLVMIR] Add more vector predication intrinsic ops (#107663 ) This revision adds vector predication smax, smin, umax and umin intrinsic ops.	2024-09-19 10:33:36 +08:00
Andrea Faulds	a800ffac41	[mlir][gpu] Disjoint patterns for lowering clustered subgroup reduce (#109158 ) Making the existing populateGpuLowerSubgroupReduceToShufflePatterns() function also cover the new "clustered" subgroup reductions is proving to be inconvenient, because certain backends may have more specific lowerings that only cover the non-clustered type, and this creates pass ordering constraints. This commit removes coverage of clustered reductions from this function in favour of a new separate function, which makes controlling the lowering much more straightforward.	2024-09-18 15:55:53 -04:00
Bimo	f8eceb45d0	[MLIR] [Python] align python ir printing with mlir-print-ir-after-all (#107522 ) When using the `enable_ir_printing` API from Python, it invokes IR printing with default args, printing the IR before each pass and printing IR after pass only if there have been changes. This PR attempts to align the `enable_ir_printing` API with the documentation	2024-09-18 11:54:16 +08:00
Andrea Faulds	fd26f8444a	[mlir][gpu] Rename two misspelled pattern population functions (#109015 )	2024-09-17 15:26:14 -04:00
Sergey Kozub	73d83f20c9	[MLIR] Add f6E2M3FN type (#107999 ) This PR adds `f6E2M3FN` type to mlir. `f6E2M3FN` type is proposed in [OpenCompute MX Specification](https://www.opencompute.org/documents/ocp-microscaling-formats-mx-v1-0-spec-final-pdf). It defines a 6-bit floating point number with bit layout S1E2M3. Unlike IEEE-754 types, there are no infinity or NaN values. ```c f6E2M3FN - Exponent bias: 1 - Maximum stored exponent value: 3 (binary 11) - Maximum unbiased exponent value: 3 - 1 = 2 - Minimum stored exponent value: 1 (binary 01) - Minimum unbiased exponent value: 1 − 1 = 0 - Has Positive and Negative zero - Doesn't have infinity - Doesn't have NaNs Additional details: - Zeros (+/-): S.00.000 - Max normal number: S.11.111 = ±2^(2) x (1 + 0.875) = ±7.5 - Min normal number: S.01.000 = ±2^(0) = ±1.0 - Max subnormal number: S.00.111 = ±2^(0) x 0.875 = ±0.875 - Min subnormal number: S.00.001 = ±2^(0) x 0.125 = ±0.125 ``` Related PRs: - [PR-94735](https://github.com/llvm/llvm-project/pull/94735) [APFloat] Add APFloat support for FP6 data types - [PR-105573](https://github.com/llvm/llvm-project/pull/105573) [MLIR] Add f6E3M2FN type - was used as a template for this PR	2024-09-16 21:09:27 +02:00
Arteen Abrishami	00f239e48a	[MLIR][TOSA] Add --tosa-reduce-transposes pass (#108260 ) ---------- Motivation: ---------- Some legalization pathways introduce redundant tosa.TRANSPOSE operations that result in avoidable data movement. For example, PyTorch -> TOSA contains a lot of unnecessary transposes due to conversions between NCHW and NHWC. We wish to remove all the ones that we can, since in general it is possible to remove the overwhelming majority. ------------ Changes Made: ------------ - Add the --tosa-reduce-transposes pass - Add TosaElementwiseOperator trait. ------------------- High-Level Overview: ------------------- The pass works through the transpose operators in the program. It begins at some transpose operator with an associated permutations tensor. It traverses upwards through the dependencies of this transpose and verifies that we encounter only operators with the TosaElementwiseOperator trait and terminate in either constants, reshapes, or transposes. We then evaluate whether there are any additional restrictions (the transposes it terminates in must invert the one we began at, and the reshapes must be ones in which we can fold the transpose into), and then we hoist the transpose through the intervening operators, folding it at the constants, reshapes, and transposes. Finally, we ensure that we do not need both the transposed form (the form that had the transpose hoisted through it) and the untransposed form (which it was prior), by analyzing the usages of those dependent operators of a given transpose we are attempting to hoist and replace. If they are such that it would require both forms to be necessary, then we do not replace the hoisted transpose, causing the new chain to be dead. Otherwise, we do and the old chain (untransposed form) becomes dead. Only one chain will ever then be live, resulting in no duplication. We then perform a simple one-pass DCE, so no canonicalization is necessary. -------------- Impact of Pass: -------------- Patching the dense_resource artifacts (from PyTorch) with dense attributes to permit constant folding, we receive the following results. Note that data movement represents total transpose data movement, calculated by noting which dimensions moved during the transpose. /////////// MobilenetV3: /////////// BEFORE total data movement: 11798776 B (11.25 MiB) AFTER total data movement: 2998016 B (2.86 MiB) 74.6% of data movement removed. BEFORE transposes: 82 AFTER transposes: 20 75.6% of transposes removed. //////// ResNet18: //////// BEFORE total data movement: 20596556 B (19.64 MiB) AFTER total data movement: 1003520 B (0.96 MiB) 95.2% of data movement removed. BEFORE transposes: 56 AFTER transposes: 5 91.1% of transposes removed. //////// ResNet50: //////// BEFORE total data movement: 83236172 B (79.3 MiB) AFTER total data movement: 3010560 B (2.87 MiB) 96.4% of data movement removed BEFORE transposes: 120 AFTER transposes: 7 94.2% of transposes removed. ///////// ResNet101: ///////// BEFORE total data movement: 124336460 B (118.58 MiB) AFTER total data movement: 3010560 B (2.87 MiB) 97.6% of data movement removed BEFORE transposes: 239 AFTER transposes: 7 97.1% of transposes removed. ///////// ResNet152: ///////// BEFORE total data movement: 175052108 B (166.94 MiB) AFTER total data movement: 3010560 B (2.87 MiB) 98.3% of data movement removed BEFORE transposes: 358 AFTER transposes: 7 98.0% of transposes removed. //////// Overview: //////// We see that we remove up to 98% of transposes and eliminate up to 98.3% of redundant transpose data movement. In the context of ResNet50, with 120 inferences per second, we reduce dynamic transpose data bandwidth from 9.29 GiB/s to 344.4 MiB/s. ----------- Future Work: ----------- (1) Evaluate tradeoffs with permitting ConstOp to be duplicated across hoisted transposes with different permutation tensors. (2) Expand the class of foldable upstream ReshapeOp we permit beyond N -> 1x1x...x1xNx1x...x1x1. (3) Enchance the pass to permit folding arbitrary transpose pairs, beyond those that form the identity. (4) Add support for more instructions besides TosaElementwiseOperator as the intervening ones (for example, the reduce_* operators). (5) Support hoisting transposes up to an input parameter. Signed-off-by: Arteen Abrishami <arteen.abrishami@arm.com>	2024-09-13 19:16:55 -07:00
Krzysztof Drewniak	a953982cb7	[mlir][GPU] Plumb range information through the NVVM lowerings (#107659 ) Update the GPU to NVVM lowerings to correctly propagate range information on IDs and dimension queries, etiher from known_{block,grid}_size attributes or from `upperBound` annotations on the operations themselves.	2024-09-13 12:07:51 -05:00
Sergio Afonso	6568062ff1	[MLIR][OpenMP] Improve assemblyFormat handling for clause-based ops (#108023 ) This patch modifies the representation of `OpenMP_Clause` to allow definitions to incorporate both required and optional arguments while still allowing operations including them and overriding the `assemblyFormat` to take advantage of automatically-populated format strings. The proposed approach is to split the `assemblyFormat` clause property into `reqAssemblyFormat` and `optAssemblyFormat`, and remove the `isRequired` template and associated `required` property. The `OpenMP_Op` class, in turn, populates the new `clausesReqAssemblyFormat` and `clausesOptAssemblyFormat` properties in addition to `clausesAssemblyFormat`. These properties can be used by clause-based OpenMP operation definitions to reconstruct parts of the clause-inherited format string in a more flexible way when overriding it. Clause definitions are updated to follow this new approach and some operation definitions overriding the `assemblyFormat` are simplified by taking advantage of the improved flexibility, reducing code duplication. The `verify-openmp-ops` tablegen pass is updated for the new `OpenMP_Clause` representation. Some MLIR and Flang unit tests had to be updated due to changes to the default printing order of clauses on updated operations.	2024-09-13 12:57:41 +01:00
Krzysztof Drewniak	9596e83b2a	[mlir][AMDGPU] Enable emulating vector buffer_atomic_fadd on gfx11 (#108312 ) * Fix a bug introduced by the Chipset refactoring in #107720 where atomics emulation for adds was mistakenly applied to gfx11+ * Add the case needed for gfx11+ atomic emulation, namely that gfx11 doesn't support atomically adding a v2f16 or v2bf16, thus requiring MLIR-level legalization for buffer intrinsics that attempt to do such an addition * Add tests, including tests for gfx11 atomic emulation Co-authored-by: Manupa Karunaratne <manupa.karunaratne@amd.com>	2024-09-12 09:47:52 -05:00
Krzysztof Drewniak	90a0be9482	[mlir][LLVM] Refactor how range() annotations are handled for ROCDL intrinsics (#107658 ) This commit introduces a ConstantRange attribute to match the ConstantRange attribute type present in LLVM IR. It then refactors the LLVM_IntrOpBase so that the basic part of the intrinsic builder code can be re-used without needing to copy it or get rid of important context. This, along with adding code for handling an optional `range` attribute to that same base, allows us to make the support for range() annotations generic without adding another bit to IntrOpBase. This commit then updates the lowering of index intrinsic operations to use the new ConstantRange attribute and fixes a bug (where we'd be subtracting 1 from upper bounds instead of adding it on operations like gpu.block_dim) along the way. The point of these changes is to enable these range annotations to be used for the corresponding NVVM operations in a future commit.	2024-09-12 09:46:42 -05:00
MaheshRavishankar	d5f0969c96	[mlir][TilingInterface] Avoid looking at operands for getting slices to continue tile + fuse. (#107882 ) Current implementation of `scf::tileConsumerAndFuseProducerUsingSCF` looks at operands of tiled/tiled+fused operations to see if they are produced by `extract_slice` operations to populate the worklist used to continue fusion. This implicit assumption does not always work. Instead make the implementations of `getTiledImplementation` return the slices to use to continue fusion. This is a breaking change - To continue to get the same behavior of `scf::tileConsumerAndFuseProducerUsingSCF`, change all out-of-tree implementation of `TilingInterface::getTiledImplementation` to return the slices to continue fusion on. All in-tree implementations have been adapted to this. - This change touches parts that required a simplification to the `ControlFn` in `scf::SCFTileAndFuseOptions`. It now returns a `std::optional<scf::SCFTileAndFuseOptions::ControlFnResult>` object that should be `std::nullopt` if fusion is not to be performed. Signed-off-by: MaheshRavishankar <mahesh.revishankar@gmail.com>	2024-09-11 22:15:43 -07:00
Jie Fu	b7167c7844	[mlir] Fix incorrect comparison due to -Wtautological-constant-out-of-range-compare (NFC) /llvm-project/mlir/include/mlir/Analysis/Presburger/Utils.h:320:26: error: result of comparison of constant 18446744073709551615 with expression of type 'unsigned int' is always true [-Werror,-Wtautological-constant-out-of-range-compare] preIndent = (preIndent != std::string::npos) ? preIndent + 1 : 0; ~~~~~~~~~ ^ ~~~~~~~~~~~~~~~~~ /llvm-project/mlir/include/mlir/Analysis/Presburger/Utils.h:335:28: error: result of comparison of constant 18446744073709551615 with expression of type 'unsigned int' is always true [-Werror,-Wtautological-constant-out-of-range-compare] preIndent = (preIndent != std::string::npos) ? preIndent + 1 : 0; ~~~~~~~~~ ^ ~~~~~~~~~~~~~~~~~ 2 errors generated.	2024-09-12 11:57:29 +08:00
Amy Wang	2740273505	[MLIR][Presburger] Make printing aligned to assist in debugging (#107648 ) Hello Arjun! Please allow me to contribute this patch as it helps me debugging significantly! When the 1's and 0's don't line up when debugging farkas lemma of numerous polyhedrons using simplex lexmin solver, it is truly straining on the eyes. Hopefully this patch can help others! The unfortunate part is the lack of testcase as I'm not sure how to add testcase for debug dumps. :) However, you can add this testcase to the SimplexTest.cpp to witness the nice printing! ```c++ TEST(SimplexTest, DumpTest) { int COLUMNS = 2; int ROWS = 2; LexSimplex simplex(COLUMNS * 2); IntMatrix m1(ROWS, COLUMNS * 2 + 1); // Adding LHS columns. for (int i = 0; i < ROWS; i++) { // an arbitrary formula to test all kinds of integers for (int j = 0; j < COLUMNS; j++) m1(i, j) = i + (2 << (i % 3)) * (-1 * ((i + j) % 2)); } // Adding RHS columns. for (int i = 0; i < ROWS; i++) { for (int j = 0; j < COLUMNS; j++) m1(i, j + COLUMNS) = j - (3 << (j % 4)) * (-1 * ((i + j * 2) % 2)); } for (int i = 0; i < m1.getNumRows(); i++) { ArrayRef<DynamicAPInt> curRow = m1.getRow(i); simplex.addInequality(curRow); } IntegerRelation rel = parseRelationFromSet("(x, y, z)[] : (z - x - 17 * y == 0, x - 11 * z >= 1)",2); simplex.dump(); m1.dump(); rel.dump(); } ``` ``` rows = 2, columns = 7 var: c3, c4, c5, c6 con: r0 [>=0], r1 [>=0] r0: -1, r1: -2 c0: denom, c1: const, c2: 2147483647, c3: 0, c4: 1, c5: 2, c6: 3 1 0 1 0 -2 0 1 1 0 -8 -3 1 3 7 0 -2 0 1 0 -3 1 3 7 0 Domain: 2, Range: 1, Symbols: 0, Locals: 0 2 constraints -1 -17 1 0 = 0 1 0 -11 -1 >= 0 ```	2024-09-11 23:22:54 -04:00
Longsheng Mou	1a431bcea7	[mlir][Tosa] Fix attr type of out_shape for `tosa.transpose_conv2d` (#108041 ) This patch fixes attr type of out_shape, which is i64 dense array attribute with exactly 4 elements. - Fix description of DenseArrayMaxCt - Add DenseArrayMinCt and move it to CommonAttrConstraints.td - Change type of out_shape to Tosa_IntArrayAttr4 Fixes #107804.	2024-09-12 09:10:16 +08:00
Krzysztof Drewniak	aa60a3e4d0	[mlir][AMDGPU] Support vector<2xf16> inputs to buffer atomic fadd (#108286 ) Extend the lowering of atomic.fadd to support the v2f16 variant avaliable on some AMDGPU chips. Re-lands #108238 (and addresses review comments from there) Co-authored-by: Giuseppe Rossini <giuseppe.rossini@amd.com>	2024-09-11 17:51:07 -05:00
Matteo Franciolini	aabb0121ee	[mlir][bufferization] Fix OpFilter::denyDialect (#108249 ) The implementation would crash with unloaded dialects.	2024-09-11 12:03:49 -07:00
Krzysztof Drewniak	cb031267bd	Revert "[mlir][AMDGPU] Support vector<2xf16> inputs to buffer atomic fadd (#108238 )" (#108256 ) This reverts commit `0d48d4d835`. Mistakenly landed without approval	2024-09-11 12:28:15 -05:00
Krzysztof Drewniak	0d48d4d835	[mlir][AMDGPU] Support vector<2xf16> inputs to buffer atomic fadd (#108238 ) Extend the lowering of atomic.fadd to support the v2f16 variant avaliable on some AMDGPU chips. Co-authored-by: Giuseppe Rossini <giuseppe.rossini@amd.com>	2024-09-11 12:12:17 -05:00
Arteen Abrishami	a54efdbdc4	[MLIR][TOSA] add additional verification to TOSA (#108133 ) ---------- Motivation: ---------- Spec conformance. Allows assumptions to be made in TOSA code. ------------ Changes Made: ------------ Add full permutation tensor verification to tosa.TRANSPOSE. Priorly would not verify that permuted values were between 0 - (rank - 1). Update tosa.TRANSPOSE perms data type to be strictly i32. Verify input/output shapes for tosa.TRANSPOSE. Add verifier to tosa.CONST, with consideration for quantization. Fix TOSA conformance of tensor type to disallow dimensions with size 0 for ranked tensors, per spec. This is not the same as rank 0 tensors. Here is an example of a disallowed tensor: tensor<3x0xi32>. Naturally, this means that the number of elements in a TOSA tensor will always be greater than 0. Signed-off-by: Arteen Abrishami <arteen.abrishami@arm.com>	2024-09-11 17:18:09 +01:00
Alex Rice	135bd31975	[mlir] [tblgen-to-irdl] Refactor tblgen-to-irdl script and support more types (#105505 ) Refactors the tblgen-to-irdl script slightly and adds support for - Various integer types - Various Float types - Confined types - Complex types (with fixed element type) Also doesn't add the operand and result ops if they are empty. I could potentially split this into smaller PRs if that'd be helpful (refactor + integer/float/complex, confined type, optional operand/result). @math-fehr	2024-09-11 14:02:44 +01:00
Amy Wang	334873fe2d	[MLIR][Python] Python binding support for IntegerSet attribute (#107640 ) Support IntegerSet attribute python binding.	2024-09-11 07:37:35 -04:00
Sergio Afonso	2f3d061918	[MLIR][OpenMP] Automate operand structure definition (#99508 ) This patch adds the "gen-openmp-clause-ops" `mlir-tblgen` generator to produce the structure definitions previously in OpenMPClauseOperands.h automatically from the information contained in OpenMPOps.td and OpenMPClauses.td. The original header is maintained to enable the definition of similar structures that are not directly related to any single `OpenMP_Clause` or `OpenMP_Op` tablegen definition.	2024-09-11 12:16:34 +01:00
Kunwar Grover	c9aa55da62	[mlir][Linalg] Add speculation for LinalgStructuredOps (#108032 ) This patch adds speculation behavior for linalg structured ops, allowing them to be hoisted out of loops using LICM.	2024-09-11 09:30:05 +01:00
Henrich Lauko	d1cad2290c	Reland [MLIR] Make resolveCallable customizable in CallOpInterface (#107989 ) Relands #100361 with fixed dependencies.	2024-09-10 15:33:13 +02:00
Sven van Haastregt	bda9474f57	Add missing newlines at EOF; NFC	2024-09-10 13:55:31 +01:00
Sergey Kozub	918222ba43	[MLIR] Add f6E3M2FN type (#105573 ) This PR adds `f6E3M2FN` type to mlir. `f6E3M2FN` type is proposed in [OpenCompute MX Specification](https://www.opencompute.org/documents/ocp-microscaling-formats-mx-v1-0-spec-final-pdf). It defines a 6-bit floating point number with bit layout S1E3M2. Unlike IEEE-754 types, there are no infinity or NaN values. ```c f6E3M2FN - Exponent bias: 3 - Maximum stored exponent value: 7 (binary 111) - Maximum unbiased exponent value: 7 - 3 = 4 - Minimum stored exponent value: 1 (binary 001) - Minimum unbiased exponent value: 1 − 3 = −2 - Has Positive and Negative zero - Doesn't have infinity - Doesn't have NaNs Additional details: - Zeros (+/-): S.000.00 - Max normal number: S.111.11 = ±2^(4) x (1 + 0.75) = ±28 - Min normal number: S.001.00 = ±2^(-2) = ±0.25 - Max subnormal number: S.000.11 = ±2^(-2) x 0.75 = ±0.1875 - Min subnormal number: S.000.01 = ±2^(-2) x 0.25 = ±0.0625 ``` Related PRs: - [PR-94735](https://github.com/llvm/llvm-project/pull/94735) [APFloat] Add APFloat support for FP6 data types - [PR-97118](https://github.com/llvm/llvm-project/pull/97118) [MLIR] Add f8E4M3 type - was used as a template for this PR	2024-09-10 10:41:05 +02:00
Matthias Springer	7574042e2a	Revert "[MLIR] Make `resolveCallable` customizable in `CallOpInterface`" (#107984 ) Reverts llvm/llvm-project#100361 This commit caused some linker errors. (Missing `MLIRCallInterfaces` dependency.)	2024-09-10 10:24:05 +02:00
Pradeep Kumar	831236e78c	[MLIR][NVVM] Add support for nvvm.breakpoint Op (#107193 ) This commit adds support for `nvvm.breakpoint` Op which lowers to the PTX brkpt instruction. Also, added the respective tests in `nvvmir.mlir`	2024-09-10 10:14:25 +02:00
Henrich Lauko	958f59d90f	[MLIR] Make `resolveCallable` customizable in `CallOpInterface` (#100361 ) Allow customization of the `resolveCallable` method in the `CallOpInterface`. This change allows for operations implementing this interface to provide their own logic for resolving callables. - Introduce the `resolveCallable` method, which does not include the optional symbol table parameter. This method replaces the previously existing extra class declaration `resolveCallable`. - Introduce the `resolveCallableInTable` method, which incorporates the symbol table parameter. This method replaces the previous extra class declaration `resolveCallable` that used the optional symbol table parameter.	2024-09-10 10:08:41 +02:00
Jakub Kuderski	763bc9249c	[mlir][amdgpu] Align Chipset with TargetParser (#107720 ) Update the Chipset struct to follow the `IsaVersion` definition from llvm's `TargetParser`. This is a follow up to https://github.com/llvm/llvm-project/pull/106169#discussion_r1733955012. * Add the stepping version. Note: This may break downstream code that compares against the minor version directly. * Use comparisons with full Chipset version where possible. Note that we can't use the code in `TargetParser` directly because the chipset utility is outside of `mlir/Target` that re-exports llvm's target library.	2024-09-09 11:12:26 -04:00
Amy Wang	6634d44e5e	[MLIR][Transform] Allow stateInitializer and stateExporter for applyTransforms (#101186 ) This is discussed in RFC: https://discourse.llvm.org/t/rfc-making-the-constructor-of-the-transformstate-class-protected/80377	2024-09-09 10:57:13 -04:00
Artem Kroviakov	663e9cec9c	[Func][GPU] Use SymbolUserOpInterface in func::ConstantOp (#107748 ) This PR enables `func::ConstantOp` creation and usage for device functions inside GPU modules. The current main returns error for referencing device functions via `func::ConstantOp`, because during the `ConstantOp` verification it only checks symbols in `ModuleOp` symbol table, which, of course, does not contain device functions that are defined in `GPUModuleOp`. This PR proposes a more general solution. Co-authored-by: Artem Kroviakov <artem.kroviakov@tum.de>	2024-09-09 11:49:16 +02:00
Jerry-Ge	476b1a661f	[TOSA] Update input name for Sin and Cos operators (#107606 ) Update the dialect input names from input to input1 for Sin/Cos for consistency. Signed-off-by: Jerry Ge <jerry.ge@arm.com>	2024-09-09 10:26:39 +01:00
Rahul Joshi	b60c6cbc0b	[MLIR][TableGen] Migrate MLIR backends to use const RecordKeeper (#107505 ) - Migrate MLIR backends to use a const RecordKeeper reference.	2024-09-07 15:13:19 -07:00
Amr Hesham	a1e06f7674	[mlir][vector] Fix the enum type in vector::CombiningKind (#107681 ) Change the enum type fo vector::CombiningKind from I32BitEnumAttrCaseBit to I32EnumAttrCase Fixes #107448	2024-09-07 19:59:25 +02:00
anjenner	4af249fe6e	Add usub_cond and usub_sat operations to atomicrmw (#105568 ) These both perform conditional subtraction, returning the minuend and zero respectively, if the difference is negative.	2024-09-06 16:19:20 +01:00
Kazu Hirata	56b29074fe	[mlir] Avoid repeated hash lookups (NFC) (#107518 )	2024-09-06 07:41:52 -07:00
Johannes de Fine Licht	6ab5829ab7	[MLIR][LLVM][NFC] Remove dead interface and add namespace qualifiers (#107573 ) The `GetResultPtrElementType` interface is dead now that MLIR has fully moved to opaque pointers, and can be removed. Add namespace qualifiers to all argument types and return types of interface methods for when they're used outside of LLVM dialect.	2024-09-06 15:56:02 +02:00
Matthias Springer	c2e53b2d50	[mlir][Transforms][NFC] Dialect conversion: Fix typo and improve docs (#107539 )	2024-09-06 10:35:07 +02:00

... 2 3 4 5 6 ...

10814 Commits