clang-p2996

Author	SHA1	Message	Date
Benjamin Maxwell	a4e15416b4	[mlir][ArmSME] Move creation of load/store intrinsics to helpers (NFC) (#76168 ) Also, for consistency make the ZeroOp lowering switch on the ArmSMETileType, rather than the element bit width.	2023-12-21 17:46:12 +00:00
Jakub Kuderski	72003adf6b	[mlir][gpu] Allow subgroup reductions over 1-d vector types (#76015 ) Each vector element is reduced independently, which is a form of multi-reduction. The plan is to allow for gradual lowering of multi-reduction that results in fewer `gpu.shuffle` ops at the end: 1d `vector.multi_reduction` --> 1d `gpu.subgroup_reduce` --> smaller 1d `gpu.subgroup_reduce` --> packed `gpu.shuffle` over i32 For example we can perform 2 independent f16 reductions with a series of `gpu.shuffles` over i32, reducing the final number of `gpu.shuffles` by 2x.	2023-12-21 11:55:43 -05:00
Paul C Fuqua	11141bc68a	Fix what seems to be a silly bug in gpu.set_default_device rewriting. Smoke test included. (#75756 )	2023-12-20 09:35:42 -06:00
Cullen Rhodes	4db0bd28e8	[mlir][vector][nfc] remove unused template parameter (#75931 )	2023-12-20 08:06:25 +00:00
Jakub Kuderski	560564f51c	[mlir][vector][gpu] Align minf/maxf reduction kind names with arith (#75901 ) This is to avoid confusion when dealing with reduction/combining kinds. For example, see a recent PR comment: https://github.com/llvm/llvm-project/pull/75846#discussion_r1430722175. Previously, they were picked to mostly mirror the names of the llvm vector reduction intrinsics: https://llvm.org/docs/LangRef.html#llvm-vector-reduce-fmin-intrinsic. In isolation, it was not clear if `<maxf>` has `arith.maxnumf` or `arith.maximumf` semantics. The new reduction kind names map 1:1 to arith ops, which makes it easier to tell/look up their semantics. Because both the vector and the gpu dialect depend on the arith dialect, it's more natural to align names with those in arith than with the lowering to llvm intrinsics. Issue: https://github.com/llvm/llvm-project/issues/72354	2023-12-20 00:14:43 -05:00
Matthias Springer	10056c821a	[mlir][SCF] `scf.parallel`: Make reductions part of the terminator (#75314 ) This commit makes reductions part of the terminator. Instead of `scf.yield`, `scf.reduce` now terminates the body of `scf.parallel` ops. `scf.reduce` may contain an arbitrary number of reductions, with one region per reduction. Example: ```mlir %init = arith.constant 0.0 : f32 %r:2 = scf.parallel (%iv) = (%lb) to (%ub) step (%step) init (%init, %init) -> f32, f32 { %elem_to_reduce1 = load %buffer1[%iv] : memref<100xf32> %elem_to_reduce2 = load %buffer2[%iv] : memref<100xf32> scf.reduce(%elem_to_reduce1, %elem_to_reduce2 : f32, f32) { ^bb0(%lhs : f32, %rhs: f32): %res = arith.addf %lhs, %rhs : f32 scf.reduce.return %res : f32 }, { ^bb0(%lhs : f32, %rhs: f32): %res = arith.mulf %lhs, %rhs : f32 scf.reduce.return %res : f32 } } ``` `scf.reduce` operations can no longer be interleaved with other ops in the body of `scf.parallel`. This simplifies the op and makes it possible to assign the `RecursiveMemoryEffects` trait to `scf.reduce`. (This was not possible before because the op was not a terminator, causing the op to be DCE'd.)	2023-12-20 11:06:27 +09:00
long.chen	227bfa1fb1	[mlir] fix a crash when lower parallel loop to gpu (#75811 ) (#75946 )	2023-12-20 09:13:15 +08:00
Rik Huijzer	9f5afc3de9	Revert "[mlir][vector] Fix invalid `LoadOp` indices being created (#75519 )" This reverts commit `3a1ae2f46d`.	2023-12-17 12:34:17 +01:00
Rik Huijzer	3a1ae2f46d	[mlir][vector] Fix invalid `LoadOp` indices being created (#75519 ) Fixes https://github.com/llvm/llvm-project/issues/71326. The cause of the issue was that a new `LoadOp` was created which looked something like: ```mlir %arg4 = func.func main(%arg1 : index, %arg2 : index) { %alloca_0 = memref.alloca() : memref<vector<1x32xi1>> %1 = vector.type_cast %alloca_0 : memref<vector<1x32xi1>> to memref<1xvector<32xi1>> %2 = memref.load %1[%arg1, %arg2] : memref<1xvector<32xi1>> return } ``` which crashed inside the `LoadOp::verify`. Note here that `%alloca_0` is 0 dimensional, `%1` has one dimension, but `memref.load` tries to index `%1` with two indices. This is now fixed by using the fact that `unpackOneDim` always unpacks one dim `1bce61e6b0/mlir/lib/Conversion/VectorToSCF/VectorToSCF.cpp (L897-L903)` and so the `loadOp` should just index only one dimension. --------- Co-authored-by: Benjamin Maxwell <macdue@dueutil.tech>	2023-12-17 11:42:35 +01:00
Rob Suderman	aa165edca8	[mlir][math] Added `math.sinh` with expansions to `math.exp` (#75517 ) Includes end-to-end tests for the cpu running, folders using `libm` and lowerings to the corresponding `libm` operations.	2023-12-15 11:35:40 -08:00
Cullen Rhodes	e7432babaf	[mlir][ArmSME] Fail instead of error in vector.outerproduct lowering (#75447 ) The 'vector.outerproduct' -> 'arm_sme.outerproduct' conversion currently errors on unsupported cases when it should return failure.	2023-12-15 07:30:32 +00:00
Pablo Antonio Martinez	7f4f75c144	[MLIR][SCFToOpenMP] Add num-threads option (#74854 ) Add `num-threads` option to the `-convert-scf-to-openmp` pass, allowing to set the number of threads to be used in the `omp.parallel` to a fixed value.	2023-12-14 09:07:17 +00:00
Sungsoon Cho	762964e97f	Add cosh op to the math dialect. (#75153 )	2023-12-13 12:25:37 +01:00
Benjamin Maxwell	01ac530a2e	[mlir][ArmSME] Remove `vector.print` legality from ArmSMEToSCF (NFC) (#74875 ) This was moved to VectorToArmSME in #74063, so this is no longer needed. VectorToArmSME uses a greedy rewriter, so a similar legality rule is not needed there. See: `bbb8a0df73/mlir/lib/Conversion/VectorToArmSME/VectorToArmSMEPass.cpp (L35)`	2023-12-11 11:25:43 +00:00
Frederik Harwath	f7250179e2	Implement acos operator in MLIR Math Dialect (#74584 ) Required for torch-mlir. Cf. llvm/torch-mlir#2604 "Implement torch.aten.acos".	2023-12-08 09:08:43 -08:00
Mehdi Amini	847d8457d1	Apply clang-tidy fixes for performance-unnecessary-value-param in VectorToGPU.cpp (NFC)	2023-12-07 21:39:25 -08:00
Mehdi Amini	b8a3f0fd3a	Apply clang-tidy fixes for llvm-qualified-auto in VectorToGPU.cpp (NFC)	2023-12-07 21:39:25 -08:00
Mehdi Amini	1cef577b90	Apply clang-tidy fixes for llvm-qualified-auto in PredicateTree.cpp (NFC)	2023-12-07 21:39:25 -08:00
Mehdi Amini	345d574b65	Apply clang-tidy fixes for llvm-prefer-isa-or-dyn-cast-in-conditionals in MapMemRefStorageClassPass.cpp (NFC)	2023-12-07 21:39:25 -08:00
Mehdi Amini	6ac80a7677	Apply clang-tidy fixes for readability-identifier-naming in GPUToLLVMConversion.cpp (NFC)	2023-12-07 21:39:25 -08:00
harsh-nod	42bba97fc2	[mlir] Extend CombineTransferReadOpTranspose pattern to handle extf ops. (#74754 ) This patch modifies the CombineTransferReadOpTranspose pattern to handle extf ops. Also adds a test which shows the transpose getting folded into the transfer_read.	2023-12-07 15:01:55 -08:00
Benjamin Maxwell	b0b69fd879	[mlir][ArmSME] More precisely model dataflow in ArmSME to SCF lowerings (#73922 ) Since #73253, loops over tiles in SSA form (i.e. loops that take `iter_args` and yield a new tile) are supported, so this patch updates ArmSME lowerings to this form. This is a NFC, as it still lowers to the same intrinsics, but this makes IR less 'surprising' at a higher-level, and may be recognised by more transforms. Example: IR before: ```mlir scf.for %tile_slice_index = %c0 to %num_tile_slices step %c1 { arm_sme.move_vector_to_tile_slice %broadcast_to_1d, %tile, %tile_slice_index : vector<[4]xi32> into vector<[4]x[4]xi32> } // ... later use %tile ``` IR now: ```mlir %broadcast_to_tile = scf.for %tile_slice_index = %c0 to %num_tile_slices step %c1 iter_args(%iter_tile = %init_tile) -> (vector<[4]x[4]xi32>) { %tile_update = arm_sme.move_vector_to_tile_slice %broadcast_to_1d, %iter_tile, %tile_slice_index : vector<[4]xi32> into vector<[4]x[4]xi32> scf.yield %tile_update : vector<[4]x[4]xi32> } // ... later use %broadcast_to_tile ```	2023-12-06 14:31:05 +00:00
Georgios Pinitas	3a772c3bfe	[mlir][tosa] Add fp16 support to `tosa.resize` (#73019 )	2023-12-06 12:48:44 +00:00
Tom Eccles	fcd06d774d	[mlir][flang] add fast math attribute to fcmp (#74315 ) `llvm.fcmp` does support fast math attributes therefore so should `arith.cmpf`. The heavy churn in flang tests are because flang sets `fastmath<contract>` by default on all operations that support the fast math interface. Downstream users of MLIR should not be so effected. This was requested in https://github.com/llvm/llvm-project/issues/74263	2023-12-06 10:19:48 +00:00
Guray Ozen	641e05decc	[mlir][gpu] Support dynamic_shared_memory Op with vector dialect (#74475 ) `gpu.dynamic_shared_memory` currently does not get lowered when it is used with vector dialect. The reason is that vector-to-llvm conversion is not included in gpu-to-nvvm. This PR includes that and adds a test.	2023-12-06 10:41:57 +01:00
Matthias Springer	8f9aac4427	[mlir][vector] Fix invalid IR in `vector.print` lowering (#74410 ) `DecomposePrintOpConversion` used to generate invalid op such as: ``` error: 'arith.extsi' op operand type 'vector<10xi32>' and result type 'vector<10xi32>' are cast incompatible vector.print %v9 : vector<10xi32> ``` This commit fixes tests such as `mlir/test/Integration/Dialect/Vector/CPU/test-reductions-i32.mlir` when verifying the IR after each pattern application (#74270).	2023-12-06 09:44:03 +09:00
Guray Ozen	391a7577e7	[mlir][gpu] Add lowering dynamic_shared_memory op for rocdl (#74473 ) This PR adds lowering of `gpu.dynamic_shared_memory` to rocdl target.	2023-12-05 19:56:43 +01:00
Benjamin Maxwell	01e40a8a3d	[mlir][ArmSME] Remove ArmSMETypeConverter (and configure LLVM one instead) (#73639 ) This patch removes the ArmSMETypeConverter, and instead updates `populateArmSMEToLLVMConversionPatterns()` to add an ArmSME vector type conversion to the existing LLVMTypeConverter. This makes it easier to add these patterns to an existing `-to-llvm` lowering pass.	2023-12-04 17:02:48 +00:00
Guray Ozen	3a03da37a3	[mlir][nvgpu] Add address space attribute converter in nvgpu-to-nvvm pass (#74075 ) GPU dialect has `#gpu.address_space<workgroup>` for shared memory of NVGPU (address space =3). Howeverm when IR combine NVGPU and GPU dialect, `nvgpu-to-nvvm` pass fails due to missing attribute conversion. This PR adds `populateGpuMemorySpaceAttributeConversions` to nvgou-to-nvvm lowering, so we can use `#gpu.address_space<workgroup>` `nvgpu-to-nvvm` pass	2023-12-04 16:48:39 +01:00
Benjamin Maxwell	10063c5a29	[mlir][ArmSME] Move vector.print -> ArmSME lowering to VectorToArmSME (#74063 ) This moves the SME tile vector.print lowering from `-convert-arm-sme-to-scf` to `-convert-vector-to-arm-sme`. This seems like a more logical place, as this is lowering a vector op to ArmSME, and it also prevents vector.print from blocking tile allocation.	2023-12-04 09:42:11 +00:00
Spenser Bauman	293c21db93	[mlir][tosa] Improve lowering of tosa.conv2d (#74143 ) The existing lowering of tosa.conv2d emits a separate linalg.generic operator to add the bias after computing the computation. This change eliminates that additional step by using the generated linalg.conv_2d_* operator by using the bias value as the input to the linalg.conv_2d operation. Rather than: %init = tensor.empty() %conv = linalg.conv_2d ins(%A, %B) %outs(%init) %init = tensor.empty() %bias = linalg.generic ins(%conv, %bias) outs(%init2) { // perform add operation } The lowering now produces: %init = tensor.empty() %bias_expanded = linalg.broadcast ins(%bias) outs(%init) %conv = linalg.conv_2d ins(%A, %B) %outs(%bias) This is the same strategy as https://github.com/llvm/llvm-project/pull/73049 applied to convolutions.	2023-12-02 12:29:10 +00:00
Spenser Bauman	f58fb8c209	[mlir][tosa] Fix lowering of tosa.conv2d (#73240 ) The lowering of tosa.conv2d produces an illegal tensor.empty operation where the number of inputs do not match the number of dynamic dimensions in the output type. The fix is to base the generation of tensor.dim operations off the result type of the conv2d operation, rather than the input type. The problem and fix are very similar to this fix https://github.com/llvm/llvm-project/pull/72724 but for convolution.	2023-12-01 15:33:14 +00:00
Spenser Bauman	0d87e25779	[mlir][tosa] Improve lowering to tosa.fully_connected (#73049 ) The current lowering of tosa.fully_connected produces a linalg.matmul followed by a linalg.generic to add the bias. The IR looks like the following: %init = tensor.empty() %zero = linalg.fill ins(0 : f32) outs(%init) %prod = linalg.matmul ins(%A, %B) outs(%zero) // Add the bias %initB = tensor.empty() %result = linalg.generic ins(%prod, %bias) outs(%initB) { // add bias and product } This has two down sides: 1. The tensor.empty operations typically result in additional allocations after bufferization 2. There is a redundant traversal of the data to add the bias to the matrix product. This extra work can be avoided by leveraging the out-param of linalg.matmul. The new IR sequence is: %init = tensor.empty() %broadcast = linalg.broadcast ins(%bias) outs(%init) %prod = linalg.matmul ins(%A, %B) outs(%broadcast) In my experiments, this eliminates one loop and one allocation (post bufferization) from the generated code.	2023-12-01 15:16:51 +00:00
Rik Huijzer	c84061fd34	[mlir][vector] Fix a `target-rank=0` unrolling (#73365 ) Fixes https://github.com/llvm/llvm-project/issues/64269. With this patch, calling `mlir-opt "-convert-vector-to-scf=full-unroll target-rank=0"` on ```mlir func.func @main(%vec : vector<2xi32>) { %alloc = memref.alloc() : memref<4xi32> %c0 = arith.constant 0 : index vector.transfer_write %vec, %alloc[%c0] : vector<2xi32>, memref<4xi32> return } ``` will result in ```mlir module { func.func @main(%arg0: vector<2xi32>) { %c0 = arith.constant 0 : index %c1 = arith.constant 1 : index %alloc = memref.alloc() : memref<4xi32> %0 = vector.extract %arg0[0] : i32 from vector<2xi32> %1 = vector.broadcast %0 : i32 to vector<i32> vector.transfer_write %1, %alloc[%c0] : vector<i32>, memref<4xi32> %2 = vector.extract %arg0[1] : i32 from vector<2xi32> %3 = vector.broadcast %2 : i32 to vector<i32> vector.transfer_write %3, %alloc[%c1] : vector<i32>, memref<4xi32> return } } ``` I've also tried to proactively find other `target-rank=0` bugs, but couldn't find any. `options.targetRank` is only used 8 times throughout the `mlir` folder, all inside `VectorToSCF.cpp`. None of the other uses look like they could cause a crash. I've also tried ```mlir func.func @main(%vec : vector<2xi32>) -> vector<2xi32> { %alloc = memref.alloc() : memref<4xindex> %c0 = arith.constant 0 : index %out = vector.transfer_read %alloc[%c0], %c0 : memref<4xindex>, vector<2xi32> return %out : vector<2xi32> } ``` with `"--convert-vector-to-scf=full-unroll target-rank=0"` and that also didn't crash. (Maybe obvious. I have to admit that I'm not very familiar with these ops.)	2023-11-30 13:29:09 +01:00
Benjamin Maxwell	eaff02f28e	[mlir][ArmSME] Switch to an attribute-based tile allocation scheme (#73253 ) This reworks the ArmSME dialect to use attributes for tile allocation. This has a number of advantages and corrects some issues with the previous approach: * Tile allocation can now be done ASAP (i.e. immediately after `-convert-vector-to-arm-sme`) * SSA form for control flow is now supported (e.g.`scf.for` loops that yield tiles) * ArmSME ops can be converted to intrinsics very late (i.e. after lowering to control flow) * Tests are simplified by removing constants and casts * Avoids correctness issues with representing LLVM `immargs` as MLIR values - The tile ID on the SME intrinsics is an `immarg` (so is required to be a compile-time constant), `immargs` should be mapped to MLIR attributes (this is already the case for intrinsics in the LLVM dialect) - Using MLIR values for `immargs` can lead to invalid LLVM IR being generated (and passes such as -cse making incorrect optimizations) As part of this patch we bid farewell to the following operations: ```mlir arm_sme.get_tile_id : i32 arm_sme.cast_tile_to_vector : i32 to vector<[4]x[4]xi32> arm_sme.cast_vector_to_tile : vector<[4]x[4]xi32> to i32 ``` These are now replaced with: ```mlir // Allocates a new tile with (indeterminate) state: arm_sme.get_tile : vector<[4]x[4]xi32> // A placeholder operation for lowering ArmSME ops to intrinsics: arm_sme.materialize_ssa_tile : vector<[4]x[4]xi32> ``` The new tile allocation works by operations implementing the `ArmSMETileOpInterface`. This interface says that an operation needs to be assigned a tile ID, and may conditionally allocate a new SME tile. Operations allocate a new tile by implementing... ```c++ std::optional<arm_sme::ArmSMETileType> getAllocatedTileType() ``` ...and returning what type of tile the op allocates (ZAB, ZAH, etc). Operations that don't allocate a tile return `std::nullopt` (which is the default behaviour). Currently the following ops are defined as allocating: ```mlir arm_sme.get_tile arm_sme.zero arm_sme.tile_load arm_sme.outerproduct // (if no accumulator is specified) ``` Allocating operations become the roots for the tile allocation pass, which currently just (naively) assigns all transitive uses of a root operation the same tile ID. However, this is enough to handle current use cases. Once tile IDs have been allocated subsequent rewrites can forward the tile IDs to any newly created operations.	2023-11-30 10:22:22 +00:00
Mehdi Amini	9415fca848	[mlir] Fix build with shared libs (missing cmake link dependency) (NFC)	2023-11-29 12:17:52 -08:00
Mehdi Amini	9e7b6f46ba	[mlir] Adopt `ConvertToLLVMPatternInterface` GpuToLLVMConversionPass to align with `convert-to-llvm` (#73761 ) This is a follow-up to the introduction of `convert-to-llvm`: it is supposed to be a unifying pass through the `ConvertToLLVMPatternInterface`, but some specific conversion (like the GPU target) aren't vanilla LLVM target. Instead they need extra customizations that are specific to LLVM-on-GPUs and our custom runtime wrappers. This change make the GpuToLLVMConversionPass just as pluggable as the `convert-to-llvm` by using the same mechanism.	2023-11-29 11:37:53 -08:00
Jakub Kuderski	1bdb2e8550	[mlir][spirv] Simplify gpu reduction to spirv logic (#73546 ) Check the type only once and then specialize op handlers based on it. Do not use boolean types in group arithmetic ops that expect integers.	2023-11-27 12:33:41 -05:00
Jakub Kuderski	7eccd52842	Reland "[mlir][gpu] Align reduction operations with vector combining kinds (#73423 )" This reverts commit `dd09221a29` and relands https://github.com/llvm/llvm-project/pull/73423. * Updated `gpu.all_reduce` `min`/`max` in CUDA integration tests.	2023-11-27 11:38:18 -05:00
Jakub Kuderski	dd09221a29	Revert "[mlir][gpu] Align reduction operations with vector combining kinds (#73423 )" This reverts commit `e0aac8c88d`. I'm seeing some nvidia integration test failures: https://lab.llvm.org/buildbot/#/builders/61/builds/52334.	2023-11-27 11:29:23 -05:00
Jakub Kuderski	e0aac8c88d	[mlir][gpu] Align reduction operations with vector combining kinds (#73423 ) The motivation for this change is explained in https://github.com/llvm/llvm-project/issues/72354. Before this change, we could not tell between signed/unsigned minimum/maximum and NaN treatment for floating point values. The mapping of old reduction operations to the new ones is as follows: * `min` --> `minsi` for ints, `minf` for floats * `max` --> `maxsi` for ints, `maxf` for floats New reduction kinds not represented in the old enum: `minui`, `maxui`, `minimumf`, `maximumf`. As a next step, I would like to have a common definition of combining kinds used by the `vector` and `gpu` dialects. Separately, the GPU to SPIR-V lowering does not yet properly handle zero and NaN values -- the behavior of floating point min/max group reductions is not specified by the SPIR-V spec, see https://github.com/llvm/llvm-project/issues/73459. Issue: https://github.com/llvm/llvm-project/issues/72354	2023-11-27 11:19:20 -05:00
Jakub Kuderski	6b9c186b2d	[mlir][spirv] Handle non-innerprod float vector add reductions (#73476 ) Instead of extracting all individual vector components and performing a scalar summation, use `spirv.Dot` with the original reduction operand and a vector constant of all ones.	2023-11-27 11:10:25 -05:00
Guray Ozen	edf5cae739	[mlir][gpu] Support Cluster of Thread Blocks in `gpu.launch_func` (#72871 ) NVIDIA Hopper architecture introduced the Cooperative Group Array (CGA). It is a new level of parallelism, allowing clustering of Cooperative Thread Arrays (CTA) to synchronize and communicate through shared memory while running concurrently. This PR enables support for CGA within the `gpu.launch_func` in the GPU dialect. It extends `gpu.launch_func` to accommodate this functionality. The GPU dialect remains architecture-agnostic, so we've added CGA functionality as optional parameters. We want to leverage mechanisms that we have in the GPU dialects such as outlining and kernel launching, making it a practical and convenient choice. An example of this implementation can be seen below: ``` gpu.launch_func @kernel_module::@kernel clusters in (%1, %0, %0) // <-- Optional blocks in (%0, %0, %0) threads in (%0, %0, %0) ``` The PR also introduces index and dimensions Ops specific to clusters, binding them to NVVM Ops: ``` %cidX = gpu.cluster_id x %cidY = gpu.cluster_id y %cidZ = gpu.cluster_id z %cdimX = gpu.cluster_dim x %cdimY = gpu.cluster_dim y %cdimZ = gpu.cluster_dim z ``` We will introduce cluster support in `gpu.launch` Op in an upcoming PR. See [the documentation](https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#cluster-of-cooperative-thread-arrays) provided by NVIDIA for details.	2023-11-27 11:05:07 +01:00
Hsiangkai Wang	477c0b67a3	[mlir][affine][gpu] Replace DivSIOp to CeilDivSIOp when lowering to GPU launch (#73328 ) When converting affine.for to GPU launch operator, we have to calculate the block dimension and thread dimension for the launch operator. The formula of the dimension size is (upper_bound - lower_bound) / step_size When the difference is indivisible by step_size, we use rounding-to-zero as the division result. However, the block dimension and thread dimension is right-open range, i.e., [0, block_dim) and [0, thread_dim). So, we will get the wrong result if we use DivSIOp. In this patch, we replace it with CeilDivSIOp to get the correct block and thread dimension values.	2023-11-27 08:05:54 +00:00
Jakub Kuderski	897141449e	[mlir][spirv] Add floating point dot product (#73466 ) Because `OpDot` does not require any extra capabilities or extensions, enable it by default in the vector to spirv conversion.	2023-11-26 19:27:16 -05:00
Jakub Kuderski	d625ea12c7	[mlir][spirv] Split codegen for float min/max reductions and others v2. [NFC] (#73363 ) This is https://github.com/llvm/llvm-project/pull/69023 but with cleanups. Reduced complexity by avoiding CRTP and preprocessor defines in favor of free functions Original description by @unterumarmung: --- This patch is part of a larger initiative aimed at fixing floating-point `max` and `min` operations in MLIR: https://discourse.llvm.org/t/rfc-fix-floating-point-max-and-min-operations-in-mlir/72671. There are two types of min/max operations for floating-point numbers: `minf`/`maxf` and `minimumf`/`maximumf`. The code generation for these operations should differ from that of other vector reduction kinds. This difference arises because CL and GL operations for floating-point min and max do not have the same semantics when handling NaNs. Therefore, we must enforce the desired semantics with additional ops. ~~However, since the code generation for floating-point min/max operations shares the same functionality as extracting values for the vector, we have decided to refactor the existing code using the CRTP pattern.~~ This change does not alter the actual behavior of the code and is necessary for future fixes to the codegen for floating-point min/max operations. --------- Co-authored-by: Daniil Dudkin <unterumarmung@yandex.ru>	2023-11-24 15:24:45 -05:00
Alexander Batashev	a4ee55fe6e	[MLIR][NFC] Fix build on recent GCC with C++20 enabled (#73308 ) The following pattern fails on recent GCC versions with -std=c++20 flag passed and succeeds with -std=c++17. Such behavior is not observed on Clang 16.0. ``` template <typename T> struct Foo { Foo<T>(int a) {} }; ``` This patch removes template parameter from constructor in two occurences to make the following command complete successfully: bazel build -c fastbuild --cxxopt=-std=c++20 --host_cxxopt=-std=c++20 @llvm-project//mlir/... This patch is similar to https://reviews.llvm.org/D154782 Co-authored-by: Alexander Batashev <a.batashev@partner.samsung.com>	2023-11-24 15:13:39 +03:00
Kai Wang	3049c76e43	[mlir][vector][spirv] Lower vector.load and vector.store to SPIR-V (#71674 ) Add patterns to lower vector.load to spirv.load and vector.store to spirv.store.	2023-11-24 10:29:43 +00:00
Oleksandr "Alex" Zinenko	43bc81d748	[mlir] fix LLVM type converter for structs (#73231 ) Existing implementation of the LLVM type converter for LLVM structs containing incompatible types was attempting to change identifiers of the struct in case of name clash post-conversion (all identified structs have different names post-conversion since one cannot change the body of the struct once initialized). Beyond a trivial error of not updating the counter in renaming, this approach was broken for recursive structs that can't be made aware of the renaming and would use the pre-existing struct with clashing name instead. For example, given `!llvm.struct<"_Converted.foo", (struct<"_Converted.foo">, f32)>` the following type `!llvm.struct<"foo", (struct<"foo", index>)>` would incorrectly convert to ``` !llvm.struct<"_Converted_1.foo", (struct<"_Converted.foo", (struct<"_Converted.foo">, f32)>)> ``` Remove this incorrect renaming and simply refuse to convert types if it would lead to identifier clashes for structs with different bodies. Document the expectation that such generated names are reserved and must not be present in the input IR of the converter. If we ever actually need to use handle such cases, this can be achieved by temporarily renaming structs with reserved identifiers to an unreserved name and back in a pre/post-processing pass that does _not_ use the type conversion infra.	2023-11-23 22:21:39 +01:00
Benjamin Maxwell	dbb8643333	[mlir][LLVM] Support `immargs` in LLVM_IntrOpBase intrinsics (#73013 ) This extends `LLVM_IntrOpBase` so that it can be passed a list of `immArgPositions` and a list (of the same length) of `immArgAttrNames`. `immArgPositions` contains the positions of `immargs` on the LLVM IR intrinsic, and `immArgAttrNames` maps those to a corresponding MLIR attribute. This allows modeling LLVM `immargs` as MLIR attributes, which is the closest match semantically (and had already been done manually for the LLVM dialect intrinsics). This has two upsides: * It's slightly easier to implement intrinsics with immargs now (especially if they make use of other features, such as overloads) * It clearly defines that `immargs` should map to attributes, before there was no mention of `immargs` in LLVMOpBase.td, so implementing them was unclear This works with other features of the `LLVM_IntrOpBase`, so `immargs` can be marked as overloaded too (which is used in some intrinsics). As part of this patch (and to test correctness) existing intrinsics have been updated to use these new parameters. This also uncovered a few issues with the `llvm.intr.vector.insert/extract` intrinsics. First, the argument order for insert did not match the LLVM intrinsic, and secondly, both were missing a mlirBuilder (so failed to import from LLVM IR). This is corrected with this patch (and a test case added).	2023-11-23 10:12:12 +00:00

1 2 3 4 5 ...

2341 Commits