clang-p2996

Author	SHA1	Message	Date
vfdev	f136c800b6	Enabled freethreading support in MLIR python bindings (#122684 ) Reland reverted https://github.com/llvm/llvm-project/pull/107103 with the fixes for Python 3.8 cc @jpienaar Co-authored-by: Peter Hawkins <phawkins@google.com>	2025-01-13 03:00:31 -08:00
xiaoleis-nv	d03f35f9b6	[MLIR][NVVM] Fix the datatype error for nvvm.mma.sync when the operand is bf16 (#122664 ) The PR fixes the datatype error for `nvvm.mma.sync` when the operand is `bf16`. This operation originally requires the A/B type to be `f16x2` for the `bf16` MMA. However, it violates the NVVM intrinsic [[here](`372044ee09/llvm/include/llvm/IR/IntrinsicsNVVM.td (L119)`)], where the A/B operand type should be `i32`. This is a bug, and there are no tests in MLIR that cover this datatype. ``` // mma bf16 -> s32 @ m16n8k16/m16n8k8 !eq(gft,"m16n8k16:a:bf16") : !listsplat(llvm_i32_ty, 4), !eq(gft,"m16n8k16:b:bf16") : !listsplat(llvm_i32_ty, 2), !eq(gft,"m16n8k8:a:bf16") : !listsplat(llvm_i32_ty, 2), !eq(gft,"m16n8k8:b:bf16") : [llvm_i32_ty], ``` This PR addresses this bug and adds tests to guarantee correctness. Co-authored-by: Xiaolei Shi <xiaoleis@nvidia.com>	2025-01-13 15:03:05 +05:30
Clément Fournier	36c3466aef	[mlir][linalg] Fix neutral elt for softmax (#118952 ) The decomposition of `linalg.softmax` uses `maxnumf`, but the identity element that is used in the generated code is the one for `maximumf`. They are not the same, as the identity for `maxnumf` is `NaN`, while the one of `maximumf` is `-Infty`. This is wrong and prevents the maxnumf from being folded. Related to #114595, which fixed the folder for maxnumf.	2025-01-13 15:21:07 +08:00
Jacques Pienaar	3f1486f08e	Revert "Added free-threading CPython mode support in MLIR Python bindings (#107103 )" Breaks on 3.8, rolling back to avoid breakage while fixing. This reverts commit `9dee7c4449`.	2025-01-12 18:30:42 +00:00
vfdev	9dee7c4449	Added free-threading CPython mode support in MLIR Python bindings (#107103 ) Related to https://github.com/llvm/llvm-project/issues/105522 Description: This PR is a joint work with Peter Hawkins (@hawkinsp) originally done by myself for pybind11 and then reworked to nanobind based on Peter's branch: https://github.com/hawkinsp/llvm-project/tree/nbdev . - Added free-threading CPython mode support for MLIR Python bindings - Added a test which can reveal data races when cpython and LLVM/MLIR compiled with TSAN Context: - Related to https://github.com/google/jax/issues/23073 Co-authored-by: Peter Hawkins <phawkins@google.com>	2025-01-12 09:56:49 -08:00
Twice	b91d5af1ac	[MLIR][Vector] Allow any strided memref for one-element vector.load in lowering vector.gather (#122437 ) In `Gather1DToConditionalLoads`, currently we will check if the stride of the most minor dim of the input memref is 1. And if not, the rewriting pattern will not be applied. However, according to the verification of `vector.load` here: `4e32271e8b/mlir/lib/Dialect/Vector/IR/VectorOps.cpp (L4971-L4975)` .. if the output vector type of `vector.load` contains only one element, we can ignore the requirement of the stride of the input memref, i.e. the input memref can be with any stride layout attribute in such case. So here we can allow more cases in lowering `vector.gather` by relaxing such check. As shown in the test case attached in this patch [here](`1933fbad58/mlir/test/Dialect/Vector/vector-gather-lowering.mlir (L151)`), now `vector.gather` of memref with non-trivial stride can be lowered successfully if the result vector contains only one element. --------- Signed-off-by: PragmaTwice <twice@apache.org> Co-authored-by: Andrzej Warzyński <andrzej.warzynski@gmail.com>	2025-01-12 16:02:41 +00:00
Matthias Springer	6422546e99	[mlir][LLVM] Fix conversion of non-standard MLIR float types (#122634 ) Certain non-standard float types were directly passed through in the LLVM type converter, resulting in invalid IR or failed assertions: ``` mlir-opt: mlir/lib/Conversion/LLVMCommon/TypeConverter.cpp:638: FailureOr<Type> mlir::LLVMTypeConverter::convertVectorType(VectorType) const: Assertion `LLVM::isCompatibleVectorType(vectorType) && "expected vector type compatible with the LLVM dialect"' failed. ``` The LLVM type converter should not define invalid type conversion rules for such types. If there is no type conversion rule, conversion patterns will not apply to ops with such operand types.	2025-01-12 15:17:12 +01:00
Kareem Ergawy	42da12063f	[flang][OpenMP] Extend delayed privatization for `omp.simd` (#122156 ) Adds support for delayed privatization for `simd` directives. This PR includes PFT down to LLVM IR lowering.	2025-01-12 07:46:58 +01:00
Kazu Hirata	4f4e2abb1a	[mlir] Migrate away from PointerUnion::{is,get} (NFC) (#122591 ) Note that PointerUnion::{is,get} have been soft deprecated in PointerUnion.h: // FIXME: Replace the uses of is(), get() and dyn_cast() with // isa<T>, cast<T> and the llvm::dyn_cast<T> I'm not touching PointerUnion::dyn_cast for now because it's a bit complicated; we could blindly migrate it to dyn_cast_if_present, but we should probably use dyn_cast when the operand is known to be non-null.	2025-01-11 13:16:43 -08:00
William Moses	38fcf62483	[MLIR] Import LLVM add flag to disable loadalldialects (#122574 ) Co-authored-by: Oleksandr "Alex" Zinenko <ftynse@gmail.com>	2025-01-11 09:11:22 -05:00
William Moses	b306eff56f	[MLIR] Enable inlining for private symbols (#122572 ) The inlining code for llvm funcs seems to have needlessly forbidden inlining of private (e.g. non-cloning) symbols.	2025-01-11 09:10:27 -05:00
Kazu Hirata	35e89897a4	[Dialect] Migrate away from PointerUnion::{is,get} (NFC) (#122568 ) Note that PointerUnion::{is,get} have been soft deprecated in PointerUnion.h: // FIXME: Replace the uses of is(), get() and dyn_cast() with // isa<T>, cast<T> and the llvm::dyn_cast<T>	2025-01-11 02:06:33 -08:00
Kazu Hirata	26d513d197	[TableGen] Migrate away from PointerUnion::{is,get} (NFC) (#122569 ) Note that PointerUnion::{is,get} have been soft deprecated in PointerUnion.h: // FIXME: Replace the uses of is(), get() and dyn_cast() with // isa<T>, cast<T> and the llvm::dyn_cast<T>	2025-01-11 00:17:40 -08:00
Kazu Hirata	129ec84574	[Conversion] Migrate away from PointerUnion::{is,get} (NFC) (#122421 ) Note that PointerUnion::{is,get} have been soft deprecated in PointerUnion.h: // FIXME: Replace the uses of is(), get() and dyn_cast() with // isa<T>, cast<T> and the llvm::dyn_cast<T> I'm not touching PointerUnion::dyn_cast for now because it's a bit complicated; we could blindly migrate it to dyn_cast_if_present, but we should probably use dyn_cast when the operand is known to be non-null.	2025-01-10 15:10:17 -08:00
Guray Ozen	2e6030ef6a	[MLIR][NVVM] Add missing cmake dependency Another fix	2025-01-10 12:22:20 +01:00
Guray Ozen	1ef2580972	[MLIR][NVVM] Add missing cmake dependency NVVMdialect uses InferIntRangeInterface, but its dependence was missing in cmake. This PR adds that.	2025-01-10 11:26:59 +01:00
Guray Ozen	66e41a1a20	[MLIR][NVVM] Declare InferIntRangeInterface for RangeableRegisterOp (#122263 )	2025-01-10 10:32:25 +01:00
Lukas Sommer	4adeb6cf55	[mlir][spirv] Add convergent attribute to builtin (#122131 ) Add the `convergent` attribute to builtin functions and builtin function calls when lowering SPIR-V non-uniform group functions to LLVM dialect. --------- Signed-off-by: Lukas Sommer <lukas.sommer@codeplay.com>	2025-01-10 09:15:18 +01:00
Longsheng Mou	9190e1c0ef	[mlir][linalg] Handle reassociationIndices correctly for 0D tensor (#121683 ) This PR fixes a bug where a value is assigned to a 0-sized reassociationIndices, preventing a crash. Fixes #116043.	2025-01-10 09:23:50 +08:00
Krzysztof Drewniak	0aa831e0ed	[mlir][GPU] Implement ValueBoundsOpInterface for GPU ID operations (#122190 ) The GPU ID operations already implement InferIntRangeInterface, which gives constant lower and upper bounds on those IDs when appropriate metadata is prentent on the operations or in the surrounding context. This commit uses that existing code to implement the ValueBoundsOpInterface, which is used when analyzing affine operations (unlike the integer range interface, which is used for arithmetic optimization). It also implements the interface for gpu.launch, where we can use it to express the constraint that block/grid sizes are equal to their value from outside the launch op and that the corresponding IDs are bounded above by that size. As a consequence, the test pass for this inference is updated to work on a FunctionOpInterface and not a func.func, creating minor churn in other tests.	2025-01-09 11:42:22 -08:00
Razvan Lupusoru	cbcb7ad32e	[mlir][acc] Introduce MappableType interface (#122146 ) OpenACC data clause operations previously required that the variable operand implemented PointerLikeType interface. This was a reasonable constraint because the dialects currently mixed with `acc` do use pointers to represent variables. However, this forces the "pointer" abstraction to be exposed too early and some cases are not cleanly representable through this approach (more specifically FIR's `fix.box` abstraction). Thus, relax this by allowing a variable to be a type which implements either `PointerLikeType` interface or `MappableType` interface.	2025-01-09 10:27:37 -08:00
Andrea Faulds	7724be9728	[mlir][spirv] Do SPIR-V serialization in -test-vulkan-runner-pipeline (#121494 ) This commit is a further incremental step toward moving the whole mlir-vulkan-runner MLIR pass pipeline into mlir-opt (see #73457). The previous step was b225b3adf7b78387c9fcb97a3ff0e0a1e26eafe2, which moved all device passes prior to SPIR-V serialization into a new mlir-opt test pass, `-test-vulkan-runner-pipeline`. This commit changes how SPIR-V serialization is accomplished for Vulkan runner tests. Until now, this was done by the Vulkan-specific ConvertGpuLaunchFuncToVulkanLaunchFunc pass. With this commit, this responsibility is removed from that pass, and is instead done with the existing generic GpuModuleToBinaryPass. In addition, the SPIR-V serialization step is no longer done inside mlir-vulkan-runner, but rather inside mlir-opt (in the `-test-vulkan-runner-pipeline` pass). Both of these changes represent a greater alignment between mlir-vulkan-runner and the other GPU integration tests. Notably, the IR shapes produced by the mlir-opt pipelines for the Vulkan and SYCL runners are now much more similar, with both using a gpu.binary op for the serialized SPIR-V kernel. In order to enable this, this commit includes these supporting changes: - ConvertToSPIRVPass is enhanced to support producing the IR shape where a spirv.module is nested inside a gpu.module, since this is what GpuModuleToBinaryPass expects. - ConvertGPULaunchFuncToVulkanLaunchFunc is changed to remove its SPIR-V serialization functionality, and instead now extracts the SPIR-V from a gpu.binary operation (as produced by ConvertToSPIRVPass). - `-test-vulkan-runner-pipeline` now attaches SPIR-V target information required by GpuModuleToBinaryPass. - The WebGPU pass option, which had been removed from mlir-vulkan-runner in the previous commit in this series, is restored as an option to `-test-vulkan-runner-pipeline` instead, so that the WebGPU pass continues being inserted into the pipeline just before SPIR-V serialization.	2025-01-09 17:58:51 +01:00
Alexander Belyaev	d056c756ae	[mlir][scf] Fix unrolling when the yielded value is defined above the loop. (#122177 )	2025-01-09 17:31:17 +01:00
Andrzej Warzyński	21ba7aef3b	[mlir][vector][nfc] Update `alignedConversionPrecondition` (#122136 ) Adds some comments and re-name variables to clarify the usage.	2025-01-09 15:14:34 +00:00
Kareem Ergawy	6f9e688203	[flang][OpenMP] Fix reduction init region block management (#122079 ) Replaces https://github.com/llvm/llvm-project/pull/121886 Fixes https://github.com/llvm/llvm-project/issues/120254 (hopefully 🤞) ## Problem Consider the following example: ```fortran program test real :: x(1) integer :: i !$omp parallel do reduction(+:x) do i = 1,1 x = 1 end do !$omp end parallel do end program ``` The HLFIR+OMP IR for this example looks like this: ```mlir func.func @_QQmain() { ... omp.parallel { %5 = fir.embox %4#0(%3) : (!fir.ref<!fir.array<1xf32>>, !fir.shape<1>) -> !fir.box<!fir.array<1xf32>> %6 = fir.alloca !fir.box<!fir.array<1xf32>> ... omp.wsloop private(@_QFEi_private_ref_i32 %1#0 -> %arg0 : !fir.ref<i32>) reduction(byref @add_reduction_byref_box_1xf32 %6 -> %arg1 : !fir.ref<!fir.box<!fir.array<1xf32>>>) { omp.loop_nest (%arg2) : i32 = (%c1_i32) to (%c1_i32_0) inclusive step (%c1_i32_1) { ... omp.yield } } omp.terminator } return } ``` The problem addressed by this PR is related to: the `alloca` in the `omp.parallel` region + the related `reduction` clause on the `omp.wsloop` op. When we try translate the reduction from MLIR to LLVM, we have to choose an `alloca` insertion point. This happens in `convertOmpWsloop` where at entry to that function, this is what the LLVM module looks like: ```llvm define void @_QQmain() { %tid.addr = alloca i32, align 4 ... entry: %omp_global_thread_num = call i32 @__kmpc_global_thread_num(ptr @1) br label %omp.par.entry omp.par.entry: %tid.addr.local = alloca i32, align 4 ... br label %omp.par.region omp.par.region: br label %omp.par.region1 omp.par.region1: ... %5 = alloca { ptr, i64, i32, i8, i8, i8, i8, [1 x [3 x i64]] }, align 8 ``` Now, when we choose an `alloca` insertion point for the reduction, this is the chosen block `omp.par.entry` (without the changes in this PR). The problem is that the allocation needed for the reduction needs to reference the `%5` SSA value. This results in inserting allocations in `omp.par.entry` that reference allocations in a later block `omp.par.region1` which causes the `Instruction does not dominate all uses!` error. ## Possible solution - take 2: This PR contains a more localized solution than https://github.com/llvm/llvm-project/pull/121886. It makes sure that on entry to `initReductionVars`, the IR builder is at a point where we can starting inserting initialization region; to make things cleaner, we still split the builder insertion point to a dedicated `omp.reduction.init`. This way we avoid splitting after the latest allocation block; which is what causing the issue.	2025-01-09 16:11:18 +01:00
Pietro Ghiglio	cdd652eb28	[MLIR][GPU] Support bf16 and i1 gpu::shuffles to LLVMSPIRV conversion (#119675 ) This PR adds support to the `bf16` and `i1` data types when converting `gpu::shuffle` to the `LLVMSPV` dialect, by inserting `bitcast` to/from `i16` (for `bf16`) and extending/truncating to `i8` (for `i1`).	2025-01-09 13:16:18 +01:00
Benjamin Kramer	35c5e56b61	Clean up -Wdangling-assignment-gsl in clang and mlir These are triggering after `b037bceef6`.	2025-01-08 14:46:15 +01:00
William Moses	1c067a513c	[MLIR] Enable import of non self referential alias scopes (#121987 ) Fixes #121965. --------- Co-authored-by: Christian Ulmann <christianulmann@gmail.com> Co-authored-by: Alex Zinenko <git@ozinenko.com>	2025-01-08 13:40:05 +01:00
Jack Frankland	360a03c980	[mlir][tosa] Add acc_type to Tosa-v1.0 Conv Ops (#121466 ) Tosa v1.0 adds accumulator type attributes to the various convolution operations defined in the spec. Update the dialect and any lit tests to include these attributes. Signed-off-by: Tai Ly <tai.ly@arm.com> Co-authored-by: Tai Ly <tai.ly@arm.com>	2025-01-08 12:12:26 +02:00
Longsheng Mou	c1d01b2fc2	[mlir][tosa] Add missing verifier for `tosa.pad` (#120934 ) This PR adds a missing verifier for `tosa.pad`, ensuring that the padding shape matches [2*rank(shape1)] according to V1.0.0 Specification. Fixes #119840.	2025-01-08 10:45:59 +02:00
Alex MacLean	4583f6d344	[NVPTX] Switch front-ends and tests to ptx_kernel cc (#120806 ) the `ptx_kernel` calling convention is a more idiomatic and standard way of specifying a NVPTX kernel than using the metadata which is not supposed to change the meaning of the program. Further, checking the calling convention is significantly faster than traversing the metadata, improving compile time. This change updates the clang and mlir frontends as well as the NVPTXCtorDtorLowering pass to emit kernels using the calling convention. In addition, this updates all NVPTX unit tests to use the calling convention as well.	2025-01-07 18:24:50 -08:00
Krzysztof Drewniak	c6f67b8e39	[mlir][affine] Add ValueBoundsOpInterface to [de]linearize_index (#121833 ) Since a need for it came up dowstream (in proving that loops run at least once), this commit implements the ValueBoundsOpInterface for affine.delinearize_index and affine.linearize_index, using affine map representations of the operations they perform. These implementations also use information from outer bounds to impose additional constraints when those are available.	2025-01-07 16:28:14 -06:00
vfdev	a0f5bbcfb7	Fixed typo in dunder get/set methods in PyAttrBuilderMap (#121794 ) Description: - fixed a typo in the method name: dunde -> dunder	2025-01-07 10:33:01 -05:00
Michael Jungmair	1fb98b5a7e	[mlir][Transforms] Make LocationSnapshotPass respect OpPrintingFlags (#119373 ) The current implementation of LocationSnapshotPass takes an OpPrintingFlags argument and stores it as member, but does not use it for printing. Properly implement the printing flags, also supporting command line args. --------- Co-authored-by: Mehdi Amini <joker.eph@gmail.com>	2025-01-07 12:14:35 +01:00
William Moses	5656cbca52	[MLIR][CAPI] export LLVMFunctionType param getter and setters (#121888 )	2025-01-07 02:39:44 -05:00
MaheshRavishankar	8cd94e0b6d	[mlir][Affine] Add nsw to lowering of `AffineMulExpr`. (#121535 ) Since index operations have no set bitwidth, it is ill-defined to use signed/unsigned wrapping behavior. The corollary to which is that it is always safe to add nsw/nuw to lowering of affine ops. Also add a folder to fold `div(s\|u)i (mul (a, v), v) -> a` Signed-off-by: MaheshRavishankar <mravisha@amd.com>	2025-01-06 14:57:24 -08:00
Matthias Springer	599c739905	[mlir][GPU] Add NVVM-specific `cf.assert` lowering (#120431 ) This commit add an NVIDIA-specific lowering of `cf.assert` to to `__assertfail`. Note: `getUniqueFormatGlobalName`, `getOrCreateFormatStringConstant` and `getOrDefineFunction` are moved to `GPUOpsLowering.h`, so that they can be reused.	2025-01-06 12:00:11 +01:00
Oleksandr "Alex" Zinenko	f6bfbc8777	[mlir] flush output in transform.print (#121382 ) Print operations are often used for debugging, immediately before the compiler aborts. In such cases, it is sometimes possible that the output isn't fully produced yet. Make sure it is by explicitly flushing the output.	2025-01-06 10:47:40 +01:00
Matthias Springer	5f7568a32c	[mlir][Transforms] Fix mapping in `findOrBuildReplacementValue` (#121644 ) Fixes two minor issues in `findOrBuildReplacementValue`: * Remove a redundant `mapping.map`. * Map `repl` instead of `value`. We used to overwrite an existing mapping, which could introduce extra materializations. Note: We generally do not want to overwrite mappings, but create a chain of mappings. There are still a few more places, where a mapping is overwritten. Once those are fixed, I will put an assertion into `ConversionValueMapping::map`.	2025-01-06 08:55:18 +01:00
Matthias Springer	2dcb3b9f37	[mlir][ArmSME] Remove func patterns from vector lowering (#121640 ) Remove `func.call` and `func.return` patterns from `populateArmSVELegalizeForLLVMExportPatterns`. This function is called from `ConvertVectorToLLVMPass::runOnOperation`. That pass should lower only `vector` dialect ops, not `func` dialect ops. These patterns also seem to be unnecessary, as no test cases are failing without them. Also note that there is no `func.func` pattern, so any application of the above-mentioned patterns produces invalid IR.	2025-01-05 17:44:13 +01:00
Matthias Springer	486f83faa3	[mlir][Transforms][NFC] Simplify `buildUnresolvedMaterialization` implementation (#121651 ) The `buildUnresolvedMaterialization` implementation used to check if a materialization is necessary. A materialization is not necessary if the desired types already match the input. However, this situation can never happen: we look for mapped values with the desired type at the call sites before requesting a new unresolved materialization. The previous implementation seemed incorrect because `buildUnresolvedMaterialization` created a mapping that is never rolled back. (When in reality that code was never executed, so it is technically not incorrect.) Also fix a comment that in `findOrBuildReplacementValue` that was incorrect.	2025-01-05 17:32:07 +01:00
William Moses	b5f21671ef	MLIR: Enable importing inlineasm calls (#121624 )	2025-01-05 11:02:49 -05:00
Matthias Springer	afef716e83	[mlir][Transforms] Fix build after #116524 (part 2) (#121662 ) Since #116524, an integration test started to become flaky (failure rate ~15%). ``` bin/mlir-opt mlir/test/Integration/Dialect/SparseTensor/CPU/sparse_block_matmul.mlir --sparsifier="enable-arm-sve=true enable-runtime-library=false vl=2 reassociate-fp-reductions=true enable-index-optimizations=true" \| mlir-cpu-runner --march=aarch64 --mattr="+sve" -e main -entry-point-result=void -shared-libs=./lib/libmlir_runner_utils.so,./lib/libmlir_c_runner_utils.so \| bin/FileCheck mlir/test/Integration/Dialect/SparseTensor/CPU/sparse_block_matmul.mlir # executed command: bin/mlir-opt mlir/test/Integration/Dialect/SparseTensor/CPU/sparse_block_matmul.mlir '--sparsifier=enable-arm-sve=true enable-runtime-library=false vl=2 reassociate-fp-reductions=true enable-index-optimizations=true' # .---command stderr------------ # \| mlir/test/Integration/Dialect/SparseTensor/CPU/sparse_block_matmul.mlir:71:10: error: null operand found # \| %0 = linalg.generic #trait_mul # \| ^ # \| mlir/test/Integration/Dialect/SparseTensor/CPU/sparse_block_matmul.mlir:71:10: note: see current operation: %70 = "arith.mulf"(<<NULL VALUE>>, %69) <{fastmath = #arith.fastmath<none>}> : (<<NULL TYPE>>, vector<[2]xf64>) -> vector<[2]xf64> # `----------------------------- # error: command failed with exit status: 1 ``` I traced the issue back to the `DenseMap<ValueVector, ValueVector, ValueVectorMapInfo> mapping;` data structure: previously, some `mapping.erase(foo)` calls were unsuccessful (returning `false`), even though the `mapping` contains `foo` as a key.	2025-01-04 21:28:59 +01:00
Matthias Springer	c9d61cde2b	[mlir][Transforms][NFC] Delete unused `nTo1TempMaterializations` (#121647 ) `nTo1TempMaterializations` is no longer used since the conversion value mapping supports 1:N mappings.	2025-01-04 15:16:35 +01:00
Matthias Springer	95c5c5d4ba	[mlir][Transforms][NFC] Use `DominanceInfo` to compute materialization insertion point (#120746 ) In the dialect conversion driver, use `DominanceInfo` to compute a suitable insertion point for N:1 source materializations.	2025-01-04 09:23:15 +01:00
Matthias Springer	2d424765f4	[mlir][IR][NFC] `DominanceInfo`: Share same impl for block/op dominance (#115587 ) The `properlyDominates` implementations for blocks and ops are very similar. This commit replaces them with a single implementation that operates on block iterators. That implementation can be used to implement both `properlyDominates` variants. Before: ```c++ template <bool IsPostDom> bool DominanceInfoBase<IsPostDom>::properlyDominatesImpl(Block a, Block b) const; template <bool IsPostDom> bool DominanceInfoBase<IsPostDom>::properlyDominatesImpl( Operation a, Operation b, bool enclosingOpOk) const; ``` After: ```c++ template <bool IsPostDom> bool DominanceInfoBase<IsPostDom>::properlyDominatesImpl( Block aBlock, Block::iterator aIt, Block bBlock, Block::iterator bIt, bool enclosingOk) const; ``` Note: A subsequent commit will add a new public `properlyDominates` overload that accepts block iterators. That functionality can then be used to find a valid insertion point at which a range of values is defined (by utilizing post dominance).	2025-01-04 09:12:03 +01:00
Krzysztof Drewniak	9f5cefebb4	[mlir][Affine] Generalize the linearize(delinearize()) simplifications (#117637 ) The existing canonicalization patterns would only cancel out cases where the entire result list of an affine.delineraize_index was passed to an affine.lineraize_index and the basis elements matched exactly (except possibly for the outer bounds). This was correct, but limited, and left open many cases where a delinearize_index would take a series of divisions and modulos only for a subsequent linearize_index to use additions and multiplications to undo all that work. This sort of simplification is reasably easy to observe at the level of splititng and merging indexes, but difficult to perform once the underlying arithmetic operations have been created. Therefore, this commit generalizes the existing simplification logic. Now, any run of two or more delinearize_index results that appears within the argument list of a linearize_index operation with the same basis (or where they're both at the outermost position and so can be unbonded, or when `linearize_index disjoint` implies a bound not present on the `delinearize_index`) will be reduced to one signle delinearize_index output, whose basis element (that is, size or length) is equal to the product of the sizes that were simplified away. That is, we can now simplify %0:2 = affine.delinearize_index %n into (8, 8) : inde, index %1 = affine.linearize_index [%x, %0#0, %0#1, %y] by (3, 8, 8, 5) : index to the simpler %1 = affine.linearize_index [%x, %n, %y] by (3, 64, 5) : index This new pattern also works with dynamically-sized basis values. While I'm here, I fixed a bunch of typos in existing tests, and added a new getPaddedBasis() method to make processing potentially-underspecified basis elements simpler in some cases.	2025-01-03 15:12:39 -06:00
Jeff Niu	9d8e634e85	[mlir][scf] Always remove for iter args that are loop invariant (#121555 ) This alters the condition in ForOpIterArgsFolder to always remove iter args when their initial value equals the yielded value, not just when the arg has no use.	2025-01-03 11:44:46 -08:00
Ivan Butygin	1cade86997	[mlir][arith] Fold `(a * b) / b -> a` (#121534 ) If overflow flags allow it. Alive2 check: https://alive2.llvm.org/ce/z/5XWjWE	2025-01-03 20:02:59 +03:00
agozillon	fa56e8bb64	[OpenMP][MLIR] Fix threadprivate lowering when compiling for target when target operations are in use (#119310 ) Currently the compiler will ICE in programs like the following on the device lowering pass: ``` program main implicit none type i1_t integer :: val(1000) end type i1_t integer :: i type(i1_t), pointer :: newi1 type(i1_t), pointer :: tab=>null() integer, dimension(:), pointer :: tabval !$omp THREADPRIVATE(tab) allocate(newi1) tab=>newi1 tab%val(:)=1 tabval=>tab%val !$omp target teams distribute parallel do do i = 1, 1000 tabval(i) = i end do !$omp end target teams distribute parallel do end program main ``` This is due to the fact that THREADPRIVATE returns a result operation, and this operation can actually be used by other LLVM dialect (or other dialect) operations. However, we currently skip the lowering of threadprivate, so we effectively never generate and bind an LLVM-IR result to the threadprivate operation result. So when we later go on to lower dependent LLVM dialect operations, we are missing the required LLVM-IR result, try to access and use it and then ICE. The fix in this particular PR is to allow compilation of threadprivate for device as well as host, and simply treat the device compilation as a no-op, binding the LLVM-IR result of threadprivate with no alterations and binding it, which will allow the rest of the compilation to proceed, where we'll eventually discard the host segment in any case. The other possible solution to this I can think of, is doing something similar to Flang's passes that occur prior to CodeGen to the LLVM dialect, where they erase/no-op certain unrequired operations or transform them to lower level series of operations. And we would erase/no-op threadprivate on device as we'd never have these in target regions. The main issues I can see with this are that we currently do not specialise this stage based on wether we're compiling for device or host, so it's setting a precedent and adding another point of having to understand the separation between target and host compilation. I am also not sure we'd necessarily want to enforce this at a dialect level incase someone else wishes to add a different lowering flow or translation flow. Another possible issue is that a target operation we have/utilise would depend on the result of threadprivate, meaning we'd not be allowed to entirely erase/no-op it, I am not sure of any situations where this may be an issue currently though.	2025-01-03 18:01:01 +01:00

1 2 3 4 5 ...

16190 Commits