Commit Graph

10288 Commits

Author SHA1 Message Date
Rik Huijzer
061e4f24b2 [mlir][doc] Escape effects, interfaces, and traits (#76297)
Fixes https://github.com/llvm/llvm-project/issues/76270.

Thanks to @scottamain for the clear description.


Co-authored-by: Scott Main <scott@modular.com>
2023-12-23 21:48:33 +01:00
Adam Paszke
85b2327192 [mlir][nvvm] Fix the PTX lowering of wgmma.mma_async (#76150) 2023-12-22 14:46:34 +01:00
Dominik Adamski
ffabf73553 [NFC][OpenMP][MLIR] Add test for lowering parallel workshare GPU loop (#76144)
This test checks if MLIR code is lowered according to schema presented
below:

func1() {
    call __kmpc_parallel_51(..., func2, ...)
}

func2() {
   call __kmpc_for_static_loop_4u(..., func3, ...)
}

func3() {
   //loop body
}
2023-12-22 11:58:04 +01:00
Jakub Kuderski
a03c53c530 [mlir][spirv] Add physical storage buffer extension test. NFC. (#76196)
This test demonstrates how the PhysicalStorageBuffer extension can be
used end-2-end in a spir-v module.

This module has been verified to pass serialization, deserialization,
and validation with spirv-val.
2023-12-21 23:06:26 -05:00
Ryan Holt
847a6f8f0a [mlir][MemRef] Add runtime bounds checking (#75817)
This change adds (runtime) bounds checks for `memref` ops using the
existing `RuntimeVerifiableOpInterface`. For `memref.load` and
`memref.store`, we check that the indices are in-bounds of the memref's
index space. For `memref.reinterpret_cast` and `memref.subview` we check
that the resulting address space is in-bounds of the input memref's
address space.
2023-12-22 11:49:15 +09:00
Matthias Springer
c99670ba51 [mlir][vector] LoadOp/StoreOp: Allow 0-D vectors (#76134)
Similar to `vector.transfer_read`/`vector.transfer_write`, allow 0-D
vectors.

This commit fixes
`mlir/test/Dialect/Vector/vector-transfer-to-vector-load-store.mlir`
when verifying the IR after each pattern (#74270). That test produces a
temporary 0-D load/store op.
2023-12-22 11:12:58 +09:00
Andrzej Warzyński
e6f5762879 [mlir][vector][nfc] Add a test case for scalable vectors (#76138)
Extends fold-arith-extf-into-vector-contract.mlir by adding a test case
for scalable vectors.
2023-12-21 18:45:00 +00:00
Billy Zhu
34a65980d7 [MLIR] Erase location of folded constants (#75415)
Follow up to the discussion from #75258, and serves as an alternate
solution for #74670.

Set the location to Unknown for deduplicated / moved / materialized
constants by OperationFolder. This makes sure that the folded constants
don't end up with an arbitrary location of one of the original ops that
became it, and that hoisted ops don't confuse the stepping order.
2023-12-21 09:54:48 -08:00
Finn Plummer
88151dd428 [mlir][spirv] Add folding for SNegate, [Logical]Not (#74992)
Add missing constant propogation folder for SNegate, [Logical]Not.

Implement additional folding when !(!x) for all ops.

This helps for readability of lowered code into SPIR-V.

Part of work for #70704
2023-12-21 18:24:01 +01:00
Maksim Levental
537b2aa264 [mlir][python] meta region_op (#75673) 2023-12-21 11:20:29 -06:00
Oleksandr "Alex" Zinenko
11140cc238 [mlir] mark ChangeResult as nodiscard (#76147)
This enum is used by dataflow analyses to indicate whether further
propagation is necessary to reach the fix point. Accidentally discarding
such a value will likely lead to propagation stopping early, leading to
incomplete or incorrect results. The most egregious example is the
duality between `join` on the analysis class, which triggers propagation
internally, and `join` on the lattice class that does not and expects
the caller to trigger it depending on the returned `ChangeResult`.
2023-12-21 17:58:53 +01:00
Jakub Kuderski
72003adf6b [mlir][gpu] Allow subgroup reductions over 1-d vector types (#76015)
Each vector element is reduced independently, which is a form of
multi-reduction.

The plan is to allow for gradual lowering of multi-reduction that
results in fewer `gpu.shuffle` ops at the end:
1d `vector.multi_reduction` --> 1d `gpu.subgroup_reduce` --> smaller 1d
`gpu.subgroup_reduce` --> packed `gpu.shuffle` over i32

For example we can perform 2 independent f16 reductions with a series of
`gpu.shuffles` over i32, reducing the final number of `gpu.shuffles` by 2x.
2023-12-21 11:55:43 -05:00
Andrzej Warzyński
17afa5befb [mlir][nfc] Update tests for Contract -> Op transforms (#76054)
Updates two tests for vector.contract -> vector.outerproduct
transformations:

1. Rename "vector-contract-to-outerproduct-transforms.mlir" as
   "vector-contract-to-outerproduct-matmul-transforms.mlir". The new
   name more accurate captures what's being tested. it is also
   consistent with
   "vector-contract-to-outerproduct-matvec-transforms.mlir", which
   covers vector matvec operations and makes finding relevant tests
   easier.

2. For matmul tests, move the traits definining the iteration spaces to
   the top of the file. This is consistent with how matvec tests are
   defined and also makes it easy to quickly identify what cases are
   covered.

3. For matmul tests, use more meaningful names for function arguments.
   This helps keep things consistent across the file (i.e. function
   definitions wih check lines and comments).

4. For matvec test, move a few tests around so that the most basic case
   (without masking) is first.

5. Update comments.
2023-12-21 13:20:16 +00:00
Alex Zinenko
78bd124649 Revert "[mlir][python] Make the Context/Operation capsule creation methods work as documented. (#76010)"
This reverts commit bbc2976868.

This change seems to be at odds with the non-owning part semantics of
MlirOperation in C API. Since downstream clients can only take and
return MlirOperation, it does not sound correct to force all returns of
MlirOperation transfer ownership. Specifically, this makes it impossible
for downstreams to implement IR-traversing functions that, e.g., look at
neighbors of an operation.

The following patch triggers the exception, and there does not seem to
be an alternative way for a downstream binding writer to express this:

```
diff --git a/mlir/lib/Bindings/Python/IRCore.cpp b/mlir/lib/Bindings/Python/IRCore.cpp
index 39757dfad5be..2ce640674245 100644
--- a/mlir/lib/Bindings/Python/IRCore.cpp
+++ b/mlir/lib/Bindings/Python/IRCore.cpp
@@ -3071,6 +3071,11 @@ void mlir::python::populateIRCore(py::module &m) {
                   py::arg("successors") = py::none(), py::arg("regions") = 0,
                   py::arg("loc") = py::none(), py::arg("ip") = py::none(),
                   py::arg("infer_type") = false, kOperationCreateDocstring)
+      .def("_get_first_in_block", [](PyOperation &self) -> MlirOperation {
+        MlirBlock block = mlirOperationGetBlock(self.get());
+        MlirOperation first = mlirBlockGetFirstOperation(block);
+        return first;
+      })
       .def_static(
           "parse",
           [](const std::string &sourceStr, const std::string &sourceName,
diff --git a/mlir/test/python/ir/operation.py b/mlir/test/python/ir/operation.py
index f59b1a26ba48..6b12b8da5c24 100644
--- a/mlir/test/python/ir/operation.py
+++ b/mlir/test/python/ir/operation.py
@@ -24,6 +24,25 @@ def expect_index_error(callback):
     except IndexError:
         pass

+@run
+def testCustomBind():
+    ctx = Context()
+    ctx.allow_unregistered_dialects = True
+    module = Module.parse(
+        r"""
+    func.func @f1(%arg0: i32) -> i32 {
+      %1 = "custom.addi"(%arg0, %arg0) : (i32, i32) -> i32
+      return %1 : i32
+    }
+  """,
+        ctx,
+    )
+    add = module.body.operations[0].regions[0].blocks[0].operations[0]
+    op = add.operation
+    # This will get a reference to itself.
+    f1 = op._get_first_in_block()
+
+

 # Verify iterator based traversal of the op/region/block hierarchy.
 # CHECK-LABEL: TEST: testTraverseOpRegionBlockIterators
```
2023-12-21 10:06:44 +00:00
Tobias Gysi
9971b9ab19 [mlir][llvm] Improve alloca handling during inlining (#75961)
This revision changes the alloca handling in the LLVM inliner.
It ensures that alloca operations, even those nested within a
region operation, can be relocated to the entry block of the function,
or the closest ancestor region that is marked with either the
isolated from above or automatic allocation scope trait.

While the LLVM dialect does not have any region operations,
the inlining interface may be used on IR that mixes different
dialects.
2023-12-21 08:11:17 +01:00
Valentin Clement
a25da1a921 [mlir][openacc] Add device_type support for compute operations (#75864)
Re-land PR after being reverted because of buildbot failures.

This patch adds representation for `device_type` clause information on
compute construct (parallel, kernels, serial).

The `device_type` clause on compute construct impacts clauses that
appear after it. The values impacted by `device_type` are now tied with
an attribute array that represent the device_type associated with them.
`DeviceType::None` is used to represent the value produced by a clause
before any `device_type`. The operands and the attribute information are
parser/printed together.

This is an example with `vector_length` clause. The first value (64) is
not impacted by `device_type` so it will be represented with
DeviceType::None. None is not printed. The second value (128) is tied
with the `device_type(multicore)` clause.
```
!$acc parallel vector_length(64) device_type(multicore) vector_length(256)
```
```
acc.parallel vector_length(%c64 : i32, %c128 : i32 [#acc.device_type<multicore>]) {
}
```

When multiple values can be produced for a single clause like
`num_gangs` and `wait`, an extra attribute describe the number of values
belonging to each `device_type`. Values and attributes are
parsed/printed together.

```
acc.parallel num_gangs({%c2 : i32, %c4 : i32}, {%c4 : i32} [#acc.device_type<nvidia>])
```

While preparing this patch I noticed that the wait devnum is not part of
the operations and is not lowered. It will be added in a follow up
patch.
2023-12-20 20:36:09 -08:00
Valentin Clement
553748356c Revert "[mlir][openacc] Add device_type support for compute operations (#75864)"
This reverts commit 8b885eb90f.
2023-12-20 16:08:10 -08:00
Maksim Levental
acaff70841 [mlir][python] move transform extras (#76102) 2023-12-20 17:29:11 -06:00
Valentin Clement (バレンタイン クレメン)
8b885eb90f [mlir][openacc] Add device_type support for compute operations (#75864)
This patch adds representation for `device_type` clause information on
compute construct (parallel, kernels, serial).

The `device_type` clause on compute construct impacts clauses that
appear after it. The values impacted by `device_type` are now tied with
an attribute array that represent the device_type associated with them.
`DeviceType::None` is used to represent the value produced by a clause
before any `device_type`. The operands and the attribute information are
parser/printed together.

This is an example with `vector_length` clause. The first value (64) is
not impacted by `device_type` so it will be represented with
DeviceType::None. None is not printed. The second value (128) is tied
with the `device_type(multicore)` clause.
```
!$acc parallel vector_length(64) device_type(multicore) vector_length(256)
```
```
acc.parallel vector_length(%c64 : i32, %c128 : i32 [#acc.device_type<multicore>]) {
}
```

When multiple values can be produced for a single clause like
`num_gangs` and `wait`, an extra attribute describe the number of values
belonging to each `device_type`. Values and attributes are
parsed/printed together.

```
acc.parallel num_gangs({%c2 : i32, %c4 : i32}, {%c4 : i32} [#acc.device_type<nvidia>])
```

While preparing this patch I noticed that the wait devnum is not part of
the operations and is not lowered. It will be added in a follow up
patch.
2023-12-20 13:45:47 -08:00
Stella Laurenzo
bbc2976868 [mlir][python] Make the Context/Operation capsule creation methods work as documented. (#76010)
This fixes a longstanding bug in the `Context._CAPICreate` method
whereby it was not taking ownership of the PyMlirContext wrapper when
casting to a Python object. The result was minimally that all such
contexts transferred in that way would leak. In addition, counter to the
documentation for the `_CAPICreate` helper (see
`mlir-c/Bindings/Python/Interop.h`) and the `forContext` /
`forOperation` methods, we were silently upgrading any unknown
context/operation pointer to steal-ownership semantics. This is
dangerous and was causing some subtle bugs downstream where this
facility is getting the most use.

This patch corrects the semantics and will only do an ownership transfer
for `_CAPICreate`, and it will further require that it is an ownership
transfer (if already transferred, it was just silently succeeding).
Removing the mis-aligned behavior made it clear where the downstream was
doing the wrong thing.

It also adds some `_testing_` functions to create unowned context and
operation capsules so that this can be fully tested upstream, reworking
the tests to verify the behavior.

In some torture testing downstream, I was not able to trigger any memory
corruption with the newly enforced semantics. When getting it wrong, a
regular exception is raised.
2023-12-20 12:18:58 -08:00
Han-Chung Wang
b33a131c82 [mlir][arith] Add support for expanding arith.maxnumf/minnumf ops. (#75989)
The maxnum/minnum semantics can be found at
https://llvm.org/docs/LangRef.html#llvm-minnum-intrinsic.

The revision also updates function names in lit tests to match op name.

Take arith.maxnumf as example:

```
func.func @maxnumf(%lhs: f32, %rhs: f32) -> f32 {
  %result = arith.maxnumf %lhs, %rhs : f32
  return %result : f32
}
```

will be expanded to

```
func.func @maxnumf(%lhs: f32, %rhs: f32) -> f32 {
  %0 = arith.cmpf ugt, %lhs, %rhs : f32
  %1 = arith.select %0, %lhs, %rhs : f32
  %2 = arith.cmpf uno, %lhs, %lhs : f32
  %3 = arith.select %2, %rhs, %1 : f32
  return %3 : f32
}
```

Case 1: Both LHS and RHS are not NaN; LHS > RHS

In this case, `%1` is LHS. `%3` and `%1` have the same value, so `%3` is
LHS.

Case 2: LHS is NaN and RHS is not NaN

In this case, `%2` is true, so `%3` is always RHS.

Case 3: LHS is not NaN and RHS is NaN

In this case, `%0` is true and `%1` is LHS. `%2` is false, so `%3` and
`%1` have the same value, which is LHS.

Case 4: Both LHS and RHS are NaN:

`%1` and RHS are all NaN, so the result is still NaN.
2023-12-20 10:35:12 -08:00
Paul C Fuqua
11141bc68a Fix what seems to be a silly bug in gpu.set_default_device rewriting. Smoke test included. (#75756) 2023-12-20 09:35:42 -06:00
Gil Rapaport
d9803841f2 [mlir][emitc] Add op modelling C expressions (#71631)
Add an emitc.expression operation that models C expressions, and provide
transforms to form and fold expressions. The translator emits the body
of
emitc.expression ops as a single C expression.
This expression is emitted by default as the RHS of an EmitC SSA value,
but if
possible, expressions with a single use that is not another expression
are
instead inlined. Specific expression's inlining can be fine tuned by
lowering
passes and transforms.
2023-12-20 15:04:46 +02:00
Andrzej Warzyński
354adb44c9 [mlir][vector] Extend CreateMaskFolder (#75842)
Extends `CreateMaskFolder` pattern so that the following:
```mlir
  %c8 = arith.constant 8 : index
  %c16 = arith.constant 16 : index
  %0 = vector.vscale
  %1 = arith.muli %0, %c16 : index
  %10 = vector.create_mask %c8, %1 : vector<8x[16]xi1>
```

is folded as:

```mlir
  %0 = vector.constant_mask [8, 16] : vector<8x[16]xi1>
```
2023-12-20 11:08:54 +00:00
Andrzej Warzyński
d5abd8a1a9 [mlir][vector][nfc] Move tests for scalable outer-product (#76035)
Tests for vector.outerproduct for scalable vectors from
"vector-scalable-outerproduct.mlir" are moved to:

  * ops.mlir and invalid.mlir.

These files are effectively used to document what Ops are supported and
That's basically what the original file was testing (but specifically
for scalable vectors).
2023-12-20 10:53:00 +00:00
Finn Plummer
4c83c27c91 [mlir][spirv] Add folding for [I|Logical][Not]Equal (#74194) 2023-12-20 11:00:28 +01:00
Matthias Springer
f7096428b4 [mlir][GPU] Add RecursiveMemoryEffects to gpu.launch (#75315)
Infer the side effects of `gpu.launch` from its body.
2023-12-20 15:25:25 +09:00
Matthias Springer
c4457e10fe [mlir][IR] Change block/region walkers to enumerate this block/region (#75020)
This change makes block/region walkers consistent with operation
walkers. An operation walk enumerates the current operation. Similarly,
block/region walks should enumerate the current block/region.

Example:
```
// Current behavior:
op1->walk([](Operation *op2) { /* op1 is enumerated */ });
block1->walk([](Block *block2) { /* block1 is NOT enumerated */ });
region1->walk([](Block *block) { /* blocks of region1 are NOT enumerated */ });
region1->walk([](Region *region2) { /* region1 is NOT enumerated });

// New behavior:
op1->walk([](Operation *op2) { /* op1 is enumerated */ });
block1->walk([](Block *block2) { /* block1 IS enumerated */ });
region1->walk([](Block *block) { /* blocks of region1 ARE enumerated */ });
region1->walk([](Region *region2) { /* region1 IS enumerated });
```
2023-12-20 14:51:45 +09:00
Matthias Springer
f10302e3fa [mlir] Require folders to produce Values of same type (#75887)
This commit adds extra assertions to `OperationFolder` and `OpBuilder`
to ensure that the types of the folded SSA values match with the result
types of the op. There used to be checks that discard the folded results
if the types do not match. This commit makes these checks stricter and
turns them into assertions.

Discarding folded results with the wrong type (without failing
explicitly) can hide bugs in op folders. Two such bugs became apparent
in MLIR (and some more in downstream projects) and are fixed with this
change.

Note: The existing type checks were introduced in
https://reviews.llvm.org/D95991.

Migration guide: If you see failing assertions (`folder produced value
of incorrect type`; make sure to run with assertions enabled!), run with
`-debug` or dump the operation right before the failing assertion. This
will point you to the op that has the broken folder. A common mistake is
a mismatch between static/dynamic dimensions (e.g., input has a static
dimension but folded result has a dynamic dimension).
2023-12-20 14:39:22 +09:00
Jakub Kuderski
560564f51c [mlir][vector][gpu] Align minf/maxf reduction kind names with arith (#75901)
This is to avoid confusion when dealing with reduction/combining kinds.
For example, see a recent PR comment:
https://github.com/llvm/llvm-project/pull/75846#discussion_r1430722175.

Previously, they were picked to mostly mirror the names of the llvm
vector reduction intrinsics:
https://llvm.org/docs/LangRef.html#llvm-vector-reduce-fmin-intrinsic. In
isolation, it was not clear if `<maxf>` has `arith.maxnumf` or
`arith.maximumf` semantics. The new reduction kind names map 1:1 to
arith ops, which makes it easier to tell/look up their semantics.

Because both the vector and the gpu dialect depend on the arith dialect,
it's more natural to align names with those in arith than with the
lowering to llvm intrinsics.

Issue: https://github.com/llvm/llvm-project/issues/72354
2023-12-20 00:14:43 -05:00
Matthias Springer
10056c821a [mlir][SCF] scf.parallel: Make reductions part of the terminator (#75314)
This commit makes reductions part of the terminator. Instead of
`scf.yield`, `scf.reduce` now terminates the body of `scf.parallel` ops.
`scf.reduce` may contain an arbitrary number of reductions, with one
region per reduction.

Example:
```mlir
%init = arith.constant 0.0 : f32
%r:2 = scf.parallel (%iv) = (%lb) to (%ub) step (%step) init (%init, %init)
    -> f32, f32 {
  %elem_to_reduce1 = load %buffer1[%iv] : memref<100xf32>
  %elem_to_reduce2 = load %buffer2[%iv] : memref<100xf32>
  scf.reduce(%elem_to_reduce1, %elem_to_reduce2 : f32, f32) {
    ^bb0(%lhs : f32, %rhs: f32):
      %res = arith.addf %lhs, %rhs : f32
      scf.reduce.return %res : f32
  }, {
    ^bb0(%lhs : f32, %rhs: f32):
      %res = arith.mulf %lhs, %rhs : f32
      scf.reduce.return %res : f32
  }
}
```

`scf.reduce` operations can no longer be interleaved with other ops in
the body of `scf.parallel`. This simplifies the op and makes it possible
to assign the `RecursiveMemoryEffects` trait to `scf.reduce`. (This was
not possible before because the op was not a terminator, causing the op
to be DCE'd.)
2023-12-20 11:06:27 +09:00
long.chen
227bfa1fb1 [mlir] fix a crash when lower parallel loop to gpu (#75811) (#75946) 2023-12-20 09:13:15 +08:00
Kunwar Grover
282d501476 [mlir][Transform] Fix crash with invalid ir for transform libraries (#75649)
This patch fixes a crash caused when the transform library interpreter
is given an IR that fails to parse.
2023-12-19 23:16:19 +05:30
Han-Chung Wang
899c2bed9e [mlir][TilingInterface] Early return cloned ops if tile sizes are zeros. (#75410)
It is a trivial early-return case. If the cloned ops are not returned,
it will generate `extract_slice` op that extracts the whole slice.
However, it is not folded away. Early-return to avoid the case.

E.g.,

```mlir
func.func @matmul_tensors(
  %arg0: tensor<?x?xf32>, %arg1: tensor<?x?xf32>, %arg2: tensor<?x?xf32>)
    -> tensor<?x?xf32> {
  %0 = linalg.matmul  ins(%arg0, %arg1: tensor<?x?xf32>, tensor<?x?xf32>)
                     outs(%arg2: tensor<?x?xf32>)
    -> tensor<?x?xf32>
  return %0 : tensor<?x?xf32>
}

module attributes {transform.with_named_sequence} {
  transform.named_sequence @__transform_main(%arg1: !transform.any_op {transform.readonly}) {
    %0 = transform.structured.match ops{["linalg.matmul"]} in %arg1 : (!transform.any_op) -> !transform.any_op
    %1 = transform.structured.tile_using_for %0 [0, 0, 0] : (!transform.any_op) -> (!transform.any_op)
    transform.yield
  }
}
```

Apply the transforms and canonicalize the IR:

```
mlir-opt --transform-interpreter -canonicalize input.mlir
```

we will get

```mlir
module {
  func.func @matmul_tensors(%arg0: tensor<?x?xf32>, %arg1: tensor<?x?xf32>, %arg2: tensor<?x?xf32>) -> tensor<?x?xf32> {
    %c1 = arith.constant 1 : index
    %c0 = arith.constant 0 : index
    %dim = tensor.dim %arg0, %c0 : tensor<?x?xf32>
    %dim_0 = tensor.dim %arg0, %c1 : tensor<?x?xf32>
    %dim_1 = tensor.dim %arg1, %c1 : tensor<?x?xf32>
    %extracted_slice = tensor.extract_slice %arg0[0, 0] [%dim, %dim_0] [1, 1] : tensor<?x?xf32> to tensor<?x?xf32>
    %extracted_slice_2 = tensor.extract_slice %arg1[0, 0] [%dim_0, %dim_1] [1, 1] : tensor<?x?xf32> to tensor<?x?xf32>
    %extracted_slice_3 = tensor.extract_slice %arg2[0, 0] [%dim, %dim_1] [1, 1] : tensor<?x?xf32> to tensor<?x?xf32>
    %0 = linalg.matmul ins(%extracted_slice, %extracted_slice_2 : tensor<?x?xf32>, tensor<?x?xf32>) outs(%extracted_slice_3 : tensor<?x?xf32>) -> tensor<?x?xf32>
    return %0 : tensor<?x?xf32>
  }
}
```

The revision early-return the case so we can get:

```mlir
func.func @matmul_tensors(%arg0: tensor<?x?xf32>, %arg1: tensor<?x?xf32>, %arg2: tensor<?x?xf32>) -> tensor<?x?xf32> {
  %0 = linalg.matmul ins(%arg0, %arg1 : tensor<?x?xf32>, tensor<?x?xf32>) outs(%arg2 : tensor<?x?xf32>) -> tensor<?x?xf32>
  return %0 : tensor<?x?xf32>
}
```
2023-12-19 09:14:43 -08:00
Ivan Butygin
c0d2ea9d42 [mlir][scf] Improve scf.parallel fusion pass (#75852)
Abort fusion if memref load may alias write, but not the exact alias. 
Add alias check hook to `naivelyFuseParallelOps`, so user can customize
alias checking.
Use builtin alias analysis in `ParallelLoopFusion` pass.
2023-12-19 18:07:46 +03:00
Oleksandr "Alex" Zinenko
9519e3ecbf [mlir] support dialect attribute translation to LLVM IR (#75309)
Extend the `amendOperation` mechanism for translating dialect attributes
attached to operations from another dialect when translating MLIR to
LLVM IR. Previously, this mechanism would have no knowledge of the LLVM
IR instructions created for the given operation, making it impossible
for it to perform local modifications such as attaching operation-level
metadata. Collect instructions inserted by the LLVM IR builder and pass
them to `amendOperation`.
2023-12-19 14:18:16 +01:00
Guray Ozen
5caae72d1a [mlir][gpu] Productize test-lower-to-nvvm as gpu-lower-to-nvvm (#75775)
The `test-lower-to-nvvm` pipeline serves as the common and proper
pipeline for nvvm+host compilation, and it's used across our CUDA
integration tests.

This PR updates the `test-lower-to-nvvm` pipeline to `gpu-lower-to-nvvm`
and moves it within `InitAllPasses.h`. The aim is to call it from
Python, also having a standardize compilation process for nvvm.
2023-12-19 08:40:46 +01:00
Jakub Kuderski
07677113ff [mlir][vector] Add pattern to break down reductions into arith ops (#75727)
The number of vector elements considered 'small' enough to extract is
parameterized.                                                   
                                                                 
This is to avoid going into specialized reduction lowering when a
single/couple of arith ops can do. Targets without dedicated reduction  
intrinsics can use that as an emulation path too.                  
                                                                   
Depends on https://github.com/llvm/llvm-project/pull/75846.
2023-12-18 17:54:54 -05:00
Jakub Kuderski
a528cee224 [mlir][vector] Improve makeArithReduction expansion (#75846)
Propagate fast math flags.
Distinguish `minf`/`maxf` and `minimumf`/`maximumf`.

Required for future patterns in
https://github.com/llvm/llvm-project/pull/75727.
2023-12-18 17:47:46 -05:00
srcarroll
b26ee97537 [MLIR][Linalg] Support dynamic sizes in lower_unpack (#75494) 2023-12-18 19:02:04 +01:00
Rik Huijzer
672f1a036a [mlir][memref] Make LoadOp::verify error more clear (#75831)
While debugging https://github.com/llvm/llvm-project/issues/71326, the
`LoadOp::verify` code and error were very confusing. This PR improves
that.

This code was a part from the reverted PR
https://github.com/llvm/llvm-project/pull/75519. Fixing the
`-convert-vector-to-scf` issue is going to take a bit longer and this
code was out of scope anyway.

Co-authored-by: Benjamin Maxwell <macdue@dueutil.tech>
2023-12-18 18:41:05 +01:00
Oleksandr "Alex" Zinenko
4d9d105c70 [mlir] fix filecheck prefixes in a dataflow test (#75794)
-SAME and -LITERAL do not compose in CHECK commands.
2023-12-18 17:11:21 +01:00
Oleksandr "Alex" Zinenko
32a4e3fcca [mlir] support non-interprocedural dataflow analyses (#75583)
The core implementation of the dataflow anlysis framework is
interpocedural by design. While this offers better analysis precision,
it also comes with additional cost as it takes longer for the analysis
to reach the fixpoint state. Add a configuration mechanism to the
dataflow solver to control whether it operates inteprocedurally or not
to offer clients a choice.

As a positive side effect, this change also adds hooks for explicitly
processing external/opaque function calls in the dataflow analyses,
e.g., based off of attributes present in the the function declaration or
call operation such as alias scopes and modref available in the LLVM
dialect.

This change should not affect existing analyses and the default solver
configuration remains interprocedural.

Co-authored-by: Jacob Peng <jacobmpeng@gmail.com>
2023-12-18 14:16:52 +01:00
Dominik Adamski
6deb5d4e44 [NFC][OpenMP][MLIR] Verify if empty workshare loop is lowered correctly (#75518)
Check if workshare loop without loop body is lowered correctly i.e.:
  1) null pointer is passed to OpenMP device RTL function as a
     parameter which denotes loop function body aggregated parameters
  2) Outlined loop function body has only one parameter - loop counter
2023-12-18 11:59:35 +01:00
Kareem Ergawy
d777504355 [MLIR][OpenMP][Offload] Lower target update op to DeviceRT (#75159)
Adds support for lowring `UpdateDataOp` to the DeviceRT. This reuses the
existing utils used by other device directive.
2023-12-18 11:14:46 +01:00
Rik Huijzer
6561efe142 [mlir][python][nfc] Test -print-ir-after-all (#75742)
The functionality to `-print-ir-after-all` was added in
caa159f044.
This PR adds a test and, with that, some documentation.

---------

Co-authored-by: Maksim Levental <maksim.levental@gmail.com>
2023-12-17 20:24:47 +01:00
Rik Huijzer
9f5afc3de9 Revert "[mlir][vector] Fix invalid LoadOp indices being created (#75519)"
This reverts commit 3a1ae2f46d.
2023-12-17 12:34:17 +01:00
Rik Huijzer
3a1ae2f46d [mlir][vector] Fix invalid LoadOp indices being created (#75519)
Fixes https://github.com/llvm/llvm-project/issues/71326.

The cause of the issue was that a new `LoadOp` was created which looked
something like:
```mlir
%arg4 = 
func.func main(%arg1 : index, %arg2 : index) {
  %alloca_0 = memref.alloca() : memref<vector<1x32xi1>>
  %1 = vector.type_cast %alloca_0 : memref<vector<1x32xi1>> to memref<1xvector<32xi1>>
  %2 = memref.load %1[%arg1, %arg2] : memref<1xvector<32xi1>>
  return
}
```
which crashed inside the `LoadOp::verify`. Note here that `%alloca_0` is
0 dimensional, `%1` has one dimension, but `memref.load` tries to index
`%1` with two indices.

This is now fixed by using the fact that `unpackOneDim` always unpacks
one dim


1bce61e6b0/mlir/lib/Conversion/VectorToSCF/VectorToSCF.cpp (L897-L903)

and so the `loadOp` should just index only one dimension.

---------

Co-authored-by: Benjamin Maxwell <macdue@dueutil.tech>
2023-12-17 11:42:35 +01:00
Quinn Dawkins
82ab0f7f36 [mlir][linalg] Fix rank-reduced cases for extract/insert slice in DropUnitDims (#74723)
Inferring the reshape reassociation indices for extract/insert slice ops
based on the read sizes of the original slicing op will generate an
invalid expand/collapse shape op for already rank-reduced cases. Instead
just infer from the shape of the slice.

Ported from Differential Revision: https://reviews.llvm.org/D147488
2023-12-16 10:08:51 -05:00
Peiming Liu
6c06bde7c4 [mlir][sparse] support loop range query using SparseTensorLevel. (#75670) 2023-12-15 16:33:31 -08:00