Updates:
1. Infer lvlToDim from dimToLvl
2. Add more tests for block sparsity
3. Finish TODOs related to lvlToDim, including adding lvlToDim to python
binding
Verification of lvlToDim that user provides will be implemented in the
next PR.
This makes these match the behaviour of optional attributes (which are
omitted when they are their default value of none). This allows for
concise assembly formats without a custom printer.
An extra print of " " is also removed, this does change any existing
uses of oilists, but if the parameter before the oilist is optional,
that would previously add an extra space.
This #68694 + some fixes for the MLIR Python tests, unfortunately GitHub
does not allow re-opening PRs 😕
This PR creates the necessary files to support bindings for operations
in the affine dialect.
This is the first of many PRs which will progressively introduce
affine.load, affine.for, etc operations. I would like to
acknowledge the work by Nelli's author @makslevental :
https://github.com/makslevental/nelli/blob/main/nelli/mlir/affine/affine.py
which jump-starts the work.
This PR adds the additional generation of what I'm calling "value
builders" (a term I'm not married to) that look like this:
```python
def empty(sizes, element_type, *, loc=None, ip=None):
return get_result_or_results(tensor.EmptyOp(sizes=sizes, element_type=element_type, loc=loc, ip=ip))
```
which instantiates a `tensor.EmptyOp` and then immediately grabs the
result (`OpResult`) and then returns that *instead of a handle to the
op*.
What's the point of adding these when `EmptyOp.result` already exists?
My claim/feeling/intuition is that eDSL users are more comfortable with
a value centric programming model (i.e., passing values as operands) as
opposed to an operator instantiation programming model. Thus this change
enables (or at least goes towards) the bindings supporting such a user
and use case. For example,
```python
i32 = IntegerType.get_signless(32)
...
ten1 = tensor.empty((10, 10), i32)
ten2 = tensor.empty((10, 10), i32)
ten3 = arith.addi(ten1, ten2)
```
Note, in order to present a "pythonic" API and enable "pythonic" eDSLs,
the generated identifiers (op names and operand names) are snake case
instead of camel case and thus `llvm::convertToSnakeFromCamelCase`
needed a small fix. Thus this PR is stacked on top of
https://github.com/llvm/llvm-project/pull/68375.
In addition, as a kind of victory lap, this PR adds a "rangefor" that
looks and acts exactly like python's `range` but emits `scf.for`.
This patch updates `transform.loop.peel` so that this Op returns two
rather than one handle:
* one for the peeled loop, and
* one for the remainder loop.
Also, following this change this Op will fail if peeling fails. This is
consistent with other similar Ops that also fail if no transformation
takes place.
Relands #67482 with an extra fix for transform_loop_ext.py
Rename and restructure tiling-related transform ops from the structured
extension to be more homogeneous. In particular, all ops now follow a
consistent naming scheme:
- `transform.structured.tile_using_for`;
- `transform.structured.tile_using_forall`;
- `transform.structured.tile_reduction_using_for`;
- `transform.structured.tile_reduction_using_forall`.
This drops the "_op" naming artifact from `tile_to_forall_op` that
shouldn't have been included in the first place, consistently specifies
the name of the control flow op to be produced for loops (instead of
`tile_reduction_using_scf` since `scf.forall` also belongs to `scf`),
and opts for the `using` connector to avoid ambiguity.
The loops produced by tiling are now systematically placed as *trailing*
results of the transform op. While this required changing 3 out of 4 ops
(except for `tile_using_for`), this is the only choice that makes sense
when producing multiple `scf.for` ops that can be associated with a
variadic number of handles. This choice is also most consistent with
*other* transform ops from the structured extension, in particular with
fusion ops, that produce the structured op as the leading result and the
loop as the trailing result.
This PR adds a new transform op that replaces `memref.alloca`s with
`memref.get_global`s to newly inserted `memref.global`s. This is useful,
for example, for allocations that should reside in the shared memory of
a GPU, which have to be declared as globals.
This PR renames the vectorization transform ops as follows:
* `structured.masked_vectorize` => `structured.vectorize`. This reflects
the fact that since [recently](https://reviews.llvm.org/D157774) the op
can also handle the unmasked case.
* `structured.vectorize` =>
`structured.vectorize_children_and_applies_patterns`. This reflects the
fact that the op does not just vectorize the given payload op but all
vectorizable children contained in it, and applies patterns before and
after for preparation and clean-up.
This rename was discussed first
[here](https://reviews.llvm.org/D157774).
The PR also adapts and cleans ups the tablegen description of the
`VectorizeChildrenAndApplyPatternsOp` (formerly `VectorizeOp`).
This commit removes the deallocation capabilities of
one-shot-bufferization. One-shot-bufferization should never deallocate
any memrefs as this should be entirely handled by the
ownership-based-buffer-deallocation pass going forward. This means the
`allow-return-allocs` pass option will default to true now,
`create-deallocs` defaults to false and they, as well as the escape
attribute indicating whether a memref escapes the current region, will
be removed. A new `allow-return-allocs-from-loops` option is added as a
temporary workaround for some bufferization limitations.
This PR cleans up the test of the mix-ins of this dialect. Most of the
character diff is due to factoring out the creation of the the top-level
sequence into a decorator. This decorator siginficantly shortens the
definition of the individual tests and can be used in all but one test,
where the top-level op is a PDL op. The only functional diff is due to
the fact that the decator uses `transform.any_op` instead of
`pdl.operation` for the type of the root handle. The only remaining
usages of the PDL dialects is now in the test a PDL-related op.
This is the first commit in a series with the goal to rework the
BufferDeallocation pass. Currently, this pass heavily relies on copies
to perform correct deallocations, which leads to very slow code and
potentially high memory usage. Additionally, there are unsupported cases
such as returning memrefs which this series of commits aims to add
support for as well.
This first commit removes the deallocation capabilities of
one-shot-bufferization.One-shot-bufferization should never deallocate any
memrefs as this should be entirely handled by the buffer-deallocation pass
going forward. This means the allow-return-allocs pass option will
default to true now, create-deallocs defaults to false and they, as well
as the escape attribute indicating whether a memref escapes the current region,
will be removed.
The documentation should w.r.t. these pass option changes should also be
updated in this commit.
Reviewed By: springerm
Differential Revision: https://reviews.llvm.org/D156662
This patch is part of a larger initiative aimed at fixing floating-point `max` and `min` operations in MLIR: https://discourse.llvm.org/t/rfc-fix-floating-point-max-and-min-operations-in-mlir/72671.
This commit addresses Task 1.2 of the mentioned RFC. By renaming these operations, we align their names with LLVM intrinsics that have corresponding semantics.
This patch adds attribute builders for all buildable attributes from the
builtin dialect that did not previously have any. These builders can be
used to construct attributes of a particular type identified by a string
from a Python argument without knowing the details of how to pass that
Python argument to the attribute constructor. This is used, for example,
in the generated code of the Python bindings of ops.
The list of "all" attributes was produced with:
(
grep -h "ods_ir.AttrBuilder.get" $(find ../build/ -name "*_ops_gen.py") \
| cut -f2 -d"'"
git grep -ho "^def [a-zA-Z0-9_]*" -- include/mlir/IR/CommonAttrConstraints.td \
| cut -f2 -d" "
) | sort -u
Then, I only retained those that had an occurence in
`mlir/include/mlir/IR`. In particular, this drops many dialect-specific
attributes; registering those builders is something that those dialects
should do. Finally, I removed those attrbiutes that had a match in
`mlir/python/mlir/ir.py` already and implemented the remaining ones. The
only ones that still miss a builder now are the following:
* Represent more than one possible attribute type:
- `Any.*Attr` (9x)
- `IntNonNegative`
- `IntPositive`
- `IsNullAttr`
- `ElementsAttr`
* I am not sure what "constant attributes" are:
- `ConstBoolAttrFalse`
- `ConstBoolAttrTrue`
- `ConstUnitAttr`
* `Location` not exposed by Python bindings:
- `LocationArrayAttr`
- `LocationAttr`
* `get` function not implemented in Python bindings:
- `StringElementsAttr`
This patch also fixes a compilation problem with
`I64SmallVectorArrayAttr`.
Reviewed By: makslevental, rkayaith
Differential Revision: https://reviews.llvm.org/D159403
The mix-in of the `MultiTileSizesOp` set the default value of its
`divisor` argument. This repeats information from the tablegen
defintion, is not necessary (since the generic code deals with `None`
and default values), and has the risk of running out of sync without
people noticing. This patch removes the setting of the value and forward
`None` to the generic constructor instead.
Reviewed By: ftynse
Differential Revision: https://reviews.llvm.org/D159416
This patch simplifies and improves the mix-in of the `TileOp`. In
particular:
* Accept all types of sizes (static, dynamic, scalable) in a single
argument `sizes`.
* Use the existing convenience function to dispatch different types of
sizes instead of repeating the implementation in the mix-in.
* Pass on `None` values as is of optional arguments to the init function
of the super class.
* Reformat with default indentation width (4 spaces vs 2 spaces).
* Add a a test for providing scalable sizes.
Reviewed By: ftynse
Differential Revision: https://reviews.llvm.org/D159417
That commit changed the mix-ins for the Python bindings of the PadOp
including some tests, but did not change the corresponding `CHECK`
statements. This patch does that.
The mix-in did not allow to *not* set many of the arguments, even though
they represent optional attributes. Instead, it set default values,
which have different semantics in some cases. In other cases, setting
the default values is already done by the C++ layer, in which case they
are currently redundant and may be wrong in some potential future change
in the TD or C++ files. With this patch, `None` is preserved until the
generated binding, which handles them as desired.
Reviewed By: springerm
Differential Revision: https://reviews.llvm.org/D158844
This reverts a feature introduced in commit
2a5d497494. The goal of that commit was to
allow `StringAttr`s to by used transparently wherever Python `str`s are
expected. But, as the tests in https://reviews.llvm.org/D159182 reveal,
pybind11 doesn't do this conversion based on `__str__` automatically,
unlike for the other types introduced in the commit above. At the same
time, changing `__str__` breaks the symmetry with other attributes of
`print(attr)` printing the assembly of the attribute, so the change
probably has more disadvantages than advantages.
Reviewed By: springerm, rkayaith
Differential Revision: https://reviews.llvm.org/D159255
The printing of `StringAttr` was changed in
https://reviews.llvm.org/D158974, such that some test cases relying on
that output had to be changed as well.
Extends the existing mix-in for VectorizeOp with support for the missing unit attributes.
Also fixes the unintuitive implementation where
`structured.VectorizeOp(target=target, vectorize_padding=False)` still resulted in the creation of the UnitAttr `vectorize_padding`.
Reviewed By: ingomueller-net
Differential Revision: https://reviews.llvm.org/D158726
This PR implements python enum bindings for *all* the enums - this includes `I*Attrs` (including positional/bit) and `Dialect/EnumAttr`.
There are a few parts to this:
1. CMake: a small addition to `declare_mlir_dialect_python_bindings` and `declare_mlir_dialect_extension_python_bindings` to generate the enum, a boolean arg `GEN_ENUM_BINDINGS` to make it opt-in (even though it works for basically all of the dialects), and an optional `GEN_ENUM_BINDINGS_TD_FILE` for handling corner cases.
2. EnumPythonBindingGen.cpp: there are two weedy aspects here that took investigation:
1. If an enum attribute is not a `Dialect/EnumAttr` then the `EnumAttrInfo` record is canonical, as far as both the cases of the enum **and the `AttrDefName`**. On the otherhand, if an enum is a `Dialect/EnumAttr` then the `EnumAttr` record has the correct `AttrDefName` ("load bearing", i.e., populates `ods.ir.AttributeBuilder('<NAME>')`) but its `enum` field contains the cases, which is an instance of `EnumAttrInfo`. The solution is to generate an one enum class for both `Dialect/EnumAttr` and "independent" `EnumAttrInfo` but to make that class interopable with two builder registrations that both do the right thing (see next sub-bullet).
2. Because we don't have a good connection to cpp `EnumAttr`, i.e., only the `enum class` getters are exposed (like `DimensionAttr::get(Dimension value)`), we have to resort to parsing e.g., `Attribute.parse(f'#gpu<dim {x}>')`. This means that the set of supported `assemblyFormat`s (for the enum) is fixed at compile of MLIR (currently 2, the only 2 I saw). There might be some things that could be done here but they would require quite a bit more C API work to support generically (e.g., casting ints to enum cases and binding all the getters or going generically through the `symbolize*` methods, like `symbolizeDimension(uint32_t)` or `symbolizeDimension(StringRef)`).
A few small changes:
1. In addition, since this patch registers default builders for attributes where people might've had their own builders already written, I added a `replace` param to `AttributeBuilder.insert` (`False` by default).
2. `makePythonEnumCaseName` can't handle all the different ways in which people write their enum cases, e.g., `llvm.CConv.Intel_OCL_BI`, which gets turned into `INTEL_O_C_L_B_I` (because `llvm::convertToSnakeFromCamelCase` doesn't look for runs of caps). So I dropped it. On the otherhand regularization does need to done because some enums have `None` as a case (and others might have other python keywords).
3. I turned on `llvm` dialect generation here in order to test `nvvm.WGMMAScaleIn`, which is an enum with [[ d7e26b5620/mlir/include/mlir/IR/EnumAttr.td (L22-L25) | no explicit discriminator ]] for the `neg` case.
Note, dialects that didn't get a `GEN_ENUM_BINDINGS` don't have any enums to generate.
Let me know if I should add more tests (the three trivial ones I added exercise both the supported `assemblyFormat`s and `replace=True`).
Reviewed By: stellaraccident
Differential Revision: https://reviews.llvm.org/D157934
In particular:
* Fix and extend the support for constructing possibly nested ArrayAttrs
from lists of Python ints. This can probably be generalized further
and used in many more places.
* Add arguments for `pad_to_multiple_of` and `copy_back_op`.
* Format with black and reorder (keyword-only) arguments to match
tablegen and (`*_gen.py`) order.
* Extend tests for new features.
Reviewed By: springerm
Differential Revision: https://reviews.llvm.org/D157789
The tests of the mix-in classes of the Python bindings currently passed
even if the ops constructed by the mix-ins under test failed to verify.
This is because the assembled IR is still printed in generic form even
if it does not verify, and the `CHECK` statements are formulated in such
a lenient way that they also match that generic form. This patch adds
explicit verification to the decorator that is used for all test
functions.
Reviewed By: springerm
Differential Revision: https://reviews.llvm.org/D157790
Reland of the original patch after updating the Python binding tests,
a few CUDA/GPU MLIR tests, and ensuring the assembly format is
round-trippable.
This patch splits the lowering of vector.print into first converting
an n-D print into a loop of scalar prints of the elements, then a second
pass that converts those scalar prints into the runtime calls. The
former is done in VectorToSCF and the latter in VectorToLLVM.
The main reason for this is to allow printing scalable vector types,
which are not possible to fully unroll at compile time, though this
also avoids fully unrolling very large vectors.
To allow VectorToSCF to add the necessary punctuation between vectors
and elements, a "punctuation" attribute has been added to vector.print.
This abstracts calling the runtime functions such as printNewline(),
without leaking the LLVM details into the higher abstraction levels.
For example:
vector.print punctuation <comma>
lowers to
llvm.call @printComma() : () -> ()
The output format and runtime functions remain the same, which avoids
the need to alter a large number of tests (aside from the pipelines).
Reviewed By: awarzynski, c-rhodes, aartbik
Differential Revision: https://reviews.llvm.org/D156519
This renaming started with the native ODS support for properties, this is completing it.
A mass automated textual rename seems safe for most codebases.
Drop also the ods prefix to keep the accessors the same as they were before
this change:
properties.odsOperandSegmentSizes
reverts back to:
properties.operandSegementSizes
The ODS prefix was creating divergence between all the places and make it harder to
be consistent.
Reviewed By: jpienaar
Differential Revision: https://reviews.llvm.org/D157173
Reland of the original patch after updating the Python binding tests and
a few CUDA/GPU MLIR tests.
This patch splits the lowering of vector.print into first converting
an n-D print into a loop of scalar prints of the elements, then a second
pass that converts those scalar prints into the runtime calls. The
former is done in VectorToSCF and the latter in VectorToLLVM.
The main reason for this is to allow printing scalable vector types,
which are not possible to fully unroll at compile time, though this
also avoids fully unrolling very large vectors.
To allow VectorToSCF to add the necessary punctuation between vectors
and elements, a "punctuation" attribute has been added to vector.print.
This abstracts calling the runtime functions such as printNewline(),
without leaking the LLVM details into the higher abstraction levels.
For example:
vector.print <comma>
lowers to
llvm.call @printComma() : () -> ()
The output format and runtime functions remain the same, which avoids
the need to alter a large number of tests (aside from the pipelines).
Reviewed By: awarzynski, c-rhodes, aartbik
Differential Revision: https://reviews.llvm.org/D156519
This patch adds a mix-in class for the only transform op of the tensor
dialect that can benefit from one: the MakeLoopIndependentOp. It adds an
overload that makes providing the return type optional.
Reviewed By: ftynse
Differential Revision: https://reviews.llvm.org/D156918
This patch uses the new enum binding generation to add the enums of the
dialect to the Python bindings and uses them in the mix-in class where
it was still missing (namely, the `LayoutMapOption` for the
`function_boundary_type_conversion` of the `OneShotBufferizeOp`.
The patch also piggy-backs a few smaller clean-ups:
* Order the keyword-only arguments alphabetically.
* Add the keyword-only arguments to an overload where they were left out
by accident.
* Change some of the attribute values used in the tests to non-default
values such that they show up in the output IR and check for that
output.
Reviewed By: ftynse
Differential Revision: https://reviews.llvm.org/D156664