This patch is a first pass at making consistent syntax across the
`LinalgTransformOp`s that use dynamic index lists for size parameters.
Previously, there were two different forms: inline types in the list, or
place them in the functional style tuple. This patch goes for the
latter.
In order to do this, the `printPackedOrDynamicIndexList`,
`printDynamicIndexList` and their `parse` counterparts were modified so
that the types can be optionally provided to the corresponding custom
directives.
All affected ops now use tablegen `assemblyFormat`, so custom
`parse`/`print` functions have been removed. There are a couple ops that
will likely add dynamic size support, and once that happens it should be
made sure that the assembly remains consistent with the changes in this
patch.
The affected ops are as follows: `pack`, `pack_greedily`,
`tile_using_forall`. The `tile_using_for` and `vectorize` ops already
used this syntax, but their custom assembly was removed.
---------
Co-authored-by: Oleksandr "Alex" Zinenko <ftynse@gmail.com>
This patch modifies the definition of `PadOp` to take transform params
and handles for the `pad_to_multiple_of` operand.
---------
Co-authored-by: Oleksandr "Alex" Zinenko <ftynse@gmail.com>
1. Explicit value means the non-zero value in a sparse tensor. If
explicitVal is set, then all the non-zero values in the tensor have the
same explicit value. The default value Attribute() indicates that it is
not set.
2. Implicit value means the "zero" value in a sparse tensor. If
implicitVal is set, then the "zero" value in the tensor is equal to the
implicit value. For now, we only support `0` as the implicit value but
it could be extended in the future. The default value Attribute()
indicates that the implicit value is `0` (same type as the tensor
element type).
Example:
```
#CSR = #sparse_tensor.encoding<{
map = (d0, d1) -> (d0 : dense, d1 : compressed),
posWidth = 64,
crdWidth = 64,
explicitVal = 1 : i64,
implicitVal = 0 : i64
}>
```
Note: this PR tests that implicitVal could be set to other values as
well. The following PR will add verifier and reject any value that's not
zero for implicitVal.
The Python bindings generated for "async" dialect didn't include any of
the "async" dialect ops. This PR fixes issues with generation of Python
bindings for "async" dialect and adds a test case to use them.
Arithmetic constants for vector types can be constructed from objects
implementing Python buffer protocol such as `array.array`. Note that
until Python 3.12, there is no typing support for buffer protocol
implementers, so the annotations use array explicitly.
Reverts llvm/llvm-project#84103
Arithmetic constants for vector types can be constructed from objects
implementing Python buffer protocol such as `array.array`. Note that
until Python 3.12, there is no typing support for buffer protocol
implementers, so the annotations use array explicitly.
From https://reviews.llvm.org/D153245
This adds support for native PDL (and PDLL) C++ constraints to return
results.
This is useful for situations where a pattern checks for certain
constraints of multiple interdependent attributes and computes a new
attribute value based on them. Currently, for such an example it is
required to escape to C++ during matching to perform the check and after
a successful match again escape to native C++ to perform the computation
during the rewriting part of the pattern. With this work we can do the
computation in C++ during matching and use the result in the rewriting
part of the pattern. Effectively this enables a choice in the trade-off
of memory consumption during matching vs recomputation of values.
This is an example of a situation where this is useful: We have two
operations with certain attributes that have interdependent constraints.
For instance `attr_foo: one_of [0, 2, 4, 8], attr_bar: one_of [0, 2, 4,
8]` and `attr_foo == attr_bar`. The pattern should only match if all
conditions are true. The new operation should be created with a new
attribute which is computed from the two matched attributes e.g.
`attr_baz = attr_foo * attr_bar`. For the check we already escape to
native C++ and have all values at hand so it makes sense to directly
compute the new attribute value as well:
```
Constraint checkAndCompute(attr0: Attr, attr1: Attr) -> Attr;
Pattern example with benefit(1) {
let foo = op<test.foo>() {attr = attr_foo : Attr};
let bar = op<test.bar>(foo) {attr = attr_bar : Attr};
let attr_baz = checkAndCompute(attr_foo, attr_bar);
rewrite bar with {
let baz = op<test.baz> {attr=attr_baz};
replace bar with baz;
};
}
```
To achieve this the following notable changes were necessary:
PDLL:
- Remove check in PDLL parser that prevented native constraints from
returning results
PDL:
- Change PDL definition of pdl.apply_native_constraint to allow variadic
results
PDL_interp:
- Change PDL_interp definition of pdl_interp.apply_constraint to allow
variadic results
PDLToPDLInterp Pass:
The input to the pass is an arbitrary number of PDL patterns. The pass
collects the predicates that are required to match all of the pdl
patterns and establishes an ordering that allows creation of a single
efficient matcher function to match all of them. Values that are matched
and possibly used in the rewriting part of a pattern are represented as
positions. This allows fusion and thus reusing a single position for
multiple matching patterns. Accordingly, we introduce
ConstraintPosition, which records the type and index of the result of
the constraint. The problem is for the corresponding value to be used in
the rewriting part of a pattern it has to be an input to the
pdl_interp.record_match operation, which is generated early during the
pass such that its surrounding block can be referred to by branching
operations. In consequence the value has to be materialized after the
original pdl.apply_native_constraint has been deleted but before we get
the chance to generate the corresponding pdl_interp.apply_constraint
operation. We solve this by emitting a placeholder value when a
ConstraintPosition is evaluated. These placeholder values (due to fusion
there may be multiple for one constraint result) are replaced later when
the actual pdl_interp.apply_constraint operation is created.
Changes since the phabricator review:
- Addressed all comments
- In particular, removed registerConstraintFunctionWithResults and
instead changed registerConstraintFunction so that contraint functions
always have results (empty by default)
- Thus we don't need to reuse `rewriteFunctions` to store constraint
functions with results anymore, and can instead use
`constraintFunctions`
- Perform a stable sort of ConstraintQuestion, so that
ConstraintQuestion appear before other ConstraintQuestion that use their
results.
- Don't create placeholders for pdl_interp::ApplyConstraintOp. Instead
generate the `pdl_interp::ApplyConstraintOp` before generating the
successor block.
- Fixed a test failure in the pdl python bindings
Original code by @martin-luecke
Co-authored-by: martin-luecke <martinpaul.luecke@amd.com>
Expose the API for constructing and inspecting StructTypes from the LLVM
dialect. Separate constructor methods are used instead of overloads for
better readability, similarly to IntegerType.
…ct LevelType from LevelFormat and properties instead.
**Rationale**
We used to explicitly declare every possible combination between
`LevelFormat` and `LevelProperties`, and it now becomes difficult to
scale as more properties/level formats are going to be introduced.
1. Add python test for n out of m
2. Add more methods for python binding
3. Add verification for n:m and invalid encoding tests
4. Add e2e test for n:m
Previous PRs for n:m #80501#79935
1. C++ enum is set through enum class LevelType : uint_64.
2. C enum is set through typedef uint_64 level_type. It is due to the
limitations in Windows build: setting enum width to ui64 is not
supported in C.
The current implementation of `nvvm.wgmma.mma_async` Op deduces the data
type of the output matrix from the data type of struct member, which can be
non-intuitive, especially in cases where types like `2xf16` are packed
into `i32`.
This PR addresses this issue by improving the Op to include an explicit
data type for the output matrix.
The modified Op now includes an explicit data type for Matrix-D (<f16>),
and looks as follows:
```
%result = llvm.mlir.undef : !llvm.struct<(struct<(i32, i32, ...
nvvm.wgmma.mma_async
%descA, %descB, %result,
#nvvm.shape<m = 64, n = 32, k = 16>,
D [<f16>, #nvvm.wgmma_scale_out<zero>],
A [<f16>, #nvvm.wgmma_scale_in<neg>, <col>],
B [<f16>, #nvvm.wgmma_scale_in<neg>, <col>]
```
Add overflow flags support to the following ops:
* `arith.addi`
* `arith.subi`
* `arith.muli`
Example of new syntax:
```
%res = arith.addi %arg1, %arg2 overflow<nsw> : i64
```
Similar to existing LLVM dialect syntax
```
%res = llvm.add %arg1, %arg2 overflow<nsw> : i64
```
Tablegen canonicalization patterns updated to always drop flags, proper
support with tests will be added later.
Updated LLVMIR translation as part of this commit as it currenly written
in a way that it will crash when new attributes added to arith ops
otherwise.
Also lower `arith` overflow flags to corresponding SPIR-V op decorations
Discussion
https://discourse.llvm.org/t/rfc-integer-overflow-flags-support-in-arith-dialect/76025
This effectively rolls forward #77211, #77700 and #77714 while adding a
test to ensure the Python usage is not broken. More follow up needed but
unrelated to the core change here. The changes here are minimal and just
correspond to "textual namespacing" ODS side, no C++ or Python changes
were needed.
---------
---------
Co-authored-by: Ivan Butygin <ivan.butygin@gmail.com>, Yi Wu <yi.wu2@arm.com>
In addition to the existing `OpHandle` which provides an abstraction to
emit transform ops targeting operations this introduces a similar
concept for _values_ and _parameters_ in form of `ValueHandle` and
`ParamHandle`.
New core transform abstractions:
- `constant_param`
- `OpHandle.get_result`
- `OpHandle.print`
- `ValueHandle.get_defining_op`
This adds Python abstractions for the different handle types of the
transform dialect
The abstractions allow for straightforward chaining of transforms by
calling their member functions.
As an initial PR for this infrastructure, only a single transform is
included: `transform.structured.match`.
With a future `tile` transform abstraction an example of the usage is:
```Python
def script(module: OpHandle):
module.match_ops(MatchInterfaceEnum.TilingInterface).tile(tile_sizes=[32,32])
```
to generate the following IR:
```mlir
%0 = transform.structured.match interface{TilingInterface} in %arg0
%tiled_op, %loops = transform.structured.tile_using_for %0 [32, 32]
```
These abstractions are intended to enhance the usability and flexibility
of the transform dialect by providing an accessible interface that
allows for easy assembly of complex transformation chains.
The "Dim" prefix is a legacy left-over that no longer makes sense, since
we have a very strict "Dimension" vs. "Level" definition for sparse
tensor types and their storage.