This PR https://github.com/llvm/llvm-project/pull/123902 broke python
bindings for `tensor.pack`/`unpack`. This PR fixes that. It also
1. adds convenience wrappers for pack/unpack
2. cleans up matmul-like ops in the linalg bindings
3. fixes linalg docs missing pack/unpack
As linalg.batch_matmul has been moved into tablegen from OpDSL, its
derived python wrapper no longer exist.This patch adds the required
python wrapper.
Also refactors the BatchmatmulOp printer to make it consistent with its
parser.
Implement the feature about perf by stage(llvm-ir -> isa, isa->binary).
The results will be stored into the properties, then users can use them
after using GpuModuleToBinary Pass.
Now that linalg.matmul is in tablegen, "hand write" the Python wrapper
that OpDSL used to derive. Similarly, add a Python wrapper for the new
linalg.contract op.
Required following misc. fixes:
1) make linalg.matmul's parsing and printing consistent w.r.t. whether
indexing_maps occurs before or after operands, i.e. per the tests cases
it comes _before_.
2) tablegen for linalg.contract did not state it accepted an optional
cast attr.
3) In ODS's C++-generating code, expand partial support for `$_builder`
access in `Attr::defaultValue` to full support. This enables access to
the current `MlirContext` when constructing the default value (as is
required when the default value consists of affine maps).
Model the `IndexType` as `uint64_t` when converting to a python integer.
With the python bindings,
```python
DenseIntElementsAttr(op.attributes["attr"])
```
used to `assert` when `attr` had `index` type like `dense<[1, 2, 3, 4]>
: vector<4xindex>`.
---------
Co-authored-by: Christopher McGirr <christopher.mcgirr@amd.com>
Co-authored-by: Tiago Trevisan Jost <tiago.trevisanjost@amd.com>
This is a companion to #118583, although it can be landed independently
because since #117922 dialects do not have to use the same Python
binding framework as the Python core code.
This PR ports all of the in-tree dialect and pass extensions to
nanobind, with the exception of those that remain for testing pybind11
support.
This PR also:
* removes CollectDiagnosticsToStringScope from NanobindAdaptors.h. This
was overlooked in a previous PR and it is duplicated in Diagnostics.h.
---------
Co-authored-by: Jacques Pienaar <jpienaar@google.com>
This PR allows out-of-tree dialects to write Python dialect modules
using nanobind instead of pybind11.
It may make sense to migrate in-tree dialects and some of the ODS Python
infrastructure to nanobind, but that is a topic for a future change.
This PR makes the following changes:
* adds nanobind to the CMake and Bazel build systems. We also add
robin_map to the Bazel build, which is a dependency of nanobind.
* adds a PYTHON_BINDING_LIBRARY option to various CMake functions, such
as declare_mlir_python_extension, allowing users to select a Python
binding library.
* creates a fork of mlir/include/mlir/Bindings/Python/PybindAdaptors.h
named NanobindAdaptors.h. This plays the same role, using nanobind
instead of pybind11.
* splits CollectDiagnosticsToStringScope out of PybindAdaptors.h and
into a new header mlir/include/mlir/Bindings/Python/Diagnostics.h, since
it is code that is no way related to pybind11 or for that matter,
Python.
* changed the standalone Python extension example to have both pybind11
and nanobind variants.
* changed mlir/python/mlir/dialects/python_test.py to have both pybind11
and nanobind variants.
Notes:
* A slightly unfortunate thing that I needed to do in the CMake
integration was to use FindPython in addition to FindPython3, since
nanobind's CMake integration expects the Python_ names for variables.
Perhaps there's a better way to do this.
FuncOps can have `arg_attrs`, an array of dictionary attributes
associated with their arguments.
E.g.,
```mlir
func.func @main(%arg0: tensor<8xf32> {test.attr_name = "value"}, %arg1: tensor<8x16xf32>)
```
These are exposed via the MLIR Python bindings with
`my_funcop.arg_attrs`.
In this case, it would return `[{test.attr_name = "value"}, {}]`, i.e.,
`%arg1` has an empty `DictAttr`.
However, if I try and access this property from a FuncOp with an empty
`arg_attrs`, e.g.,
```mlir
func.func @main(%arg0: tensor<8xf32>, %arg1: tensor<8x16xf32>)
```
This raises the error:
```python
return ArrayAttr(self.attributes[ARGUMENT_ATTRIBUTE_NAME])
~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^
KeyError: 'attempt to access a non-existent attribute'
```
This PR fixes this by returning the expected `[{}, {}]`.
The zero points of UniformQuantizedPerAxisType should be List[int].
And there are two methods missing return value.
Co-authored-by: 牛奕博 <niuyibo@niuyibodeMacBook-Pro.local>
The affine.delinearize_index and affine.linearize_index operations, as
currently defined, require providing a length N basis to [de]linearize N
values. The first value in this basis is never used during lowering and
is unused during lowering. (Note that, even though it isn't used during
lowering it can still be used to, for example, remove length-1 outputs
from a delinearize).
This dead value makes sense in the original context of these operations,
which is linearizing or de-linearizing indexes to memref<>s, vector<>s,
and other shaped types, where that outer bound is avaliable and may be
useful for analysis.
However, other usecases exist where the outer bound is not known. For
example:
%thread_id_x = gpu.thread_id x : index
%0:3 = affine.delinearize_index %thread_id_x into (4, 16) : index,index,
index
In this code, we don't know the upper bound of the thread ID, but we do
want to construct the ?x4x16 grid of delinearized values in order to
further partition the GPU threads.
In order to support such usecases, we broaden the definition of
affine.delinearize_index and affine.linearize_index to make the outer
bound optional.
In the case of affine.delinearize_index, where the number of results is
a function of the size of the passed-in basis, we augment all existing
builders with a `hasOuterBound` argument, which, for backwards
compatibilty and to preserve the natural usage of the op, defaults to
`true`. If this flag is true, the op returns one result per basis
element, if it is false, it returns one extra result in position 0.
We also update existing canonicalization patterns (and move one of them
into the folder) to handle these cases. Note that disagreements about
the outer bound now no longer prevent delinearize/linearize
cancelations.
Fix the AffineIfOp's default builder such that it takes in an
IntegerSetAttr. AffineIfOp has skipDefaultBuilders=1 which effectively
skips the creation of the default AffineIfOp::builder on the C++ side.
(AffineIfOp has two custom OpBuilder defined in the
extraClassDeclaration.) However, on the python side, _affine_ops_gen.py
shows that the default builder is being created, but it does not accept
IntegerSet and thus is useless. This fix at line 411 makes the default
python AffineIfOp builder take in an IntegerSet input and does not
impact the C++ side of things.
The earlier PR(https://github.com/llvm/llvm-project/pull/104783) which
introduces
transpose and broadcast semantic to linalg.matmul was reverted due to
two failing
OpDSL test for linalg.matmul.
Since linalg.matmul is now defined using TableGen ODS instead of
Python-based OpDSL,
these test started failing and needs to be removed/updated.
This commit removes/updates the failing obsolete tests from below files.
All other files
were part of earlier PR and just cherry picked.
"mlir/test/python/integration/dialects/linalg/opsrun.py"
"mlir/test/python/integration/dialects/transform.py"
---------
Co-authored-by: Renato Golin <rengolin@systemcall.eu>
This commit makes `affine.delinealize` join other indexing operators,
like `vector.extract`, which store a mixed static/dynamic set of sizes,
offsets, or such. In this case, the `basis` (the set of values that will
be used to decompose the linear index) is now stored as an array of
index attributes where the basis is statically known, eliminating the
need to cretae constants.
This commit also adds copies of the delinearize utility in the affine
dialect to allow it to take an array of `OpFoldResult`s and extends te
DynamicIndexList parser/printer to allow specifying the delimiters in
tablegen (this is needed to avoid breaking existing syntax).
---------
Co-authored-by: Jakub Kuderski <kubakuderski@gmail.com>
The pack_paddings attribute in the structure.pad TD Op is used to set
the `nofold` attribute in the generated tensor.pad Op. The current name
is confusing and suggests that there's a relation with the tensor.pack
Op. This patch renames it as `nofold_flags` to better match the actual
usage.
The main goal of this patch is to extend the semantic of 'linalg.matmul'
named op to include per operand transpose semantic while also laying out
a way to move ops definition from OpDSL to tablegen. Hence, it is
implemented in tablegen. Transpose semantic is as follows.
By default 'linalg.matmul' behavior will remain as is. Transpose
semantics can be appiled on per input operand by specifying the optional
permutation attributes (namely 'permutationA' for 1st input and
'permutationB' for 2nd input) for each operand explicitly as needed. By
default, no transpose is mandated for any of the input operand.
Example:
```
%val = linalg.matmul ins(%arg0, %arg1 : memref<5x3xf32>,
memref<5x7xf32>)
outs(%arg2: memref<3x7xf32>)
permutationA = [1, 0]
permutationB = [0, 1]
```
Hi @xurui1995 @makslevental,
I think in https://github.com/llvm/llvm-project/pull/103087 there's
unintended regression where user can no longer create sparse tensors
with `tensor.empty`.
Previously I could pass:
```python
out = tensor.empty(tensor_type, [])
```
where `tensor_type` contained `shape`, `dtype`, and `encoding`.
With the latest
```python
tensor.empty(sizes: Sequence[Union[int, Value]], element_type: Type, *, loc=None, ip=None)
```
it's no longer possible.
I propose to add `encoding` argument which is passed to
`RankedTensorType.get(static_sizes, element_type, encoding)` (I updated
one of the tests to check it).
Without this fix, `scf.if` operations would be created without a parent.
Since `scf.if` operations often have no results, this caused silent bugs
where the generated code was straight-up missing the operation.
As reported in https://github.com/llvm/llvm-project/issues/101132, this
fixes two bugs:
1. When accessing variadic operands inside an operation, it must be
accessed as `self.operation.operands` instead of `operation.operands`
2. The implementation of the `equally_sized_accessor` function is doing
wrong arithmetics when calculating the resulting index and group sizes.
I have added a test for the `equally_sized_accessor` function, which did
not have a test previously.
This patch adds the `#gpu.kernel_metadata` and `#gpu.kernel_table`
attributes. The `#gpu.kernel_metadata` attribute allows storing metadata
related to a compiled kernel, for example, the number of scalar
registers used by the kernel. The attribute only has 2 required
parameters, the name and function type. It also has 2 optional
parameters, the arguments attributes and generic dictionary for storing
all other metadata.
The `#gpu.kernel_table` stores a table of `#gpu.kernel_metadata`,
mapping the name of the kernel to the metadata.
Finally, the function `ROCDL::getAMDHSAKernelsELFMetadata` was added to
collect ELF metadata from a binary, and to test the class methods in
both attributes.
Example:
```mlir
gpu.binary @binary [#gpu.object<#rocdl.target<chip = "gfx900">, kernels = #gpu.kernel_table<[
#gpu.kernel_metadata<"kernel0", (i32) -> (), metadata = {sgpr_count = 255}>,
#gpu.kernel_metadata<"kernel1", (i32, f32) -> (), arg_attrs = [{llvm.read_only}, {}]>
]> , bin = "BLOB">]
```
The motivation behind these attributes is to provide useful information
for things like tunning.
---------
Co-authored-by: Mehdi Amini <joker.eph@gmail.com>
Since we have extended `EmptyOp`, maybe we should also provide a
corresponding `tensor.empty` method. In the downstream usage, I tend to
use APIs with all lowercase letters to create ops, so having a
`tensor.empty` to replace the extended `tensor.EmptyOp` would keep my
code style consistent.
At the moment, the in_bounds attribute has two confusing/contradicting
properties:
1. It is both optional _and_ has an effective default-value.
2. The default value is "out-of-bounds" for non-broadcast dims, and
"in-bounds" for broadcast dims.
(see the `isDimInBounds` vector interface method for an example of this
"default" behaviour [1]).
This PR aims to clarify the logic surrounding the `in_bounds` attribute
by:
* making the attribute mandatory (i.e. it is always present),
* always setting the default value to "out of bounds" (that's
consistent with the current behaviour for the most common cases).
#### Broadcast dimensions in tests
As per [2], the broadcast dimensions requires the corresponding
`in_bounds` attribute to be `true`:
```
vector.transfer_read op requires broadcast dimensions to be in-bounds
```
The changes in this PR mean that we can no longer rely on the
default value in cases like the following (dim 0 is a broadcast dim):
```mlir
%read = vector.transfer_read %A[%base1, %base2], %f, %mask
{permutation_map = affine_map<(d0, d1) -> (0, d1)>} :
memref<?x?xf32>, vector<4x9xf32>
```
Instead, the broadcast dimension has to explicitly be marked as "in
bounds:
```mlir
%read = vector.transfer_read %A[%base1, %base2], %f, %mask
{in_bounds = [true, false], permutation_map = affine_map<(d0, d1) -> (0, d1)>} :
memref<?x?xf32>, vector<4x9xf32>
```
All tests with broadcast dims are updated accordingly.
#### Changes in "SuperVectorize.cpp" and "Vectorization.cpp"
The following patterns in "Vectorization.cpp" are updated to explicitly
set the `in_bounds` attribute to `false`:
* `LinalgCopyVTRForwardingPattern` and `LinalgCopyVTWForwardingPattern`
Also, `vectorizeAffineLoad` (from "SuperVectorize.cpp") and
`vectorizeAsLinalgGeneric` (from "Vectorization.cpp") are updated to
make sure that xfer Ops created by these hooks set the dimension
corresponding to broadcast dims as "in bounds". Otherwise, the Op
verifier would complain
Note that there is no mechanism to verify whether the corresponding
memory access are indeed in bounds. Still, this is consistent with the
current behaviour where the broadcast dim would be implicitly assumed
to be "in bounds".
[1]
4145ad2bac/mlir/include/mlir/Interfaces/VectorInterfaces.td (L243-L246)
[2]
https://mlir.llvm.org/docs/Dialects/Vector/#vectortransfer_read-vectortransferreadop
The following logic can lead to a class name mismatch when using
`linalg.powf` in Python. This PR fixed the issue and also renamed
`NegfOp` to `NegFOp` in linalg to adhere to the naming convention, as
exemplified by `arith::NegFOp`.
173514d58e/mlir/python/mlir/dialects/linalg/opdsl/lang/dsl.py (L140-L143)
```
# linalg.powf(arg0, arg1, outs=[init_result.result])
NotImplementedError: Unknown named op_name / op_class_name: powf / PowfOp
```
This patch enables continuous tiling of a target structured op using
diminishing tile sizes. In cases where the tensor dimensions are not
exactly divisible by the tile size, we are left with leftover tensor
chunks that are irregularly tiled. This approach enables tiling of the
leftover chunk with a smaller tile size and repeats this process
recursively using exponentially diminishing tile sizes. This eventually
generates a chain of loops that apply tiling using diminishing tile
sizes.
Adds `continuous_tile_sizes` op to the transform dialect. This op, when
given a tile size and a dimension, computes a series of diminishing tile
sizes that can be used to tile the target along the given dimension.
Additionally, this op also generates a series of chunk sizes that the
corresponding tile sizes should be applied to along the given dimension.
Adds `multiway` attribute to `transform.structured.split` that enables
multiway splitting of a single target op along the given dimension, as
specified in a list enumerating the chunk sizes.
Using `for_` is very hand with python bindings. Currently, it doesn't
support results, we had to fallback to two lines scf.for.
This PR yields results of scf.for in `for_`
---------
Co-authored-by: Maksim Levental <maksim.levental@gmail.com>
This patch is a first pass at making consistent syntax across the
`LinalgTransformOp`s that use dynamic index lists for size parameters.
Previously, there were two different forms: inline types in the list, or
place them in the functional style tuple. This patch goes for the
latter.
In order to do this, the `printPackedOrDynamicIndexList`,
`printDynamicIndexList` and their `parse` counterparts were modified so
that the types can be optionally provided to the corresponding custom
directives.
All affected ops now use tablegen `assemblyFormat`, so custom
`parse`/`print` functions have been removed. There are a couple ops that
will likely add dynamic size support, and once that happens it should be
made sure that the assembly remains consistent with the changes in this
patch.
The affected ops are as follows: `pack`, `pack_greedily`,
`tile_using_forall`. The `tile_using_for` and `vectorize` ops already
used this syntax, but their custom assembly was removed.
---------
Co-authored-by: Oleksandr "Alex" Zinenko <ftynse@gmail.com>
This patch modifies the definition of `PadOp` to take transform params
and handles for the `pad_to_multiple_of` operand.
---------
Co-authored-by: Oleksandr "Alex" Zinenko <ftynse@gmail.com>
1. Explicit value means the non-zero value in a sparse tensor. If
explicitVal is set, then all the non-zero values in the tensor have the
same explicit value. The default value Attribute() indicates that it is
not set.
2. Implicit value means the "zero" value in a sparse tensor. If
implicitVal is set, then the "zero" value in the tensor is equal to the
implicit value. For now, we only support `0` as the implicit value but
it could be extended in the future. The default value Attribute()
indicates that the implicit value is `0` (same type as the tensor
element type).
Example:
```
#CSR = #sparse_tensor.encoding<{
map = (d0, d1) -> (d0 : dense, d1 : compressed),
posWidth = 64,
crdWidth = 64,
explicitVal = 1 : i64,
implicitVal = 0 : i64
}>
```
Note: this PR tests that implicitVal could be set to other values as
well. The following PR will add verifier and reject any value that's not
zero for implicitVal.
The Python bindings generated for "async" dialect didn't include any of
the "async" dialect ops. This PR fixes issues with generation of Python
bindings for "async" dialect and adds a test case to use them.
Arithmetic constants for vector types can be constructed from objects
implementing Python buffer protocol such as `array.array`. Note that
until Python 3.12, there is no typing support for buffer protocol
implementers, so the annotations use array explicitly.
Reverts llvm/llvm-project#84103
Arithmetic constants for vector types can be constructed from objects
implementing Python buffer protocol such as `array.array`. Note that
until Python 3.12, there is no typing support for buffer protocol
implementers, so the annotations use array explicitly.