For extremely large models, it may be inefficient to load the model into
memory in Python prior to passing it to the MLIR C APIs for
deserialization. This change adds an API to parse a ModuleOp directly
from a file path.
Re-lands
[4e14b8a](4e14b8afb4).
Implement the feature about perf by stage(llvm-ir -> isa, isa->binary).
The results will be stored into the properties, then users can use them
after using GpuModuleToBinary Pass.
Now that linalg.matmul is in tablegen, "hand write" the Python wrapper
that OpDSL used to derive. Similarly, add a Python wrapper for the new
linalg.contract op.
Required following misc. fixes:
1) make linalg.matmul's parsing and printing consistent w.r.t. whether
indexing_maps occurs before or after operands, i.e. per the tests cases
it comes _before_.
2) tablegen for linalg.contract did not state it accepted an optional
cast attr.
3) In ODS's C++-generating code, expand partial support for `$_builder`
access in `Attr::defaultValue` to full support. This enables access to
the current `MlirContext` when constructing the default value (as is
required when the default value consists of affine maps).
For extremely large models, it may be inefficient to load the model into
memory in Python prior to passing it to the MLIR C APIs for
deserialization. This change adds an API to parse a ModuleOp directly
from a file path.
If the large element limit is specified, large elements are hidden from
the asm but large resources are not. This change extends the large
elements limit to apply to printed resources as well.
Model the `IndexType` as `uint64_t` when converting to a python integer.
With the python bindings,
```python
DenseIntElementsAttr(op.attributes["attr"])
```
used to `assert` when `attr` had `index` type like `dense<[1, 2, 3, 4]>
: vector<4xindex>`.
---------
Co-authored-by: Christopher McGirr <christopher.mcgirr@amd.com>
Co-authored-by: Tiago Trevisan Jost <tiago.trevisanjost@amd.com>
Use `mlir_target_link_libraries()` to link dependencies of libraries
that are not included in libMLIR, to ensure that they link to the dylib
when they are used in Flang. Otherwise, they implicitly pull in all
their static dependencies, effectively causing Flang binaries to
simultaneously link to the dylib and to static libraries, which is never
a good idea.
I have only covered the libraries that are used by Flang. If you wish, I
can extend this approach to all non-libMLIR libraries in MLIR, making
MLIR itself also link to the dylib consistently.
[v3 with more `-DBUILD_SHARED_LIBS=ON` fixes]
Use `mlir_target_link_libraries()` to link dependencies of libraries
that are not included in libMLIR, to ensure that they link to the dylib
when they are used in Flang. Otherwise, they implicitly pull in all
their static dependencies, effectively causing Flang binaries to
simultaneously link to the dylib and to static libraries, which is never
a good idea.
I have only covered the libraries that are used by Flang. If you wish, I
can extend this approach to all non-libMLIR libraries in MLIR, making
MLIR itself also link to the dylib consistently.
[v2 with fixed `-DBUILD_SHARED_LIBS=ON` build]
Use `mlir_target_link_libraries()` to link dependencies of libraries
that are not included in libMLIR, to ensure that they link to the dylib
when they are used in Flang. Otherwise, they implicitly pull in all
their static dependencies, effectively causing Flang binaries to
simultaneously link to the dylib and to static libraries, which is never
a good idea.
I have only covered the libraries that are used by Flang. If you wish, I
can extend this approach to all non-libMLIR libraries in MLIR, making
MLIR itself also link to the dylib consistently.
In order to optionally run some checks that depend on the `ml_dtypes`
python module we have to remove the `CHECK` lines for those tests or
they will be required and missed in the test output.
I've changed to use asserts as recommended in [1].
[1]:
https://github.com/llvm/llvm-project/pull/123061#issuecomment-2596116023
We noticed that `mlir/python/requirements.txt` lists `ml_dtypes` as a requirement but when looking at the code in `mlir/python`, the only `import` is guarded:
```python
try:
import ml_dtypes
except ModuleNotFoundError:
# The third-party ml_dtypes provides some optional low precision data-types for NumPy.
ml_dtypes = None
```
This makes `ml_dtypes` an optional dependency.
Some python tests however partially depend on `ml_dtypes` and should not run if that module is unavailable. That is what this change does.
This is a replacement for #123051 which was excluding tests too broadly.
Gives option post as global list as well as arg to control which
dialects are loaded during context creation. This enables setting either
a good base set or skipping in individual cases.
This is a companion to #118583, although it can be landed independently
because since #117922 dialects do not have to use the same Python
binding framework as the Python core code.
This PR ports all of the in-tree dialect and pass extensions to
nanobind, with the exception of those that remain for testing pybind11
support.
This PR also:
* removes CollectDiagnosticsToStringScope from NanobindAdaptors.h. This
was overlooked in a previous PR and it is duplicated in Diagnostics.h.
---------
Co-authored-by: Jacques Pienaar <jpienaar@google.com>
Do not run `cf-to-llvm` as part of `func-to-llvm`. This commit fixes
https://github.com/llvm/llvm-project/issues/70982.
This commit changes the way how `func.func` ops are lowered to LLVM.
Previously, the signature of the entire region (i.e., entry block and
all other blocks in the `func.func` op) was converted as part of the
`func.func` lowering pattern.
Now, only the entry block is converted. The remaining block signatures
are converted together with `cf.br` and `cf.cond_br` as part of
`cf-to-llvm`. All unstructured control flow is not converted as part of
a single pass (`cf-to-llvm`). `func-to-llvm` no longer deals with
unstructured control flow.
Also add more test cases for control flow dialect ops.
Note: This PR is in preparation of #120431, which adds an additional
GPU-specific lowering for `cf.assert`. This was a problem because
`cf.assert` used to be converted as part of `func-to-llvm`.
Note for LLVM integration: If you see failures, add
`-convert-cf-to-llvm` to your pass pipeline.
Do not run `arith-to-llvm` as part of `func-to-llvm`. This commit partly
fixes#70982.
Also simplify the pass pipeline for two math dialect integration tests.
Note for LLVM integration: If you see failures, add `arith-to-llvm` to your pass pipeline.
Relands #118583, with a fix for Python 3.8 compatibility. It was not
possible to set the buffer protocol accessers via slots in Python 3.8.
Why? https://nanobind.readthedocs.io/en/latest/why.html says it better
than I can, but my primary motivation for this change is to improve MLIR
IR construction time from JAX.
For a complicated Google-internal LLM model in JAX, this change improves
the MLIR
lowering time by around 5s (out of around 30s), which is a significant
speedup for simply switching binding frameworks.
To a large extent, this is a mechanical change, for instance changing
`pybind11::` to `nanobind::`.
Notes:
* this PR needs Nanobind 2.4.0, because it needs a bug fix
(https://github.com/wjakob/nanobind/pull/806) that landed in that
release.
* this PR does not port the in-tree dialect extension modules. They can
be ported in a future PR.
* I removed the py::sibling() annotations from def_static and def_class
in `PybindAdapters.h`. These ask pybind11 to try to form an overload
with an existing method, but it's not possible to form mixed
pybind11/nanobind overloads this ways and the parent class is now
defined in nanobind. Better solutions may be possible here.
* nanobind does not contain an exact equivalent of pybind11's buffer
protocol support. It was not hard to add a nanobind implementation of a
similar API.
* nanobind is pickier about casting to std::vector<bool>, expecting that
the input is a sequence of bool types, not truthy values. In a couple of
places I added code to support truthy values during casting.
* nanobind distinguishes bytes (`nb::bytes`) from strings (e.g.,
`std::string`). This required nb::bytes overloads in a few places.
Why? https://nanobind.readthedocs.io/en/latest/why.html says it better
than I can, but my primary motivation for this change is to improve MLIR
IR construction time from JAX.
For a complicated Google-internal LLM model in JAX, this change improves
the MLIR
lowering time by around 5s (out of around 30s), which is a significant
speedup for simply switching binding frameworks.
To a large extent, this is a mechanical change, for instance changing
`pybind11::`
to `nanobind::`.
Notes:
* this PR needs Nanobind 2.4.0, because it needs a bug fix
(https://github.com/wjakob/nanobind/pull/806) that landed in that
release.
* this PR does not port the in-tree dialect extension modules. They can
be ported in a future PR.
* I removed the py::sibling() annotations from def_static and def_class
in `PybindAdapters.h`. These ask pybind11 to try to form an overload
with an existing method, but it's not possible to form mixed
pybind11/nanobind overloads this ways and the parent class is now
defined in nanobind. Better solutions may be possible here.
* nanobind does not contain an exact equivalent of pybind11's buffer
protocol support. It was not hard to add a nanobind implementation of a
similar API.
* nanobind is pickier about casting to std::vector<bool>, expecting that
the input is a sequence of bool types, not truthy values. In a couple of
places I added code to support truthy values during casting.
* nanobind distinguishes bytes (`nb::bytes`) from strings (e.g.,
`std::string`). This required nb::bytes overloads in a few places.
This PR allows out-of-tree dialects to write Python dialect modules
using nanobind instead of pybind11.
It may make sense to migrate in-tree dialects and some of the ODS Python
infrastructure to nanobind, but that is a topic for a future change.
This PR makes the following changes:
* adds nanobind to the CMake and Bazel build systems. We also add
robin_map to the Bazel build, which is a dependency of nanobind.
* adds a PYTHON_BINDING_LIBRARY option to various CMake functions, such
as declare_mlir_python_extension, allowing users to select a Python
binding library.
* creates a fork of mlir/include/mlir/Bindings/Python/PybindAdaptors.h
named NanobindAdaptors.h. This plays the same role, using nanobind
instead of pybind11.
* splits CollectDiagnosticsToStringScope out of PybindAdaptors.h and
into a new header mlir/include/mlir/Bindings/Python/Diagnostics.h, since
it is code that is no way related to pybind11 or for that matter,
Python.
* changed the standalone Python extension example to have both pybind11
and nanobind variants.
* changed mlir/python/mlir/dialects/python_test.py to have both pybind11
and nanobind variants.
Notes:
* A slightly unfortunate thing that I needed to do in the CMake
integration was to use FindPython in addition to FindPython3, since
nanobind's CMake integration expects the Python_ names for variables.
Perhaps there's a better way to do this.
FuncOps can have `arg_attrs`, an array of dictionary attributes
associated with their arguments.
E.g.,
```mlir
func.func @main(%arg0: tensor<8xf32> {test.attr_name = "value"}, %arg1: tensor<8x16xf32>)
```
These are exposed via the MLIR Python bindings with
`my_funcop.arg_attrs`.
In this case, it would return `[{test.attr_name = "value"}, {}]`, i.e.,
`%arg1` has an empty `DictAttr`.
However, if I try and access this property from a FuncOp with an empty
`arg_attrs`, e.g.,
```mlir
func.func @main(%arg0: tensor<8xf32>, %arg1: tensor<8x16xf32>)
```
This raises the error:
```python
return ArrayAttr(self.attributes[ARGUMENT_ATTRIBUTE_NAME])
~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^
KeyError: 'attempt to access a non-existent attribute'
```
This PR fixes this by returning the expected `[{}, {}]`.
The zero points of UniformQuantizedPerAxisType should be List[int].
And there are two methods missing return value.
Co-authored-by: 牛奕博 <niuyibo@niuyibodeMacBook-Pro.local>
The affine.delinearize_index and affine.linearize_index operations, as
currently defined, require providing a length N basis to [de]linearize N
values. The first value in this basis is never used during lowering and
is unused during lowering. (Note that, even though it isn't used during
lowering it can still be used to, for example, remove length-1 outputs
from a delinearize).
This dead value makes sense in the original context of these operations,
which is linearizing or de-linearizing indexes to memref<>s, vector<>s,
and other shaped types, where that outer bound is avaliable and may be
useful for analysis.
However, other usecases exist where the outer bound is not known. For
example:
%thread_id_x = gpu.thread_id x : index
%0:3 = affine.delinearize_index %thread_id_x into (4, 16) : index,index,
index
In this code, we don't know the upper bound of the thread ID, but we do
want to construct the ?x4x16 grid of delinearized values in order to
further partition the GPU threads.
In order to support such usecases, we broaden the definition of
affine.delinearize_index and affine.linearize_index to make the outer
bound optional.
In the case of affine.delinearize_index, where the number of results is
a function of the size of the passed-in basis, we augment all existing
builders with a `hasOuterBound` argument, which, for backwards
compatibilty and to preserve the natural usage of the op, defaults to
`true`. If this flag is true, the op returns one result per basis
element, if it is false, it returns one extra result in position 0.
We also update existing canonicalization patterns (and move one of them
into the folder) to handle these cases. Note that disagreements about
the outer bound now no longer prevent delinearize/linearize
cancelations.
Fix the AffineIfOp's default builder such that it takes in an
IntegerSetAttr. AffineIfOp has skipDefaultBuilders=1 which effectively
skips the creation of the default AffineIfOp::builder on the C++ side.
(AffineIfOp has two custom OpBuilder defined in the
extraClassDeclaration.) However, on the python side, _affine_ops_gen.py
shows that the default builder is being created, but it does not accept
IntegerSet and thus is useless. This fix at line 411 makes the default
python AffineIfOp builder take in an IntegerSet input and does not
impact the C++ side of things.
This PR re-introduces the functionality of
https://github.com/llvm/llvm-project/pull/113064, which was reverted in
0a68171b3c
due to memory lifetime issues.
Notice that I was not able to re-produce the ASan results myself, so I
have not been able to verify that this PR really fixes the issue.
---
Currently it is unsupported to:
1. Convert a MlirAttribute with type i1 to a numpy array
2. Convert a boolean numpy array to a MlirAttribute
Currently the entire Python application violently crashes with a quite
poor error message https://github.com/pybind/pybind11/issues/3336
The complication handling these conversions, is that MlirAttribute
represent booleans as a bit-packed i1 type, whereas numpy represents
booleans as a byte array with 8 bit used per boolean.
This PR proposes the following approach:
1. When converting a i1 typed MlirAttribute to a numpy array, we can not
directly use the underlying raw data backing the MlirAttribute as a
buffer to Python, as done for other types. Instead, a copy of the data
is generated using numpy's unpackbits function, and the result is send
back to Python.
2. When constructing a MlirAttribute from a numpy array, first the
python data is read as a uint8_t to get it converted to the endianess
used internally in mlir. Then the booleans are bitpacked using numpy's
bitpack function, and the bitpacked array is saved as the MlirAttribute
representation.
The earlier PR(https://github.com/llvm/llvm-project/pull/104783) which
introduces
transpose and broadcast semantic to linalg.matmul was reverted due to
two failing
OpDSL test for linalg.matmul.
Since linalg.matmul is now defined using TableGen ODS instead of
Python-based OpDSL,
these test started failing and needs to be removed/updated.
This commit removes/updates the failing obsolete tests from below files.
All other files
were part of earlier PR and just cherry picked.
"mlir/test/python/integration/dialects/linalg/opsrun.py"
"mlir/test/python/integration/dialects/transform.py"
---------
Co-authored-by: Renato Golin <rengolin@systemcall.eu>
This commit makes `affine.delinealize` join other indexing operators,
like `vector.extract`, which store a mixed static/dynamic set of sizes,
offsets, or such. In this case, the `basis` (the set of values that will
be used to decompose the linear index) is now stored as an array of
index attributes where the basis is statically known, eliminating the
need to cretae constants.
This commit also adds copies of the delinearize utility in the affine
dialect to allow it to take an array of `OpFoldResult`s and extends te
DynamicIndexList parser/printer to allow specifying the delimiters in
tablegen (this is needed to avoid breaking existing syntax).
---------
Co-authored-by: Jakub Kuderski <kubakuderski@gmail.com>
Currently it is unsupported to:
1. Convert a `MlirAttribute` with type `i1` to a numpy array
2. Convert a boolean numpy array to a `MlirAttribute`
Currently the entire Python application violently crashes with a quite
poor error message https://github.com/pybind/pybind11/issues/3336
The complication handling these conversions, is that `MlirAttribute`
represent booleans as a bit-packed `i1` type, whereas numpy represents
booleans as a byte array with 8 bit used per boolean.
This PR proposes the following approach:
1. When converting a `i1` typed `MlirAttribute` to a numpy array, we can
not directly use the underlying raw data backing the `MlirAttribute` as
a buffer to Python, as done for other types. Instead, a copy of the data
is generated using numpy's unpackbits function, and the result is send
back to Python.
2. When constructing a `MlirAttribute` from a numpy array, first the
python data is read as a `uint8_t` to get it converted to the endianess
used internally in mlir. Then the booleans are bitpacked using numpy's
bitpack function, and the bitpacked array is saved as the
`MlirAttribute` representation.
Please note that I am not sure if this approach is the desired solution.
I'd appreciate any feedback.
The pack_paddings attribute in the structure.pad TD Op is used to set
the `nofold` attribute in the generated tensor.pad Op. The current name
is confusing and suggests that there's a relation with the tensor.pack
Op. This patch renames it as `nofold_flags` to better match the actual
usage.