Extend MLIR to LLVM lowering by adding support for `omp.wsloop` for
delayed privatization. This also refactors a few bit of code to isolate
the logic needed for `firstprivate` initialization in a shared util that
can be used across constructs that need it. The same is done for
`dealloc`
regions.
Parent PR: https://github.com/llvm/llvm-project/pull/118447. Only latest
commit is relevant for this PR.
This refactors the logic needed to emit init logic for reductions by
moving some duplicated code into a shared util. The logic for doing is
quite involved and is needed for any construct that has reductions.
Moreover, when a construct has both private and reduction clauses, both
sets of clauses need to cooperate with each other when emitting the
logic needed for allocation and initialization. Therefore, this PR
clearly sets the boundaries for the logic needed to initialize
reductions.
As a follow-on for #87986, moves the VectorType convenience wrappers
(`FixedVectorType` and `ScalableVectorType`) to BuiltinTypes.h. This
allows us to use the new wrappers in "CommonTypeConstraints.td".
The current implementation of lowering to llvm for vector.extract
incorrectly assumes that if the number of indices is zero, the operation
can be folded away. This PR removes this condition and relies on the
folder to do it instead.
This PR also unifies the logic for scalar extracts and slice extracts,
which as a side effect also enables vector.extract lowering for n-d
vector.extract with dynamic inner most dimension. (This was only
prevented by a conservative check in the old implementation)
Unsigned integer types are uncommon enough in MLIR that there is no
operation to cast a scalar from signless to unsigned and vice versa.
Currently tosa.rescale uses builtin.unrealized_conversion_cast which
does not lower. Instead, this commit introduces optional attributes to
indicate unsigned input or output, named similarly to those in the TOSA
specification. This is more in line with the rest of MLIR where specific
operations rather than values are signed/unsigned.
If using a variadic operand, the error message given if the number of
types and operands do not match would be along the lines of:
```
3 operands present, but expected 2
```
This error message is confusing for multiple reasons, particular for
beginners:
* If the intention is to have 3 operands, it does not point out why it
expects 2. The user may actually just want to add a type to the type
list
* It reads as if a verifier error rather than a parser error, giving the
impression the Op only supports 2 operands.
This PR attempts to improve the error message by first noting the issue
("number of operands and types mismatch") and mentioning how many
operands and types it received.
This PR fixes a bug in `RemoveDeadValues` where the
`FunctionOpInterface` does not have the `isDeclaration` method. As a
result, we should use the `isExternal` method instead. Fixes#116347.
The extract <-> shape cast folder was conservatively asserting and
failing on 0-d vectors. This pr fixes this.
This pr also adds more tests for 0d cases and updates related tests to
better reflect what they test.
This PR allows out-of-tree dialects to write Python dialect modules
using nanobind instead of pybind11.
It may make sense to migrate in-tree dialects and some of the ODS Python
infrastructure to nanobind, but that is a topic for a future change.
This PR makes the following changes:
* adds nanobind to the CMake and Bazel build systems. We also add
robin_map to the Bazel build, which is a dependency of nanobind.
* adds a PYTHON_BINDING_LIBRARY option to various CMake functions, such
as declare_mlir_python_extension, allowing users to select a Python
binding library.
* creates a fork of mlir/include/mlir/Bindings/Python/PybindAdaptors.h
named NanobindAdaptors.h. This plays the same role, using nanobind
instead of pybind11.
* splits CollectDiagnosticsToStringScope out of PybindAdaptors.h and
into a new header mlir/include/mlir/Bindings/Python/Diagnostics.h, since
it is code that is no way related to pybind11 or for that matter,
Python.
* changed the standalone Python extension example to have both pybind11
and nanobind variants.
* changed mlir/python/mlir/dialects/python_test.py to have both pybind11
and nanobind variants.
Notes:
* A slightly unfortunate thing that I needed to do in the CMake
integration was to use FindPython in addition to FindPython3, since
nanobind's CMake integration expects the Python_ names for variables.
Perhaps there's a better way to do this.
This change doesn't introduce any functional differences but aligns the
implementation more closely with LLVM's representation. Previously, the
code generated a lookup table to map MLIR enums to LLVM enums due to the
lack of one-to-one correspondence. With this refactoring, the generated
code now casts directly from one enum to another.
This PR adds two small convenience Vector types:
* `ScalableVectorType` and `FixedVectorType`.
The goal of these new types is two-fold:
* Enable idiomatic checks like `isa<ScalableVectorType>(...)`.
* Make the split into "Scalable" and "Fixed-wdith" vectors a bit more
explicit and more visible in the code-base.
The new types are added in mlir/include/mlir/IR (instead of e.g.
mlir/include/mlir/Dialect/Vector) so that the new types can be used
without requiring any new dependency (e.g. on the Vector dialect).
`DecomposeCallGraphTypes.cpp` was a workaround around missing 1:N
support in the dialect conversion. Now that 1:N support was added, the
workaround can be deleted. The test remains in place, as an example for
how to write such a transformation with the dialect conversion
framework.
Note for LLVM integration: If you are using
`DecomposeCallGraphTypes.cpp`, switch to the patterns that are used in
`TestDecomposeCallGraphTypes.cpp`.
FuncOps can have `arg_attrs`, an array of dictionary attributes
associated with their arguments.
E.g.,
```mlir
func.func @main(%arg0: tensor<8xf32> {test.attr_name = "value"}, %arg1: tensor<8x16xf32>)
```
These are exposed via the MLIR Python bindings with
`my_funcop.arg_attrs`.
In this case, it would return `[{test.attr_name = "value"}, {}]`, i.e.,
`%arg1` has an empty `DictAttr`.
However, if I try and access this property from a FuncOp with an empty
`arg_attrs`, e.g.,
```mlir
func.func @main(%arg0: tensor<8xf32>, %arg1: tensor<8x16xf32>)
```
This raises the error:
```python
return ArrayAttr(self.attributes[ARGUMENT_ATTRIBUTE_NAME])
~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^
KeyError: 'attempt to access a non-existent attribute'
```
This PR fixes this by returning the expected `[{}, {}]`.
If expand(collapse) has a dimension that gets collapsed and then
expanded to the same shape, the pattern would fail to canonicalize this
to a single collapse shape. Line 341 was changed because the
expand(collapse) could be a reinterpret-cast like sequence where the
shapes differ but the rank is the same. This cannot be represented by a
single `collapse_shape` op.
Signed-off-by: Ian Wood <ianwood2024@u.northwestern.edu>
Adds a new test for invalid `tensor.unpack` operations where the output
rank does not match the expected rank (input rank + num inner tile
sizes). For example:
```mlir
tensor.unpack %output
inner_dims_pos = [0, 1]
inner_tiles = [32, 16]
into %input : tensor<64x32x16xf32> -> tensor<256x128xf32>
```
In addition, updates the corresponding error message to make it more
informative:
BEFORE:
```mlir
error: packed rank must equal unpacked rank + tiling factors}
```
AFTER:
```mlir
error: packed rank != (unpacked rank + num tiling factors), got 3 != 4
```
-- Fixed a typo of member variable static_basis of
AffineDelinearizeIndexOp operation in AffineOps.td
Signed-off: Abdul Raheem Beigh <abdulraheembeigh@gmail.com>
This commit adds a new `matchAndRewrite` overload to `ConversionPattern`
to support 1:N replacements. This is the first of two main PRs that
merge the 1:1 and 1:N dialect conversion drivers.
The existing `matchAndRewrite` function supports only 1:1 replacements,
as can be seen from the `ArrayRef<Value>` parameter.
```c++
LogicalResult ConversionPattern::matchAndRewrite(
Operation *op, ArrayRef<Value> operands /*adaptor values*/,
ConversionPatternRewriter &rewriter) const;
```
This commit adds a `matchAndRewrite` overload that is called by the
dialect conversion driver. By default, this new overload dispatches to
the original 1:1 `matchAndRewrite` implementation. Existing
`ConversionPattern`s do not need to be changed as long as there are no
1:N type conversions or value replacements.
```c++
LogicalResult ConversionPattern::matchAndRewrite(
Operation *op, ArrayRef<ValueRange> operands /*adaptor values*/,
ConversionPatternRewriter &rewriter) const {
// Note: getOneToOneAdaptorOperands produces a fatal error if at least one
// ValueRange has 0 or more than 1 value.
return matchAndRewrite(op, getOneToOneAdaptorOperands(operands), rewriter);
}
```
The `ConversionValueMapping`, which keeps track of value replacements
and materializations, still does not support 1:N replacements. We still
rely on argument materializations to convert N replacement values back
into a single value. The `ConversionValueMapping` will be generalized to
1:N mappings in the second main PR.
Before handing the adaptor values to a `ConversionPattern`, all argument
materializations are "unpacked". The `ConversionPattern` receives N
replacement values and does not see any argument materializations. This
implementation strategy allows us to use the 1:N infrastructure/API in
`ConversionPattern`s even though some functionality is still missing in
the driver. This strategy was chosen to keep the sizes of the PRs
smaller and to make it easier for downstream users to adapt to API
changes.
This commit also updates the the "decompose call graphs" transformation
and the "sparse tensor codegen" transformation to use the new 1:N
`ConversionPattern` API.
Note for LLVM conversion: If you are using a type converter with 1:N
type conversion rules or if your patterns are performing 1:N
replacements (via `replaceOpWithMultiple` or
`applySignatureConversion`), conversion pattern applications will start
failing (fatal LLVM error) with this error message: `pattern 'name' does
not support 1:N conversion`. The name of the failing pattern is shown in
the error message. These patterns must be updated to the new 1:N
`matchAndRewrite` API.
These operations were introduced as counterparts to the following LLVM
intrinsics:
* `@llvm.masked.expandload.*`,
* `@llvm.masked.compressstore.*`.
Currently, there is minimal test coverage for scalable vector use cases
involving these Ops (both LLVM and MLIR). Additionally, the verifier is
flawed - it incorrectly allows mixing fixed-width and scalable vectors.
To address these issues, scalable vector support for these Ops is being
disabled for now. This decision can be revisited if a clear need arises
for their use with scalable vectors in the future.
Disables `vector.from_elements` for scalable vectors. Given that the
length of scalable vectors is unknown at compile time, the semantics of
this Op are unclear in this context.
This PR in order to solve the following problem.
https://github.com/llvm/llvm-project/pull/117721.
To efficiently implement the thread-to-data mapping relationship, I
introduced AffineScope in gpu.func(Data or thread layout).
Currently, the Linalg vectorizer disallows non-trailing parallel
dimensions to be scalable, e.g., `vector_sizes [[8], 1]` (*), for cases
like:
```mlir
%0 = linalg.fill ins(%arg0 : f32) outs(%A : tensor<?x?xf32>) -> tensor<?x?xf32>
```
This restriction exists to avoid generating "scalable" arrays of
aggregates, which LLVM does not support (multi-dim vectors are lowered
into arrays of aggregates at the LLVM level).
This patch relaxes that restriction when the trailing parallel vector
dimension is `1`, e.g., for `vector_sizes [[8], 1]`. Such cases are safe
since trailing unit dimensions can be collapsed. This relaxation is
necessary to support scalable vectorization for tensor.pack, where inner
tile sizes are `[8]` (scalable) and `1` (scalar).
(*) Transform Dialect notation
-- This commit updates `::fold()` to have constant(CST)
attribute for affine.delinearize_index/linearize_index op's basis
wherever applicable.
-- Essentially the code checks if the mixed basis OpFoldResult
set contains any constant SSA value and converts it to a constant
integer instead.
Signed-off-by: Abhishek Varma <abhvarma@amd.com>
When an unresolved materialization (`unrealized_conversion_cast` op) is
rolled back, the mapping should be rolled back as well, regardless of
whether it is a source, target or argument materialization. Otherwise,
we accumulate pointers to erased IR in the `mapping`. This is harmless
in most cases, but can cause issues when a new operation is allocated at
the same memory location and the pointer is "reused".
It is not possible to write a test case for this because I cannot
trigger the pointer reuse programmatically.
Fix a number of dependencies issue to build mlir-linalg-ods-yaml-gen
host binary which make a cross-build using the Make generator fail.
Namely:
- do not use binary path for the custom target created when
LLVM_USE_HOST_TOOLS is true;
- use target name instead of name of variable holding the target name
for add_custom_target and set_target_properties in setup_host_tool();
- force setting of executable and target cache variable which are only
used as global variables;
- remove dependency on target defined in different directory in
add_linalg_ods_yaml_gen() since add_custom_target DEPENDS can only be
used on "files and outputs of custom commands created with
add_custom_command() command calls in the same directory";
- remove unneeded dependency on ${MLIR_LINALG_ODS_YAML_GEN_EXE}, the
target dependency will ensure the binary will be built.
Note that we keep using ${MLIR_LINALG_ODS_YAML_GEN_EXE} in the COMMAND
rather than use ${MLIR_LINALG_ODS_YAML_GEN_TARGET} because when
LLVM_NATIVE_TOOL_DIR is used the latter is an empty string.
Testing-wise, all three codepaths in get_host_tool_path() were tested
with both GNU Make and Ninja generators:
- cross-compiling with LLVM_NATIVE_TOOL_DIR checks the if path;
- cross-compiling without LLVM_NATIVE_TOOL_DIR checks the elseif path;
- native build without LLVM_NATIVE_TOOL_DIR checks the else path.
The printing and parsing logic for struct types was still using ad-hoc
functions instead of the more conventional `print` and `parse` methods
whose declarations are automatically generated by TableGen.
This PR effectively renames these functions and uses them directly as
implementations for `print` and `parse` of `LLVMStructType`.
This additionally fixes linking errors when users or auto generated code
may call `print` and `parse` directly.
Fixes https://github.com/llvm/llvm-project/issues/117927
This PR extends the MLIR representation for `omp.target` ops by adding a
`map_idx` to `private` vars. This annotation stores the index of the map
info operand corresponding to the private var. If the variable does not
have a map operand, the `map_idx` attribute is either not present at all
or its value is `-1`.
This makes matching the private variable to its map info op easier (see
https://github.com/llvm/llvm-project/pull/116576 for usage).
The terms "legal type" and "illegal type" are ambiguous when talking
about materializations. E.g., for target materializations we do not
necessarily convert from illegal to legal types. We convert from the
most recently mapped value to the type that was produced by converting
the original type.
---------
Co-authored-by: Markus Böck <markus.boeck02@gmail.com>
This PR adds a command line argument `--mlir-disable-diagnostic` for
disabling diagnostic information for mlir-opt.
When debugging with mlir-opt, some developers would like to disable the
diagnostic information and focus specifically on the dumped IR. For
example, https://github.com/triton-lang/triton/pull/5250
Adds a CMake option MLIR_DISABLE_CONFIGURE_PYTHON_DEV_PACKAGES which
gates doing package discovery and configuration for Python dev packages
by MLIR (this was made opt-out to preserve compatibility with
find_package(MLIR) based uses which do not set the standard options).
The default Python setup that MLIR does has been a problem for
super-projects that include LLVM for a long time because it forces a
very specific package discovery mechanism that is not uniform in all
uses.
When reviewing #117922, I noted that this would effectively be a break
the world event for downstreams, forcing them to adapt their nanobind
dep to the exact way that MLIR does it. Adding the option to just
wholesale skip the built-in configuration heuristics at least gives us a
mechanism to tell downstreams to migrate to, giving them complete
control and not requiring packaging workarounds. This seemed a better
option than (once again) creating a situation where downstreams could
not integrate the dep change without doing tricky infra upgrades, and it
removes the burden from the author of that patch from needing to think
about how this affects super-projects that include MLIR (i.e. they can
just be told to do it themselves as needed vs being in a wedged state
and unable to upgrade).
This patch simplifies and extends the logic used when compressing masks
emitted by `vector.constant_mask` to support extracting 1-D vectors from
multi-dimensional vector loads. It streamlines mask computation, making
it applicable for multi-dimensional mask generation, improving the
overall handling of masked load operations.