As a follow-on for #87986, moves the VectorType convenience wrappers
(`FixedVectorType` and `ScalableVectorType`) to BuiltinTypes.h. This
allows us to use the new wrappers in "CommonTypeConstraints.td".
Unsigned integer types are uncommon enough in MLIR that there is no
operation to cast a scalar from signless to unsigned and vice versa.
Currently tosa.rescale uses builtin.unrealized_conversion_cast which
does not lower. Instead, this commit introduces optional attributes to
indicate unsigned input or output, named similarly to those in the TOSA
specification. This is more in line with the rest of MLIR where specific
operations rather than values are signed/unsigned.
If using a variadic operand, the error message given if the number of
types and operands do not match would be along the lines of:
```
3 operands present, but expected 2
```
This error message is confusing for multiple reasons, particular for
beginners:
* If the intention is to have 3 operands, it does not point out why it
expects 2. The user may actually just want to add a type to the type
list
* It reads as if a verifier error rather than a parser error, giving the
impression the Op only supports 2 operands.
This PR attempts to improve the error message by first noting the issue
("number of operands and types mismatch") and mentioning how many
operands and types it received.
This PR allows out-of-tree dialects to write Python dialect modules
using nanobind instead of pybind11.
It may make sense to migrate in-tree dialects and some of the ODS Python
infrastructure to nanobind, but that is a topic for a future change.
This PR makes the following changes:
* adds nanobind to the CMake and Bazel build systems. We also add
robin_map to the Bazel build, which is a dependency of nanobind.
* adds a PYTHON_BINDING_LIBRARY option to various CMake functions, such
as declare_mlir_python_extension, allowing users to select a Python
binding library.
* creates a fork of mlir/include/mlir/Bindings/Python/PybindAdaptors.h
named NanobindAdaptors.h. This plays the same role, using nanobind
instead of pybind11.
* splits CollectDiagnosticsToStringScope out of PybindAdaptors.h and
into a new header mlir/include/mlir/Bindings/Python/Diagnostics.h, since
it is code that is no way related to pybind11 or for that matter,
Python.
* changed the standalone Python extension example to have both pybind11
and nanobind variants.
* changed mlir/python/mlir/dialects/python_test.py to have both pybind11
and nanobind variants.
Notes:
* A slightly unfortunate thing that I needed to do in the CMake
integration was to use FindPython in addition to FindPython3, since
nanobind's CMake integration expects the Python_ names for variables.
Perhaps there's a better way to do this.
This change doesn't introduce any functional differences but aligns the
implementation more closely with LLVM's representation. Previously, the
code generated a lookup table to map MLIR enums to LLVM enums due to the
lack of one-to-one correspondence. With this refactoring, the generated
code now casts directly from one enum to another.
This PR adds two small convenience Vector types:
* `ScalableVectorType` and `FixedVectorType`.
The goal of these new types is two-fold:
* Enable idiomatic checks like `isa<ScalableVectorType>(...)`.
* Make the split into "Scalable" and "Fixed-wdith" vectors a bit more
explicit and more visible in the code-base.
The new types are added in mlir/include/mlir/IR (instead of e.g.
mlir/include/mlir/Dialect/Vector) so that the new types can be used
without requiring any new dependency (e.g. on the Vector dialect).
`DecomposeCallGraphTypes.cpp` was a workaround around missing 1:N
support in the dialect conversion. Now that 1:N support was added, the
workaround can be deleted. The test remains in place, as an example for
how to write such a transformation with the dialect conversion
framework.
Note for LLVM integration: If you are using
`DecomposeCallGraphTypes.cpp`, switch to the patterns that are used in
`TestDecomposeCallGraphTypes.cpp`.
If expand(collapse) has a dimension that gets collapsed and then
expanded to the same shape, the pattern would fail to canonicalize this
to a single collapse shape. Line 341 was changed because the
expand(collapse) could be a reinterpret-cast like sequence where the
shapes differ but the rank is the same. This cannot be represented by a
single `collapse_shape` op.
Signed-off-by: Ian Wood <ianwood2024@u.northwestern.edu>
-- Fixed a typo of member variable static_basis of
AffineDelinearizeIndexOp operation in AffineOps.td
Signed-off: Abdul Raheem Beigh <abdulraheembeigh@gmail.com>
This commit adds a new `matchAndRewrite` overload to `ConversionPattern`
to support 1:N replacements. This is the first of two main PRs that
merge the 1:1 and 1:N dialect conversion drivers.
The existing `matchAndRewrite` function supports only 1:1 replacements,
as can be seen from the `ArrayRef<Value>` parameter.
```c++
LogicalResult ConversionPattern::matchAndRewrite(
Operation *op, ArrayRef<Value> operands /*adaptor values*/,
ConversionPatternRewriter &rewriter) const;
```
This commit adds a `matchAndRewrite` overload that is called by the
dialect conversion driver. By default, this new overload dispatches to
the original 1:1 `matchAndRewrite` implementation. Existing
`ConversionPattern`s do not need to be changed as long as there are no
1:N type conversions or value replacements.
```c++
LogicalResult ConversionPattern::matchAndRewrite(
Operation *op, ArrayRef<ValueRange> operands /*adaptor values*/,
ConversionPatternRewriter &rewriter) const {
// Note: getOneToOneAdaptorOperands produces a fatal error if at least one
// ValueRange has 0 or more than 1 value.
return matchAndRewrite(op, getOneToOneAdaptorOperands(operands), rewriter);
}
```
The `ConversionValueMapping`, which keeps track of value replacements
and materializations, still does not support 1:N replacements. We still
rely on argument materializations to convert N replacement values back
into a single value. The `ConversionValueMapping` will be generalized to
1:N mappings in the second main PR.
Before handing the adaptor values to a `ConversionPattern`, all argument
materializations are "unpacked". The `ConversionPattern` receives N
replacement values and does not see any argument materializations. This
implementation strategy allows us to use the 1:N infrastructure/API in
`ConversionPattern`s even though some functionality is still missing in
the driver. This strategy was chosen to keep the sizes of the PRs
smaller and to make it easier for downstream users to adapt to API
changes.
This commit also updates the the "decompose call graphs" transformation
and the "sparse tensor codegen" transformation to use the new 1:N
`ConversionPattern` API.
Note for LLVM conversion: If you are using a type converter with 1:N
type conversion rules or if your patterns are performing 1:N
replacements (via `replaceOpWithMultiple` or
`applySignatureConversion`), conversion pattern applications will start
failing (fatal LLVM error) with this error message: `pattern 'name' does
not support 1:N conversion`. The name of the failing pattern is shown in
the error message. These patterns must be updated to the new 1:N
`matchAndRewrite` API.
These operations were introduced as counterparts to the following LLVM
intrinsics:
* `@llvm.masked.expandload.*`,
* `@llvm.masked.compressstore.*`.
Currently, there is minimal test coverage for scalable vector use cases
involving these Ops (both LLVM and MLIR). Additionally, the verifier is
flawed - it incorrectly allows mixing fixed-width and scalable vectors.
To address these issues, scalable vector support for these Ops is being
disabled for now. This decision can be revisited if a clear need arises
for their use with scalable vectors in the future.
Disables `vector.from_elements` for scalable vectors. Given that the
length of scalable vectors is unknown at compile time, the semantics of
this Op are unclear in this context.
This PR in order to solve the following problem.
https://github.com/llvm/llvm-project/pull/117721.
To efficiently implement the thread-to-data mapping relationship, I
introduced AffineScope in gpu.func(Data or thread layout).
Fix a number of dependencies issue to build mlir-linalg-ods-yaml-gen
host binary which make a cross-build using the Make generator fail.
Namely:
- do not use binary path for the custom target created when
LLVM_USE_HOST_TOOLS is true;
- use target name instead of name of variable holding the target name
for add_custom_target and set_target_properties in setup_host_tool();
- force setting of executable and target cache variable which are only
used as global variables;
- remove dependency on target defined in different directory in
add_linalg_ods_yaml_gen() since add_custom_target DEPENDS can only be
used on "files and outputs of custom commands created with
add_custom_command() command calls in the same directory";
- remove unneeded dependency on ${MLIR_LINALG_ODS_YAML_GEN_EXE}, the
target dependency will ensure the binary will be built.
Note that we keep using ${MLIR_LINALG_ODS_YAML_GEN_EXE} in the COMMAND
rather than use ${MLIR_LINALG_ODS_YAML_GEN_TARGET} because when
LLVM_NATIVE_TOOL_DIR is used the latter is an empty string.
Testing-wise, all three codepaths in get_host_tool_path() were tested
with both GNU Make and Ninja generators:
- cross-compiling with LLVM_NATIVE_TOOL_DIR checks the if path;
- cross-compiling without LLVM_NATIVE_TOOL_DIR checks the elseif path;
- native build without LLVM_NATIVE_TOOL_DIR checks the else path.
The printing and parsing logic for struct types was still using ad-hoc
functions instead of the more conventional `print` and `parse` methods
whose declarations are automatically generated by TableGen.
This PR effectively renames these functions and uses them directly as
implementations for `print` and `parse` of `LLVMStructType`.
This additionally fixes linking errors when users or auto generated code
may call `print` and `parse` directly.
Fixes https://github.com/llvm/llvm-project/issues/117927
This PR extends the MLIR representation for `omp.target` ops by adding a
`map_idx` to `private` vars. This annotation stores the index of the map
info operand corresponding to the private var. If the variable does not
have a map operand, the `map_idx` attribute is either not present at all
or its value is `-1`.
This makes matching the private variable to its map info op easier (see
https://github.com/llvm/llvm-project/pull/116576 for usage).
The terms "legal type" and "illegal type" are ambiguous when talking
about materializations. E.g., for target materializations we do not
necessarily convert from illegal to legal types. We convert from the
most recently mapped value to the type that was produced by converting
the original type.
---------
Co-authored-by: Markus Böck <markus.boeck02@gmail.com>
This PR adds a command line argument `--mlir-disable-diagnostic` for
disabling diagnostic information for mlir-opt.
When debugging with mlir-opt, some developers would like to disable the
diagnostic information and focus specifically on the dumped IR. For
example, https://github.com/triton-lang/triton/pull/5250
Fix#114594
#### Context
[IEEE754-2019](https://ieeexplore.ieee.org/document/8766229) Sec 9.6
defines 2 minimum and 2 maximum operations. They are termed
- `maximum` and `maximumNumber`
- `minimum` and `minimumNumber`
In the arith dialect they are respectively named `maximumf` and
`maxnumf`, `minimumf` and `minnumf` so I use these names.
These operations only differ in how they handle NaN values. For
`maximumf` and `minimumf`, if any operand is NaN, then the result is
NaN, ie, NaN is propagated. For `maxnumf` and `minnumf`, if any operand
is NaN, then the other operand is returned, ie, NaN is absorbed. The
following identities hold:
```
maximumf(x, NaN) = maximumf(NaN, x) = NaN
maxnumf(x, NaN) = maxnumf(NaN, x) = x
```
(and same for min).
#### Arith folders
In the following I am talking about the folders for the arith
operations. The folders implement the following canonicalizations (`op`
is one of maximumf, maxnumf, minimumf, minnumf):
1. `op(x, x)` folds to `x`
2. for `op(x, y)`, if `y` folds to the neutral element of the `op`, then
the `op` is folded to `x`.
1. The neutral element of `maximumf` is -Infty
2. The neutral element of `minimumf` is +Infty
3. The neutral element of `maxnumf` and `minnumf` is NaN as shown above.
3. for `op(x, y)`, if both `x` and `y` fold to constants `x'` and `y'`,
then the `op` is folded and the result is calculated with a
corresponding runtime function.
The folders are properly implemented for `maximumf` and `minimumf`, but
the same implementations were copied for the respective `maxnumf` and
`minnumf` functions. This means the neutral element of the second folder
above is wrong:
- `maxnumf(x, -Infty)` is folded to `x`, but that's wrong, because if
`x` is NaN then -Infty should be the result
- `minnumf(x, +Infty)` is folded to `x`, but same thing, the result
should be +Infty when `x` is NaN.
This is fixed by using `NaN` as neutral element for the `maxnumf` and
`minnumf` ops.[^1]
Again because of copy paste mistake, the third pattern above is using
`llvm::maximum` instead of `llvm::maximumnum` to calculate the result in
case both arguments fold to a constant:
- `maxnumf(NaN, x')` would have been folded to `llvm::maximum(NaN, x')`
which is `NaN`, whereas the result should be `x'`.
This folder for `minnumf` already correctly uses `llvm::minnum`, but I
fixed the one for `maxnumf` in this PR.
[^1]: this is by the way already correctly implemented in
[`arith::getIdentityValueAttr`](a821964e03/mlir/lib/Dialect/Arith/IR/ArithOps.cpp (L2493-L2498))
We've had the ability to define LLVM's `range` attribute through
#llvm.constant_range for some time, and have used this for some GPU
intrinsics. This commit allows using `llvm.range` as a parameter or
result attribute on function declarations and definitions.
Use `llvm.func`'s `intel_reqd_sub_group_size` attribute instead of
SPIR-V environment attributes in the `gpu.shuffle` conversion pattern.
This metadata is needed to check the semantics of the operation are
supported, i.e., it has a constant width and its value is equal to the
sub-group size.
As the pass also converts `gpu.func` to `llvm.func`, adding a
discardable attribute of name `intel_reqd_sub_group_size` attribute to
the latter is enough for this pattern to work.
We no longer have a notion of "default" sub-group size, so this
attribute needs to be set in the parent function for `gpu.shuffle`
operations to be converted.
Drop dependency on the SPIR-V dialect as we no longer require creating
attributes from this dialect to lower `gpu.shuffle` instances.
---------
Signed-off-by: Victor Perez <victor.perez@codeplay.com>
This commit removes the last remaining components of the dialect
conversion-based bufferization passes.
Note for LLVM integration: If you depend on these components, migrate to
One-Shot Bufferize or copy them to your codebase.
`isOpOperandCanBeDroppedAfterFusedLinalgs` crashes when `indexingMaps`
is empty. This can occur when `producer` only has DPS init operands and
`consumer ` only has a single DPS input operand (all operands are
ignored and nothing gets added to `indexingMaps`). This is because
`concatAffineMaps` wasn't handling the maps being empty properly.
Similar to `canOpOperandsBeDroppedImpl`, I added an early return when
the maps are of size zero. Additionally, `concatAffineMaps`'s
declaration comment says it returns an empty map when `maps` is empty
but it has no way to get the `MLIRContext` needed to construct the empty
affine map when the array is empty. So, I changed this to take the
context.
__NOTE: concatAffineMaps now takes an MLIRContext to be able to
construct an empty map in the case where `maps` is empty.__
---------
Signed-off-by: Ian Wood <ianwood2024@u.northwestern.edu>
Co-authored-by: Quinn Dawkins <quinn.dawkins@gmail.com>
As described in issue llvm/llvm-project#91518, a previous PR
llvm/llvm-project#78484 introduced the `defaultMemorySpaceFn` into
bufferization options, allowing one to inform OneShotBufferize that it
should use a specified function to derive the memory space attribute
from the encoding attribute attached to tensor types.
However, introducing this feature exposed unhandled edge cases,
examples of which are introduced by this change in the new test under
`test/Dialect/Bufferization/Transforms/one-shot-bufferize-encodings.mlir`.
Fixing the inconsistencies introduced by `defaultMemorySpaceFn` is
pretty simple. This change:
- Updates the `bufferization.to_memref` and `bufferization.to_tensor`
operations to explicitly include operand and destination types,
whereas previously they relied on type inference to deduce the
tensor types. Since the type inference cannot recover the correct
tensor encoding/memory space, the operand and result types must be
explicitly included. This is a small assembly format change, but it
touches a large number of test files.
- Makes minor updates to other bufferization functions to handle the
changes in building the above ops.
- Updates bufferization of `tensor.from_elements` to handle memory
space.
Integration/upgrade guide:
In downstream projects, if you have tests or MLIR files that explicitly
use
`bufferization.to_tensor` or `bufferization.to_memref`, then update
them to the new assembly format as follows:
```
%1 = bufferization.to_memref %0 : memref<10xf32>
%2 = bufferization.to_tensor %1 : memref<10xf32>
```
becomes
```
%1 = bufferization.to_memref %0 : tensor<10xf32> to memref<10xf32>
%2 = bufferization.to_tensor %0 : memref<10xf32> to tensor<10xf32>
```
Currently, the Vector dialect TD file includes the following "vector"
type definitions:
```mlir
def AnyVector : VectorOf<[AnyType]>;
def AnyVectorOfAnyRank : VectorOfAnyRankOf<[AnyType]>;
def AnyFixedVector : FixedVectorOf<[AnyType]>;
def AnyScalableVector : ScalableVectorOf<[AnyType]>;
```
In short:
* `AnyVector` _excludes_ 0-D vectors.
* `AnyVectorOfAnyRank`, `AnyFixedVector`, and `AnyScalableVector`
_include_ 0-D vectors.
The naming for "groups" that include 0-D vectors is inconsistent and can
be misleading, and `AnyVector` implies that 0-D vectors are included,
which is not the case.
This patch renames these definitions for clarity:
```mlir
def AnyVectorOfNonZeroRank : VectorOfNonZeroRankOf<[AnyType]>;
def AnyVectorOfAnyRank : VectorOfAnyRankOf<[AnyType]>;
def AnyFixedVectorOfAnyRank : FixedVectorOfAnyRank<[AnyType]>;
def AnyScalableVectorOfAnyRank : ScalableVectorOfAnyRank<[AnyType]>;
```
Rationale:
* The updated names are more explicit about 0-D vector support.
* It becomes clearer that scalable vectors currently allow 0-D vectors -
this might warrant a revisit.
* The renaming paves the way for adding a new group for "fixed-width
vectors excluding 0-D vectors" (e.g., AnyFixedVector), which I plan to
introduce in a follow-up patch.
Drop commas from split barrier operations assembly format.
Signed-off-by: Victor Perez <victor.perez@codeplay.com>
Depends on #116648, review ec8d35471602cd88aa2ebaf239b698ef3ba353bd
only.
---------
Signed-off-by: Victor Perez <victor.perez@codeplay.com>
Add support for denormal in the Arith dialect (binary and unary
operations).
Denormal are attached to every operation, and they can be of three
different
kinds:
1) ieee, denormal are preserved and processed as defined by IEEE 754
rules.
2) preserve sign, a mode where denormal numbers are flushed to zero, but
the
sign of the zero (+0 or -0) is preserved.
3) positive zero, a mode where all denormal numbers are flushed to
positive zero
(+0), ignoring the sign of the original number.
Denormal refers to both the operands and the result. Currently only
lowering for
ieee is supported.
Currently, `GeneralizePadOpPattern` is grouped under
`populatePadOpVectorizationPatterns`. However, as noted in #111349, this
transformation "decomposes" rather than "vectorizes" `tensor.pad`. As
such, it functions as:
* a vectorization _pre-processing_ transformation, not
* a vectorization transformation itself.
To clarify its purpose, this PR turns `GeneralizePadOpPattern` into a
standalone transformation by:
* introducing a dedicated `populateDecomposePadPatterns` method,
* adding a `apply_patterns.linalg.decompose_pad` Transform Dialect Op,
* removing it from `populatePadOpVectorizationPatterns`.
In addition, to better reflect its role, it is renamed as "decomposition"
rather then "generalization". This is in line with the recent renaming
of similar ops, i.e. tensor.pack/tensor.unpack Ops in #116439.
Declarative assemblyFormat ODS is more concise and requires less
boilerplate than filling out cpp interfaces.
Changes:
updates the PtrAccessChainOp and InBoundPtrAccessChainOp defined in
SPIRVMemoryOps.td to use assemblyFormat. Removes part print/parse from
MemoryOps.cpp which is now generated by assemblyFormat
Updates tests to updated format
Issue: #73359
This PR extracts NFC changes out of
https://github.com/llvm/llvm-project/pull/116035 to reap as many of the
same benefits without any of the semantic changes.
More concretely, moving `LLVMStructType` to ODS has the benefits of
being able to generate much of the required boilerplate, such as
interface definitions, documentation and more, automatically.
Furthermore, `LLVMStructType` is then treated less special and its
definition can be found at the same place where all other complex type
definitions are found in the LLVM dialect.
Future changes could leverage more automatically generated code from
TableGen such as `assemblyFormat`. As these are not as trivial, they
have been left for future PRs.
---------
Co-authored-by: Tobias Gysi <tobias.gysi@nextsilicon.com>
This patch adds the `ConvertToLLVMAttrInterface` and
`ConvertToLLVMOpInterface` interfaces. It also modifies the
`convert-to-llvm` pass to use these interfaces when available.
The `ConvertToLLVMAttrInterface` interfaces allows attributes to
configure conversion to LLVM, including the conversion target, LLVM type
converter, and populating conversion patterns. See the `NVVMTargetAttr`
implementation of this interface for an example of how this interface
can be used to configure conversion to LLVM.
The `ConvertToLLVMOpInterface` interface collects all convert to LLVM
attributes stored in an operation.
Finally, the `convert-to-llvm` pass was modified to use these interfaces
when available. This allows applying `convert-to-llvm` to GPU modules
and letting the `NVVMTargetAttr` decide which patterns to populate.
This location type represents a contiguous range inside a file. It is
effectively a pair of FileLineCols. Add new type and make FileLineCol a
view for case where it matches existing previous one.
The location includes filename and optional start line & col, and end
line & col. Considered common cases are file:line, file:line:col,
file:line:start_col to file:line:end_col and general range within same
file. In memory its encoded as trailing objects. This keeps the memory
requirement the same as FileLineColLoc today (makes the rather common
File:Line cheaper) at the expense of extra work at decoding time. Kept the unsigned
type.
There was the option to always have file range be castable to
FileLineColLoc. This cast would just drop other fields. That may result
in some simpler staging. TBD.
This is a rather minimal change, it does not yet add bindings (C or
Python), lowering to LLVM debug locations etc. that supports end line:cols.
---------
Co-authored-by: River Riddle <riddleriver@gmail.com>
As the documentation for -affine-expand-index-ops says,
affine.delinearize_index and affine.linearize_index don't need to be
expanded into the affine dialect.
Expanding these operations into affine.apply operations can introduce
unwanted "simplifications", mainly translations of `(dN mod C + ...)` to
`(dN + ... - (dN floordiv C) * C)` and similar, which create worse
generated code. This commit resolves this issue by expanding out
affine.delanierize_index directly.
In addition, the lowering of affine.linearize_index now sorts the
operands by loop-independence, allowing an increased amount of
loop-invariant code motion after lowering.
The old behavior is preserved as -expand-affine-index-ops-as-affine but
is no longer the default
This PR is simply adding the Broadcom vendor ID to the SPIRV list. In
order to enable the use of this vendor ID in a SPIRV pipeline for the
Videocore GPUs.
When inferring the mask of a transfer operation that results in a single `i1` element,
we could represent it using either `vector<i1>` or vector<1xi1>. To avoid type mismatches,
this PR updates the mask inference logic to consistently generate `vector<1xi1>` for
these cases. We can enable 0-D masks if they are needed in the future.
See: https://github.com/llvm/llvm-project/issues/116197
The dialect conversion-based bufferization passes have been migrated to
One-Shot Bufferize about two years ago. To clean up the code base, this
commit removes the `finalizing-bufferize` pass, one of the few remaining
parts of the old infrastructure. Most bufferization passes have already
been removed.
Note for LLVM integration: If you depend on this pass, migrate to
One-Shot Bufferize or copy the pass to your codebase.
Depends on #114152.
Here is the [merged
MR](https://github.com/llvm/llvm-project/pull/116007) which caused a
failure and [was
reverted](https://github.com/llvm/llvm-project/pull/116811).
Thanks to @joker-eph for the help, I fix it (miss constructing
`ModuleObject` with callback functions in
`mlir/lib/Target/LLVM/NVVM/Target.cpp`) and split unit tests from origin
test which don't need `ptxas` to make the test runs more widely.
This commit refactors part of the code in preparation for the migration
of other *matmul* variants from OpDSL to ODS.
Moves getDefaultIndexingmaps() helper into the MatmulOp class.