1. Extract the main logic from `foldTensorCastPrecondition` into a
dedicated helper hook: `hasFoldableTensorCastOperand`. This allows
for reusing the corresponding checks.
2. Rename `getNewOperands` to `getUpdatedOperandsAfterCastOpFolding` for
better clarity and documentation of its functionality.
3. These updated hooks will be reused in:
* https://github.com/llvm/llvm-project/pull/123902. This PR makes
them public.
**Note:** Moving these hooks to `Tensor/Utils` is not feasible because
`MLIRTensorUtils` depends on `MLIRTensorDialect` (CMake targets). If
these hooks were moved to `Utils`, it would create a dependency of
`MLIRTensorDialect` on `MLIRTensorUtils`, leading to a circular
dependency.
When MappableType was introduced alongside PointerLikeType, the data
clause operation builders were duplicated to accept a `TypedValue` of
one of the two type options. However, the underlying builder takes a
`Value` and this difference is not relevant for it. The only difference
is that `varType` is set differently depending on the type.
Having two duplicated builders can lead to clunky building since a
`Value` must always be cast to one of the two options. Thus, simply
clean this up - the verifier already checks that it is a type that
implements one of the two interfaces.
For extremely large models, it may be inefficient to load the model into
memory in Python prior to passing it to the MLIR C APIs for
deserialization. This change adds an API to parse a ModuleOp directly
from a file path.
Re-lands
[4e14b8a](4e14b8afb4).
A collection of fixes to the mesh dialect
- allow constants in sharding propagation/spmdization
- fixes to tensor replication (e.g. 0d tensors)
- improved canonicalization
- sharding propagation incorrectly generated too many ShardOps
New operation `mesh.GetShardOp` enables exchanging sharding information
(like on function boundaries)
Enable ops with only read side effects in scf.for to be hoisted with a
scf.if guard that checks against the trip count
This patch takes a step towards a less conservative LICM in MLIR as
discussed in the following discourse thread:
[Speculative LICM?](https://discourse.llvm.org/t/speculative-licm/80977)
This patch in particular does the following:
1. Relaxes the original constraint for hoisting that only hoists ops
without any side effects. This patch also allows the ops with only read
side effects to be hoisted into an scf.if guard only if every op in the
loop or its nested regions is side-effect free or has only read side
effects. This scf.if guard wraps the original scf.for and checks for
**trip_count > 0**.
2. To support this, two new interface methods are added to
**LoopLikeInterface**: _wrapInTripCountCheck_ and
_unwrapTripCountCheck_. Implementation starts with wrapping the scf.for
loop into scf.if guard using _wrapInTripCountCheck_ and if there is no
op hoisted into the this guard after we are done processing the
worklist, it unwraps the guard by calling _unwrapTripCountCheck_.
This warning is causing lots of build spam when I use a recent Clang as
my host compiler. It's a potential false positive, so silence it until
https://github.com/llvm/llvm-project/issues/126600 is resolved.
Fix variable casing while I'm here.
Adds a small note to VectorOps.td on what "dim-1" broadcast is. Also
updates comments to consistently use quotes, i.e.
* "dim-1" broadcasting instead of dim-1 broadcasting.
This way it is clear that we are referring to "stretching" one of the
trailing dims rather than e.g. broadcasting a dim at idx 1.
Change the shift operand for the mul operator to be a required operand.
Also defined shift to be Tosa_ScalarInt8Tensor which requires that it is
a rank-1 tensor
whose shape is [1] (ie, tensor containing a single element)
Signed-off-by: Tai Ly <tai.ly@arm.com>
This changes Tosa ClampOp attributes to min_val and max_val which are
either integer attributes or float attributes, and adds verify checks
that these attribute element types must match element types of input and
output
Co-authored-by: Tai Ly <tai.ly@arm.com>
Refactors XeGPU scatter attribute introducing following:
- improved docs formatting
- default initialized parameters
- invariant checks in attribute verifier
- removal of additional parsing error
The attribute's getters now provide default values simplifying their
usage and scattered tensor descriptor handling.
Related descriptor verifier is updated to avoid check duplication.
Update llvm.call/llvm.invoke pretty printer/parser and the llvm ir import/export
to deal with the argument and result attributes.
This patch is made on top of PR 123176 that modified the
CallOpInterface and added the argument and result attributes to
llvm.call and llvm.invoke without doing anything with them.
RFC: https://discourse.llvm.org/t/mlir-rfc-adding-argument-and-result-attributes-to-llvm-call/84107
lib/libMLIRLLVMIRTransforms.a fails to build from scratch with the
following error:
In file included from llvm/include/llvm/Frontend/OpenMP/OMPConstants.h:19,
from llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h:19,
from mlir/include/mlir/Target/LLVMIR/ModuleTranslation.h:26,
from mlir/include/mlir/Dialect/LLVMIR/NVVMDialect.h:24,
from mlir/lib/Dialect/LLVMIR/Transforms/InlinerInterfaceImpl.cpp:17:
llvm/include/llvm/Frontend/OpenMP/OMP.h:16:10:
fatal error: llvm/Frontend/OpenMP/OMP.h.inc: No such file or directory
Use a forward declaration for OpenMPIRBuilder in ModuleTranslation.h to
avoid pulling OpenMP frontend header that require generated headers.
The existing `mlir::populateMathPolynomialApproximationPatterns` is
coarse-grained and inflexible:
- It populates 2 distinct classes of patterns: (1) polynomial
approximations, (2) expansions of operands to f32.
- It does not offer knobs to select which math functions to apply the
rewrites to.
This PR adds finer-grained populate-patterns functions, which take a
predicate lambda allowing the caller to control which math functions to
apply rewrites to.
Signed-off-by: Benoit Jacob <jacob.benoit.1@gmail.com>
OpenACC specification describes the following type categories: scalar,
array, composite, and aggregate (which includes arrays, composites, and
others such as Fortran pointer/allocatable).
Decision for how to do implicit mapping is dependent on a variable's
category. Since acc dialect's only means of distinguishing between types
is through the interfaces attached, add API to be able to get the type
category.
In addition to defining the new API, attempt to provide a base
implementation for memref which matches what OpenACC spec describes.
Now that linalg.matmul is in tablegen, "hand write" the Python wrapper
that OpDSL used to derive. Similarly, add a Python wrapper for the new
linalg.contract op.
Required following misc. fixes:
1) make linalg.matmul's parsing and printing consistent w.r.t. whether
indexing_maps occurs before or after operands, i.e. per the tests cases
it comes _before_.
2) tablegen for linalg.contract did not state it accepted an optional
cast attr.
3) In ODS's C++-generating code, expand partial support for `$_builder`
access in `Attr::defaultValue` to full support. This enables access to
the current `MlirContext` when constructing the default value (as is
required when the default value consists of affine maps).
PR #126091 adds intrinsics for tcgen05
wait/fence/commit operations. This patch
adds NVVM Dialect Ops for them.
Signed-off-by: Durgadoss R <durgadossr@nvidia.com>
Fix private memref creation bug in affine fusion exposed in the case of
the same memref being loaded from/stored to in producer nest. Make the
private memref replacement sound.
Change affine fusion debug string to affine-fusion - more compact.
Fixes: https://github.com/llvm/llvm-project/issues/48703
Adds XeGPU tensor descriptor type verifier.
The type verifier covers general tensor descriptor invariants w.r.t. Xe
ISA semantics.
Related operation verifiers are updated to account for the new
descriptor checks and avoid duplication.
I'm seeing build errors in a downstream project using torch-mlir that
are fixed by this change. See
https://github.com/iree-org/iree/pull/19903#discussion_r1946899561 for
more context. The build error on MSVC is:
```
C:\home\runner\_work\iree\iree\third_party\llvm-project\mlir\include\mlir/Dialect/Tosa/Utils/ConversionUtils.h(148): error C2872: 'OpTrait': ambiguous symbol
C:\home\runner\_work\iree\iree\third_party\llvm-project\mlir\include\mlir/Dialect/Tosa/IR/TosaOps.h(49): note: could be 'mlir::OpTrait'
C:\home\runner\_work\iree\iree\third_party\torch-mlir\include\torch-mlir/Dialect/Torch/IR/TorchTraits.h(23): note: or 'mlir::torch::Torch::OpTrait'
C:\home\runner\_work\iree\iree\third_party\llvm-project\mlir\include\mlir/Dialect/Tosa/Utils/ConversionUtils.h(148): note: the template instantiation context (the oldest one first) is
C:\home\runner\_work\iree\iree\third_party\torch-mlir\lib\Conversion\TorchToTosa\TosaLegalizeCommon.cpp(126): note: see reference to function template instantiation 'TosaOp mlir::tosa::CreateOpAndInfer<mlir::tosa::MulOp,mlir::Value&,mlir::Value&,mlir::Value&>(mlir::PatternRewriter &,mlir::Location,mlir::Type,mlir::Value &,mlir::Value &,mlir::Value &)' being compiled
with
[
TosaOp=mlir::tosa::MulOp
]
C:\home\runner\_work\iree\iree\third_party\torch-mlir\include\torch-mlir/Conversion/TorchToTosa/TosaLegalizeUtils.h(83): note: see reference to function template instantiation 'TosaOp mlir::tosa::CreateOpAndInfer<TosaOp,mlir::Value&,mlir::Value&,mlir::Value&>(mlir::ImplicitLocOpBuilder &,mlir::Type,mlir::Value &,mlir::Value &,mlir::Value &)' being compiled
with
[
TosaOp=mlir::tosa::MulOp
]
C:\home\runner\_work\iree\iree\third_party\torch-mlir\include\torch-mlir/Conversion/TorchToTosa/TosaLegalizeUtils.h(76): note: see reference to function template instantiation 'TosaOp mlir::tosa::CreateOpAndInferShape<TosaOp,mlir::Value&,mlir::Value&,mlir::Value&>(mlir::ImplicitLocOpBuilder &,mlir::Type,mlir::Value &,mlir::Value &,mlir::Value &)' being compiled
with
[
TosaOp=mlir::tosa::MulOp
]
```
I think the torch-mlir code here is causing the issue, but I'm not sure
why builds only started failing now:
https://github.com/llvm/torch-mlir/blob/main/include/torch-mlir/Dialect/Torch/IR/TorchTraits.h.
Given that `mlir::OpTrait` already exists, torch-mlir should not be
creating an ambiguous symbol `mlir::torch::Torch::OpTrait`. So while a
better fix would be to the downstream project, being explicit here
doesn't seem that unreasonable to me.
The shape operand is changed to input shape type since V1.0
Change-Id: I508cc1d67e9b017048b3f29fecf202cb7d707110
Co-authored-by: Won Jeon <won.jeon@arm.com>
A new constraint is also added to restrict attributes values for SPIR-V
attributes. Ideally this should use `ConfinedAttr` with a custom
constraint directly on the operand, however it seems TableGen does not
allow using that with SPIR-V attributes. I suspect it is because SPIR-V
attributes do not derive from the generic MLIR attribute class -
TableGen complains about missing enum field.
* Remove duplicate `TypeOrContainer`. There is an identical class with
the same name: `TypeOrValueSemanticsContainer`.
* Remove `TypeOrContainerOfAnyRank` and use
`TypeOrValueSemanticsContainer` instead. `TypeOrContainerOfAnyRank` is
inconsistent with the other classes because it explicitly checks for
`VectorType` and `TensorType` instead of utilizing the value semantics
type trait.
* Remove `SignlessIntegerOrIndexLikeOfAnyRank` etc. and use
`SignlessIntegerOrIndexLike` instead. `SignlessIntegerOrIndexLike` etc.
already allow 0-d vectors, so there is no difference with
`SignlessIntegerOrIndexLikeOfAnyRank`.
The config is currently not movable and because there are constructors
the default move won't be generated, which prevents it from being moved.
Also, it is not copyable because of the unique_ptr. This PR adds move
constructor to allow moving it.
Goals:
1. To add syntax and semantic to 'batch_matmul' without changing any of
the existing syntax expectations for current usage. batch_matmul is
still just batch_matmul.
2. Move the definition of batch_matmul from linalg OpDsl to tablegen ODS
infra.
Scope of this patch:
To expose broadcast and transpose semantics on the 'batch_matmul'.
The broadcast and transpose semantic are as follows:
By default, 'linalg.batch_matmul' behavior will remain as is. Broadcast
and Transpose semantics can be applied by specifying the explicit
attribute 'indexing_maps' as shown below. This is a list attribute, so
the list must include all the maps if specified.
Example Transpose:
```
linalg.batch_matmul indexing_maps = [
affine_map< (d0, d1, d2, d3) -> (d0, d3, d1)>, //transpose
affine_map< (d0, d1, d2, d3) -> (d0, d3, d2)>,
affine_map< (d0, d1, d2, d3) -> (d0, d1, d2)>
]
ins (%arg0, %arg1: memref<2x5x3xf32>,memref<2x5x7xf32>)
outs (%arg2: memref<2x3x7xf32>)
```
Example Broadcast:
```
linalg.batch_matmul indexing_maps = [
affine_map< (d0, d1, d2, d3) -> (d3)>, //broadcast
affine_map< (d0, d1, d2, d3) -> (d0, d3, d2)>,
affine_map< (d0, d1, d2, d3) -> (d0, d1, d2)>
]
ins (%arg0, %arg1: memref<5xf32>,memref<2x5x7xf32>)
outs (%arg2: memref<2x3x7xf32>)
```
Example Broadcast and transpose:
```
linalg.batch_matmul indexing_maps = [
affine_map< (d0, d1, d2, d3) -> (d1, d3)>, //broadcast
affine_map< (d0, d1, d2, d3) -> (d0, d2, d3)>, //transpose
affine_map< (d0, d1, d2, d3) -> (d0, d1, d2)>
]
ins (%arg0, %arg1: memref<3x5xf32>, memref<2x7x5xf32>)
outs (%arg2: memref<2x3x7xf32>)
```
RFCs and related PR:
https://discourse.llvm.org/t/rfc-linalg-opdsl-constant-list-attribute-definition/80149https://discourse.llvm.org/t/rfc-op-explosion-in-linalg/82863https://discourse.llvm.org/t/rfc-mlir-linalg-operation-tree/83586https://github.com/llvm/llvm-project/pull/115319
LLVM itself is generally moving away from using `undef` and towards
using `poison`, to the point of having a lint that caches new uses of
`undef` in tests.
In order to not trip the lint on new patterns and to conform to the
evolution of LLVM
- Rename valious ::undef() methods on StructBuilder subclasses to
::poison()
- Audit the uses of UndefOp in the MLIR libraries and replace almost all
of them with PoisonOp
The remaining uses of `undef` are initializing `uninitialized` memrefs,
explicit conversions to undef from SPIR-V, and a few cases in
AMDGPUToROCDL where usage like
%v = insertelement <M x iN> undef, iN %v, i32 0
%arg = bitcast <M x iN> %v to i(M * N)
is used to handle "i32" arguments that are are really packed vectors of
smaller types that won't always be fully initialized.
There were a bunch of spots in ROCDL.td where we were defining our own
llvmBuilder call which could have been generated using the default
built-in one on LLVM_IntrOpBase.
This commit cleans up such usages in the interests of potentinally
enabling ROCDL import in the future and of making best practices more
obvious.
The one breaking change is renaming WaitcntOp to SWaitcntOp, which
should have minimal impact.
This Pull Request adds OpImageWrite as defined in section 3.52.10.
(Image Instructions). The tests in
`mlir/test/Target/SPIRV/image-ops.mlir` are also updated (and extended
with the new op), so they now pass validation with `spirv-val` after
serialization into SPIR-V. The test was missing `ImageQuery` capability
and entry points. For entry points dummy `main` functions were added.
The newly introduced `TensorRelayoutOpInterface` is created specifically
for `tensor.pack` + `tensor.unpack`. Although the interface is
currently empty, it enables us to refactor the logic in
`FoldTensorCastProducerOp` within the Tensor dialect as follows:
```cpp
// OLD
// Reject tensor::PackOp - there's dedicated pattern for that instead.
if (!foldTensorCastPrecondition(op) ||
isa<tensor::PackOp, tensor::UnPackOp>(*op))
return failure();
```
is replaced with:
```cpp
// NEW
// Reject tensor::PackOp - there's dedicated pattern for that instead.
if (!foldTensorCastPrecondition(op) ||
isa<tensor::RelayoutOpInterface>(*op))
return failure();
```
This will be crucial once `tensor.pack` + `tensor.pack` are replaced
with `linalg.pack` + `linalg.unpack` (i.e. moved to Linalg):
* https://github.com/llvm/llvm-project/pull/123902,
* https://discourse.llvm.org/t/rfc-move-tensor-pack-and-tensor-unpack-into-linalg/.
Note that the interface itself will later be moved to the Linalg
dialect. This decoupling ensures that the Tensor dialect does not
require an understanding of Linalg ops, thus keeping the dependency
lightweight.
This PR is effectively a preparatory step for moving PackOp and UnpackOp
to Linalg. Once that's completed, most CMake changes from this PR will
be effectively reverted.
This includes support for module translation, module import and add tests for both.
Fix https://github.com/llvm/llvm-project/issues/115390
ClangIR cannot currently lower global aliases to LLVM because of missing support for this.
Create `VectorToLLVMDialectInterface` which allows automatic conversion
discovery by generic `--convert-to-llvm` pass. This only covers final
dialect conversion step and not any previous preparation steps. Also,
currently there is no way to pass any additional parameters through this
conversion interface, but most users using default parameters anyway.
For extremely large models, it may be inefficient to load the model into
memory in Python prior to passing it to the MLIR C APIs for
deserialization. This change adds an API to parse a ModuleOp directly
from a file path.
Removed the TOSA quantization attribute used in various MLIR TOSA
dialect operations in favour of using builtin attributes.
Update any lit tests, conversions and transformations appropriately.
Signed-off-by: Tai Ly <tai.ly@arm.com>
Co-authored-by: Tai Ly <tai.ly@arm.com>
Drop arbitrary checks and hacks from affine fusion MDG construction and
handle all ops using memory read/write effects. This has been a long
pending change and it now makes affine fusion more powerful in the
presence of non-affine ops and does not limit fusion in parts of the
block where it is feasible simply because of non-affine ops elsewhere or
intervening non-affine users.
Populate memref read and write ops in non-affine region holding ops and
non-affine ops at the top level of the Block properly; add the
appropriate edges to MDG. Use memory read-write effects and drop
assumptions and special handling of ops due to historic reasons.
Update MDG to drop unnecessary "unhandled region" hack. This hack is no
longer needed with the update to fully and properly construct the MDG.
MDG edges now capture dependences between nodes completely. Drop
non-affine users check. With the MDG generalization to properly include
edges
between non-affine nodes/operations, the non-affine users on path check
in fusion is no longer needed. Add more test cases to exercise MDG
generalization.
Drop unnecessary failure when encountering side-effect-free affine.if
ops.
Improve documentation on MDG.
resource keys have the problem that you can’t parse them from mlir
assembly if they have special or non-printable characters, but nothing
prevents you from specifying such a key when you create e.g. a
DenseResourceElementsAttr, and it works fine in other ways, including
bytecode emission and parsing
this PR solves the parsing by quoting and escaping keys with special or
non-printable characters in mlir assembly, in the same way as symbols,
e.g.:
```
module attributes {
fst = dense_resource<resource_fst> : tensor<2xf16>,
snd = dense_resource<"resource\09snd"> : tensor<2xf16>
} {}
{-#
dialect_resources: {
builtin: {
resource_fst: "0x0200000001000200",
"resource\09snd": "0x0200000008000900"
}
}
#-}
```
by not quoting keys without special or non-printable characters, the
change is effectively backwards compatible
the change is tested by:
1. adding a test with a dense resource handle key with special
characters to `dense-resource-elements-attr.mlir`
2. adding special and unprintable characters to some resource keys in
the existing lit tests `pretty-resources-print.mlir` and
`mlir/test/Bytecode/resources.mlir`
Add Rocdl support for the following GFX950 instructions:
CVT_SCALE_PK_FP8_F32
CVT_SCALE_PK_BF8_F32
CVT_SCALE_SR_FP8_F32
CVT_SCALE_SR_BF8_F32
CVT_SCALE_PK_F32_FP8
CVT_SCALE_PK_F32_BF8
CVT_SCALE_F32_FP8
CVT_SCALE_F32_BF8
The current implementation of OpenACC lowering includes explicit
expansion of following cases:
- Creation of `acc.bounds` operations for all arrays, including those
whose dimensions are captured in the type (eg `!fir.array<100xf32>`)
- Expansion of box types by only putting the box's address in the data
clause. The address was extracted with a `fir.box_addr` operation and
the bounds were filled with `fir.box_dims` operation.
However, with the creation of the new type interface `MappableType`, the
idea is that specific type-based semantics can now be used. This also
really simplifies representation in the IR. Consider the following
example:
```
subroutine sub(arr)
real :: arr(:)
!$acc enter data copyin(arr)
end subroutine
```
Before the current PR, the relevant acc dialect IR looked like:
```
func.func @_QPsub(%arg0: !fir.box<!fir.array<?xf32>> {fir.bindc_name =
"arr"}) {
...
%1:2 = hlfir.declare %arg0 dummy_scope %0 {uniq_name = "_QFsubEarr"} :
(!fir.box<!fir.array<?xf32>>, !fir.dscope) ->
(!fir.box<!fir.array<?xf32>>, !fir.box<!fir.array<?xf32>>)
%c1 = arith.constant 1 : index
%c0 = arith.constant 0 : index
%2:3 = fir.box_dims %1#0, %c0 : (!fir.box<!fir.array<?xf32>>, index)
-> (index, index, index)
%c0_0 = arith.constant 0 : index
%3 = arith.subi %2#1, %c1 : index
%4 = acc.bounds lowerbound(%c0_0 : index) upperbound(%3 : index)
extent(%2#1 : index) stride(%2#2 : index) startIdx(%c1 : index)
{strideInBytes = true}
%5 = fir.box_addr %1#0 : (!fir.box<!fir.array<?xf32>>) ->
!fir.ref<!fir.array<?xf32>>
%6 = acc.copyin varPtr(%5 : !fir.ref<!fir.array<?xf32>>) bounds(%4) ->
!fir.ref<!fir.array<?xf32>> {name = "arr", structured = false}
acc.enter_data dataOperands(%6 : !fir.ref<!fir.array<?xf32>>)
```
After the current change, it looks like:
```
func.func @_QPsub(%arg0: !fir.box<!fir.array<?xf32>> {fir.bindc_name =
"arr"}) {
...
%1:2 = hlfir.declare %arg0 dummy_scope %0 {uniq_name = "_QFsubEarr"} :
(!fir.box<!fir.array<?xf32>>, !fir.dscope) ->
(!fir.box<!fir.array<?xf32>>, !fir.box<!fir.array<?xf32>>)
%2 = acc.copyin var(%1#0 : !fir.box<!fir.array<?xf32>>) ->
!fir.box<!fir.array<?xf32>> {name = "arr", structured = false}
acc.enter_data dataOperands(%2 : !fir.box<!fir.array<?xf32>>)
```
Restoring the old behavior can be done with following command line
options:
`--openacc-unwrap-fir-box=true --openacc-generate-default-bounds=true`