In case the distributed dim of the dest vector is also a dim of the src vector, each lane inserts a smaller part of the source vector. Otherwise, one lane inserts the entire src vector and the other lanes do nothing.
Differential Revision: https://reviews.llvm.org/D137953
In case of a distribution, only one lane inserts the scalar value. In case of a broadcast, every lane inserts the scalar.
Differential Revision: https://reviews.llvm.org/D137929
Ops such as `%1 = vector.extract %0[2] : vector<5x96xf32>`.
Distribute the source vector, then extract. In case of a 1d extract, rewrite to vector.extractelement.
Differential Revision: https://reviews.llvm.org/D137646
When establishing the correspondence between transform values and
payload operations or parameters, check that the latter are non-null and
report errors. This was previously allowed for exotic cases of partially
successfull transformations with "apply each" trait, but was dangerous.
The "apply each" implementation was reworked to remove the need for this
functionality, so this can now be hardned to avoid null pointer
dereferences.
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D141142
Conservatively only allow inlining for loads and stores that don't carry
any attributes that require handling while inlining. This can later be
relaxed when proper handling is introduced.
Reviewed By: Dinistro, gysit
Differential Revision: https://reviews.llvm.org/D141115
Add a folder for LogicalNotEqual when rhs is false. This pattern shows
up after lowering to SPIRV.
Differential Revision: https://reviews.llvm.org/D141163
As of several months ago, both ArithToLLVM and ArithToSPIRV have
native support for integer min and max operations. Since these are all
the targets available in MLIR core, the need to "expand" arith.minui,
arith.minsi, arith,maxsi, and arith.manxui to more primitive
operations is to longer present.
Therefore, the expanding of integer min and max operations in Arith,
while correct, is likely to lead to performance loss by way of
misoptimization further down the line, and is no longer needed for
anyone's correctness.
This change may break downstream tests, but will not affect the
semantics of MLIR programs.
arith.minf and arith.maxf have a lot of underlying complexity due to
the many different possible NaN and signed zero semantics available on
various platforms, and so removing their expansion is left to a future
commit.
Reviewed By: ThomasRaoux, Mogball
Differential Revision: https://reviews.llvm.org/D140856
This change introduces new LLVMIR dialect operations to represent
TBAA root, type descriptor and access tag metadata nodes.
For the purpose of importing TBAA metadata from LLVM IR it only
supports the current version of TBAA format described in
https://llvm.org/docs/LangRef.html#tbaa-metadata (i.e. size-aware
representation introduced in D41501 is not supported).
TBAA attribute support is only added for LLVM::LoadOp and LLVM::StoreOp.
Support for intrinsics operations (e.g. LLVM::MemcpyOp) may be added later.
The TBAA attribute is represented as an array of access tags, though,
LLVM IR supports only single access tag per memory accessing instruction.
I implemented it as an array anticipating similar support in LLVM IR
to combine TBAA graphs with different roots for Flang - one of the options
described in https://docs.google.com/document/d/16kKZVmI585wth01VSaJAqZMZpoX68rcdBmgfj0kNAt0/edit#heading=h.jzzheaz9vqac
It should be easy to restrict MLIR operation to a single access tag,
if we end up using a different approach for Flang.
Differential Revision: https://reviews.llvm.org/D140768
In several cases, the splitting may be known to be a noop, i.e., produce
no second part. Thread this information through the transform utilities
to the transform dialect, and differentiate it from the error state.
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D141138
Adapt the implementation of TransformEachOpTrait to the existence of
parameter values recently introduced into the transform dialect. In
particular, allow `applyToOne` hooks to return a list containing a mix
of `Operation *` that will be associated with handles and `Attribute`
that will be associated with parameter values by the trait
implementation of the transform interface's `apply` method.
Disentangle the "transposition" of the list of per-payload op partial
results to decrease its overall complexity and detemplatize the code
that doesn't really need templates. This removes the poorly documented
special handling for single-result ops with TransformEachOpTrait that
could have assigned null pointer values to handles.
Reviewed By: springerm
Differential Revision: https://reviews.llvm.org/D140979
This makes it more consistent with the recently added
TransformParamTypeInterface.
Reviewed By: springerm
Differential Revision: https://reviews.llvm.org/D140977
Introduce a new kind of values into the transform dialect -- parameter
values. These values have a type implementing the new
`TransformParamTypeInterface` and are associated with lists of
attributes rather than lists of payload operations. This mechanism
allows one to wrap numeric calculations, typically heuristics, into
transform operations separate from those at actually applying the
transformation. For example, tile size computation can be now separated
from tiling itself, and not hardcoded in the transform dialect. This
further improves the separation of concerns between transform choice and
implementation.
Reviewed By: springerm
Differential Revision: https://reviews.llvm.org/D140976
This pattern is similar to `FoldFillWithTensorReshape`, which performs the same swapping with reshapes.
Fill the smaller extracted tensor slice instead of `x`. This allows for additional simplifications in case `x` is the result of another extract_slice.
Differential Revision: https://reviews.llvm.org/D141117
Fix an off-by-one error in extended umul extension for WebGPU.
Revert to the long multiplication algorithm originally added to wide
integer emulation, which was deleted in D139776. It is much easier
to see why it is correct.
Add runtime tests based on the mlir-vulkan-runner. These run both with
and without umul extension.
Issue: https://github.com/llvm/llvm-project/issues/59563
Reviewed By: antiagainst
Differential Revision: https://reviews.llvm.org/D141085
A transformation tiling a reduction dimension of a Linalg op needs a
tile size for said dimension. When an insufficient number of dimensions
was provided, it would segfault due to out-of-bounds access to a vector.
Also fix incorrect error reporting in the structured transform op
exercising this functionality.
Reviewed By: springerm, ThomasRaoux
Differential Revision: https://reviews.llvm.org/D141046
Use an array of structures to represent the indices for the tailing COO region
of a sparse tensor.
Reviewed By: aartbik
Differential Revision: https://reviews.llvm.org/D140870
Collapsing / expanding a splatted value can be replaced with a single `tensor.splat` operation. Replace
these cases with a simple `tensor.splat` operation.
Reviewed By: rsuderman
Differential Revision: https://reviews.llvm.org/D140552
This is needed because WGSL does not yet support extended multiplication
ops.
Set up pattern/pass stuff and handle the first op: `UMulExtended`.
`SMulExtended` handling will go to a separate patch.
Issue: https://github.com/llvm/llvm-project/issues/59563
Reviewed By: antiagainst
Differential Revision: https://reviews.llvm.org/D140995
This new option is set to `false` by default. It should be set only in Canonicalizer tests to detect faulty canonicalization patterns. I.e., patterns that prevent the canonicalizer from converging. The canonicalizer should always convergence on such small unit tests that we have in `canonicalize.mlir`.
Two faulty canonicalization patterns were detected and fixed with this change.
Differential Revision: https://reviews.llvm.org/D140873
The decomposition was no longer correct for transpose_conv2d to conv2d
after the updated TOSA specification. Specifically the behavior for
padding was changed to refer to padding the tranpsose_conv2d instead
of referencing the conv applied to the inverse transform.
Test was validated using the TOSA conformance tests.
Reviewed By: NatashaKnk
Differential Revision: https://reviews.llvm.org/D140503
Added tosa.transpose canonicalization for case where a tosa.transpose is
equivalent to a tosa.reshape. This occurs when the permutation does not
permutate non-unary dimensions.
Reviewed By: NatashaKnk
Differential Revision: https://reviews.llvm.org/D140356
There's currently no way to get accurate cube roots in the math dialect.
powf(x, 1/3.0) is too inaccurate in some cases.
Reviewed By: akuegel
Differential Revision: https://reviews.llvm.org/D140842
Fix affine LICM pass for unknown region-holding ops. The logic was
completely ignoring regions of unknown ops leading to generation of
invalid IR on hoisting. Handle affine.parallel op among those with
regions that are supported.
Reviewed By: ftynse
Differential Revision: https://reviews.llvm.org/D140738
This revision folds arith.shrui, arith.shrsi and arith.shli with zero
rhs to lhs.
Reviewed By: ftynse
Differential Revision: https://reviews.llvm.org/D140749
We were missing to check for transpose when folding.
Also add a new file to test folding independently of
canonicalization as canonicalization was hiding the bug.
Reviewed By: aartbik
Differential Revision: https://reviews.llvm.org/D140533