Rationale:
The LLVM dialect supports passing fastmath flags from floating point ops to LLVMIR instructions. However, not all LLVM ops have the required attribute. This change adds support for fastmath flags to `llvm.intr.vector.reduce.{fmin,fmax}`. One scenario where this is useful is in lowering llvm.intr.vector.reduce.{fmax,fmin} to LLVMIR with `nnan` (NoNans) flag so it may be [[ 115c7beda7/llvm/lib/CodeGen/ExpandReductions.cpp (L159) | lowered to a shuffle reduction ]].
Changes:
- Make `LLVM_VecReductionF` implement the `FastmathFlagsInterface`; change is modeled on `LLVM_UnaryIntrOpF`
- Add an assembly format for `LLVM_VecReductionF` ops. The purpose is to keep existing functionality: avoid printing the fastmath flags attribute when it has its default value (`none`). Change is modeled on `LLVM_UnaryIntrOpBase`
Reviewed By: gysit
Differential Revision: https://reviews.llvm.org/D145692
This patch adds masking support for more contraction flavors including those
with any combiner operation (add, mul, min, max, and, or, etc.) and
regular matmul contractions.
Combiner operations that are performing vertical reductions (and,
therefore, they are not represented with a horizontal reduction
operation) can be executed unmasked. However, the previous value of
the accumulator must be propagated for lanes that shouldn't accumulate.
We achieve this goal by introducing a select operation after the
accumulator to choose between the combined and the previous accumulator
value. This design decision is made to avoid introducing masking support
to all the arithmetic and logical operations in the Arith dialect. VP
intrinsics do not support pass-thru values either so we would have to
generate the same sequence when lowering to LLVM. The op + select
pattern is peepholed by some backend with native masking support for those
operations.
Consequently, this patch removes masking support from the vector.fma
operation to follow the same approach for all the combiner operations.
Reviewed By: ThomasRaoux
Differential Revision: https://reviews.llvm.org/D144239
This patch adds support for masking vector.contract ops with the
vector.mask approach. This also includes the lowering of vector.contract
through the vector.outerproduct path to LLVM. For now, this only adds
support for one of the many potential flavors of
vector.contract/vector.outerproduct but unsupported cases will fail
gratefully.
Reviewed By: ThomasRaoux
Differential Revision: https://reviews.llvm.org/D143965
This patch adds the conversion patterns to lower masked reduction
operations to the corresponding vp intrinsics in LLVM.
Reviewed By: ftynse
Differential Revision: https://reviews.llvm.org/D142177
This is generated by running
```
sed --in-place 's/[[:space:]]\+$//' mlir/**/*.td
sed --in-place 's/[[:space:]]\+$//' mlir/**/*.mlir
```
Reviewed By: rriddle, dcaballe
Differential Revision: https://reviews.llvm.org/D138866
This is required for D126305 code to propagate fastmath attributes
for Arith operations that are converted to LLVM IR intrinsics
operations.
LLVM IR intrinsic operations are using custom assembly format now
to avoid printing {fastmathFlags = #llvm.fastmath<none>}, which
is too verbose.
Reviewed By: rriddle
Differential Revision: https://reviews.llvm.org/D136225
Currently vector.gather only supports reading memory into a 1-D result vector.
This patch extends it to support an n-D result vector with the indices, masks,
and passthroughs in n-D vectors.
As we are trying to vectorize tensor.extract with vector.gather
(https://github.com/iree-org/iree/issues/9198), it will need to gather the
elements into an n-D vector. Having vector.gather with n-D results allows us
to avoid flatten and reshape at the vectorization stage. The backends can then
decide the optimal ways to lower the vector.gather op.
Note that this is different from n-D gathering, which is about reading n-D
memory with the n-D indices. The indices here are still only 1-D offsets on
the base.
Reviewed By: dcaballe
Differential Revision: https://reviews.llvm.org/D131905
This patch moves `LLVM::ShuffleVectorOp` to assembly format and in the
process drops the extra type that can be inferred (both operand types
are required to be the same) and switches to a dense integer array.
The syntax change:
```
// Before
%0 = llvm.shufflevector %0, %1 [0 : i32, 0 : i32, 0 : i32, 0 : i32] : vector<4xf32>, vector<4xf32>
// After
%0 = llvm.shufflevector %0, %1 [0, 0, 0, 0] : vector<4xf32>
```
Reviewed By: dcaballe
Differential Revision: https://reviews.llvm.org/D132038
This commit adds support for 0-D vectors in ReductionOp.
Reviewed By: nicolasvasilache, dcaballe
Differential Revision: https://reviews.llvm.org/D131896
Adding the accumulator value after the `vector.contract` changes the
precision of the operation. This makes sure the accumulator is carried
through to `vector.reduce` (and down to LLVM).
Differential Revision: https://reviews.llvm.org/D128674
For example, we could do the following eliminations:
fold vector.shuffle V1, V2, [0, 1, 2, 3] : <4xi32>, <2xi32> -> V1
fold vector.shuffle V1, V2, [4, 5] : <4xi32>, <2xi32> -> V2
Differential Revision: https://reviews.llvm.org/D122706
We are using "enable-index-optimizations" and "indexOptimizations" as
names for an optimization that consists of using i32 for indices within
a vector. For instance, when building a vector comparison for mask
generation. The name is confusing and suggests a scope beyond these
vector indices. This change makes the function of the option explicit
in its name.
Differential Revision: https://reviews.llvm.org/D122415
The way vector.create_mask is currently lowered is
vector-length-dependent, and therefore incompatible with scalable vector
types. This patch adds an alternative lowering path for create_mask
operations that return a scalable vector mask.
Differential Revision: https://reviews.llvm.org/D118248
Currently, the transfer mask is materialized by generating the vector
comparison: [offset + 0, .., offset + length - 1] < [dim, .., dim]
A better alternative is to materialize the transfer mask by using the
operation: `vector.create_mask (dim - offset)`, which will generate
simpler code and compose better with scalable vectors.
Differential Revision: https://reviews.llvm.org/D120487
This is part of the larger effort to split the standard dialect. This will also allow for pruning some
additional dependencies on Standard (done in a followup).
Differential Revision: https://reviews.llvm.org/D118202
The 0-D case gets lowered in almost the same way that the 1-D case does
in VectorCreateMaskOpConversion. I also had to slightly update the
verifier for the op to always require exactly 1 operand in the 0-D case.
Depends On D115220
Reviewed by: ftynse
Differential revision: https://reviews.llvm.org/D115221
To support creating both a mask with just a single `true` and `false` values,
I had to relax the restriction in the verifier that the rank is always equal to
the length of the attribute array, in other words, we now allow:
- `vector.constant_mask [0] : vector<i1>` which gets lowered to
`arith.constant dense<false> : vector<i1>`
- `vector.constant_mask [1] : vector<i1>` which gets lowered to
`arith.constant dense<true> : vector<i1>`
(the attribute list for the 0-D case must be a singleton containing
either `0` or `1`)
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D115023
The implementation only allows to bit-cast between two 0-D vectors. We could
probably support casting from/to vectors like `vector<1xf32>`, but I wasn't
convinced that this would be important and it would require breaking the
invariant that `BitCastOp` works only on vectors with equal rank.
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D114854
This reverts commit 29a50c5864.
After LLVM lowering, the original patch incorrectly moved alignment
information across an unconstrained GEP operation. This is only correct
for some index offsets in the GEP. It seems that the best approach is,
in fact, to rely on LLVM to propagate information from the llvm.assume()
to users.
Thanks to Thomas Raoux for catching this.
This changes the op to produce `AnyVectorOfAnyRank` following mostly the code for 1-D vectors.
Depends On D114598
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D114550
This revision makes concrete use of 0-d vectors to extend the semantics of
InsertElementOp.
Reviewed By: dcaballe, pifon2a
Differential Revision: https://reviews.llvm.org/D114388
This revision starts making concrete use of 0-d vectors to extend the semantics of
ExtractElementOp.
In the process a new VectorOfAnyRank Tablegen OpBase.td is added to allow progressive transition to supporting 0-d vectors by gradually opting in.
Differential Revision: https://reviews.llvm.org/D114387
`vector::InsertElementOp` and `vector::ExtractElementOp` have had their `position`
operand changed to accept `AnySignlessIntegerOrIndex` for better operability with
operations that use `index`, such as affine loops.
LLVM's `extractelement` and `insertelement` can also accept `i64`, so lowering
directly to these operations without explicitly inserting casts is allowed. SPIRV's
equivalent ops can also accept `i64`.
Reviewed By: nicolasvasilache, jpienaar
Differential Revision: https://reviews.llvm.org/D114139
FMAOp -> LLVM conversion is done progressively by peeling off 1 dimension from FMAOp at each pattern iteration. Add the recursively bounded property declaration to the pattern so that the rewriter can apply it multiple times.
Without this, FMAOps with 3+D do not lower to LLVM.
Differential Revision: https://reviews.llvm.org/D113886
This also fixes the vector.shuffle C++ builder which had an incorrect type assumption that triggers with this new rewrite.
The vector.shuffle semantics were correct though.
Differential revision: https://reviews.llvm.org/D112578
The current implementation invokes materializations
whenever an input operand does not have a mapping for the
desired type, i.e. it requires materialization at the earliest possible
point. This conflicts with goal of dialect conversion (and also the
current documentation) which states that a materialization is only
required if the materialization is supposed to persist after the
conversion process has finished.
This revision refactors this such that whenever a target
materialization "might" be necessary, we insert an
unrealized_conversion_cast to act as a temporary materialization.
This allows for deferring the invocation of the user
materialization hooks until the end of the conversion process,
where we actually have a better sense if it's actually
necessary. This has several benefits:
* In some cases a target materialization hook is no longer
necessary
When performing a full conversion, there are some situations
where a temporary materialization is necessary. Moving forward,
these users won't need to provide any target materializations,
as the temporary materializations do not require the user to
provide materialization hooks.
* getRemappedValue can now handle values that haven't been
converted yet
Before this commit, it wasn't well supported to get the remapped
value of a value that hadn't been converted yet (making it
difficult/impossible to convert multiple operations in many
situations). This commit updates getRemappedValue to properly
handle this case by inserting temporary materializations when
necessary.
Another code-health related benefit is that with this change we
can move a majority of the complexity related to materializations
to the end of the conversion process, instead of handling adhoc
while conversion is happening.
Differential Revision: https://reviews.llvm.org/D111620
Precursor: https://reviews.llvm.org/D110200
Removed redundant ops from the standard dialect that were moved to the
`arith` or `math` dialects.
Renamed all instances of operations in the codebase and in tests.
Reviewed By: rriddle, jpienaar
Differential Revision: https://reviews.llvm.org/D110797
Historically the builtin dialect has had an empty namespace. This has unfortunately created a very awkward situation, where many utilities either have to special case the empty namespace, or just don't work at all right now. This revision adds a namespace to the builtin dialect, and starts to cleanup some of the utilities to no longer handle empty namespaces. For now, the assembly form of builtin operations does not require the `builtin.` prefix. (This should likely be re-evaluated though)
Differential Revision: https://reviews.llvm.org/D105149
This simplifies the vector to LLVM lowering. Previously, both vector.load/store and vector.transfer_read/write lowered directly to LLVM. With this commit, there is a single path to LLVM vector load/store instructions and vector.transfer_read/write ops must first be lowered to vector.load/store ops.
* Remove vector.transfer_read/write to LLVM lowering.
* Allow non-unit memref strides on all but the most minor dimension for vector.load/store ops.
* Add maxTransferRank option to populateVectorTransferLoweringPatterns.
* vector.transfer_reads with changing element type can no longer be lowered to LLVM. (This functionality is needed only for SPIRV.)
Differential Revision: https://reviews.llvm.org/D106118
The dialect-specific cast between builtin (ex-standard) types and LLVM
dialect types was introduced long time before built-in support for
unrealized_conversion_cast. It has a similar purpose, but is restricted
to compatible builtin and LLVM dialect types, which may hamper
progressive lowering and composition with types from other dialects.
Replace llvm.mlir.cast with unrealized_conversion_cast, and drop the
operation that became unnecessary.
Also make unrealized_conversion_cast legal by default in
LLVMConversionTarget as the majority of convesions using it are partial
conversions that actually want the casts to persist in the IR. The
standard-to-llvm conversion, which is still expected to run last, cleans
up the remaining casts standard-to-llvm conversion, which is still
expected to run last, cleans up the remaining casts
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D105880
Simplify vector unrolling pattern to be more aligned with rest of the
patterns and be closer to vector distribution.
The new implementation uses ExtractStridedSlice/InsertStridedSlice
instead of the Tuple ops. After this change the ops based on Tuple don't
have any more used so they can be removed.
This allows removing signifcant amount of dead code and will allow
extending the unrolling code going forward.
Differential Revision: https://reviews.llvm.org/D105381
vector.transfer_read and vector.transfer_write operations are converted
to llvm intrinsics with specific alignment information, however there
doesn't seem to be a way in llvm to take information from llvm.assume
intrinsics and change this alignment information. In any
event, due the to the structure of the llvm.assume instrinsic, applying
this information at the llvm level is more cumbersome. Instead, let's
generate the masked vector load and store instrinsic with the right
alignment information from MLIR in the first place. Since
we're bothering to do this, lets just emit the proper alignment for
loads, stores, scatter, and gather ops too.
Differential Revision: https://reviews.llvm.org/D100444
The patch enables the use of index type in vectors. It is a prerequisite to support vectorization for indexed Linalg operations. This refactoring became possible due to the newly introduced data layout infrastructure. The data layout of a module defines the bitwidth of the index type needed to verify bitcasts and similar vector operations.
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D99948
Also factors out out-of-bounds mask generation from vector.transfer_read/write into a new MaterializeTransferMask pattern.
Differential Revision: https://reviews.llvm.org/D100001
This is in preparation for adding a new "mask" operand. The existing "masked" attribute was used to specify dimensions that may be out-of-bounds. Such transfers can be lowered to masked load/stores. The new "in_bounds" attribute is used to specify dimensions that are guaranteed to be within bounds. (Semantics is inverted.)
Differential Revision: https://reviews.llvm.org/D99639