Summary:
This replaces the direct lowering of vector.outerproduct to LLVM with progressive lowering into elementary vectors ops to avoid having the similar lowering logic at several places.
NOTE1: with the new progressive rule, the lowered llvm is slightly more elaborate than with the direct lowering, but the generated assembly is just as optimized; still if we want to stay closer to the original, we should add a "broadcast on extract" to shuffle rewrite (rather than special cases all the lowering steps)
NOTE2: the original outerproduct lowering code should now be removed but some linalg test work directly on vector and contain some dead code, so this requires another CL
Reviewers: nicolasvasilache, andydavis1
Reviewed By: nicolasvasilache, andydavis1
Subscribers: mehdi_amini, rriddle, jpienaar, burmako, shauheen, antiagainst, nicolasvasilache, arpith-jacob, mgester, lucyrfox, liufengdb, Joonsoo, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D75956
Summary: This op mirrors the llvm.intr counterpart and allows lowering + type conversions in a progressive fashion.
Differential Revision: https://reviews.llvm.org/D75775
Summary:
The `vector.fma` operation is portable enough across targets that we do not want
to keep it wrapped under `vector.outerproduct` and `llvm.intrin.fmuladd`.
This revision lifts the op into the vector dialect and implements the lowering to LLVM by using two patterns:
1. a pattern that lowers from n-D to (n-1)-D by unrolling when n > 2
2. a pattern that converts from 1-D to the proper LLVM representation
Reviewers: ftynse, stellaraccident, aartbik, dcaballe, jsetoain, tetuante
Reviewed By: aartbik
Subscribers: fhahn, dcaballe, merge_guards_bot, mehdi_amini, rriddle, jpienaar, burmako, shauheen, antiagainst, arpith-jacob, mgester, lucyrfox, aartbik, liufengdb, Joonsoo, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D74075
Summary:
This revision exposes the portable `llvm.fma` intrinsic in LLVMOps and uses it
in lieu of `llvm.fmuladd` when lowering the `vector.outerproduct` op to LLVM.
This guarantees proper `fma` instructions will be emitted if the target ISA
supports it.
`llvm.fmuladd` does not have this guarantee in its semantics, despite evidence
that the proper x86 instructions are emitted.
For more details, see https://llvm.org/docs/LangRef.html#llvm-fmuladd-intrinsic.
Reviewers: ftynse, aartbik, dcaballe, fhahn
Reviewed By: aartbik
Subscribers: mehdi_amini, rriddle, jpienaar, burmako, shauheen, antiagainst, arpith-jacob, mgester, lucyrfox, liufengdb, Joonsoo, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D74219
Summary:
Rationale:
When lowering to LLVM for different rank insert (n vs k), the offset
arrays needs to drop one dimension (becomes n-1), but the strides
array needs to be preserved (remains k). With regression test.
Note that this example was actually in the documentation, so
extra important to do it right :-)
Reviewers: nicolasvasilache, andydavis1, ftynse
Reviewed By: nicolasvasilache, ftynse
Subscribers: Joonsoo, merge_guards_bot, mehdi_amini, rriddle, jpienaar, burmako, shauheen, antiagainst, arpith-jacob, mgester, lucyrfox, liufengdb, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D73733
Summary:
This diff implements the progressive lowering of insert_strided_slice.
Two cases appear:
1. when the source and dest vectors have different ranks, extract the dest
subvector at the proper offset and reduce to case 2.
2. when they have the same rank N:
a. if the source and dest type are the same, the insertion is trivial:
just forward the source
b. otherwise, iterate over all N-1 D subvectors and create an
extract/insert_strided_slice/insert replacement, reducing the problem
to vecotrs of the same N-1 rank.
This combines properly with the other conversion patterns to lower all the way to LLVM.
Reviewers: ftynse, rriddle, AlexEichenberger, andydavis1, tetuante, nicolasvasilache
Reviewed By: andydavis1
Subscribers: merge_guards_bot, mehdi_amini, jpienaar, burmako, shauheen, antiagainst, arpith-jacob, mgester, lucyrfox, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D72317
Summary:
This diff implements the progressive lowering of strided_slice to either:
1. extractelement + insertelement for the 1-D case
2. extract + optional strided_slice + insert for the n-D case.
This combines properly with the other conversion patterns to lower all the way to LLVM.
Appropriate tests are added.
Reviewers: ftynse, rriddle, AlexEichenberger, andydavis1, tetuante
Reviewed By: andydavis1
Subscribers: merge_guards_bot, mehdi_amini, jpienaar, burmako, shauheen, antiagainst, arpith-jacob, mgester, lucyrfox, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D72310
Introduces some centralized methods to move towards
consistent use of i32 as vector subscripts.
Note: sizes/strides/offsets attributes are still i64
PiperOrigin-RevId: 286434133
Similar to insert/extract vector instructions but
(1) work on 1-D vectors only
(2) allow for a dynamic index
%c3 = constant 3 : index
%0 = vector.insertelement %arg0, %arg1[%c : index] : vector<4xf32>
%1 = vector.extractelement %arg0[%c3 : index] : vector<4xf32>
PiperOrigin-RevId: 285792205
For example
%0 = vector.shuffle %x, %y [3 : i32, 2 : i32, 1 : i32, 0 : i32] : vector<2xf32>, vector<2xf32>
yields a vector<4xf32> result with a permutation of the elements of %x and %y
PiperOrigin-RevId: 284657191
Since these operations lower to [insert|extract][element|value] at LLVM
dialect level, neither element nor value would correctly reflect the meaning.
PiperOrigin-RevId: 284240727
This CL refactors some of the MLIR vector dependencies to allow decoupling VectorOps, vector analysis, vector transformations and vector conversions from each other.
This makes the system more modular and allows extracting VectorToVector into VectorTransforms that do not depend on vector conversions.
This refactoring exhibited a bunch of cyclic library dependencies that have been cleaned up.
PiperOrigin-RevId: 283660308
This is step 1/n in refactoring infrastructure along the Vector dialect to make it ready for retargetability and composable progressive lowering.
PiperOrigin-RevId: 280529784
Following up on the consolidation of MemRef descriptor conversion, update
Vector-to-LLVM conversion to use the helper class that abstracts away the
implementation details of the MemRef descriptor. This also makes the types of
the attributes in emitted llvm.insert/extractelement operations consistently
i64 instead of a mix of index and i64.
PiperOrigin-RevId: 280441451
This CL moves VectorOps to Tablegen and cleans up the implementation.
This is almost NFC but 2 changes occur:
1. an interface change occurs in the padding value specification in vector_transfer_read:
the value becomes non-optional. As a shortcut we currently use %f0 for all paddings.
This should become an OpInterface for vectorization in the future.
2. the return type of vector.type_cast is trivial and simplified to `memref<vector<...>>`
Relevant roundtrip and invalid tests that used to sit in core are moved to the vector dialect.
The op documentation is moved to the .td file.
PiperOrigin-RevId: 280430869
This CL adds an extra pointer to the memref descriptor to allow specifying alignment.
In a previous implementation, we used 2 types: `linalg.buffer` and `view` where the buffer type was the unit of allocation/deallocation/alignment and `view` was the unit of indexing.
After multiple discussions it was decided to use a single type, which conflates both, so the memref descriptor now needs to carry both pointers.
This is consistent with the [RFC-Proposed Changes to MemRef and Tensor MLIR Types](https://groups.google.com/a/tensorflow.org/forum/#!searchin/mlir/std.view%7Csort:date/mlir/-wKHANzDNTg/4K6nUAp8AAAJ).
PiperOrigin-RevId: 279959463
A VectorTypeCastOp can only be used to lower between statically sized contiguous memrefs of scalar and matching vector type. The sizes and strides are thus fully static and easy to determine.
A relevant test is added.
This is a step towards solving tensorflow/mlir#189.
PiperOrigin-RevId: 275538981
Some of the operations in the LLVM dialect are required to model the LLVM IR in
MLIR, for example "constant" operations are needed to declare a constant value
since MLIR, unlike LLVM, does not support immediate values as operands. To
avoid confusion with actual LLVM operations, we prefix such axuiliary
operations with "mlir.".
PiperOrigin-RevId: 266942838
LLVM intrinsics have an open name space and their names can potentially overlap
with names of LLVM instructions (LLVM intrinsics are functions, not
instructions). In MLIR, LLVM intrinsics are modeled as operations, so it needs
to make sure their names cannot clash with the instructions. Use the "intr."
prefix for intrinsics in the LLVM dialect.
PiperOrigin-RevId: 264372173
This CL adds an optional third argument to the vector.outerproduct instruction.
When such a third argument is specified, it is added to the result of the outerproduct and is lowered to FMA intrinsic when the lowering supports it.
In the future, we can add an attribute on the `vector.outerproduct` instruction to modify the operations for which to emit code (e.g. "+/*", "max/+", "min/+", "log/exp" ...).
This CL additionally performs minor cleanups in the vector lowering and adds tests to improve coverage.
This has been independently verified to result in proper fma instructions for haswell as follows.
Input:
```
func @outerproduct_add(%arg0: vector<17xf32>, %arg1: vector<8xf32>, %arg2: vector<17x8xf32>) -> vector<17x8xf32> {
%2 = vector.outerproduct %arg0, %arg1, %arg2 : vector<17xf32>, vector<8xf32>
return %2 : vector<17x8xf32>
}
}
```
Command:
```
mlir-opt vector-to-llvm.mlir -vector-lower-to-llvm-dialect --disable-pass-threading | mlir-opt -lower-to-cfg -lower-to-llvm | mlir-translate --mlir-to-llvmir | opt -O3 | llc -O3 -march=x86-64 -mcpu=haswell -mattr=fma,avx2
```
Output:
```
outerproduct_add: # @outerproduct_add
# %bb.0:
...
vmovaps 112(%rbp), %ymm8
vbroadcastss %xmm0, %ymm0
...
vbroadcastss 64(%rbp), %ymm15
vfmadd213ps 144(%rbp), %ymm8, %ymm0 # ymm0 = (ymm8 * ymm0) + mem
...
vfmadd213ps 400(%rbp), %ymm8, %ymm9 # ymm9 = (ymm8 * ymm9) + mem
...
```
PiperOrigin-RevId: 263743359
This CL is step 3/n towards building a simple, programmable and portable vector abstraction in MLIR that can go all the way down to generating assembly vector code via LLVM's opt and llc tools.
This CL adds support for converting MLIR n-D vector types to (n-1)-D arrays of 1-D LLVM vectors and a conversion VectorToLLVM that lowers the `vector.extractelement` and `vector.outerproduct` instructions to the proper mix of `llvm.vectorshuffle`, `llvm.extractelement` and `llvm.mulf`.
This has been independently verified to produce proper avx2 code.
Input:
```
func @vec_1d(%arg0: vector<4xf32>, %arg1: vector<8xf32>) -> vector<8xf32> {
%2 = vector.outerproduct %arg0, %arg1 : vector<4xf32>, vector<8xf32>
%3 = vector.extractelement %2[0 : i32]: vector<4x8xf32>
return %3 : vector<8xf32>
}
```
Command:
```
mlir-opt vector-to-llvm.mlir -vector-lower-to-llvm-dialect --disable-pass-threading | mlir-opt -lower-to-cfg -lower-to-llvm | mlir-translate --mlir-to-llvmir | opt -O3 | llc -O3 -march=x86-64 -mcpu=haswell -mattr=fma,avx2
```
Output:
```
vec_1d: # @vec_1d
# %bb.0:
vbroadcastss %xmm0, %ymm0
vmulps %ymm1, %ymm0, %ymm0
retq
```
PiperOrigin-RevId: 262895929