clang-p2996

Author	SHA1	Message	Date
aartbik	078776a679	[mlir] [VectorOps] Progressively lower vector.outerproduct to LLVM Summary: This replaces the direct lowering of vector.outerproduct to LLVM with progressive lowering into elementary vectors ops to avoid having the similar lowering logic at several places. NOTE1: with the new progressive rule, the lowered llvm is slightly more elaborate than with the direct lowering, but the generated assembly is just as optimized; still if we want to stay closer to the original, we should add a "broadcast on extract" to shuffle rewrite (rather than special cases all the lowering steps) NOTE2: the original outerproduct lowering code should now be removed but some linalg test work directly on vector and contain some dead code, so this requires another CL Reviewers: nicolasvasilache, andydavis1 Reviewed By: nicolasvasilache, andydavis1 Subscribers: mehdi_amini, rriddle, jpienaar, burmako, shauheen, antiagainst, nicolasvasilache, arpith-jacob, mgester, lucyrfox, liufengdb, Joonsoo, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D75956	2020-03-12 13:45:42 -07:00
Nicolas Vasilache	63b683a816	[mlir][Vector] Add a vector.matrix_multiply op on 1-D vectors Summary: This op mirrors the llvm.intr counterpart and allows lowering + type conversions in a progressive fashion. Differential Revision: https://reviews.llvm.org/D75775	2020-03-09 13:34:03 -04:00
aartbik	e83b7b99da	[mlir] [VectorOps] Implement vector.reduce operation Summary: This new operation operates on 1-D vectors and forms the bridge between vector.contract and llvm intrinsics for vector reductions. Reviewers: nicolasvasilache, andydavis1, ftynse Reviewed By: nicolasvasilache Subscribers: mehdi_amini, rriddle, jpienaar, burmako, shauheen, antiagainst, nicolasvasilache, arpith-jacob, mgester, lucyrfox, liufengdb, Joonsoo, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D74370	2020-02-11 11:31:59 -08:00
Nicolas Vasilache	681f929f59	[mlir][VectorOps] Introduce a `vector.fma` op that works on n-D vectors and lowers to `llvm.intrin.fmuladd` Summary: The `vector.fma` operation is portable enough across targets that we do not want to keep it wrapped under `vector.outerproduct` and `llvm.intrin.fmuladd`. This revision lifts the op into the vector dialect and implements the lowering to LLVM by using two patterns: 1. a pattern that lowers from n-D to (n-1)-D by unrolling when n > 2 2. a pattern that converts from 1-D to the proper LLVM representation Reviewers: ftynse, stellaraccident, aartbik, dcaballe, jsetoain, tetuante Reviewed By: aartbik Subscribers: fhahn, dcaballe, merge_guards_bot, mehdi_amini, rriddle, jpienaar, burmako, shauheen, antiagainst, arpith-jacob, mgester, lucyrfox, aartbik, liufengdb, Joonsoo, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D74075	2020-02-07 15:44:53 -05:00
Nicolas Vasilache	499ad45877	[mlir][VectorOps] Expose and use llvm.intrin.fma* Summary: This revision exposes the portable `llvm.fma` intrinsic in LLVMOps and uses it in lieu of `llvm.fmuladd` when lowering the `vector.outerproduct` op to LLVM. This guarantees proper `fma` instructions will be emitted if the target ISA supports it. `llvm.fmuladd` does not have this guarantee in its semantics, despite evidence that the proper x86 instructions are emitted. For more details, see https://llvm.org/docs/LangRef.html#llvm-fmuladd-intrinsic. Reviewers: ftynse, aartbik, dcaballe, fhahn Reviewed By: aartbik Subscribers: mehdi_amini, rriddle, jpienaar, burmako, shauheen, antiagainst, arpith-jacob, mgester, lucyrfox, liufengdb, Joonsoo, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D74219	2020-02-07 15:38:40 -05:00
aartbik	e52414b1ae	[mlir][VectorOps] Generalized vector.print to i32/i64 Summary: Lowering to LLVM IR was restricted to float/double. This CL also adds the integral values. Reviewers: andydavis1, nicolasvasilache, ftynse Reviewed By: nicolasvasilache, ftynse Subscribers: mehdi_amini, rriddle, jpienaar, burmako, shauheen, antiagainst, nicolasvasilache, arpith-jacob, mgester, lucyrfox, liufengdb, Joonsoo, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D74179	2020-02-07 09:25:30 -08:00
aartbik	c8fc76a99b	[mlir] [VectorOps] fixed bug in vector.insert_strided_slice lowering Summary: Rationale: When lowering to LLVM for different rank insert (n vs k), the offset arrays needs to drop one dimension (becomes n-1), but the strides array needs to be preserved (remains k). With regression test. Note that this example was actually in the documentation, so extra important to do it right :-) Reviewers: nicolasvasilache, andydavis1, ftynse Reviewed By: nicolasvasilache, ftynse Subscribers: Joonsoo, merge_guards_bot, mehdi_amini, rriddle, jpienaar, burmako, shauheen, antiagainst, arpith-jacob, mgester, lucyrfox, liufengdb, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D73733	2020-01-31 11:29:46 -08:00
aartbik	459cf6e500	[mlir] [VectorOps] Lowering of vector.extract/insert_slices to LLVM IR Summary: Uses progressive lowering to convert vector.extract_slices and vector_insert_slices to equivalent vector operations that can be subsequently lowered into LLVM. Reviewers: nicolasvasilache, andydavis1, rriddle Reviewed By: nicolasvasilache, rriddle Subscribers: merge_guards_bot, mehdi_amini, rriddle, jpienaar, burmako, shauheen, antiagainst, nicolasvasilache, arpith-jacob, mgester, lucyrfox, liufengdb, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D72808	2020-01-27 10:35:48 -08:00
Nicolas Vasilache	2d515e49d8	[mlir][VectorOps] Implement insert_strided_slice conversion Summary: This diff implements the progressive lowering of insert_strided_slice. Two cases appear: 1. when the source and dest vectors have different ranks, extract the dest subvector at the proper offset and reduce to case 2. 2. when they have the same rank N: a. if the source and dest type are the same, the insertion is trivial: just forward the source b. otherwise, iterate over all N-1 D subvectors and create an extract/insert_strided_slice/insert replacement, reducing the problem to vecotrs of the same N-1 rank. This combines properly with the other conversion patterns to lower all the way to LLVM. Reviewers: ftynse, rriddle, AlexEichenberger, andydavis1, tetuante, nicolasvasilache Reviewed By: andydavis1 Subscribers: merge_guards_bot, mehdi_amini, jpienaar, burmako, shauheen, antiagainst, arpith-jacob, mgester, lucyrfox, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D72317	2020-01-09 03:13:01 -05:00
Nicolas Vasilache	65678d9384	[mlir][VectorOps] Implement strided_slice conversion Summary: This diff implements the progressive lowering of strided_slice to either: 1. extractelement + insertelement for the 1-D case 2. extract + optional strided_slice + insert for the n-D case. This combines properly with the other conversion patterns to lower all the way to LLVM. Appropriate tests are added. Reviewers: ftynse, rriddle, AlexEichenberger, andydavis1, tetuante Reviewed By: andydavis1 Subscribers: merge_guards_bot, mehdi_amini, jpienaar, burmako, shauheen, antiagainst, arpith-jacob, mgester, lucyrfox, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D72310	2020-01-09 03:03:51 -05:00
Aart Bik	1d47564a53	[VectorOps] unify vector dialect "subscripts" PiperOrigin-RevId: 286650682	2019-12-20 15:33:04 -08:00
Aart Bik	15f800f4bc	[VectorOps] minor cleanup: vector dialect "subscripts" are i32 Introduces some centralized methods to move towards consistent use of i32 as vector subscripts. Note: sizes/strides/offsets attributes are still i64 PiperOrigin-RevId: 286434133	2019-12-19 11:51:08 -08:00
Aart Bik	d9b500d3bb	[VectorOps] Add vector.print definition, with lowering support Examples: vector.print %f : f32 vector.print %x : vector<4xf32> vector.print %y : vector<3x4xf32> vector.print %z : vector<2x3x4xf32> LLVM lowering replaces these with fully unrolled calls into a small runtime support library that provides some basic printing operations (single value, opening closing bracket, comma, newline). PiperOrigin-RevId: 286230325	2019-12-18 11:31:34 -08:00
Aart Bik	cd5dab8ad7	[VectorOps] Add [insert/extract]element definition together with lowering to LLVM Similar to insert/extract vector instructions but (1) work on 1-D vectors only (2) allow for a dynamic index %c3 = constant 3 : index %0 = vector.insertelement %arg0, %arg1[%c : index] : vector<4xf32> %1 = vector.extractelement %arg0[%c3 : index] : vector<4xf32> PiperOrigin-RevId: 285792205	2019-12-16 09:52:46 -08:00
Aart Bik	1c81adf362	[VectorOps] Add lowering of vector.shuffle to LLVM IR For example, a shuffle %1 = vector.shuffle %arg0, %arg1 [0 : i32, 1 : i32] : vector<2xf32>, vector<2xf32> becomes a direct LLVM shuffle 0 = llvm.shufflevector %arg0, %arg1 [0 : i32, 1 : i32] : !llvm<"<2 x float>">, !llvm<"<2 x float>"> but %1 = vector.shuffle %a, %b[1 : i32, 0 : i32, 2: i32] : vector<1x4xf32>, vector<2x4xf32> becomes the more elaborate (note the index permutation that drives argument selection for the extract operations) %0 = llvm.mlir.undef : !llvm<"[3 x <4 x float>]"> %1 = llvm.extractvalue %arg1[0] : !llvm<"[2 x <4 x float>]"> %2 = llvm.insertvalue %1, %0[0] : !llvm<"[3 x <4 x float>]"> %3 = llvm.extractvalue %arg0[0] : !llvm<"[1 x <4 x float>]"> %4 = llvm.insertvalue %3, %2[1] : !llvm<"[3 x <4 x float>]"> %5 = llvm.extractvalue %arg1[1] : !llvm<"[2 x <4 x float>]"> %6 = llvm.insertvalue %5, %4[2] : !llvm<"[3 x <4 x float>]"> PiperOrigin-RevId: 285268164	2019-12-12 14:11:56 -08:00
Aart Bik	9826fe5c9f	[VectorOps] Add lowering of vector.insert to LLVM IR For example, an insert %0 = vector.insert %arg0, %arg1[3 : i32] : f32 into vector<4xf32> becomes %0 = llvm.mlir.constant(3 : i32) : !llvm.i32 %1 = llvm.insertelement %arg0, %arg1[%0 : !llvm.i32] : !llvm<"<4 x float>"> A more elaborate example, inserting an element in a higher dimension vector %0 = vector.insert %arg0, %arg1[3 : i32, 7 : i32, 15 : i32] : f32 into vector<4x8x16xf32> becomes %0 = llvm.extractvalue %arg1[3 : i32, 7 : i32] : !llvm<"[4 x [8 x <16 x float>]]"> %1 = llvm.mlir.constant(15 : i32) : !llvm.i32 %2 = llvm.insertelement %arg0, %0[%1 : !llvm.i32] : !llvm<"<16 x float>"> %3 = llvm.insertvalue %2, %arg1[3 : i32, 7 : i32] : !llvm<"[4 x [8 x <16 x float>]]"> PiperOrigin-RevId: 284882443	2019-12-10 17:12:49 -08:00
Aart Bik	1fe65688d4	[VectorOps] Add a ShuffleOp to the VectorOps dialect For example %0 = vector.shuffle %x, %y [3 : i32, 2 : i32, 1 : i32, 0 : i32] : vector<2xf32>, vector<2xf32> yields a vector<4xf32> result with a permutation of the elements of %x and %y PiperOrigin-RevId: 284657191	2019-12-09 16:15:41 -08:00
Aart Bik	d37f27251f	[VecOps] Rename vector.[insert\|extract]element to just vector.[insert\|extract] Since these operations lower to [insert\|extract][element\|value] at LLVM dialect level, neither element nor value would correctly reflect the meaning. PiperOrigin-RevId: 284240727	2019-12-06 12:39:25 -08:00
Aart Bik	b36aaeafb1	[VectorOps] Add lowering of vector.broadcast to LLVM IR For example, a scalar broadcast %0 = vector.broadcast %x : f32 to vector<2xf32> return %0 : vector<2xf32> which expands scalar x into vector [x,x] by lowering to the following LLVM IR dialect to implement the duplication over the leading dimension. %0 = llvm.mlir.undef : !llvm<"<2 x float>"> %1 = llvm.mlir.constant(0 : index) : !llvm.i64 %2 = llvm.insertelement %x, %0[%1 : !llvm.i64] : !llvm<"<2 x float>"> %3 = llvm.shufflevector %2, %0 [0 : i32, 0 : i32] : !llvm<"<2 x float>">, !llvm<"<2 x float>"> return %3 : vector<2xf32> In the trailing dimensions, the operand is simply "passed through", unless a more elaborate "stretch" is required. For example %0 = vector.broadcast %arg0 : vector<1xf32> to vector<4xf32> return %0 : vector<4xf32> becomes %0 = llvm.mlir.undef : !llvm<"<4 x float>"> %1 = llvm.mlir.constant(0 : index) : !llvm.i64 %2 = llvm.extractelement %arg0[%1 : !llvm.i64] : !llvm<"<1 x float>"> %3 = llvm.mlir.constant(0 : index) : !llvm.i64 %4 = llvm.insertelement %2, %0[%3 : !llvm.i64] : !llvm<"<4 x float>"> %5 = llvm.shufflevector %4, %0 [0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm<"<4 x float>">, !llvm<"<4 x float>"> llvm.return %5 : !llvm<"<4 x float>"> PiperOrigin-RevId: 284219926	2019-12-06 11:02:29 -08:00
Nicolas Vasilache	5c0c51a997	Refactor dependencies to expose Vector transformations as patterns - NFC This CL refactors some of the MLIR vector dependencies to allow decoupling VectorOps, vector analysis, vector transformations and vector conversions from each other. This makes the system more modular and allows extracting VectorToVector into VectorTransforms that do not depend on vector conversions. This refactoring exhibited a bunch of cyclic library dependencies that have been cleaned up. PiperOrigin-RevId: 283660308	2019-12-03 17:52:10 -08:00
Nicolas Vasilache	0b271b7dfe	Refactor the LowerVectorTransfers pass to use the RewritePattern infra - NFC This is step 1/n in refactoring infrastructure along the Vector dialect to make it ready for retargetability and composable progressive lowering. PiperOrigin-RevId: 280529784	2019-11-14 15:40:07 -08:00
Alex Zinenko	bf5916e7a4	Use MemRefDescriptor in Vector-to-LLVM convresion Following up on the consolidation of MemRef descriptor conversion, update Vector-to-LLVM conversion to use the helper class that abstracts away the implementation details of the MemRef descriptor. This also makes the types of the attributes in emitted llvm.insert/extractelement operations consistently i64 instead of a mix of index and i64. PiperOrigin-RevId: 280441451	2019-11-14 09:05:42 -08:00
Nicolas Vasilache	f2b6ae9991	Move VectorOps to Tablegen - (almost) NFC This CL moves VectorOps to Tablegen and cleans up the implementation. This is almost NFC but 2 changes occur: 1. an interface change occurs in the padding value specification in vector_transfer_read: the value becomes non-optional. As a shortcut we currently use %f0 for all paddings. This should become an OpInterface for vectorization in the future. 2. the return type of vector.type_cast is trivial and simplified to `memref<vector<...>>` Relevant roundtrip and invalid tests that used to sit in core are moved to the vector dialect. The op documentation is moved to the .td file. PiperOrigin-RevId: 280430869	2019-11-14 08:15:23 -08:00
Nicolas Vasilache	f51a155337	Add support for alignment attribute in std.alloc. This CL adds an extra pointer to the memref descriptor to allow specifying alignment. In a previous implementation, we used 2 types: `linalg.buffer` and `view` where the buffer type was the unit of allocation/deallocation/alignment and `view` was the unit of indexing. After multiple discussions it was decided to use a single type, which conflates both, so the memref descriptor now needs to carry both pointers. This is consistent with the [RFC-Proposed Changes to MemRef and Tensor MLIR Types](https://groups.google.com/a/tensorflow.org/forum/#!searchin/mlir/std.view%7Csort:date/mlir/-wKHANzDNTg/4K6nUAp8AAAJ). PiperOrigin-RevId: 279959463	2019-11-12 07:06:54 -08:00
Nicolas Vasilache	2823b68580	Implement lowering of VectorTypeCastOp to LLVM A VectorTypeCastOp can only be used to lower between statically sized contiguous memrefs of scalar and matching vector type. The sizes and strides are thus fully static and easy to determine. A relevant test is added. This is a step towards solving tensorflow/mlir#189. PiperOrigin-RevId: 275538981	2019-10-18 14:00:06 -07:00
Alex Zinenko	c335d9d313	LLVM dialect: prefix auxiliary operations with "mlir." Some of the operations in the LLVM dialect are required to model the LLVM IR in MLIR, for example "constant" operations are needed to declare a constant value since MLIR, unlike LLVM, does not support immediate values as operands. To avoid confusion with actual LLVM operations, we prefix such axuiliary operations with "mlir.". PiperOrigin-RevId: 266942838	2019-09-03 09:10:56 -07:00
Alex Zinenko	0f974817b5	LLVM dialect: prefix operations that correspond to intrinsics with "intr." LLVM intrinsics have an open name space and their names can potentially overlap with names of LLVM instructions (LLVM intrinsics are functions, not instructions). In MLIR, LLVM intrinsics are modeled as operations, so it needs to make sure their names cannot clash with the instructions. Use the "intr." prefix for intrinsics in the LLVM dialect. PiperOrigin-RevId: 264372173	2019-08-20 06:38:52 -07:00
Nicolas Vasilache	f826ceef3c	Extend vector.outerproduct with an optional 3rd argument This CL adds an optional third argument to the vector.outerproduct instruction. When such a third argument is specified, it is added to the result of the outerproduct and is lowered to FMA intrinsic when the lowering supports it. In the future, we can add an attribute on the `vector.outerproduct` instruction to modify the operations for which to emit code (e.g. "+/", "max/+", "min/+", "log/exp" ...). This CL additionally performs minor cleanups in the vector lowering and adds tests to improve coverage. This has been independently verified to result in proper fma instructions for haswell as follows. Input: ``` func @outerproduct_add(%arg0: vector<17xf32>, %arg1: vector<8xf32>, %arg2: vector<17x8xf32>) -> vector<17x8xf32> { %2 = vector.outerproduct %arg0, %arg1, %arg2 : vector<17xf32>, vector<8xf32> return %2 : vector<17x8xf32> } } ``` Command: ``` mlir-opt vector-to-llvm.mlir -vector-lower-to-llvm-dialect --disable-pass-threading \| mlir-opt -lower-to-cfg -lower-to-llvm \| mlir-translate --mlir-to-llvmir \| opt -O3 \| llc -O3 -march=x86-64 -mcpu=haswell -mattr=fma,avx2 ``` Output: ``` outerproduct_add: # @outerproduct_add # %bb.0: ... vmovaps 112(%rbp), %ymm8 vbroadcastss %xmm0, %ymm0 ... vbroadcastss 64(%rbp), %ymm15 vfmadd213ps 144(%rbp), %ymm8, %ymm0 # ymm0 = (ymm8 ymm0) + mem ... vfmadd213ps 400(%rbp), %ymm8, %ymm9 # ymm9 = (ymm8 * ymm9) + mem ... ``` PiperOrigin-RevId: 263743359	2019-08-16 03:53:26 -07:00
Nicolas Vasilache	252ada4932	Add lowering of vector dialect to LLVM dialect. This CL is step 3/n towards building a simple, programmable and portable vector abstraction in MLIR that can go all the way down to generating assembly vector code via LLVM's opt and llc tools. This CL adds support for converting MLIR n-D vector types to (n-1)-D arrays of 1-D LLVM vectors and a conversion VectorToLLVM that lowers the `vector.extractelement` and `vector.outerproduct` instructions to the proper mix of `llvm.vectorshuffle`, `llvm.extractelement` and `llvm.mulf`. This has been independently verified to produce proper avx2 code. Input: ``` func @vec_1d(%arg0: vector<4xf32>, %arg1: vector<8xf32>) -> vector<8xf32> { %2 = vector.outerproduct %arg0, %arg1 : vector<4xf32>, vector<8xf32> %3 = vector.extractelement %2[0 : i32]: vector<4x8xf32> return %3 : vector<8xf32> } ``` Command: ``` mlir-opt vector-to-llvm.mlir -vector-lower-to-llvm-dialect --disable-pass-threading \| mlir-opt -lower-to-cfg -lower-to-llvm \| mlir-translate --mlir-to-llvmir \| opt -O3 \| llc -O3 -march=x86-64 -mcpu=haswell -mattr=fma,avx2 ``` Output: ``` vec_1d: # @vec_1d # %bb.0: vbroadcastss %xmm0, %ymm0 vmulps %ymm1, %ymm0, %ymm0 retq ``` PiperOrigin-RevId: 262895929	2019-08-12 04:08:57 -07:00

29 Commits