This helper handles non trivial cases of broadcast + optional transpose creation
that should not leak to the outside world.
Differential Revision: https://reviews.llvm.org/D139003
This is generated by running
```
sed --in-place 's/[[:space:]]\+$//' mlir/**/*.td
sed --in-place 's/[[:space:]]\+$//' mlir/**/*.mlir
```
Reviewed By: rriddle, dcaballe
Differential Revision: https://reviews.llvm.org/D138866
Fold InsertStridedOp(ConstantOp into ConstantOp) -> ConstantOp.
This pattern comes with vector size threshold to make sure we do not
introduce too many large constants.
This help clean up code created by the Wide Integer Emulation pass.
Reviewed By: antiagainst
Differential Revision: https://reviews.llvm.org/D138739
This revision fixes a bug in the vector.extract folding that was missing
handling the "dim-1" broadcasting case in vector.broadcast.
Differential Revision: https://reviews.llvm.org/D138804
This pattern comes with vector size threshold to make sure we do not
introduce too many large constants.
This help clean up code created by the Wide Integer Emulation pass.
Reviewed By: dcaballe
Differential Revision: https://reviews.llvm.org/D138733
This allows us to better canonicalize/clean-up code created by the Wide
Integer Emulation pass.
Reviewed By: antiagainst
Differential Revision: https://reviews.llvm.org/D138606
This generalizes the existing fold for `ExtractOp(non-splat constant)`
to work with vector results. The vector case is handled by extracting
the subrange of attribute array.
My main use it to clean up code generated by the Wide Integer Emulation
pass.
Reviewed By: antiagainst
Differential Revision: https://reviews.llvm.org/D138690
This op significantly improves transfor dialect usage when using vector abstractions.
It also brings us closer to writing simple end-to-end unit tests that guard against subtle regressions in how patterns combine.
Differential Revision: https://reviews.llvm.org/D138723
Masking hasn't been widely used in vector transfer ops and the semantics
of the mask operand were a bit loose. This patch states that the mask
operand in a vector transfer op is applied to the read/write part of the
operation and, therefore, its shape should match the shape of the
elements read/written from/into the memref/tensor regardless of any
permutation/broadcasting also applied by the transfer operation.
Reviewers: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D138079
Not all shape of vectors can be casted into other types, we add a check
to not fold insertOp into bitcast if the shape does not support it.
Examples of unsupported shape castings are f16 vectors to f32 if the
shape is not multiple of 2s. or int8 to int32 if shapes are not multiple
of 4.
Reviewed By: antiagainst, ThomasRaoux
Differential Revision: https://reviews.llvm.org/D137802
Ops such as `%1 = vector.extractelement %0[%pos : index] : vector<96xf32>`.
In case of an extract from a 1D vector, the source vector is distributed. The lane into which the requested position falls, extracts the element and shuffles it to all other lanes.
Differential Revision: https://reviews.llvm.org/D137336
This is useful for breaking down extract_strided_slice and potentially
cancel with other extract / insert ops before or after.
Reviewed By: ThomasRaoux
Differential Revision: https://reviews.llvm.org/D137471
In D134622 the printed form of a pass manager is changed to include the
name of the op that the pass manager is anchored on. This updates the
`-pass-pipeline` argument format to include the anchor op as well, so
that the printed form of a pipeline can be directly passed to
`-pass-pipeline`. In most cases this requires updating
`-pass-pipeline='pipeline'` to
`-pass-pipeline='builtin.module(pipeline)'`.
This also fixes an outdated assert that prevented running a
`PassManager` anchored on `'any'`.
Reviewed By: rriddle
Differential Revision: https://reviews.llvm.org/D134900
When a value used in the forOp is defined outside the region but within
the parent warpOp we need to return and distribute the value to pass it
to new operations created within the loop.
Also simplify the lambda interface.
Differential Revision: https://reviews.llvm.org/D137146
This patch introduces the lowering for xfer ops masked with `vector.mask`.
Vector reductions are not lowered yet because new LLVM intrinsics are needed
in the LLVM dialect.
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D136741
We currently only support one level of aliases, which isn't great
in situations where an attribute/type can have multiple duplicated
components nested within it(e.g. debuginfo metadata). This commit
refactors alias generation to support nested aliases, which requires
changing alias grouping to take into account the depth of child
aliases, to ensure that attributes/types aren't printed before the
aliases they use.
The only real user facing change here was that we no longer print
0 as an alias suffix, which would be unnecessarily expensive to keep
in the new alias generation method (and isn't that valuable of a
behavior to preserve).
Differential Revision: https://reviews.llvm.org/D136541
This patch introduces the `vector.mask` operation and the MaskableOpInterface
as described in https://discourse.llvm.org/t/rfc-vector-masking-representation-in-mlir/64964.
The `vector.mask` operation is used to predicate the execution of operations
implementing the MaskableOpInterface. This interface will be implemented by maskable
operations and provides information about its masking constraints and semantics.
For now, only vector transfer and reduction ops implement the MaskableOpInterface
for illustration and testing purposes.
Reviewed By: nicolasvasilache, rriddle
Differential Revision: https://reviews.llvm.org/D134939
This commit adds a pattern to merge accumulator and result
`vector.transpose` ops into `vector.contract`. This kind of
pattern can be generated for NCHW convolution vectorization,
where we use transposes to convert the 1-D NCW convolution
into NWC during vectorization. Merging the transpose would
mean we can avoid materialize vector extract/insert for
transposes and it makes further vector level transformations
easier.
Reviewed By: ThomasRaoux
Differential Revision: https://reviews.llvm.org/D135496
In https://reviews.llvm.org/D133883, we changed the
`FoldExtractSliceIntoTransferRead` pattern from requiring
full identity map to minor identity map. This effectively
allows rank reducing `vector.transfer_read` ops. However,
the logic for checking `tensor.extract_slice` rank reducing
still looks at the vector rank, which now could be smaller
than the `tensor.extract_slice`'s output tensor rank.
It ends up we can have incorrect index cacluation after
folding due to this double rank reducing behavior.
Reviewed By: ThomasRaoux
Differential Revision: https://reviews.llvm.org/D134984
Many tests still depend on specific names of SSA values (!!).
This commit is a best effort cleanup that will set the stage for adding some pretty SSA result names.
Make sure we consider other subviews of the same buffer when doing store
to load forwarding or dead store elimination.
Differential Revision: https://reviews.llvm.org/D134576
Test does a "CHECK-NOT" against the function name when it should check
to ensure that the `vector.warp_execute_on_lane_0` is removed.
Reviewed By: ThomasRaoux
Differential Revision: https://reviews.llvm.org/D134448
One of the vector transformation patterns has been indiscriminately
converting layouts to affine maps. Leverage the strided form when
possible.
Reviewed By: nicolasvasilache, dcaballe
Differential Revision: https://reviews.llvm.org/D134047
All relevant operations have been switched to primarily use the strided
layout, but still support the affine map layout. Update the relevant
tests to use the strided format instead for compatibility with how ops
now print by default.
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D134045
Bufferization already makes the assumption that buffers pass function
boundaries in the strided form and uses the corresponding affine map layouts.
Switch it to use the recently introduced strided layout instead to avoid
unnecessary casts when bufferizing further operations to the memref dialect
counterparts that now largely rely on the strided layout attribute.
Depends On D133947
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D133951
Memref subview operation has been initially designed to work on memrefs with
strided layouts only and has never supported anything else. Port it to use the
recently added StridedLayoutAttr instead of extracting the strided from
implicitly from affine maps.
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D133938
vecotr.transfer_read ops with minor identity indexing map is rank
reducing, with implicit leading unit dimensions. This should be
a natural extension to support in addition to full identity indexing
maps.
Reviewed By: ThomasRaoux
Differential Revision: https://reviews.llvm.org/D133883
Simplify the lowering of warp_execute_on_lane0 of scf.if by making the
logic more generic. Also remove the assumption that the most inner
dimension is the dimension distributed.
Differential Revision: https://reviews.llvm.org/D133826
Add a new pattern to fold `vector.extract` over n-D constants that extract scalars.
The previous code handled ND splat constants only. The new pattern is conservative and does handle sub-vector constants.
This is to aid the `arith::EmulateWideInt` pass which emits a lot of 2-element vector constants.
Reviewed By: Mogball, dcaballe
Differential Revision: https://reviews.llvm.org/D133742
This revision significantly improves and tests the broadcast behavior of vector.warp_execute_on_lane_0.
Previously, the implementation of the broadcast behavior of vector.warp_execute_on_lane_0
assumed that the broadcasted value was always of scalar type.
This is not necessarily the case.
Differential Revision: https://reviews.llvm.org/D133767
The logic to figure out if a transfer op can be flattened wasn't
considering the shape being loaded therefore it was incorrectly assuming
some transfer ops were reading contigous data.
Differential Revision: https://reviews.llvm.org/D133544
CombiningKind was implemented before EnumAttr, so it reimplements the same behaviour with the custom code. Except for a few places, EnumAttr is a drop-in replacement.
Reviewed By: nicolasvasilache, pifon2a
Differential Revision: https://reviews.llvm.org/D133343
Running: `mlir-opt -test-vector-warp-distribute=rewrite-warp-ops-to-scf-if -canonicalize -verify-each=0`.
Prior to this revision, IR resembling the following would be produced:
```
%4 = "vector.load"(%3, %arg0) : (memref<1x32xf32, 3>, index) -> vector<1x1xf32>
```
This fails verification since it needs 2 indices to load but only 1 is provided.
Differential Revision: https://reviews.llvm.org/D133106
Introduce a new attribute to represent the strided memref layout. Strided
layouts are omnipresent in code generation flows and are the only kind of
layouts produced and supported by a half of operation in the memref dialect
(view-related, shape-related). However, they are internally represented as
affine maps that require a somewhat fragile extraction of the strides from the
linear form that also comes with an overhead. Furthermore, textual
representation of strided layouts as affine maps is difficult to read: compare
`affine_map<(d0, d1, d2)[s0, s1] -> (d0*32 + d1*s0 + s1 + d2)>` with
`strides: [32, ?, 1], offset: ?`. While a rudimentary support for parsing a
syntactically sugared version of the strided layout has existed in the codebase
for a long time, it does not go as far as this commit to make the strided
layout a first-class attribute in the IR.
This introduces the attribute and updates the tests that using the pre-existing
sugared form to use the new attribute instead. Most memref created
programmatically, e.g., in passes, still use the affine form with further
extraction of strides and will be updated separately.
Update and clean-up the memref type documentation that has gotten stale and has
been referring to the details of affine map composition that are long gone.
See https://discourse.llvm.org/t/rfc-materialize-strided-memref-layout-as-an-attribute/64211.
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D132864
Currently vector.gather only supports reading memory into a 1-D result vector.
This patch extends it to support an n-D result vector with the indices, masks,
and passthroughs in n-D vectors.
As we are trying to vectorize tensor.extract with vector.gather
(https://github.com/iree-org/iree/issues/9198), it will need to gather the
elements into an n-D vector. Having vector.gather with n-D results allows us
to avoid flatten and reshape at the vectorization stage. The backends can then
decide the optimal ways to lower the vector.gather op.
Note that this is different from n-D gathering, which is about reading n-D
memory with the n-D indices. The indices here are still only 1-D offsets on
the base.
Reviewed By: dcaballe
Differential Revision: https://reviews.llvm.org/D131905