If decided to resize the instruction, need to drop wrapping flags from
the resulting vector instructions to avoid incorrect
optimizations/assumptions later.
Fixes PR75995.
The RISC-V vector crypto extensions have been ratified. This patch
updates the Clang and LLVM support for these extensions to be
non-experimental, while leaving the C intrinsics as experimental since
the C intrinsics are not yet standardized.
Co-authored-by: Brandon Wu <brandon.wu@sifive.com>
SLP Vectorizer can discard vector entries at unknown positions. This
example shows the behaviour:
https://godbolt.org/z/or43EM594
The following instruction inserts an element at an unknown position:
```
%2 = insertelement <3 x i64> poison, i64 %value, i64 %position
```
The position depends on an argument that is unknown at compile time.
After running SLP, one can see there is no more instruction present
referencing `%position`.
This happens as SLP parallelizes the two adds in the example. It then
needs to merge the original vector with the new vector.
Within `isUndefVector`, the SLP vectorizer constructs a bitmap
indicating which elements of the original vector are poison values. It
does this by walking the insertElement instructions.
If it encounters an insert with a non-constant position, it is ignored.
This will result in poison values to be used for all entries, where
there are no inserts with constant positions.
However, as the position is unknown, the element could be anywhere.
Therefore, I think it is only safe to assume none of the entries are
poison values and to simply take them all over when constructing the
shuffleVector instruction.
This fixes#75437
nodes, having same last instruction.
If the user nodes has the same last-instruction, used as insert points
for the buildvector nodes, finding the proper dependency is crucial.
Before, it depended on the indices of the buildvectors themselves but
looks like it should depend on indices of the user nodes, because it
identifies the vectorization order and, thus, properly aligns
buildvector nodes in terms of def-use chain.
These tests rely on SCEV looking recognizing an "or" with no common
bits as an "add". Add the disjoint flag to relevant or instructions
in preparation for switching SCEV to use the flag instead of the
ValueTracking query. The IR with disjoint flag matches what
InstCombine would produce.
The disjoint flag was recently added to IR in #72583
We already set it when we turn an add into an or. This patch sets it on Ors that weren't converted from an Add.
Currently we emit gathers for scalars being vectorized in the tree as
a pair of extractelement/insertelement instructions. Instead we can try
to find all required vectors and emit shuffle vector instructions
directly, improving the code and reducing compile time.
Part of non-power-of-2 vectorization.
Differential Revision: https://reviews.llvm.org/D110978
Currently, MinBWs map uses Value* as a key and stores mapping for each
value to be demoted. It make is it hard to get the actual MinBWs value
for the buildvector scalars(constants), since same constant might be
used in different nodes with the different MinBWs values/decisions.
Also, it consumes extra memory for the vectorized values/instructions
from the same nodes.
Better to map actual nodes. It fixes the bitwidth data fetching for
buildvector scalars and improves memory consumption/analysis time for
other instructions.
If the gather node includes ordered loads only partially (not the whole
node consists of loads) and the other gathered scalar are not loads, and
no other dependency from other nodes is found, we still can improve the
cost of gather, if take into account the fact that these loads still can
be vectorized.
operand.
Need to copy the submask not to the very first part of the common
extractelements vector mask, but to the proper one to avoid wrong code
emission.
If the buildvector node contains extractelement, which vector operand
depends on vector node, need to check if the node is ready and use
vectorized value instead of the original vector operation.
SLP includes analysis for the minimum bitwidth, the actual integer
operations can be emitted. It allows to reduce register pressure and
improve perf. Currently, it includes only cost model and the next
transformation relies on InstructionCombiner. Better to do it directly
in SLP, it allows to reduce compile time and fix cost model issues.
SLP includes analysis for the minimum bitwidth, the actual integer
operations can be emitted. It allows to reduce register pressure and
improve perf. Currently, it includes only cost model and the next
transformation relies on InstructionCombiner. Better to do it directly
in SLP, it allows to reduce compile time and fix cost model issues.
With the recent change 98c90a13 (ISel: introduce vector ISD::LRINT,
ISD::LLRINT; custom RISCV lowering), it is now possible for
SLPVectorizer, LoopVectorize, and Scalarizer to operate on llvm.lrint
and llvm.llrint, with vector codegen for the RISC-V target. Make a
trivial change to VectorUtils, and update the corresponding tests.
A couple of important fixes have been landed since the original patch
was landed and reverted, and it is now safe to re-land the patch:
5e1d81a (LegalizeIntegerTypes: implement PromoteIntRes for xrint) and
fd887a3 (LegalizeVectorTypes: fix bug in widening of vec result in
xrint). See also #71399, which proves that lrint and llrint will indeed
produce vector codegen on RISC-V.
Fixes#55208.
Currently tryToGatherExtractElements function analyzes the whole vector,
regrdless number of actual registers, used in this vector. It may
prevent some optimizations, because per-register analysis may allow to
simplify the final code by reusing more already emitted vectors and
better shuffles.
Differential Revision: https://reviews.llvm.org/D148855
This causes asserts:
llvm-project/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp:10082:
Value *llvm::slpvectorizer::BoUpSLP::ShuffleInstructionBuilder::adjustExtracts(
const TreeEntry *, MutableArrayRef<int>, unsigned int, bool &):
Assertion `Part == 0 && "Expected firs part."' failed.
See comment on the code review.
> Currently tryToGatherExtractElements function analyzes the whole vector,
> regrdless number of actual registers, used in this vector. It may
> prevent some optimizations, because per-register analysis may allow to
> simplify the final code by reusing more already emitted vectors and
> better shuffles.
>
> Differential Revision: https://reviews.llvm.org/D148855
This reverts commit 9dfdbd7887.
Currently tryToGatherExtractElements function analyzes the whole vector,
regrdless number of actual registers, used in this vector. It may
prevent some optimizations, because per-register analysis may allow to
simplify the final code by reusing more already emitted vectors and
better shuffles.
Differential Revision: https://reviews.llvm.org/D148855
Remove support for zext and sext constant expressions. All places
creating them have been removed beforehand, so this just removes the
APIs and uses of these constant expressions in tests.
There is some additional cleanup that can be done on top of this, e.g.
we can remove the ZExtInst vs ZExtOperator footgun.
This is part of
https://discourse.llvm.org/t/rfc-remove-most-constant-expressions/63179.
This reverts commit 3e6d7c6d98.
That commit caused miscompilation of ffmpeg's libavcodec/vp9dsp_8bpp.o
on aarch64; the file still compiles correctly, but no longer produces
the right result - see https://reviews.llvm.org/D148855#4655968
for details.
Currently tryToGatherExtractElements function analyzes the whole vector,
regrdless number of actual registers, used in this vector. It may
prevent some optimizations, because per-register analysis may allow to
simplify the final code by reusing more already emitted vectors and
better shuffles.
Differential Revision: https://reviews.llvm.org/D148855
Currently tryToGatherExtractElements function analyzes the whole vector,
regrdless number of actual registers, used in this vector. It may
prevent some optimizations, because per-register analysis may allow to
simplify the final code by reusing more already emitted vectors and
better shuffles.
Differential Revision: https://reviews.llvm.org/D148855
With the recent change 98c90a13 (ISel: introduce vector ISD::LRINT,
ISD::LLRINT; custom RISCV lowering), it is now possible for
SLPVectorizer, LoopVectorize, and Scalarizer to operate on llvm.lrint
and llvm.llrint, with vector codegen for the RISC-V target. Make a
trivial change to VectorUtils, and update the corresponding tests.
the middle of reduction ops.
Need to emit freeze instruction not only in the case, where the root is
bool logical op, but also if we reduce several scalars, but unable to
say precisely, if the root is bool logical op.
If the very first reduction operation is not bool logical op, but some
others are, still need to emit the boo logic op for all the extra
reduction operations to avoid incorrect poison propagation.
Builds on #67982 which recently introduced the nneg flag on a zext
instruction. InstCombine is one of our largest canonicalizers of zext
from non-negative sext instructions, so set the flag there.
type.
Need to check, if the number of vector registers, returned by TTI, is
not greater than total number of mask element and not zero, before
trying to perform any operations. TTI still may return non-valid number
of registers.
After determining the cost of loads that could not be coalesced into
`VectorizedLoads` in SLP, computing the cost of a gather-vectorized
load is carried out. Favour a potentially high valid cost when the
type of a group of loads, whose type is a vector of size dependent
upon `VF`, may be legalized into a scalar value.
Fixes: https://github.com/llvm/llvm-project/issues/68953.
There are many tests that specify a target triple/CPU flags but no
DataLayout which can lead to IR being generated that has unusual
behaviour. This commit attempts to use the default DataLayout based
on the relevant flags if there is no explicit override on the command
line or in the IR file.
One thing that is not currently possible to differentiate from a missing
datalayout `target datalayout = ""` in the IR file since the current
APIs don't allow detecting this case. If it is considered useful to
support this case (instead of passing "-data-layout=" on the command
line), I can change IR parsers to track whether they have seen such a
directive and change the callback type.
Differential Revision: https://reviews.llvm.org/D141060