Commit Graph

1276 Commits

Author SHA1 Message Date
Alexey Bataev
a13148a880 [SLP]Fix PR75995: drop wrapping flags for resized wrapped binops.
If decided to resize the instruction, need to drop wrapping flags from
the resulting vector instructions to avoid incorrect
optimizations/assumptions later.
Fixes PR75995.
2023-12-20 06:51:39 -08:00
Alexey Bataev
8abf8c948c [SLP][NFC]Add a test with incorrect wrapping flags in the binops with
minbitwidth types.
2023-12-20 06:27:01 -08:00
Nikita Popov
9d4557920f [InstCombine] Don't treat undef as poison in demanded element simplification
We can only set PoisonElts if the element is poison, not if it is
undef.
2023-12-19 12:26:48 +01:00
Nikita Popov
a5f3415533 [InstCombine] Replace non-demanded undef vector with poison
If an operand (esp to shufflevector or insertelement) is not
demanded, canonicalize it from undef to poison.
2023-12-18 16:12:37 +01:00
Maurice Heumann
f42b930af9 [SLP] Pessimistically handle unknown vector entries in SLP vectorizer (#75438)
SLP Vectorizer can discard vector entries at unknown positions. This
example shows the behaviour:

https://godbolt.org/z/or43EM594

The following instruction inserts an element at an unknown position:

```
%2 = insertelement <3 x i64> poison, i64 %value, i64 %position
```

The position depends on an argument that is unknown at compile time.

After running SLP, one can see there is no more instruction present
referencing `%position`.

This happens as SLP parallelizes the two adds in the example. It then
needs to merge the original vector with the new vector.

Within `isUndefVector`, the SLP vectorizer constructs a bitmap
indicating which elements of the original vector are poison values. It
does this by walking the insertElement instructions.

If it encounters an insert with a non-constant position, it is ignored.
This will result in poison values to be used for all entries, where
there are no inserts with constant positions.

However, as the position is unknown, the element could be anywhere.
Therefore, I think it is only safe to assume none of the entries are
poison values and to simply take them all over when constructing the
shuffleVector instruction.

This fixes #75437
2023-12-14 09:48:23 -05:00
Alexey Bataev
0e1a9e3084 [SLP]Fix PR74607: Fix dependency between buildvector nodes with user
nodes, having same last instruction.

If the user nodes has the same last-instruction, used as insert points
for the buildvector nodes, finding the proper dependency is crucial.
  Before, it depended on the indices of the buildvectors themselves but
  looks like it should depend on indices of the user nodes, because it
  identifies the vectorization order and, thus, properly aligns
  buildvector nodes in terms of def-use chain.
2023-12-06 10:15:01 -08:00
Nikita Popov
eecb99c5f6 [Tests] Add disjoint flag to some tests (NFC)
These tests rely on SCEV looking recognizing an "or" with no common
bits as an "add". Add the disjoint flag to relevant or instructions
in preparation for switching SCEV to use the flag instead of the
ValueTracking query. The IR with disjoint flag matches what
InstCombine would produce.
2023-12-05 14:09:36 +01:00
Alexey Bataev
1f88e62db4 [SLP]Fix/improve minbitwidth mapping to use TreeEntry as a key.
Currently, MinBWs map uses Value* as a key and stores mapping for each
value to be demoted. It make is it hard to get the actual MinBWs value
for the buildvector scalars(constants), since same constant might be
  used in different nodes with the different MinBWs values/decisions.
Also, it consumes extra memory for the vectorized values/instructions
 from the same nodes.
Better to map actual nodes. It fixes the bitwidth data fetching for
buildvector scalars and improves memory consumption/analysis time for
other instructions.
2023-11-30 06:33:31 -08:00
Alexey Bataev
12bcd6339d [SLP]Improve detection of gathered loads, if no other deps are detected.
If the gather node includes ordered loads only partially (not the whole
node consists of loads) and the other gathered scalar are not loads, and
no other dependency from other nodes is found, we still can improve the
cost of gather, if take into account the fact that these loads still can
be vectorized.
2023-11-22 11:35:51 -08:00
Alexey Bataev
f609d4ba1d [SLP]Fix PR72833: do not crash if only operand is casted but the use
instruction.

Need to check if only operand is casted, not the user instruction
itself, if the types of the operands does not match the actual type.
2023-11-20 08:35:35 -08:00
Alexey Bataev
40e46b6eff [SLP]Do not emit int bitcast after minbitwidth analysis.
No need to emit bitcat op for integer operands if it is detected that
after minbitwidth analysis the type is the same.
2023-11-20 06:25:17 -08:00
Alexey Bataev
206799fcf5 [SLP]Fix PR72524: "Out-of-bounds shuffle mask element" failed.
Need to check if we ran into subvector extract pattern before checking
for identity vector to avoid compiler crash.
2023-11-16 07:39:32 -08:00
Alexey Bataev
95703642e3 [SLP]Fix PR72202: wrong mask emission for the first found vector
operand.

Need to copy the submask not to the very first part of the common
extractelements vector mask, but to the proper one to avoid wrong code
emission.
2023-11-16 07:01:05 -08:00
Alexey Bataev
181b2c1b4a [SLP][NFC]Add a test for PR72202 to show a bug in a mask generation for
vectorized extractelements operands.
2023-11-16 06:36:04 -08:00
Alexey Bataev
8ea8dd9a01 [SLP] Fix crash on trying to reshuffle a scalar that was vectorized.
If the buildvector node contains extractelement, which vector operand
depends on vector node, need to check if the node is ready and use
vectorized value instead of the original vector operation.
2023-11-15 11:01:45 -08:00
Alexey Bataev
5adfad254e [SLP]Emit actual bitwidth for analyzed MinBitwidth nodes, NFCI.
SLP includes analysis for the minimum bitwidth, the actual integer
operations can be emitted. It allows to reduce register pressure and
improve perf. Currently, it includes only cost model and the next
transformation relies on InstructionCombiner. Better to do it directly
in SLP, it allows to reduce compile time and fix cost model issues.
2023-11-14 11:12:52 -08:00
Alexey Bataev
506a30d30f [SLP][NFC]Add a test with cast op, not matching original cast op, NFC. 2023-11-14 10:08:12 -08:00
Alexey Bataev
f2f3050476 Revert "[SLP]Emit actual bitwidth for analyzed MinBitwidth nodes, NFCI."
This reverts commit f6ae50f710 to fix
a crash revealed in the internal testing.
2023-11-14 09:45:54 -08:00
Alexey Bataev
f6ae50f710 [SLP]Emit actual bitwidth for analyzed MinBitwidth nodes, NFCI.
SLP includes analysis for the minimum bitwidth, the actual integer
operations can be emitted. It allows to reduce register pressure and
improve perf. Currently, it includes only cost model and the next
transformation relies on InstructionCombiner. Better to do it directly
in SLP, it allows to reduce compile time and fix cost model issues.
2023-11-14 07:57:37 -08:00
Alexey Bataev
dbd00c3b5d [SLP][NFC]Add a test for gather node with mixed load/non-load scalars. 2023-11-10 08:40:58 -08:00
Alexey Bataev
ac254fc055 [SLP]Improve tryToGatherExtractElements by using per-register analysis.
Currently tryToGatherExtractElements function analyzes the whole vector,
regrdless number of actual registers, used in this vector. It may
prevent some optimizations, because per-register analysis may allow to
simplify the final code by reusing more already emitted vectors and
better shuffles.

Differential Revision: https://reviews.llvm.org/D148855
2023-11-06 07:29:27 -08:00
Hans Wennborg
046c57e705 Revert "[SLP]Improve tryToGatherExtractElements by using per-register analysis."
This causes asserts:

  llvm-project/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp:10082:
  Value *llvm::slpvectorizer::BoUpSLP::ShuffleInstructionBuilder::adjustExtracts(
    const TreeEntry *, MutableArrayRef<int>, unsigned int, bool &):
  Assertion `Part == 0 && "Expected firs part."' failed.

See comment on the code review.

> Currently tryToGatherExtractElements function analyzes the whole vector,
> regrdless number of actual registers, used in this vector. It may
> prevent some optimizations, because per-register analysis may allow to
> simplify the final code by reusing more already emitted vectors and
> better shuffles.
>
> Differential Revision: https://reviews.llvm.org/D148855

This reverts commit 9dfdbd7887.
2023-11-06 13:56:42 +01:00
Alexey Bataev
9dfdbd7887 [SLP]Improve tryToGatherExtractElements by using per-register analysis.
Currently tryToGatherExtractElements function analyzes the whole vector,
regrdless number of actual registers, used in this vector. It may
prevent some optimizations, because per-register analysis may allow to
simplify the final code by reusing more already emitted vectors and
better shuffles.

Differential Revision: https://reviews.llvm.org/D148855
2023-11-03 10:43:58 -07:00
Nikita Popov
e4a4122eb6 [IR] Remove zext and sext constant expressions (#71040)
Remove support for zext and sext constant expressions. All places
creating them have been removed beforehand, so this just removes the
APIs and uses of these constant expressions in tests.

There is some additional cleanup that can be done on top of this, e.g.
we can remove the ZExtInst vs ZExtOperator footgun.

This is part of
https://discourse.llvm.org/t/rfc-remove-most-constant-expressions/63179.
2023-11-03 10:46:07 +01:00
Martin Storsjö
66152f4eed Revert "[SLP]Improve tryToGatherExtractElements by using per-register analysis."
This reverts commit 3e6d7c6d98.

That commit caused miscompilation of ffmpeg's libavcodec/vp9dsp_8bpp.o
on aarch64; the file still compiles correctly, but no longer produces
the right result - see https://reviews.llvm.org/D148855#4655968
for details.
2023-11-03 00:08:17 +02:00
Alexey Bataev
495ed8d8c8 [SLP]Fix PR70507: freeze poisonous insts to avoid poison propagation.
If the reduction instruction is not bool logical op, but reduced within bool logical op reduction list, need to freeze to avoid poison propagation.
2023-11-02 10:37:38 -07:00
Alexey Bataev
033d2b71d2 [SLP][NFC]Add a test to show poison propagation in mixed (non)bool
logical ops reduction, NFC.
2023-11-02 09:58:13 -07:00
Alexey Bataev
3e6d7c6d98 [SLP]Improve tryToGatherExtractElements by using per-register analysis.
Currently tryToGatherExtractElements function analyzes the whole vector,
regrdless number of actual registers, used in this vector. It may
prevent some optimizations, because per-register analysis may allow to
simplify the final code by reusing more already emitted vectors and
better shuffles.

Differential Revision: https://reviews.llvm.org/D148855
2023-11-01 10:42:35 -07:00
Alexey Bataev
6e8d957a22 Revert "[SLP]Improve tryToGatherExtractElements by using per-register analysis."
This reverts commit 0a34aaedd8 to fix
fails reported in https://lab.llvm.org/buildbot/#/builders/265/builds/40
2023-11-01 08:52:31 -07:00
Alexey Bataev
c28b7eb496 [SLP]Fix handling of -slp-vectorize-hor-store for values with many uses. 2023-11-01 08:41:54 -07:00
Alexey Bataev
c449a64c3e [SLP][NFC]Add the test shoing issue with -slp-vectorize-hor-store
option, NFC.
2023-11-01 08:31:18 -07:00
Alexey Bataev
0a34aaedd8 [SLP]Improve tryToGatherExtractElements by using per-register analysis.
Currently tryToGatherExtractElements function analyzes the whole vector,
regrdless number of actual registers, used in this vector. It may
prevent some optimizations, because per-register analysis may allow to
simplify the final code by reusing more already emitted vectors and
better shuffles.

Differential Revision: https://reviews.llvm.org/D148855
2023-11-01 07:44:49 -07:00
Alexey Bataev
4c997e1536 [SLP]Fix PR70507: emit freeeze whenever required for bool logical ops in
the middle of reduction ops.

Need to emit freeze instruction not only in the case, where the root is
bool logical op, but also if we reduce several scalars, but unable to
say precisely, if the root is bool logical op.
2023-10-31 12:23:12 -07:00
Alexey Bataev
0e8cbb6ac8 [SLP][NFC]Add a test with poisonous reduction, seeding bool logical op.
NFC.
2023-10-31 12:10:10 -07:00
Alexey Bataev
9da19e4340 [SLP]Fix PR70507: correctly handle bool logical ops in reductions.
If the very first reduction operation is not bool logical op, but some
others are, still need to emit the boo logic op for all the extra
reduction operations to avoid incorrect poison propagation.
2023-10-30 14:09:08 -07:00
Alexey Bataev
71bf052ec9 [SLP][NFC]Add a test for bool logic ops reduction, NFC. 2023-10-30 13:38:57 -07:00
Alexey Bataev
af15c46777 [SLP]Do not crash if number of vector registers does not feet the vector
type.

Need to check, if the number of vector registers, returned by TTI, is
not greater than total number of mask element and not zero, before
trying to perform any operations. TTI still may return non-valid number
of registers.
2023-10-30 07:30:52 -07:00
Alex Richardson
e39f6c1844 [opt] Infer DataLayout from triple if not specified
There are many tests that specify a target triple/CPU flags but no
DataLayout which can lead to IR being generated that has unusual
behaviour. This commit attempts to use the default DataLayout based
on the relevant flags if there is no explicit override on the command
line or in the IR file.

One thing that is not currently possible to differentiate from a missing
datalayout `target datalayout = ""` in the IR file since the current
APIs don't allow detecting this case. If it is considered useful to
support this case (instead of passing "-data-layout=" on the command
line), I can change IR parsers to track whether they have seen such a
directive and change the callback type.

Differential Revision: https://reviews.llvm.org/D141060
2023-10-26 12:07:37 -07:00
Alexey Bataev
196d154ab7 [SLP]Improve isGatherShuffledEntry by trying per-register shuffle.
Currently when building gather/buildvector node, we try to build nodes
shuffles without taking into account separate vector registers. We can
improve final codegen and the whole vectorization process by including
this info into the analysis and the vector code emission, allows to emit
better vectorized code.

Differential Revision: https://reviews.llvm.org/D149742
2023-10-26 08:51:37 -07:00
Alexey Bataev
c65ec9d919 Revert "[SLP]Improve isGatherShuffledEntry by trying per-register shuffle."
This reverts commit 560bad013e to fix
a bug reported in https://lab.llvm.org/buildbot/#/builders/5/builds/37763.
2023-10-26 08:36:50 -07:00
Simon Pilgrim
585da2651f [SLP][X86] Regenerate hadd/hsub tests with full set of check-prefixes
Prep for D148855
2023-10-26 14:39:46 +01:00
Alexey Bataev
560bad013e [SLP]Improve isGatherShuffledEntry by trying per-register shuffle.
Currently when building gather/buildvector node, we try to build nodes
shuffles without taking into account separate vector registers. We can
improve final codegen and the whole vectorization process by including
this info into the analysis and the vector code emission, allows to emit
better vectorized code.

Differential Revision: https://reviews.llvm.org/D149742
2023-10-26 05:57:03 -07:00
Valery Dmitriev
3324776d9c [SLP] Improve gather tree nodes matching when users are PHIs. (#70111)
This is re-commit of #69392 and also fixes issue #69670 which was
uncovered with the prior commit.
For delayed gather emission it may be incorrect to use stab instruction
as insertion point if it is a PHI operand. For that case insertion point
is adjusted to be at the end of block, ensuring that prior dependecy
vector code is emitted earlier.
2023-10-24 16:39:36 -07:00
Valery Dmitriev
117041dac9 [NFC][SLP] Add test case for issue #69670. (#70088)
Test exposes issue with delayed gather emission, which may lead to
generating an instruction which does not dominate all users.
2023-10-24 12:36:38 -07:00
Alexey Bataev
d79051f894 [SLP]Fix PR70004: Do not change insert point for reduction gather nodes.
No need to change the insert point for reduction gather node, we can use
the ReductionRoot as insert point instead to avoid possible crashes.
2023-10-24 09:19:59 -07:00
Alexey Bataev
8d307f59ee [SLP]Fix PR69246: do not treat resizing maskas identity.
If the mask is resizing and the mask size is greater than than the
length of the vector, being reused from extractelement instructions, the
mask for undefs cannot be treated as identity, must be treated as
a broadcast.
2023-10-24 08:14:13 -07:00
Alexey Bataev
254558ac53 [SLP]Fix PR69976: Check for multi-node uses during node building.
Need to check if there is already a node created for the multi-node
instruction before ending up with creating a new node for such
instructions.
2023-10-24 07:01:46 -07:00
Douglas Yung
734b016b66 Revert "[SLP] Improve gather tree nodes matching when users are PHIs. (#69392)"
This reverts commit c80b503496.

This change causes a fatal error in the backend and is filed as issue #69670.
2023-10-20 10:59:07 -07:00
Alexey Bataev
553616a213 [SLP][NFC]Add avx2 test run, NFC. 2023-10-19 09:12:37 -07:00
Valery Dmitriev
c80b503496 [SLP] Improve gather tree nodes matching when users are PHIs. (#69392) 2023-10-18 09:05:11 -07:00