Commit Graph

644 Commits

Author SHA1 Message Date
Matt Arsenault
66073953a5 AMDGPU: Allow vectorization of round intrinsic
There seems to be a small benefit to the legalized sequence for v2f16
round with packed instructions, so allow vectorizing it by reducing
the cost.

An unintended side effect is vectorization of f32 round also
happens. The current FMA logic seems off to me, and isn't checking for
packed instructions.
2020-03-23 17:00:41 -04:00
Craig Topper
f4c67dfa92 [X86] More accurately model the cost of horizontal reductions.
This patch attempts to more accurately model the reduction of
power of 2 vectors of types we natively support. This takes into
account the narrowing of vectors that occur as we go from 512
bits to 256 bits, to 128 bits. It also takes into account the use
of wider elements in the shuffles for the first 2 steps of a
reduction from 128 bits. And uses a v8i16 shift for the final step
of vXi8 reduction.

The default implementation uses the legalized type for the arithmetic
for all levels. And uses the single source permute cost of the
legalized type for all levels. This penalizes things like
lack of v16i8 pshufb on pre-sse3 targets and the splitting and
joining that needs to be done for integer types on AVX1. We never
need v16i8 shuffle for a reduction and we only need split AVX1 ops
when type the type wide and needs to be split. I think we're still
over costing splits and joins for AVX1, but we're closer now.

I've also removed all pairwise special casing because I don't
think we ever want to generate that on X86. I've also adjusted
the add handling to more accurately account for any type splitting
that occurs before we reach a legal type.

Differential Revision: https://reviews.llvm.org/D76478
2020-03-22 14:20:15 -07:00
Huihui Zhang
fc1f205745 [SLPVectorizer][SVE] Bail out early for scalable vector.
Summary:
SLPVectorizer try to vectorize list of scalar instructions of the same type,
instructions already vectorized are rejected through isValidElementType().

Without this patch, tryToVectorizeList() will first try to determine vectorization
factor of a list of Instructions before checking whether each instruction has unsupported
type or not. For instructions already vectorized for SVE, it will crash at getVectorElementSize(),
where it try to return a fixed size.

This patch make sure invalid element types are rejected before trying to get vectorization
factor. This make sure we are not trying to vectorize instructions already vectorized.

Reviewers: sdesmalen, efriedma, spatel, RKSimon, ABataev, apazos, rengolin

Reviewed By: efriedma

Subscribers: tschuett, hiraditya, rkruppe, psnobl, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D76017
2020-03-13 11:23:31 -07:00
Simon Pilgrim
a2db388dce [CostModel][X86] Improve ISD::CTTZ costs accounting for BSF/TZCNT implementations 2020-03-13 16:51:13 +00:00
Florian Hahn
2d6ecf4648 [SLP] Support vectorizing functions provided by vector libs.
It seems like the SLPVectorizer is currently not aware of vector
versions of functions provided by libraries like Accelerate [1].
This patch updates SLPVectorizer to use the same infrastructure
the LoopVectorizer uses to detect vectorizable library functions.

For calls, it computes the cost of an intrinsic call (existing behavior)
and the cost of a vector function library call, if available. Like
LoopVectorizer, it assumes the cost of the vector function is simply the
cost of a call to a vector function.

[1] https://developer.apple.com/documentation/accelerate

Reviewers: ABataev, RKSimon, spatel

Reviewed By: ABataev

Differential Revision: https://reviews.llvm.org/D75878
2020-03-10 13:10:50 +00:00
Simon Pilgrim
5cbddf7cbc [X86][SSE] Add more accurate costs for fmaxnum/fminnum codegen
Based off llvm-mca reports on codegen in llvm\test\CodeGen\X86\fmaxnum.ll + llvm\test\CodeGen\X86\fminnum.ll
2020-03-10 11:59:40 +00:00
Simon Pilgrim
9b05596eff [SLPVectorizer][X86] Add fmaxnum/fminnum tests 2020-03-10 11:18:28 +00:00
Florian Hahn
b53907bfed [SLP] Precommit vector library test for D75878. 2020-03-10 10:17:34 +00:00
Alexey Bataev
afa45d23e9 [SLP]Update test checks, NFC. 2020-02-28 13:25:44 -05:00
Simon Pilgrim
168a44a70e [CostModel][X86] Improve extract/insert element costs (PR43605)
This tries to improve the accuracy of extract/insert element costs by accounting for subvector extraction/insertion for >128-bit vectors and the shuffling of elements to/from the 0'th index.

It also adds INSERTPS for f32 types and PINSR/PEXTR costs for integer types (at the moment we assume the same cost as MOVD/MOVQ - which isn't always true).

Differential Revision: https://reviews.llvm.org/D74976
2020-02-27 15:54:13 +00:00
Simon Pilgrim
b82438872b [CostModel][X86] We don't need a scale factor for SLM extract costs
D74976 will handle larger vector types, but since SLM doesn't support AVX+ then we will always be extracting from 128-bit vectors so don't need to scale the cost.
2020-02-24 14:23:04 +00:00
Florian Hahn
e32522ca17 [SLPVectorizer] Do not assume extracelement idx is a ConstantInt.
The index of an ExtractElementInst is not guaranteed to be a
ConstantInt. It can be any integer value. Check explicitly for
ConstantInts.

The new test cases illustrate scenarios where we crash without
this patch. I've also added another test case to check the matching
of extractelement vector ops works.

Reviewers: RKSimon, ABataev, dtemirbulatov, vporpo

Reviewed By: ABataev

Differential Revision: https://reviews.llvm.org/D74758
2020-02-18 18:16:06 +01:00
Matt Arsenault
b38940dfb9 TTI: Fix vectorization cost for bswap 2020-02-14 10:14:07 -08:00
Sanjay Patel
bc1148e7bc [PATCH] D73727: [SLP] drop poison-generating flags for shuffle reduction ops (PR44536)
We may calculate reassociable math ops in arbitrary order when creating a shuffle reduction,
so there's no guarantee that things like 'nsw' hold on those intermediate values. Drop all
poison-generating flags for safety.

This change is limited to shuffle reductions because I don't think we have a problem in the
general case (where we intersect flags of each scalar op that goes into a vector op), but if
there's evidence of other cases being wrong, we can extend this fix to cover those cases.

https://bugs.llvm.org/show_bug.cgi?id=44536

Differential Revision: https://reviews.llvm.org/D73727
2020-01-31 09:54:35 -05:00
Andrei Elovikov
e1d6d36852 [SLP] Don't allow Div/Rem as alternate opcodes
Summary:
We don't have control/verify what will be the RHS of the division, so it might
happen to be zero, causing UB.

Reviewers: Vasilis, RKSimon, ABataev

Reviewed By: ABataev

Subscribers: vporpo, ABataev, hiraditya, llvm-commits, vdmitrie

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D72740
2020-01-21 15:21:17 -08:00
Andrei Elovikov
757fe53994 [SLP] Add a test showing miscompilation in AltOpcode support
Reviewers: Vasilis, RKSimon, ABataev

Reviewed By: RKSimon, ABataev

Subscribers: ABataev, inglorion, dexonsmith, llvm-commits, vdmitrie

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D72739
2020-01-21 14:16:38 -08:00
James Henderson
d68904f957 [NFC] Fix trivial typos in comments
Reviewed By: jhenderson

Differential Revision: https://reviews.llvm.org/D72143

Patch by Kazuaki Ishizaki.
2020-01-06 10:50:26 +00:00
Fangrui Song
a36ddf0aa9 Migrate function attribute "no-frame-pointer-elim"="false" to "frame-pointer"="none" as cleanups after D56351 2019-12-24 16:27:51 -08:00
Fangrui Song
502a77f125 Migrate function attribute "no-frame-pointer-elim" to "frame-pointer"="all" as cleanups after D56351 2019-12-24 15:57:33 -08:00
Alexey Bataev
1edb3ea645 [SLP]Fix test arguments, NFC. 2019-12-19 13:36:21 -05:00
Alexey Bataev
bc28f17e4f [SLP]Added test for gathering reused extracts from narrow vector, NFC. 2019-12-19 12:51:56 -05:00
Anton Afanasyev
a315519c17 [SLP] Enhance SLPVectorizer to vectorize different combinations of aggregates
Summary:
Make SLPVectorize to recognize homogeneous aggregates like
`{<2 x float>, <2 x float>}`, `{{float, float}, {float, float}}`,
`[2 x {float, float}]` and so on.
It's a follow-up of https://reviews.llvm.org/D70068.
Merged `findBuildVector()` and `findBuildAggregate()` to
one `findBuildAggregate()` function making it recursive
to recognize multidimensional aggregates. Aggregates required
to be homogeneous.

Reviewers: RKSimon, ABataev, dtemirbulatov, spatel, vporpo

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D70587
2019-12-03 19:29:27 +03:00
Sanjay Patel
5c166f1d19 [x86] make SLM extract vector element more expensive than default
I'm not sure what the effect of this change will be on all of the affected
tests or a larger benchmark, but it fixes the horizontal add/sub problems
noted here:
https://reviews.llvm.org/D59710?vs=227972&id=228095&whitespace=ignore-most#toc

The costs are based on reciprocal throughput numbers in Agner's tables for
PEXTR*; these appear to be very slow ops on Silvermont.

This is a small step towards the larger motivation discussed in PR43605:
https://bugs.llvm.org/show_bug.cgi?id=43605

Also, it seems likely that insert/extract is the source of perf regressions on
other CPUs (up to 30%) that were cited as part of the reason to revert D59710,
so maybe we'll extend the table-based approach to other subtargets.

Differential Revision: https://reviews.llvm.org/D70607
2019-11-27 14:08:56 -05:00
Anton Afanasyev
80cd6b6e04 [SLP] Enhance SLPVectorizer to vectorize vector aggregate
Summary:
Vector aggregate is homogeneous aggregate of vectors like `{ <2 x float>, <2 x float> }`.
This patch allows `findBuildAggregate()` to consider vector aggregates as
well as scalar ones. For instance, `{ <2 x float>, <2 x float> }` maps to `<4 x float>`.

Fixes vector part of llvm.org/PR42022

Reviewers: RKSimon

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D70068
2019-11-22 20:01:59 +03:00
Anton Afanasyev
6d73265ad8 [SLP][Test] Precommit tests for D70068 and D70587. NFC. 2019-11-22 19:47:21 +03:00
Eric Christopher
714aabacfb Temporarily Revert "[SLP] allow forming 2-way reduction patterns" and update testcases.
After speaking with Sanjay - seeing a number of miscompiles and working
on tracking down a testcase. None of the follow on patches seem to
have helped so far.

This reverts commit 8a0aa5310b.
2019-11-20 16:00:53 -08:00
Eric Christopher
8a0aa5310b Temporarily Revert "Temporarily Revert "[SLP] allow forming 2-way reduction patterns""
as there were testcase changes after that need to also be reverted.

This reverts commit cd8748a15f.
2019-11-20 15:39:47 -08:00
Eric Christopher
cd8748a15f Temporarily Revert "[SLP] allow forming 2-way reduction patterns"
After speaking with Sanjay - seeing a number of miscompiles and working
on tracking down a testcase. None of the follow on patches seem to
have helped so far.

This reverts commit 7ff57705ba.
2019-11-20 15:19:31 -08:00
Sanjay Patel
b80033ef65 [SLP] reduce duplicate CHECK lines in tests; NFC 2019-11-20 10:12:54 -05:00
Sanjay Patel
0a8e7ca402 [SLP] fix miscompile on min/max reductions with extra uses (PR43948) (2nd try)
The 1st attempt was reverted because it revealed an existing
bug where we could produce invalid IR (use of value before
definition). That should be fixed with:
rG39de82ecc9c2

The bug manifests as replacing a reduction operand with an undef
value.

The problem appears to be limited to cases where a min/max reduction
has extra uses of the compare operand to the select.

In the general case, we are tracking "ExternallyUsedValues" and
an "IgnoreList" of the reduction operations, but those may not apply
to the final compare+select in a min/max reduction.

For that, we use replaceAllUsesWith (RAUW) to ensure that the new
vectorized reduction values are transferred to all subsequent users.

Differential Revision: https://reviews.llvm.org/D70148
2019-11-19 14:57:35 -05:00
Sanjay Patel
39de82ecc9 [SLP] fix insertion point for min/max reduction
As discussed in D70148 (and caused a revert of the original commit):
if we insert at the select, then we can produce invalid IR because
the replacement for the compare may have uses before the select.
2019-11-19 10:50:10 -05:00
Sanjay Patel
6265be2782 [SLP] add test for reduction miscompile; NFC
See D70148 for discussion.
2019-11-19 10:24:32 -05:00
Eric Christopher
6f1cc4151a Temporarily revert "[SLP] fix miscompile on min/max reductions with extra uses (PR43948)"
as it causes an ICE on valid. A testcase was followed up on the original thread.

This reverts commit a3e61946c5.
2019-11-18 14:41:37 -08:00
Sanjay Patel
b763924bd0 [SLP] reduce duplicated check lines in tests; NFC 2019-11-18 17:03:07 -05:00
Alexey Bataev
bfa32573bf Revert "Temporarily Revert:"
This reverts commit e511c4b0df:

    Temporarily Revert:

     "[SLP] Generalization of stores vectorization."
     "[SLP] Fix -Wunused-variable. NFC"
     "[SLP] Vectorize jumbled stores."

after fixing the problem with compile time.
2019-11-14 16:38:20 -05:00
Sanjay Patel
a3e61946c5 [SLP] fix miscompile on min/max reductions with extra uses (PR43948)
The bug manifests as replacing a reduction operand with an undef
value.

The problem appears to be limited to cases where a min/max reduction
has extra uses of the compare operand to the select.

In the general case, we are tracking "ExternallyUsedValues" and
an "IgnoreList" of the reduction operations, but those may not apply
to the final compare+select in a min/max reduction.

For that, we use replaceAllUsesWith (RAUW) to ensure that the new
vectorized reduction values are transferred to all subsequent users.

Differential Revision: https://reviews.llvm.org/D70148
2019-11-13 15:57:35 -05:00
Sanjay Patel
142cbe73e9 [SLP] improve test readability; NFC 2019-11-13 12:59:00 -05:00
Sanjay Patel
2d06375c3f [SLP] add test for miscompile with reduction (PR43948); NFC 2019-11-12 11:36:41 -05:00
Vasileios Porpodas
6a18a95487 [SLP] Look-ahead operand reordering heuristic.
Summary: This patch introduces a new heuristic for guiding operand reordering. The new "look-ahead" heuristic can look beyond the immediate predecessors. This helps break ties when the immediate predecessors have identical opcodes (see lit test for examples).

Reviewers: RKSimon, ABataev, dtemirbulatov, Ayal, hfinkel, rnk

Reviewed By: RKSimon, dtemirbulatov

Subscribers: xbolva00, Carrot, hiraditya, phosek, rnk, rcorcs, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D60897
2019-11-11 21:06:51 -08:00
Sanjay Patel
7ff57705ba [SLP] allow forming 2-way reduction patterns
We have a vector compare reduction problem seen in PR39665 comment 2:
https://bugs.llvm.org/show_bug.cgi?id=39665#c2

Or slightly reduced here:

define i1 @cmp2(<2 x double> %a0) {
  %a = fcmp ogt <2 x double> %a0, <double 1.0, double 1.0>
  %b = extractelement <2 x i1> %a, i32 0
  %c = extractelement <2 x i1> %a, i32 1
  %d = and i1 %b, %c
  ret i1 %d
}

SLP would not attempt to turn this into a vector reduction because there is an
artificial lower limit on that transform. We can not completely remove that limit
without inducing regressions though, so this patch just hacks an extra attempt at
creating a 2-way reduction to the end of the analysis.

As shown in the test file, we are still not getting some of the motivating cases,
so follow-on patches will be needed to solve those cases.

Differential Revision: https://reviews.llvm.org/D59710
2019-11-07 06:08:42 -05:00
Eric Christopher
e511c4b0df Temporarily Revert:
"[SLP] Generalization of stores vectorization."
 "[SLP] Fix -Wunused-variable. NFC"
 "[SLP] Vectorize jumbled stores."

As they're causing significant (10-30x) compile time regressions on
vectorizable code.

The primary cause of the compile-time regression is f228b53716.

This reverts commits:

f228b53716
5503455ccb
21d498c9c0
2019-11-06 16:06:15 -08:00
Sanjay Patel
7effd37b00 [SLP] add tests for 2-wide reductions; NFC 2019-11-05 17:18:37 -05:00
Alexey Bataev
b80c41cd3c [SLP]Fix PR43799: Crash on different sizes of GEP indices.
Summary:
If the GEP instructions are going to be vectorized, the indices in those
GEP instructions must be of the same type. Otherwise, the compiler may
crash when trying to build the vector constant.

Reviewers: RKSimon, spatel

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D69627
2019-11-04 10:36:26 -05:00
Sanjay Patel
fc98907535 [SLP] avoid 'tmp' value name conflict with auto-generated CHECK script; NFC
The script uses 'TMP#' as its substitute for nameless values,
so if a test already contains 'tmp#' *named* values, then
there could be trouble. We should probably just fix the
script to avoid this problem going forward, but it's easy
enough to change a test too (and explicitly naming variables
'tmp' is always a sad choice).
2019-11-01 09:27:35 -04:00
Sanjay Patel
7faf33484e [SLP] avoid 'tmp' value name conflict with auto-generated CHECK script; NFC
The script uses 'TMP#' as its substitute for nameless values,
so if a test already contains 'tmp#' *named* values, then
there could be trouble. We should probably just fix the
script to avoid this problem going forward, but it's easy
enough to change a test too (and explicitly naming variables
'tmp' is always a sad choice).
2019-11-01 09:27:35 -04:00
Sanjay Patel
37628802be [SLP] avoid 'tmp' value name conflict with auto-generated CHECK script; NFC
The script uses 'TMP#' as its substitute for nameless values,
so if a test already contains 'tmp#' *named* values, then
there could be trouble. We should probably just fix the
script to avoid this problem going forward, but it's easy
enough to change a test too (and explicitly naming variables
'tmp' is always a sad choice).
2019-11-01 09:27:35 -04:00
Alexey Bataev
70ad617dd6 [SLP] Vectorize jumbled stores.
Summary:
Patch adds support for vectorization of the jumbled stores. The value
operands are vectorized and then shuffled in the right order before
store.

Reviewers: RKSimon, spatel, hfinkel, mkuper

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D43339
2019-10-31 16:02:25 -04:00
Haojian Wu
e65ddcafee Revert "[SLP] Vectorize jumbled stores."
This reverts commit 21d498c9c0.

This commit causes some crashes on some targets.
2019-10-31 10:21:24 +01:00
Alexey Bataev
21d498c9c0 [SLP] Vectorize jumbled stores.
Summary:
Patch adds support for vectorization of the jumbled stores. The value
operands are vectorized and then shuffled in the right order before
store.

Reviewers: RKSimon, spatel, hfinkel, mkuper

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D43339
2019-10-30 13:33:52 -04:00
Alexey Bataev
f228b53716 [SLP] Generalization of stores vectorization.
Stores are vectorized with maximum vectorization factor of 16. Patch
tries to improve the situation and use maximal vectorization factor.

Reviewers: spatel, RKSimon, mkuper, hfinkel

Differential Revision: https://reviews.llvm.org/D43582
2019-10-29 11:46:36 -04:00