Commit Graph

2136 Commits

Author SHA1 Message Date
Florian Hahn
cc39866436 [LV] Reorganize and extend in-loop reduction tests.
Split off min-max in-loop reduction tests into separate file and extend
them by adding tests with
 * min & max intrinsics
 * fmuladd with permuted operands
 * min & max select tests with permuted operands.

Adds extra test coverage as suggested in D155845.
2023-07-26 23:23:14 +01:00
Anna Thomas
e85fd3cbdd Revert "[LV] Complete load groups and release store groups in presence of dependency"
This reverts commit eaf6117f33 (D155520).
There's an ASAN build failure that needs investigation.
2023-07-26 15:07:26 -04:00
Ramkumar Ramachandra
110ec1863a LoopVectorize/iv-select-cmp: add test for decreasing IV, const start
The most straightforward extension to D150851 would involve a loop with
decreasing induction variable, with a constant start value.
iv-select-cmp.ll only contains a negative test for the decreasing
induction variable case when the start value is variable, namely
not_vectorized_select_decreasing_induction_icmp. Hence, add a test for
the most straightforward extension to D150851, in preparation to
vectorize:

  long rdx = 331;
  for (long i = 19999; i >= 0; i--) {
    if (a[i] > 3)
      rdx = i;
  }
  return rdx;

Differential Revision: https://reviews.llvm.org/D156152
2023-07-26 14:15:26 +01:00
Anna Thomas
eaf6117f33 [LV] Complete load groups and release store groups in presence of dependency
This is a complete fix for CompleteLoadGroups introduced in
D154309. We need to check for dependency between A and every member of
the load Group of B.
This patch also fixes another miscompile seen when we incorrectly sink stores
below a depending load (see testcase in
interleaved-accesses-sink-store-across-load.ll). This is fixed by
releasing store groups correctly.

Differential Revision: https://reviews.llvm.org/D155520
2023-07-25 17:32:09 -04:00
Martin Storsjö
245ec675a4 Revert "[LV] Re-use existing broadcast value for live-ins."
This reverts commit eea9258648.

That commit triggered crashes in the following testcase:

$ cat reduced.c
typedef struct {
  int a[8]
} b;
typedef struct {
  b *c;
  short d
} e;
void f() {
  int g;
  char *h;
  e *i = f;
  short j = i->d;
  int a = i->c->a[0];
  for (;;)
    for (; g < a; g++) {
      *h = j * i->d >> 8;
      h++;
    }
}
$ clang -target aarch64-linux-gnu -w -c -O2 reduced.c
2023-07-25 10:35:41 +03:00
Florian Hahn
eea9258648 [LV] Re-use existing broadcast value for live-ins.
When requesting a vector value for a live-in, we can re-use the
broadcast of the live-in of part 0 for parts > 0.
2023-07-24 11:50:47 +01:00
Maciej Gabka
38cdb007a5 Add missing SLEEF mappings to scalable vector functions for log2 and log2f
In the original commit adding SLEEF mappings, https://reviews.llvm.org/D146839
mappings for log2/log2f were missing.

Reviewed By: paulwalker-arm

Differential Revision: https://reviews.llvm.org/D155801
2023-07-21 13:59:13 +00:00
Maciej Gabka
b172fbff68 Revert "[TLI][AArch64] Add missing SLEEF mappings to scalable vector functions for log2 and log2f"
This reverts commit 791c89600a.
2023-07-21 13:50:10 +00:00
Maciej Gabka
791c89600a [TLI][AArch64] Add missing SLEEF mappings to scalable vector functions for log2 and log2f
In the original commit adding SLEEF mappings, https://reviews.llvm.org/D146839
mappings for log2/log2f were missing.

Reviewed By: paulwalker-arm

Differential Revision: https://reviews.llvm.org/D155623
2023-07-21 13:46:03 +00:00
David Green
2e0bf67df1 [LV][AArch64] Fix reductions costs in strict-fadd-cost.ll. NFC
These tests were originally added in 0aff1798b5, where they
were measuring the cost of fadd and fmuladd reductions, which should be fairly
high cost. For some reason, due to the forced vector factors, the debug costs
of each instruction are printed twice by the vectorizer. Once as if the
instruction is a simple fadd/fmuladd, and later with the correct reduction
cost.

In d827865e9f the costs were updated to match the first
print statements, where they would be better to match the second to test the
cost of the reduction.

This patch returns them to testing the original reduction costs.
2023-07-20 10:34:05 +01:00
Mel Chen
4ddc1745a8 [LV] Add tests for select-cmp reduction pattern. (NFC)
The test cases for selecting increasing integer induction variable.

Reviewed By: fhahn, shiva0217

Differential Revision: https://reviews.llvm.org/D153936
2023-07-19 20:17:36 -07:00
Philip Reames
7cc6b80d9a [RISCV][CostModel] Model vrgather.vv as being quadradic in LMUL
vrgather.vv across multiple vector registers (i.e. LMUL > 1) requires all to all data movement. This includes two conceptual sets of changes:

    For permutes, we were modeling these as being linear in LMUL.
    For reverse, we were modeling them as being fixed cost in LMUL.

Both were wrong, and have been adjusted to O(LMUL^2).  Noticed via code inspection while looking at something else.

Its worth asking whether we should be lowering reverse to something other than a vrgather at high LMULs. That shuffle is quite expensive.  (Future work)

Differential Revision: https://reviews.llvm.org/D152019
2023-07-18 11:52:34 -07:00
Sander de Smalen
08fd44b300 [AArch64] Force streaming-compatible codegen when attributes are set.
Before this patch, the only way to generate streaming-compatible code
was to use the `-force-streaming-compatible-sve` flag, but the compiler
should also avoid the use of instructions invalid in streaming mode
when a function has the aarch64_pstate_sm_enabled/compatible attribute.

Reviewed By: paulwalker-arm, david-arm

Differential Revision: https://reviews.llvm.org/D155428
2023-07-18 10:26:00 +00:00
Florian Hahn
68746a8cea [LV] Move all VPlan transforms after initial VPlan construction.
Reorder VPlan transforms slightly so they are all grouped together,
after disabling Value -> VPValue lookup. In terms of codegen impact,
this should be NFC modulo a small number of instruction reorderings.

Preparation to split up tryToBuildVPlanWithVPRecipes in a follow-up.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D154640
2023-07-18 10:53:30 +01:00
Anna Thomas
a5573bf030 [LV] Precommit test for interleaving miscompile
Identified another miscompile while working on fixing interleaving's
current miscompile in D154309. This is different from testcases landed in D154309,
since it showcases an incorrect sinking of store (the former testcases
in that review and follow-up ones) showed incorrect hoisting of loads
across stores.
2023-07-17 17:24:40 -04:00
zhongyunde
4d2723bd00 [ValueTracking] Support vscale assumes for isKnownToBeAPowerOfTwo
This patch is separated from D154953 to see what tests are affected by this
change alone according comment.
Depend on the related updating of LangRef on D155193.

Reviewed By: paulwalker-arm, nikic, david-arm
Differential Revision: https://reviews.llvm.org/D155350
2023-07-15 19:42:58 +08:00
Anna Thomas
dfaf4587e4 Precommit follow-up testcase for interleaved miscompile
Follow-up testcase for PR63602.

Suggested by Ayal in D154309, more complete fix coming up which should
handle this testcase as well.
2023-07-14 16:04:56 -04:00
Maciej Gabka
5b0e19a7ab [TLI][AArch64] Add mappings to vectorized functions from ArmPL
Arm Performance Libraries contain math library which provides
vectorized versions of common math functions.
This patch allows to use it with clang and llvm via -fveclib=ArmPL or
-vector-library=ArmPL, so loops with such calls can be vectorized.
The executable needs to be linked with the amath library.

Arm Performance Libraries are available at:
https://developer.arm.com/Tools%20and%20Software/Arm%20Performance%20Libraries

Reviewed by: paulwalker-arm
Differential Revision: https://reviews.llvm.org/D154508
2023-07-12 12:53:18 +00:00
Nikita Popov
edb2fc6dab [llvm] Remove explicit -opaque-pointers flag from tests (NFC)
Opaque pointers mode is enabled by default, no need to explicitly
enable it.
2023-07-12 14:35:55 +02:00
Mel Chen
0158d86ab3 [LV] Change the test cases to ensure that the trip count is not zero. (NFC)
Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D154415
2023-07-11 19:12:59 -07:00
Florian Hahn
d7e79bd7d4 [LV] Check if ops can safely be truncated in computeMinimumValueSizes.
Update computeMinimumValueSizes to check if an instruction's operands
can safely be truncated.

If more than MinBW bits are demanded by for the operand or if the
operand is a constant and cannot be safely truncated, it is not safe to
evaluate the instruction in the narrower MinBW. Skip those cases.

Fixes https://github.com/llvm/llvm-project/issues/47927

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D154717
2023-07-11 20:18:55 +01:00
Florian Hahn
1739200654 [LV] Add trunc test variants with shl and ashr.
Add extra tests for D154717 where narrowing results in poison.
2023-07-10 21:04:19 +01:00
Florian Hahn
14ec3f4b06 [LV] Skip VFs > # iterations remaining for epilogue vectorization.
If a candidate VF for epilogue vectorization is greater than the number of
remaining iterations, the epilogue loop would be dead. Skip such factors.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D154264
2023-07-07 21:43:51 +01:00
Florian Hahn
aee851fd0e Revert "[LV] Skip VFs < iterations remaining for epilogue vectorization."
This reverts commit 7cc0be01a0.

The title of the commit is incorrect, revert to fix the commit message.
2023-07-07 21:41:24 +01:00
Florian Hahn
7cc0be01a0 [LV] Skip VFs < iterations remaining for epilogue vectorization.
If a candidate VF for epilogue vectorization is less than the number of
remaining iterations, the epilogue loop would be dead. Skip such factors.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D154264
2023-07-07 20:33:42 +01:00
Luke Lau
b9af086292 [RISCV] Update loop vectorizer interleaved access test output
02bb33c3ce changed it so it no longer unrolls the
loop.
2023-07-07 15:38:04 +01:00
Nikita Popov
a5e253d659 [LoopVectorize] Regenerate test checks (NFC) 2023-07-07 14:42:31 +02:00
Florian Hahn
4d847bf4d0 [LV] Do not add load to group if it moves across conflicting store.
This patch prevents invalid load groups from being formed, where a load
needs to be moved across a conflicting store.

Once we hit a store that conflicts with a load with an existing
interleave group, we need to stop adding earlier loads to the group, as
this would force hoisting the previous stores in the group across the
conflicting load.

To detect such cases, add a new CompletedLoadGroups set, which is used
to keep track of load groups to which no earlier loads can be added.

Fixes https://github.com/llvm/llvm-project/issues/63602

Reviewed By: anna

Differential Revision: https://reviews.llvm.org/D154309
2023-07-07 11:06:30 +01:00
Florian Hahn
6b289304f6 [LV] Add test case for incorrect shift truncation.
Test for https://github.com/llvm/llvm-project/issues/47927
2023-07-06 15:23:17 +01:00
Florian Hahn
a0fcf84a8c [LV] Consider if scalar epilogue is required in getMaximizedVFForTarget.
When a scalar epilogue is required, at least one iteration of the scalar loop
has to execute. Adjust ConstTripCount accordingly to avoid picking a max VF
that results in a dead vector loop.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D154261
2023-07-06 13:35:35 +01:00
Florian Hahn
1746ac42ca [LV] Forget SCEVs for exit phis after vectorization.
After vectorization, the exit blocks of the original loop will have additional
predecessors. Invalidate SCEVs for the exit phis in case SE looked through
single-entry phis.

Fixes https://github.com/llvm/llvm-project/issues/63368
Fixes https://github.com/llvm/llvm-project/issues/63669
2023-07-04 21:28:03 +01:00
Florian Hahn
8a25dc3787 [LV] Regenerate check lines to reduced diff.
Regenerate checks to avoid unnecessary changes in D154264.
2023-07-04 14:01:05 +01:00
Evgeniy Brevnov
d7329653d0 [VPlan] Allow sinking of instructions with no defs
We started seeing new failure after D142886. Looks like it enabled new cases and we hit an assert:
assert(Current->getNumDefinedValues() == 1 &&
           "only recipes with a single defined value expected");

 When we do instruction sinking for the first order recurrence we hit an assert if instruction doesn't have single def. In case instruction doesn't produce any new def there is no new users and nothing to sink.

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D151204
2023-07-04 16:53:06 +07:00
Florian Hahn
e561edaaa5 [LV] Prepare tests for D154261.
Update trip count of test in
pr56319-vector-exit-cond-optimization-epilogue-vectorization.ll to
make sure epilogue vectorization will still trigger after D154261,
checking for the original issue.

Move the original test to limit-vf-by-tripcount.ll for testing new
functionality of D154261.
2023-07-03 17:49:36 +01:00
Florian Hahn
c14b0a7c55 [LV] Check for vector instruction in main vector loop.
Update the test to check for the vectorization call in the main vector
loop, not the dead epilogue vector loop as it does currently.
2023-07-03 14:16:47 +01:00
Florian Hahn
6954cb5425 [LV] Add test case for #63602. 2023-07-02 22:17:16 +01:00
Nikita Popov
bb3763e497 Revert "[SimplifyCFG] Allow dropping block that only contains ephemeral values"
This reverts commit 20f0c68fd8.

https://reviews.llvm.org/D153966#4464594 reports an optimization
regression in Rust.

Additionally this change has caused an unexpected 0.3% compile-time
regression.
2023-06-30 21:24:05 +02:00
Nikita Popov
20f0c68fd8 [SimplifyCFG] Allow dropping block that only contains ephemeral values
Perform the TryToSimplifyUncondBranchFromEmptyBlock() transform if
the block is empty except for ephemeral values. The ephemeral values
will be dropped in that case.

This makes sure that assumes don't block this transforms, as reported
in https://discourse.llvm.org/t/llvm-assume-blocks-optimization/71609.

Differential Revision: https://reviews.llvm.org/D153966
2023-06-30 15:24:01 +02:00
Florian Hahn
9078a9942d [LV] Add additional tests with dead vector epilogues. 2023-06-30 12:17:57 +01:00
Igor Kirillov
17bde328d6 [LV] Add mask support for vectorizing interleaved groups
This patch extends LoopVectorize to handle the vectorization of interleaved
memory accesses with scalable vectors when mask is required or/and predicated
tail folding is enabled.

Differential Revision: https://reviews.llvm.org/D152258
2023-06-29 17:50:56 +00:00
Michael Platings
54c79fa53c [test] Replace aarch64-*-eabi with aarch64
Also replace aarch64_be-*-eabi with aarch64_be

Using "eabi" for aarch64 targets is a common mistake and warned by Clang Driver.
We want to avoid it elsewhere as well. Just use the common "aarch64" without
other triple components.

Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D153943
2023-06-29 09:06:00 +01:00
Igor Kirillov
7049393a58 [LV] Precommit masked interleaved access tests
Precommit for D152258.

Differential Revision: https://reviews.llvm.org/D153443
2023-06-28 09:23:23 +00:00
Fangrui Song
ebbfdca586 [test] Replace aarch64-arm-none-eabi with aarch64
Similar to 02e9441d6c, but for llvm/test and one
lld/test/ELF test.
2023-06-27 19:36:27 -07:00
Florian Hahn
dc9f69e483 [LV] Add test with reduction start values that are/may be poison/undef.
Test cases for #62565.
2023-06-22 20:15:23 +01:00
Anna Thomas
ec146cb7c0 [LV] Add support for minimum/maximum intrinsics
{mini|maxi}mum intrinsics are different from {min|max}num intrinsics in
the propagation of NaN and signed zero. Also, the minnum/maxnum
intrinsics require the presence of nsz flags to be valid reductions in
vectorizer. In this regard, we introduce a new recurrence kind and also
add support for identifying reduction patterns using these intrinsics.

The reduction intrinsics and lowering was introduced here: 26bfbec5d2.

There are tests added which show how this interacts across chains of
min/max patterns.

Differential Revision: https://reviews.llvm.org/D151482
2023-06-20 13:17:28 -04:00
Florian Hahn
0a246a0c72 [LV] Use VPValues when creating GEP with all invariant indices.
Update VPWidenGEPRecipe::execute to use the VPValue operands of the
recipe when creating the GEP instruction.

Fixes #63340.
2023-06-16 16:14:01 +01:00
Florian Hahn
ea6ca9cb2b [LV] Fix crash when stride isn't a constant.
In same cases, the stride may not be a constant. Just skip those cases
for now. This should only happen for cases where LV interleaves only, if
it is vectorized the stride needs to be versioned to a constant.
2023-06-14 16:53:34 +01:00
Simon Pilgrim
4cbedaeff5 [LoopVectorize][X86] Regenerate slm-no-vectorize.ll 2023-06-13 14:15:37 +01:00
Florian Hahn
d209084720 [VPlan] Replace versioned stride with constant during VPlan opts.
After constructing the initial VPlan, replace VPValues for versioned
strides with their constant counterparts.

Differential Revision: https://reviews.llvm.org/D147783
2023-06-13 08:26:55 +01:00
Nikita Popov
2b7c347c7f [LoopVectorize] Convert test to opaque pointers (NFC)
I'm keeping the bitcast in the input here, because without it
we end up introducing a stride 1 assumption and end up testing
a different case.
2023-06-12 14:49:45 +02:00