Commit Graph

532 Commits

Author SHA1 Message Date
Nikita Popov
b7b0ce6e76 [LoopUnroll] Convert test to opaque pointers (NFC) 2023-01-06 11:48:03 +01:00
Nikita Popov
9c7afbacd8 [LoopUnroll] Name instructions in test (NFC) 2023-01-06 11:48:03 +01:00
Nikita Popov
ef992b6079 [LoopUnroll] Convert some tests to opaque pointers (NFC) 2022-12-23 16:35:26 +01:00
Roman Lebedev
3a8e009f97 Revert "Reland "[SimplifyCFG] FoldBranchToCommonDest(): deal with mismatched IV's in PHI's in common successor block""
One of these two changes is exposing (or causing) some more miscompiles.
A reproducer is in progress, so reverting until resolved.

This reverts commit 428f36401b.
2022-12-20 18:36:42 +03:00
Roman Lebedev
428f36401b Reland "[SimplifyCFG] FoldBranchToCommonDest(): deal with mismatched IV's in PHI's in common successor block"
This reverts commit 37b8f09a4b,
and returns commit 1bd0b82e50.
The miscompile was in InstCombine, and it has been addressed.

This tries to approach the problem noted by @arsenm:
terrible codegen for `__builtin_fpclassify()`:
https://godbolt.org/z/388zqdE37

Just because the PHI in the common successor happens to have different
incoming values for these two blocks, doesn't mean we have to give up.
It's quite easy to deal with this, we just need to produce a select:
https://alive2.llvm.org/ce/z/000srb

Now, the cost model for this transform is rather overly strict,
so this will basically never fire. We tally all (over all preds)
the selects needed to the NumBonusInsts

Differential Revision: https://reviews.llvm.org/D139275
2022-12-17 05:18:54 +03:00
Alexander Kornienko
37b8f09a4b Revert "[SimplifyCFG] FoldBranchToCommonDest(): deal with mismatched IV's in PHI's in common successor block"
This reverts commit 1bd0b82e50, since it leads to
miscompiles. See https://reviews.llvm.org/D139275#3993229 and
https://reviews.llvm.org/D139275#4001580.
2022-12-16 17:23:35 +01:00
Roman Lebedev
da80639ee2 [NFC][IndVar] Autogenerate checklines in one test 2022-12-14 17:39:10 +03:00
Roman Lebedev
a64b2e9e3e [NFC][SCEV][LoopUnroll] Add tests where treating or as add raises expansion cost
From https://reviews.llvm.org/rG46db90cc71d1#1154128
2022-12-12 20:41:56 +03:00
Roman Lebedev
1bd0b82e50 [SimplifyCFG] FoldBranchToCommonDest(): deal with mismatched IV's in PHI's in common successor block
This tries to approach the problem noted by @arsenm:
terrible codegen for `__builtin_fpclassify()`:
https://godbolt.org/z/388zqdE37

Just because the PHI in the common successor happens to have different
incoming values for these two blocks, doesn't mean we have to give up.
It's quite easy to deal with this, we just need to produce a select:
https://alive2.llvm.org/ce/z/000srb

Now, the cost model for this transform is rather overly strict,
so this will basically never fire. We tally all (over all preds)
the selects needed to the NumBonusInsts

Differential Revision: https://reviews.llvm.org/D139275
2022-12-12 18:20:03 +03:00
Bjorn Pettersson
3528e63d89 [test] Remove duplicate RUN lines in Transform tests 2022-12-08 11:47:16 +01:00
Roman Lebedev
86faf2cd88 [NFC] Port all LoopUnroll tests to -passes= syntax 2022-12-08 02:38:47 +03:00
Roman Lebedev
5103ef64fe [NFC] Port all (but one) LoopUnroll tests to -passes= syntax 2022-12-07 20:15:43 +03:00
Jamie Schmeiser
2b6683fd5f Expand loop peeling phi computation to handle binary ops and casts
Summary:
Expand the capabilities of the code for computing how many peels are
needed to make phis determined.  A cast gets the peel count for the
value being casted while a binary op gets the maximum of the operands.

Respond to review comments: remove redundant asserts.

Author: Jamie Schmeiser <schmeise@ca.ibm.com>
Reviewed By:mkazantsev (Max Kazantsev),syzaara (Zaara Syeda)
Differential Revision: https://reviews.llvm.org/D138719
2022-12-05 12:10:53 -05:00
Roman Lebedev
b79921a4a8 [NFC] Re-autogenerate checklines in a few tests being affected 2022-12-04 20:58:55 +03:00
Jamie Schmeiser
be1ff1fe58 [NFC] Refactor loop peeling code for calculating phi invariance.
Summary:
Refactor loop peeling code by moving code for calculating phi invariance
into a separate class that does the calculation.  Redescribe and rework
the algorithm in preparation for adding increased functionality.  Add
test case that does not exhibit peeling that will be subsequently supported.

Author: Jamie Schmeiser <schmeise@ca.ibm.com>
Reviewed By: mkazantsev (Max Kazantsev)
Differential Revision: https://reviews.llvm.org/D138232
2022-11-25 09:07:14 -05:00
Alina Sbirlea
d1b19da854 [LoopPeeling] Add flag to disable support for peeling loops with non-latch exits
Add a flag to allow disabling the changes in
https://reviews.llvm.org/D134803.

Differential Revision: https://reviews.llvm.org/D136643
2022-10-25 12:19:14 -07:00
Yaxun (Sam) Liu
9d5adc7e49 Revert "reland e5581df60a [SimplifyCFG] accumulate bonus insts cost"
This reverts commit bd7949bcd8.

Revert this patch since reviwers have different opinions regarding
the approach in post-commit review.

Will open RFC for further discussion.

Differential Revision: https://reviews.llvm.org/D132408
2022-10-25 12:15:39 -04:00
Yaxun (Sam) Liu
bd7949bcd8 reland e5581df60a [SimplifyCFG] accumulate bonus insts cost
Fixed compile time increase due to always constructing LocalCostTracker.
Now only construct LocalCostTracker when needed.
2022-10-24 15:43:53 -04:00
Florian Hahn
e302fa89aa [LoopUnroll] Forget exit values when making changes.
When unrolling, the exit values in LCSSA phis will get updated.
Invalidate cached SCEV values for those phis in case SCEV looked through
a exit phi.

Fixes #58340.
2022-10-18 15:12:24 +01:00
Florian Hahn
b0ded70ebf [LoopUnroll] Add test for mis-compile due to missing SCEV invalidation.
Test for #58340.
2022-10-18 14:56:44 +01:00
Arthur Eubanks
f3a928e233 [opt] Don't translate legacy -analysis flag to require<analysis>
Tests relying on this should explicitly use -passes='require<analysis>,foo'.
2022-10-07 14:54:34 -07:00
Florian Hahn
ec86e9a99b [LoopUnroll] Add test for crash exposed by 9e931439. 2022-10-07 20:02:58 +01:00
Nikita Popov
b43a4d0850 [LoopPeeling] Support peeling loops with non-latch exits
Loop peeling currently requires that a) the latch is exiting
b) a branch and c) other exits are unreachable/deopt. This patch
removes all of these limitations, and adds the necessary branch
weight updating support. It essentially works the same way as
before with latch -> exiting terminator and
loop trip count -> per exit trip count.

It's worth noting that there are still other limitations in
profitability heuristics: This patch enables peeling of loops to
make conditions invariant (which is pretty much always highly
profitable if possible), while peeling to make loads dereferenceable
still checks that non-latch exits are unreachable and PGO-based
peeling has even more conditions. Those checks could be relaxed
later if we consider those cases profitable.

The motivation for this change is that loops using iterator adaptors
in Rust often optimize very badly, and end up with a loop phi of the
form phi(true, false) in the final result. Peeling eliminates that
phi and conditions based on it, which enables a lot of follow-on
simplification.

Differential Revision: https://reviews.llvm.org/D134803
2022-10-07 12:35:52 +02:00
Florian Hahn
7c0ff64b0f [LAA] Change to function analysis for new PM.
At the moment, LoopAccessAnalysis is a loop analysis for the new pass
manager. The issue with that is that LAI caches SCEV expressions and
modifications in a loop may impact SCEV expressions in other loops, but
we do not have a convenient way to invalidate LAI for other loops
withing a loop pipeline.

To avoid this issue, turn it into a function analysis which returns a
manager object that keeps track of the individual LAI objects per loop.

Fixes #50940.

Fixes #51669.

Reviewed By: aeubanks

Differential Revision: https://reviews.llvm.org/D134606
2022-10-01 15:44:27 +01:00
Florian Hahn
27330882a1 [LoopUnroll] Add cache verification failure test case.
Test case for D134612.
2022-09-26 14:25:37 +01:00
Simon Pilgrim
09cb9fdef9 [InstCombine] Fold ult(add(x,-1),c) -> ule(x,c) iff x != 0 (PR57635)
Alive2: https://alive2.llvm.org/ce/z/sZ6wwS

As detailed on Issue #57635 and #37628 - for unsigned comparisons, we can compare prior to a decrement iff the value is known never to be zero.

Differential Revision: https://reviews.llvm.org/D134172
2022-09-20 16:44:41 +01:00
Nikita Popov
dd61726d5b Revert "[SimplifyCFG] accumulate bonus insts cost"
This reverts commit e5581df60a.

This causes major compile-time regressions, about 2-3% end-to-end
on CTMark.
2022-09-19 14:46:43 +02:00
Yaxun (Sam) Liu
e5581df60a [SimplifyCFG] accumulate bonus insts cost
SimplifyCFG folds

bool foo() {
  if (cond1) return false;
  if (cond2) return false;
  return true;
}

as

bool foo() {
  if (cond1 | cond2) return false
  return true;
}

'cond2' is called 'bonus insts' in branch folding since they introduce overhead
since the original CFG could do early exit but the folded CFG always executes
them. SimplifyCFG calculates the costs of 'bonus insts' of a folding a BB into
its predecessor BB which shares the destination. If it is below bonus-inst-threshold,
SimplifyCFG will fold that BB into its predecessor and cond2 will always be executed.

When SimplifyCFG calculates the cost of 'bonus insts', it only consider 'bonus' insts
in the current BB to be considered for folding. This causes issue for unrolled loops
which share destinations, e.g.

bool foo(int *a) {
  for (int i = 0; i < 32; i++)
    if (a[i] > 0) return false;
  return true;
}

After unrolling, it becomes

bool foo(int *a) {
  if(a[0]>0) return false
  if(a[1]>0) return false;
  //...
  if(a[31]>0) return false;
  return true;
}

SimplifyCFG will merge each BB with its predecessor BB,
and ends up with 32 'bonus insts' which are always executed, which
is much slower than the original CFG.

The root cause is that SimplifyCFG does not consider the
accumulated cost of 'bonus insts' which are folded from
different BB's.

This patch fixes that by introducing a ValueMap to track
costs of 'bonus insts' coming from different BB's into
the same BB, and cuts off if the accumulated cost
exceeds a threshold.

Reviewed by: Artem Belevich, Florian Hahn, Nikita Popov, Matt Arsenault

Differential Revision: https://reviews.llvm.org/D132408
2022-09-18 20:21:14 -04:00
Jamie Schmeiser
5e3ac79690 Loop names used in reporting can grow very large
Summary:
The code for generating a name for loops for various reporting scenarios
created a name by serializing the loop into a string.  This may result in
a very large name for a loop containing many blocks.  Use the getName()
function on the loop instead.

Author: Jamie Schmeiser <schmeise@ca.ibm.com>
Reviewed By: Whitney (Whitney Tsang), aeubanks (Arthur Eubanks)
Differential Revision: https://reviews.llvm.org/D133587
2022-09-09 13:45:14 -04:00
Florian Hahn
555e09c2b0 [LAA] Rename printing pass to print<access-info>.
This updates the naming for the LAA printing pass to be in line with
most other analysis printing passes.

The old name has come up as confusing multiple times already, e.g. in
D131924.
2022-08-26 11:00:09 +01:00
Craig Topper
37c47b2cac [RISCV] Change how mtune aliases are implemented.
The previous implementation translated from names like sifive-7-series
to sifive-7-rv32 or sifive-7-rv64. This also required sifive-7-rv32
and sifive-7-rv64 to be valid CPU names. As those are not real
CPUs it doesn't make sense to accept them in -mcpu.

This patch does away with the translation and adds sifive-7-series
directly to RISCV.td. Removing sifive-7-rv32 and sifive-7-rv64.
sifive-7-series is only allowed in -mtune.

I've also added "rocket" to RISCV.td but have not removed rocket-rv32
or rocket-rv64.

To prevent -mcpu=sifive-7-series or -mcpu=rocket being used with llc,
I've added a Feature32Bit to all rv32 CPUs. And made it an error to
have an rv32 triple without Feature32Bit. sifive-7-series and rocket
do not have Feature32Bit or Feature64Bit set so the user would need
to provide -mattr=+32bit or -mattr=+64bit along with the -mcpu to
avoid the error.

SiFive no longer names their newer products with 3, 5, or 7 series.
Instead we have p200 series, x200 series, p500 series, and p600 series.
Following the previous behavior would require a sifive-p500-rv32 and
sifive-p500-rv64 in order to support -mtune=sifive-p500-series. There
is currently no p500 product, but it could start getting confusing if
there was in the future.

I'm open to hearing alternatives for how to achieve my main goal
of removing sifive-7-rv32/rv64 as a CPU name.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D131708
2022-08-18 16:22:25 -07:00
Max Kazantsev
ebabd6bf18 Return "[SCEV] Use context to strengthen flags of BinOps"
This reverts commit 354fa0b480.

Returning as is. The patch was reverted due to a miscompile, but
this patch is not causing it. This patch made it possible to infer
some nuw flags in code guarded by `false` condition, and then someone
else to managed to propagate the flag from dead code outside.

Returning the patch to be able to reproduce the issue.
2022-08-16 14:12:36 +07:00
Max Kazantsev
354fa0b480 Revert "[SCEV] Use context to strengthen flags of BinOps"
This reverts commit 34ae308c73.

Our internal testing found a miscompile. Not sure if it's caused by
this patch or it revealed something else. Reverting while investigating.
2022-08-15 18:51:59 +07:00
Martin Sebor
0dcfe7aa35 [InstCombine] Tighten up known library function signature tests (PR #56463)
Replace a switch statement used to validate arguments to known library
functions with a more consistent table-driven approach and tighten it
up.
2022-08-10 14:15:46 -06:00
Max Kazantsev
34ae308c73 [SCEV] Use context to strengthen flags of BinOps
Sometimes SCEV cannot infer nuw/nsw from something as simple as
```
  len in [0, MAX_INT]
...
  iv = phi(0, iv.next)
  guard(iv <s len)
  guard(iv <u len)
  iv.next = iv + 1
```
just because flag strenthening only relies on definition and does not use local facts.
This patch adds support for the simplest case: inference of flags of `add(x, constant)`
if we can contextually prove that `x <= max_int - constant`.

In case if it has negative CT impact, we can add an option to switch it off. I woudln't
expect that though.

Differential Revision: https://reviews.llvm.org/D129643
Reviewed By: apilipenko
2022-08-03 14:08:57 +07:00
Nikita Popov
534b9246a2 [LoopInfo] Allow cloning of callbr
After D129288, callbr is safe to clone without special handling.
This permits optimizations like loop unroll and loop unswitch on
loops containing callbrs.

Fixes https://github.com/llvm/llvm-project/issues/41834.

Differential Revision: https://reviews.llvm.org/D129993
2022-07-19 09:57:28 +02:00
Nikita Popov
118d8fe46b [LoopUnroll] Regenerate test checks (NFC) 2022-07-18 10:37:22 +02:00
Nikita Popov
2a721374ae [IR] Don't use blockaddresses as callbr arguments
Following some recent discussions, this changes the representation
of callbrs in IR. The current blockaddress arguments are replaced
with `!` label constraints that refer directly to callbr indirect
destinations:

    ; Before:
    %res = callbr i8* asm "", "=r,r,i"(i8* %x, i8* blockaddress(@test8, %foo))
    to label %asm.fallthrough [label %foo]
    ; After:
    %res = callbr i8* asm "", "=r,r,!i"(i8* %x)
    to label %asm.fallthrough [label %foo]

The benefit of this is that we can easily update the successors of
a callbr, without having to worry about also updating blockaddress
references. This should allow us to remove some limitations:

* Allow unrolling/peeling/rotation of callbr, or any other
  clone-based optimizations
  (https://github.com/llvm/llvm-project/issues/41834)
* Allow duplicate successors
  (https://github.com/llvm/llvm-project/issues/45248)

This is just the IR representation change though, I will follow up
with patches to remove limtations in various transformation passes
that are no longer needed.

Differential Revision: https://reviews.llvm.org/D129288
2022-07-15 10:18:17 +02:00
Florian Hahn
6d5f814357 [LoopUnrollRuntime] Invalidate SCEV for exit phi in ConnectProlog.
ConnectProlog adds new incoming values to exit phi nodes which can
change the SCEV for the phi after 20d798bd47.

Fix is analog to cfc741bc0e.

Fixes #56286.
2022-06-29 20:28:43 +01:00
Florian Hahn
9a35f19e3e [UnrollRuntime] Invalidate SCEVs for modified phis in ConnectEpilog.
ConnectEpilog adds new incoming values to exit phi nodes which can
change the SCEV for the phi after 20d798bd47.

Fix is analog to cfc741bc0e.

Fixes #56282.
2022-06-29 18:26:00 +01:00
Florian Hahn
cfc741bc0e [LoopPeel] Forget SCEV for updated exit phi values.
LoopPeel add new incoming values to exit phi nodes which can change the
SCEV for the phi after 20d798bd47.

Forget SCEVs for such phis.

Fixes #56044.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D128164
2022-06-20 13:19:27 +02:00
Philip Reames
206f10d3f6 Plumb InstructionCost through unroll costing
Teach the unroller(s) how to handle an invalid cost. This avoids crashes when the backend can't provide a cost due to either a fundemental limitation or an unimplemented cost model case.

Differential Revision: https://reviews.llvm.org/D127305
2022-06-09 15:42:53 -07:00
Nuno Lopes
80b3dcc045 [Support] Make report_fatal_error respect its GenCrashDiag argument so it doesn't generate a backtrace
There are a few places where we use report_fatal_error when the input is broken.
Currently, this function always crashes LLVM with an abort signal, which
then triggers the backtrace printing code.
I think this is excessive, as wrong input shouldn't give a link to
LLVM's github issue URL and tell users to file a bug report.
We shouldn't print a stack trace either.

This patch changes report_fatal_error so it uses exit() rather than
abort() when its argument GenCrashDiag=false.

Reviewed by: nikic, MaskRay, RKSimon

Differential Revision: https://reviews.llvm.org/D126550
2022-05-30 19:19:23 +01:00
Nikita Popov
81c648a3d9 [LoopUnroll] Freeze tripcount rather than condition
This is a followup to D125754. We introduce two branches, one
before the unrolled loop and one before the epilogue (and similar
for the prologue case). The previous patch only froze the
condition on the first branch.

Rather than independently freezing the second condition, this patch
instead freezes TripCount and bases BECount on it. These are the
two quantities involved in the conditions, and this ensures that
both work on a consistent, non-poisonous trip count.

Differential Revision: https://reviews.llvm.org/D125896
2022-05-24 09:42:39 +02:00
Nikita Popov
e44fe27251 [LoopUnroll] Regenerate test checks (NFC) 2022-05-18 17:20:09 +02:00
Nikita Popov
323514de58 [LoopUnroll] Avoid branch on poison for runtime unroll with multiple exits
When performing runtime unrolling with multiple exits, one of the
earlier (non-latch) exits may exit the loop on the first iteration,
such that we never branch on the latch exit condition. As such, we
need to freeze the condition of the new branch that is introduced
before the loop, as it now executes unconditionally.

Differential Revision: https://reviews.llvm.org/D125754
2022-05-18 09:51:22 +02:00
Whitney Tsang
80304c5f88 [LoopUnroll] Always respect user unroll pragma
IMO when user provide unroll pragma, compiler should always respect it.
It is not clear to me why loop unroll pass currently ensure that the
unrolled loop size is limited by PragmaUnrollThreshold.

Reviewed By: Meinersbur

Differential Revision: https://reviews.llvm.org/D119148
2022-04-11 14:33:24 -04:00
Andrew Wei
0af3e6a22d [InstCombine] Sink instructions with multiple users in a successor block.
This patch tries to sink instructions when they are only used in a successor block.

This is a further enhancement patch based on Anna's commit:
D109700, which allows sinking an instruction having multiple uses in a single user.

In this patch, sink instructions with multiple users in a single successor block will be supported.
It could fix a known issue from rust:
  https://github.com/rust-lang/rust/issues/51346#issuecomment-394443610

Reviewed By: nikic, reames

Differential Revision: https://reviews.llvm.org/D121585
2022-03-18 11:53:45 +08:00
William S. Moses
d9da6a535f [LICM][PhaseOrder] Don't speculate in LICM until after running loop rotate
LICM will speculatively hoist code outside of loops. This requires removing information, like alias analysis (https://github.com/llvm/llvm-project/issues/53794), range information (https://bugs.llvm.org/show_bug.cgi?id=50550), among others. Prior to https://reviews.llvm.org/D99249 , LICM would only be run after LoopRotate. Running Loop Rotate prior to LICM prevents a instruction hoist from being speculative, if it was conditionally executed by the iteration (as is commonly emitted by clang and other frontends). Adding the additional LICM pass first, however, forces all of these instructions to be considered speculative, even if they are not speculative after LoopRotate. This destroys information, resulting in performance losses for discarding this additional information.

This PR modifies LICM to accept a ``speculative'' parameter which allows LICM to be set to perform information-loss speculative hoists or not. Phase ordering is then modified to not perform the information-losing speculative hoists until after loop rotate is performed, preserving this additional information.

Reviewed By: lebedev.ri

Differential Revision: https://reviews.llvm.org/D119965
2022-02-17 20:13:07 -05:00
Roman Lebedev
371fcb720e [SimplifyCFG][PhaseOrdering] Defer lowering switch into an integer range comparison and branch until after at least the IPSCCP
That transformation is lossy, as discussed in
https://github.com/llvm/llvm-project/issues/53853
and https://github.com/rust-lang/rust/issues/85133#issuecomment-904185574

This is an alternative to D119839,
which would add a limited IPSCCP into SimplifyCFG.

Unlike lowering switch to lookup, we still want this transformation
to happen relatively early, but after giving a chance for the things
like CVP to do their thing. It seems like deferring it just until
the IPSCCP is enough for the tests at hand, but perhaps we need to
be more aggressive and disable it until CVP.

Fixes https://github.com/llvm/llvm-project/issues/53853
Refs. https://github.com/rust-lang/rust/issues/85133

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D119854
2022-02-17 12:13:55 +03:00