Commit Graph

5897 Commits

Author SHA1 Message Date
Philip Reames
fa82a3d016 [runtimeunroll] Support epilogue unrolling with a parent loop
This patch adds support for unrolling inner loops using epilogue unrolling. The basic issue is that the original latch exit block of the inner loop could be outside the outer loop.  When we clone the inner loop and split the latch exit, the cloned blocks need to be in the outer loop.

Differential Revision: https://reviews.llvm.org/D108476
2021-09-02 16:29:20 -07:00
Philip Reames
45c672e20d [runtimeunroll] Under EXPENSIVE_CHECKS, validate loop info
Requested in review comment on D108476
2021-09-02 16:28:46 -07:00
Nikita Popov
c86e1ce73b [SCEVExpander] Simplify pointer overflow check
This is a followup to D104662 to generate slightly nicer code for
pointer overflow checks. Bypass expandAddToGEP and instead
explicitly generate i8 GEPs. This saves some bitcasts and negates
the value in a more obvious way. In particular, this prevents SCEV
from looking through the umul.with.overflow, same as in the integer
case.

The wrapping-pointer-ni.ll test deserves a comment: Previously,
this generated a typed GEP which used the umulo argument rather
than the multiplication result. This results in more compact IR in
that case, but effectively does the multiplication twice, the
second one is just hidden in the GEP. Reusing the umulo result
seems pretty reasonable to me.

Differential Revision: https://reviews.llvm.org/D109093
2021-09-02 20:15:59 +02:00
Philip Reames
c3b3aa277a Fix a missing MemorySSA update in breakLoopBackedge
This is a case I'd missed in 6a8237. The odd bit here is that missing the edge removal update seems to produce MemorySSA which verifies, but is still corrupt in a way which bothers following passes. I wasn't able to reduce a single pass test case, which is why the reported test case is taken as is.

Differential Revision: https://reviews.llvm.org/D109068
2021-09-01 16:59:01 -07:00
Philip Reames
e735f2bf37 [SCEVExpander] Prefer pointer expansion for overflow checks
We'd special cased this logic to use pointer types for non-integral pointers, but there's no reason we can't do that for all pointer types.   Doing it this was has a few advantages:
a) The code itself becomes more straight forward, and easier to test.
b) We avoid introducing ptrtoint into programs which didn't have them in the source.
c) The resulting codegen is easier to analyze and simplify (mostly due to lack of ptrtoint).

Note that there are some test diffs, but a) running them through instcombine helps a ton, and b) there's enough missing obvious transforms on both before and after IR that it's clear this isn't performance sensitive.

This is mostly motivated by cleaning up mentions of non-integrals to have a clearer idea of what we actually need to support.

Differential Revision: https://reviews.llvm.org/D104662
2021-09-01 13:11:25 -07:00
Arthur Eubanks
52e6d70c40 [NFC] Use newly introduced *AtIndex methods
Introduced in D108788. These are clearer.
2021-09-01 11:18:41 -07:00
Philip Reames
b604fcb7bc [runtime] Move prolog/epilog block to a post-simplify strategy
The runtime unroller will try to produce a non-loop if the unroll count is 2 and thus the prolog/epilog loop would only run at most one iteration. The old implementation did this by avoiding loop construction entirely. This patches instead constructs the trivial loop and then explicitly breaks the backedge and simplifies. This does result in some additional code churn when triggered, but a) results in better quality code and b) removes a codepath which didn't work properly for multiple exit epilogs.

One oddity that I want to draw to reviewer attention is that this somehow changes revisit order. The new order looks equivalent to me, but I don't understand how creating and erasing an extra loop here creates this effect.

Differential Revision: https://reviews.llvm.org/D108521
2021-08-31 09:29:36 -07:00
Doug Beck
ed6cff667e Fix typo s/beloinging/belonging
Differential Revision: https://reviews.llvm.org/D107099
2021-08-31 12:01:50 +05:30
Roman Lebedev
795d142d23 [NFCI][IndVars] rewriteLoopExitValues(): don't expand SCEV's until needed
Previously, we'd expand *ALL* the SCEV's eagerly, because we needed to
check with `isValidRewrite()`, and discard bad rewrite candidates,
but now that we do not do that, we also don't need to always expand.

In particular, this avoids expanding potentially-huge SCEV's that we
would discard anyways because they are high-cost and we aren't
rewriting aggressively.
2021-08-30 12:28:24 +03:00
Roman Lebedev
7b0d59da9a [IndVars] Drop check for the validity of rewrite
`isValidRewrite()` checks that the both the original SCEV,
and the rewrite SCEV have the same base pointer.
I //believe//, after all the recent SCEV improvements,
this invariant is already enforced by SCEV itself.

I originally tried changing it into an assert in D108043,
but that showed that it triggers on e.g. https://reviews.llvm.org/D108043#2946621,
where SCEV manages to forward the store to load,
test added.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D108655
2021-08-30 12:06:58 +03:00
Nikita Popov
9f7873784d [SCEVExpander] Reuse removePointerBase() for canonical addrecs
ExposePointerBase() in SCEVExpander implements basically the same
functionality as removePointerBase() in SCEV, so reuse it.

The SCEVExpander code assumes that the pointer operand on adds is
the last one -- I'm not sure that always holds. As such this might
not be strictly NFC.
2021-08-29 21:12:35 +02:00
Nikita Popov
0886fd5b3a [SCEVExpander] Remove unnecessary mul/udiv check (NFC)
Pointer-typed SCEV expressions can no longer be mul or udiv, so
we do not need to specially handle them here.
2021-08-29 20:47:00 +02:00
Nikita Popov
3f162e8e6d [SCEVExpander] Assert single pointer op in add (NFC)
There can only be one pointer operand in an add expression, and
we have sorted operands to guarantee that it is the first. As
such, the pointer check for other operands is dead code.
2021-08-29 20:30:56 +02:00
Philip Reames
6a82376012 Special case common branch patterns in breakLoopBackedge (try 2)
Changes since aec08e:
* Adjust placement of a closing brace so that the general case actually runs.  Turns out we had *no* coverage of the switch case.  I added one in eae90fd.
* Drop .llvm.loop.* metadata from the new branch as there is no longer a loop to annotate.

Original commit message:

This special cases an unconditional latch and a conditional branch latch exit to improve codegen and test readability. I am hoping to reuse this function in the runtime unroll code, but without this change, the test diffs are far too complex to assess.
2021-08-27 10:27:16 -07:00
Andrew Litteken
9d2c859ebb [CodeExtractor] Making the arguments outlined easier to access from the outside
The Code Extractor does not provide an easy mechanism for determining the
inputs and outputs after extraction has occurred, this patch gives the
ability to pass in empty SetVectors to be filled with the inputs and
outputs if they need to be analyzed.

Added Tests:
- InputOutputMonitoring in unittests/Transforms/Utils/CodeExtractorTests.cpp

Reviewers: paquette

Differential Revision: https://reviews.llvm.org/D106991
2021-08-26 09:47:53 -07:00
Vyacheslav Zakharin
2e192ab1f4 [CodeExtractor] Preserve topological order for the return blocks.
Differential Revision: https://reviews.llvm.org/D108673
2021-08-25 08:09:01 -07:00
Florian Hahn
90d09eb300 [LoopPeel] Allow peeling with multiple unreachable-terminated exit blocks.
Support for peeling with multiple exit blocks was added in D63921/77bb3a486fa6.

So far it has only been enabled for loops where all non-latch exits are
'de-optimizing' exits (D63923). But peeling of multi-exit loops can be
highly beneficial in other cases too, like if all non-latch exiting
blocks are unreachable.

The motivating case are loops with runtime checks, like the C++ example
below. The main issue preventing vectorization is that the invariant
accesses to load the bounds of B is conditionally executed in the loop
and cannot be hoisted out. If we peel off the first iteration, they
become dereferenceable in the loop, because they must execute before the
loop is executed, as all non-latch exits are terminated with
unreachable. This subsequently allows hoisting the loads and runtime
checks out of the loop, allowing vectorization of the loop.

     int sum(std::vector<int> *A, std::vector<int> *B, int N) {
       int cost = 0;
       for (int i = 0; i < N; ++i)
         cost += A->at(i) + B->at(i);
       return cost;
     }

This gives a ~20-30% increase of score for Geekbench5/HDR on AArch64.

Note that this requires a follow-up improvement to the peeling cost
model to actually peel iterations off loops as above. I will share that
shortly.

Also, peeling of multi-exits might be beneficial for exit blocks with
other terminators, but I would like to keep the scope limited to known
high-reward cases for now.

I removed the option to disable peeling for multi-deopt exits because
the code is more general now. Alternatively, the option could also be
generalized, but I am not sure if there's much value in the option?

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D108108
2021-08-25 13:26:40 +01:00
Philip Reames
1e07f19bfc Revert "Special case common branch patterns in breakLoopBackedge"
This reverts commit aec08e8600.

Several problems have been reported with malformed loopinfo after this change, see discussion on https://reviews.llvm.org/rGaec08e86004b.
2021-08-24 08:53:42 -07:00
Philip Reames
d8d84c9df8 [runtimeunroll] Use early return to reduce nesting [nfc] 2021-08-22 11:34:50 -07:00
Philip Reames
aec08e8600 Special case common branch patterns in breakLoopBackedge
This special cases an unconditional latch and a conditional branch latch exit to improve codegen and test readability.  I am hoping to reuse this function in the runtime unroll code, but without this change, the test diffs are far too complex to assess.
2021-08-22 10:42:23 -07:00
Alexander Potapenko
b0391dfc73 [clang][Codegen] Introduce the disable_sanitizer_instrumentation attribute
The purpose of __attribute__((disable_sanitizer_instrumentation)) is to
prevent all kinds of sanitizer instrumentation applied to a certain
function, Objective-C method, or global variable.

The no_sanitize(...) attribute drops instrumentation checks, but may
still insert code preventing false positive reports. In some cases
though (e.g. when building Linux kernel with -fsanitize=kernel-memory
or -fsanitize=thread) the users may want to avoid any kind of
instrumentation.

Differential Revision: https://reviews.llvm.org/D108029
2021-08-20 14:01:06 +02:00
Roman Lebedev
5d4f37e895 [NFCI][SimplifyCFG] Rewrite createUnreachableSwitchDefault()
The only thing that function should do as per it's semantic,
is to ensure that the switch's default is a block consisting only of
an `unreachable` terminator.

So let's just create such a block and update switch's default
to point to it. There should be no need for all this weird dance
around predecessors/successors.
2021-08-20 13:28:08 +03:00
Akira Hatanaka
898dc4590c Refactor inlineRetainOrClaimRVCalls. NFC
This is in preparation for committing https://reviews.llvm.org/D103000.
2021-08-19 14:55:45 -07:00
Arthur Eubanks
44a3241f10 [NFC] Replace some attribute methods that use confusing indexes 2021-08-19 14:10:26 -07:00
Philip Reames
17b9cb1817 [runtimeunroll] Support multiple exits to latch exit w/prolog loop
This patch extends the runtime unrolling infrastructure to support unrolling a loop with multiple exiting blocks branching to the same exit block used by the latch. It intentionally does not include a cost model change to enable this functionality unless appropriate force flags are used.

This is the prolog companion to D107381. Since this was LGTMed, a problem with DT updating was reported against that patch.  I roled in the analogous fix here as it seemed obvious, and not worth re-review.

As an aside, our prolog form leaves a lot of potential value on the floor when there is an invariant load or invariant condition in the loop being runtime unrolled. We should probably consider a "required prolog" heuristic.  (Alternatively, maybe we should be peeling these cases more aggressively?)

Differential Revision: https://reviews.llvm.org/D108262
2021-08-19 11:43:52 -07:00
Philip Reames
447256f22b [runtimeunroll] Fix reported DT verification error after 94d0914
In 94d0914, I added support for unrolling of multiple exit loops which have multiple exits reaching the latch.  Per reports on the review post commit, I'd missed updating the domtree for one case.  This fix addresses that ommission.

There's no new test as this is covered by existing tests with expensive verification turned on.
2021-08-19 11:06:17 -07:00
Arthur Eubanks
33d44b762e [OpaquePtr][Inline] Use byval type instead of pointee type
Reviewed By: #opaque-pointers, dblaikie

Differential Revision: https://reviews.llvm.org/D105711
2021-08-19 09:56:08 -07:00
Sanjay Patel
ec54e275f5 Revert "[CVP] processSwitch: Remove default case when switch cover all possible values."
This reverts commit 9934a5b2ed.
This patch may cause miscompiles because it missed a constraint
as shown in the examples from:
https://llvm.org/PR51531
2021-08-19 08:43:51 -04:00
Arthur Eubanks
fde0eb1f9a [NFC] A couple more removeAttribute() cleanups 2021-08-18 11:15:20 -07:00
Arthur Eubanks
3f4d00bc3b [NFC] More get/removeAttribute() cleanup 2021-08-17 21:05:41 -07:00
Arthur Eubanks
de0ae9e89e [NFC] Cleanup more AttributeList::addAttribute() 2021-08-17 21:05:41 -07:00
Arthur Eubanks
ad727ab7d9 [NFC] Migrate some callers away from Function/AttributeLists methods that take an index
These methods can be confusing.
2021-08-17 21:05:40 -07:00
Arthur Eubanks
46cf82532c [NFC] Replace Function handling of attributes with less confusing calls
To avoid magic constants and confusing indexes.
2021-08-17 21:05:40 -07:00
Jun Ma
9934a5b2ed [CVP] processSwitch: Remove default case when switch cover all possible values.
Differential Revision: https://reviews.llvm.org/D106056
2021-08-18 10:23:13 +08:00
Philip Reames
94d0914292 [runtimeunroll] Support multiple exits to latch exit w/epilogue loop
This patch extends the runtime unrolling infrastructure to support unrolling a loop with multiple exiting blocks branching to the same exit block used by the latch. It intentionally does not include a cost model change to enable this functionality unless appropriate force flags are used.

I decided to restrict this to the epilogue case. Given the changes ended up being pretty generic, we may be able to unblock the prolog case too, but I want to do that in a separate change to reduce the amount of code we all have to understand at one time.

Differential Revision: https://reviews.llvm.org/D107381
2021-08-17 17:52:04 -07:00
Philip Reames
982da7a20c [SCEVExpander] Stop hoisting IR when reusing phis
his is a fix for PR43678, and is an alternate patch to D105723.

The basic issue we're running into is that LSR + SCEVExpander are moving the very instruction whose operand we're in the process of expanding. This breaks the subtle and ill-documented invariant which let LSR work. (Full story can be found here: https://reviews.llvm.org/D105723#2878473)

Rather than attempting a fix, this change just removes the optimization entirely. The code is entirely untested, and removing it appears to have no impact I can find.  This code was added back in 2014 by 1e12f8563d with a single test which does not seem to actually test the hoisting logic.

From a philosophical standpoint, it also seems very strange to have the expander implementing optimizations which should live in a dedicated transform pass.

Differential Revision: https://reviews.llvm.org/D106178
2021-08-17 09:38:32 -07:00
Arthur Eubanks
0d822da2bd [NFC] Remove/replace some confusing attribute getters on Function 2021-08-16 16:12:37 -07:00
Nikita Popov
735a590471 [MemorySSA] Remove -enable-mssa-loop-dependency option
This option has been enabled by default for quite a while now.
The practical impact of removing the option is that MSSA use
cannot be disabled in default pipelines (both LPM and NPM) and
in manual LPM invocations. NPM can still choose to enable/disable
MSSA using loop vs loop-mssa.

The next step will be to require MSSA for LICM and drop the
AST-based implementation entirely.

Differential Revision: https://reviews.llvm.org/D108075
2021-08-16 20:59:37 +02:00
Nikita Popov
570c9beb8e [MemorySSA] Remove unnecessary MSSA dependencies
LoopLoadElimination, LoopVersioning and LoopVectorize currently
fetch MemorySSA when construction LoopAccessAnalysis. However,
LoopAccessAnalysis does not actually use MemorySSA and we can pass
nullptr instead.

This saves one MemorySSA calculation in the default pipeline, and
thus improves compile-time.

Differential Revision: https://reviews.llvm.org/D108074
2021-08-16 20:40:55 +02:00
Roman Lebedev
febcedf18c Revert "[NFCI][IndVars] rewriteLoopExitValues(): nowadays SCEV should not change GEP base pointer"
https://bugs.llvm.org/show_bug.cgi?id=51490 was filed.

This reverts commit 35a8bdc775.
2021-08-16 14:30:29 +03:00
David Sherwood
9b19b77883 [NFC] Remove unused code in llvm::createSimpleTargetReduction 2021-08-16 09:50:45 +01:00
Roman Lebedev
2eb554a9fe Revert "Reland [SimplifyCFG] performBranchToCommonDestFolding(): form block-closed SSA form before cloning instructions (PR51125)"
This is still wrong, as failing bots suggest.

This reverts commit 3d9beefc7d.
2021-08-16 11:07:42 +03:00
Roman Lebedev
3d9beefc7d Reland [SimplifyCFG] performBranchToCommonDestFolding(): form block-closed SSA form before cloning instructions (PR51125)
... with test change this time.

LLVM IR SSA form is "implicit" in `@pr51125`. While is a valid LLVM IR,
and does not require any PHI nodes, that completely breaks the further logic
in `CloneInstructionsIntoPredecessorBlockAndUpdateSSAUses()`
that updates the live-out uses of the bonus instructions.

What i believe we need to do, is to first make the SSA form explicit,
by inserting tautological PHI nodes, and rewriting the offending uses.

```
$ /builddirs/llvm-project/build-Clang12/bin/opt -load /repositories/alive2/build-Clang-release/tv/tv.so -load-pass-plugin /repositories/alive2/build-Clang-release/tv/tv.so -tv -simplifycfg -simplifycfg-require-and-preserve-domtree=1 -bonus-inst-threshold=10 -tv -o /dev/null /tmp/test.ll

----------------------------------------
@global_pr51125 = global 4 bytes, align 4

define i32 @pr51125() {
%entry:
  br label %L

%L:
  %ld = load i32, * @global_pr51125, align 4
  %iszero = icmp eq i32 %ld, 0
  br i1 %iszero, label %exit, label %L2

%L2:
  store i32 4294967295, * @global_pr51125, align 4
  %cmp = icmp eq i32 %ld, 4294967295
  br i1 %cmp, label %L, label %exit

%exit:
  %r = phi i32 [ %ld, %L2 ], [ %ld, %L ]
  ret i32 %r
}
=>
@global_pr51125 = global 4 bytes, align 4

define i32 @pr51125() {
%entry:
  %ld.old = load i32, * @global_pr51125, align 4
  %iszero.old = icmp eq i32 %ld.old, 0
  br i1 %iszero.old, label %exit, label %L2

%L2:
  %ld2 = phi i32 [ %ld.old, %entry ], [ %ld, %L2 ]
  store i32 4294967295, * @global_pr51125, align 4
  %cmp = icmp ne i32 %ld2, 4294967295
  %ld = load i32, * @global_pr51125, align 4
  %iszero = icmp eq i32 %ld, 0
  %or.cond = select i1 %cmp, i1 1, i1 %iszero
  br i1 %or.cond, label %exit, label %L2

%exit:
  %ld1 = phi i32 [ poison, %L2 ], [ %ld.old, %entry ]
  %r = phi i32 [ %ld2, %L2 ], [ %ld.old, %entry ]
  ret i32 %r
}
Transformation seems to be correct!

```

Fixes https://bugs.llvm.org/show_bug.cgi?id=51125

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D106317
2021-08-15 19:16:04 +03:00
Roman Lebedev
60dd0121c9 Revert "[SimplifyCFG] performBranchToCommonDestFolding(): form block-closed SSA form before cloning instructions (PR51125)"
Forgot to stage the test change.

This reverts commit 78af5cb213.
2021-08-15 19:15:09 +03:00
Roman Lebedev
78af5cb213 [SimplifyCFG] performBranchToCommonDestFolding(): form block-closed SSA form before cloning instructions (PR51125)
LLVM IR SSA form is "implicit" in `@pr51125`. While is a valid LLVM IR,
and does not require any PHI nodes, that completely breaks the further logic
in `CloneInstructionsIntoPredecessorBlockAndUpdateSSAUses()`
that updates the live-out uses of the bonus instructions.

What i believe we need to do, is to first make the SSA form explicit,
by inserting tautological PHI nodes, and rewriting the offending uses.

```
$ /builddirs/llvm-project/build-Clang12/bin/opt -load /repositories/alive2/build-Clang-release/tv/tv.so -load-pass-plugin /repositories/alive2/build-Clang-release/tv/tv.so -tv -simplifycfg -simplifycfg-require-and-preserve-domtree=1 -bonus-inst-threshold=10 -tv -o /dev/null /tmp/test.ll

----------------------------------------
@global_pr51125 = global 4 bytes, align 4

define i32 @pr51125() {
%entry:
  br label %L

%L:
  %ld = load i32, * @global_pr51125, align 4
  %iszero = icmp eq i32 %ld, 0
  br i1 %iszero, label %exit, label %L2

%L2:
  store i32 4294967295, * @global_pr51125, align 4
  %cmp = icmp eq i32 %ld, 4294967295
  br i1 %cmp, label %L, label %exit

%exit:
  %r = phi i32 [ %ld, %L2 ], [ %ld, %L ]
  ret i32 %r
}
=>
@global_pr51125 = global 4 bytes, align 4

define i32 @pr51125() {
%entry:
  %ld.old = load i32, * @global_pr51125, align 4
  %iszero.old = icmp eq i32 %ld.old, 0
  br i1 %iszero.old, label %exit, label %L2

%L2:
  %ld2 = phi i32 [ %ld.old, %entry ], [ %ld, %L2 ]
  store i32 4294967295, * @global_pr51125, align 4
  %cmp = icmp ne i32 %ld2, 4294967295
  %ld = load i32, * @global_pr51125, align 4
  %iszero = icmp eq i32 %ld, 0
  %or.cond = select i1 %cmp, i1 1, i1 %iszero
  br i1 %or.cond, label %exit, label %L2

%exit:
  %ld1 = phi i32 [ poison, %L2 ], [ %ld.old, %entry ]
  %r = phi i32 [ %ld2, %L2 ], [ %ld.old, %entry ]
  ret i32 %r
}
Transformation seems to be correct!

```

Fixes https://bugs.llvm.org/show_bug.cgi?id=51125

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D106317
2021-08-15 19:02:34 +03:00
Roman Lebedev
35a8bdc775 [NFCI][IndVars] rewriteLoopExitValues(): nowadays SCEV should not change GEP base pointer
Currently/previously, while SCEV guaranteed that it produces the same value,
the way it was produced may be illegal IR, so we have an ugly check that
the replacement is valid.

But now that the SCEV strictness wrt the pointer/integer types has been improved,
i believe this invariant is already upheld by the SCEV itself, natively.

I think we should add an assertion, wait for a week, and then, if all is good,
rip out all this checking.
Or we could just do the latter directly i guess.

This reverts commit rL127839.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D108043
2021-08-15 18:59:32 +03:00
Arthur Eubanks
c19d7f8af0 [CallPromotion] Check for inalloca/byval mismatch
Previously we would allow promotion even if the byval/inalloca
attributes on the call and the callee didn't match.

It's ok if the byval/inalloca types aren't the same. For example, LTO
importing may rename types.

Fixes PR51397.

Reviewed By: rnk

Differential Revision: https://reviews.llvm.org/D107998
2021-08-13 16:52:04 -07:00
Arthur Eubanks
a9831cce1e [NFC] Remove public uses of AttributeList::getAttributes()
Use methods that better convey the intent.
2021-08-13 11:38:12 -07:00
Arthur Eubanks
80ea2bb574 [NFC] Rename AttributeList::getParam/Ret/FnAttributes() -> get*Attributes()
This is more consistent with similar methods.
2021-08-13 11:16:52 -07:00
Roman Lebedev
c46546bd52 Reland "[NFCI][SimplifyCFG] simplifyCondBranch(): assert that branch is non-tautological""
The commit originally unearthed a problem, reported as
https://reviews.llvm.org/rGf30a7dff8a5b32919951dcbf92e4a9d56c4679ff#1019890
Now that the problem has been fixed, and the assertion no longer fires,
let's see if there are other cases it fires on.

This reverts commit 5c8c24d2de,
relanding commit f30a7dff8a.
2021-08-13 15:45:03 +03:00