Commit Graph

274 Commits

Author SHA1 Message Date
David Green
eecba95067 [ARM] Replace arm vendor with none. NFC 2020-04-22 18:19:35 +01:00
Craig Topper
944cc5e0ab [SelectionDAGBuilder][CGP][X86] Move some of SDB's gather/scatter uniform base handling to CGP.
I've always found the "findValue" a little odd and
inconsistent with other things in SDB.

This simplfifies the code in SDB to just handle a splat constant
address or a 2 operand GEP in the same BB. This removes the
need for "findValue" since the operands to the GEP are
guaranteed to be available. The splat constant handling is
new, but was needed to avoid regressions due to constant
folding combining GEPs created in CGP.

CGP is now responsible for canonicalizing gather/scatters into
this form. The pattern I'm using for scalarizing, a scalar GEP
followed by a GEP with an all zeroes index, seems to be subject
to constant folding that the insertelement+shufflevector was not.

Differential Revision: https://reviews.llvm.org/D76947
2020-04-16 17:49:22 -07:00
Guozhi Wei
6d20937c29 [CodeGenPrepare] Delete intrinsic call to llvm.assume to enable more tailcall
The attached test case is simplified from tcmalloc. Both function calls should be optimized as tailcall. But llvm can only optimize the first call. The second call can't be optimized because function dupRetToEnableTailCallOpts failed to duplicate ret into block case2.

There 2 problems blocked the duplication:

  1 Intrinsic call llvm.assume is not handled by dupRetToEnableTailCallOpts.
  2 The control flow is more complex than expected, dupRetToEnableTailCallOpts can only duplicate ret into its predecessor, but here we have an intermediate block between call and ret.

The solutions:

  1 Since CodeGenPrepare is already at the end of LLVM IR phase, we can simply delete the intrinsic call to llvm.assume.
  2 A general solution to the complex control flow is hard, but for this case, after exit2 is duplicated into case1, exit2 is the only successor of exit1 and exit1 is the only predecessor of exit2, so they can be combined through eliminateFallThrough. But this function is called too late, there is no more dupRetToEnableTailCallOpts after it. We can add an earlier call to eliminateFallThrough to solve it.

Differential Revision: https://reviews.llvm.org/D76539
2020-03-31 11:55:51 -07:00
Juneyoung Lee
d82c1e8c56 Rename test name, add more tests for codegenprepare 2020-03-25 20:31:12 +09:00
Juneyoung Lee
e951a48996 Add freeze(and x, const) case to codegenprepare's freeze-cmp.ll 2020-03-25 17:29:01 +09:00
Juneyoung Lee
6ad63606ea [CodeGenPrepare] Freeze condition when transforming select to br
Summary:
This is a simple fix for CodeGenPrepare that freezes branch condition when transforming select to branch.
If it is not frozen, instsimplify or the later pipeline can potentially exploit undefined behavior.

The diff shows optimized form becase D75859 and D76048 already made a few changes to CodeGenPrepare for optimizing freeze(cmp).

Reviewers: jdoerfert, spatel, lebedev.ri, efriedma

Reviewed By: lebedev.ri

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D76179
2020-03-16 12:46:20 +09:00
Juneyoung Lee
4ffe3ac729 Revert "[CodeGenPrepare] Freeze condition when transforming select to br"
This reverts commit 10aa7ea951.
2020-03-16 12:45:54 +09:00
Juneyoung Lee
10aa7ea951 [CodeGenPrepare] Freeze condition when transforming select to br
Summary:
This is a simple fix for CodeGenPrepare that freezes branch condition when transforming select to branch.
If it is not freezed, instsimplify or the later pipeline can potentially exploit undefined behavior.

The diff shows optimized form becase D75859 and D76048 already made a few changes to CodeGenPrepare for optimizing freeze(cmp).

Reviewers: jdoerfert, spatel, lebedev.ri, efriedma

Reviewed By: lebedev.ri

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D76179
2020-03-15 11:10:46 +09:00
Juneyoung Lee
c39cb1c0dd [CodeGenPrepare] Expand freeze conversion to support fcmp and icmp with null
Summary:
This is a simple patch that expands https://reviews.llvm.org/D75859 to pointer comparison and fcmp

Checked with Alive2

Reviewers: reames, jdoerfert

Reviewed By: jdoerfert

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D76048
2020-03-13 17:21:33 +09:00
Juneyoung Lee
48b901b0e1 Add tests to Transforms/CodeGenPrepare/X86/freeze-cmp.ll before commiting D76048 2020-03-13 17:18:42 +09:00
Juneyoung Lee
629cf3c1c5 Apply update_test_check.py to CodeGenPrepare/X86/freeze-icmp.ll test 2020-03-12 16:37:16 +09:00
Juneyoung Lee
8eb2f865c3 [CodeGenPrepare] Fold br(freeze(icmp x, const)) to br(icmp(freeze x, const))
Summary:
This patch helps CodeGenPrepare move freeze into the icmp when it is used by branch.
It reenables generation of efficient conditional jumps.

This is only done when at least one of icmp's operands is constant to prevent the transformation from increasing # of freeze instructions.

Performance degradation of MultiSource/Benchmarks/Ptrdist/yacr2/yacr2.test is resolved with this patch.

Checked with Alive2

Reviewers: reames, fhahn, nlopes

Reviewed By: reames

Subscribers: jdoerfert, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D75859
2020-03-12 03:16:15 +09:00
Florian Hahn
7769030b93 Recommit "[PatternMatch] Match XOR variant of unsigned-add overflow check."
This version fixes a buildbot failure cause by picking the wrong insert
point for XORs. We cannot pick the XOR binary operator as insert point,
as it is not guaranteed that both input operands for the overflow
intrinsic are defined before it.

This reverts the revert commit
c7fc0e5da6.
2020-02-23 18:33:18 +00:00
Florian Hahn
c7fc0e5da6 Revert "[PatternMatch] Match XOR variant of unsigned-add overflow check."
This reverts commit e01a3d49c2.
and commit a6a585b803.

This causes a failure on GreenDragon:
http://lab.llvm.org:8080/green/view/LLDB/job/lldb-cmake/9597
2020-02-19 19:37:08 +01:00
Florian Hahn
e01a3d49c2 [PatternMatch] Match XOR variant of unsigned-add overflow check.
Instcombine folds (a + b <u a) to (a ^ -1 <u b) and that does not match
the expected pattern in CodeGenPerpare via UAddWithOverflow.

This causes a regression over Clang 7 on both X86 and AArch64:
https://gcc.godbolt.org/z/juhXYV

This patch extends UAddWithOverflow to also catch the XOR case, if the
XOR is only used in the ICMP. This covers just a single case, but I'd
like to make sure I am not missing anything before tackling the other
cases.

Reviewers: nikic, RKSimon, lebedev.ri, spatel

Reviewed By: nikic, lebedev.ri

Differential Revision: https://reviews.llvm.org/D74228
2020-02-19 15:25:18 +01:00
Florian Hahn
216afd3301 [TargetLower] Update shouldFormOverflowOp check if math is used.
On some targets, like SPARC, forming overflow ops is only profitable if
the math result is used: https://godbolt.org/z/DxSmdB
This patch adds a new MathUsed parameter to allow the targets
to make the decision and defaults to only allowing it
if the math result is used. That is the conservative choice.

This patch also updates AArch64ISelLowering, X86ISelLowering,
ARMISelLowering.h, SystemZISelLowering.h to allow forming overflow
ops if the math result is not used. On those targets using the
overflow intrinsic for the overflow check only generates better code.

Reviewers: nikic, RKSimon, lebedev.ri, spatel

Reviewed By: lebedev.ri

Differential Revision: https://reviews.llvm.org/D74722
2020-02-19 11:28:33 +01:00
Florian Hahn
7cbf710396 [CGP] Precommit tests for D74228. 2020-02-19 09:24:06 +01:00
Florian Hahn
106ae108c1 [CGP] Add uaddo test with math used, SPARC/AArch64 variants. 2020-02-18 12:49:08 +01:00
Clement Courbet
15488ff24b [CodeGen] Fix the computation of the alignment of split stores.
Summary:
Right now the alignment of the lower half of a store is computed as
align/2, which fails for unaligned stores (align = 1), and is overly
pessimitic for, e.g. a 8 byte store aligned to 4 bytes.
Fixes PR44851
Fixes PR44877

Reviewers: gchatelet, spatel, lebedev.ri

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D74311
2020-02-12 10:37:30 +01:00
Clement Courbet
24856002e0 [CodeGenPrepare] Add more store splitting tests for PR44877. 2020-02-12 09:50:47 +01:00
Fangrui Song
bf70494b94 [test] More tests to target specific directories after CodeGenPrepare requires TargetPassConfig (D73754) 2020-02-02 10:43:02 -08:00
Fangrui Song
eee6a45a13 [CodeGenPrepare][test] Add REQUIRES to two tests after D73754 2020-02-02 09:53:17 -08:00
Fangrui Song
5a56a25b0b [CodeGenPrepare] Make TargetPassConfig required
The code paths in the absence of TargetMachine, TargetLowering or
TargetRegisterInfo are poorly tested. As rL285987 said, requiring
TargetPassConfig allows us to delete many (untested) checks littered
everywhere.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D73754
2020-02-02 09:28:45 -08:00
Fangrui Song
502a77f125 Migrate function attribute "no-frame-pointer-elim" to "frame-pointer"="all" as cleanups after D56351 2019-12-24 15:57:33 -08:00
Hiroshi Yamauchi
ed50e6060b [PGO][PGSO] Enable size optimizations in code gen / target passes for cold code.
Summary: Split off of D67120.

Reviewers: davidxl

Subscribers: hiraditya, asb, rbar, johnrusso, simoncook, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, lenary, s.egerton, pzheng, sameer.abuasal, apazos, luismarques, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D71288
2019-12-13 11:01:19 -08:00
Joerg Sonnenberger
9681ea9560 Reapply r374743 with a fix for the ocaml binding
Add a pass to lower is.constant and objectsize intrinsics

This pass lowers is.constant and objectsize intrinsics not simplified by
earlier constant folding, i.e. if the object given is not constant or if
not using the optimized pass chain. The result is recursively simplified
and constant conditionals are pruned, so that dead blocks are removed
even for -O0. This allows inline asm blocks with operand constraints to
work all the time.

The new pass replaces the existing lowering in the codegen-prepare pass
and fallbacks in SDAG/GlobalISEL and FastISel. The latter now assert
on the intrinsics.

Differential Revision: https://reviews.llvm.org/D65280

llvm-svn: 374784
2019-10-14 16:15:14 +00:00
Dmitri Gribenko
1a21f98ac3 Revert "Add a pass to lower is.constant and objectsize intrinsics"
This reverts commit r374743. It broke the build with Ocaml enabled:
http://lab.llvm.org:8011/builders/clang-x86_64-debian-fast/builds/19218

llvm-svn: 374768
2019-10-14 12:22:48 +00:00
Joerg Sonnenberger
e4300c392d Add a pass to lower is.constant and objectsize intrinsics
This pass lowers is.constant and objectsize intrinsics not simplified by
earlier constant folding, i.e. if the object given is not constant or if
not using the optimized pass chain. The result is recursively simplified
and constant conditionals are pruned, so that dead blocks are removed
even for -O0. This allows inline asm blocks with operand constraints to
work all the time.

The new pass replaces the existing lowering in the codegen-prepare pass
and fallbacks in SDAG/GlobalISEL and FastISel. The latter now assert
on the intrinsics.

Differential Revision: https://reviews.llvm.org/D65280

llvm-svn: 374743
2019-10-13 23:00:15 +00:00
David Bolvansky
3d33e97be6 [NFC} Updated test
llvm-svn: 372093
2019-09-17 09:45:52 +00:00
David Green
a6e944b173 [CGP] Ensure sinking multiple instructions does not invalidate dominance checks
In MVE, as of rL371218, we are attempting to sink chains of instructions such as:
  %l1 = insertelement <8 x i8> undef, i8 %l0, i32 0
  %broadcast.splat26 = shufflevector <8 x i8> %l1, <8 x i8> undef, <8 x i32> zeroinitializer
In certain situations though, we can end up breaking the dominance relations of
instructions. This happens when we sink the instruction into a loop, but cannot
remove the originals. The Use is updated, which might in fact be a Use from the
second instruction to the first.

This attempts to fix that by reversing the order of instruction that are sunk,
and ensuring that we update the uses on new instructions if they have already
been sunk, not the old ones.

Differential Revision: https://reviews.llvm.org/D67366

llvm-svn: 371743
2019-09-12 16:00:07 +00:00
Sam Tebbs
f1cdd95a2f [ARM] Sink add/mul(shufflevector(insertelement())) for MVE instruction selection
This patch sinks add/mul(shufflevector(insertelement())) into the basic block in which they are used so that they can then be selected together.

This is useful for various MVE instructions, such as vmla and others that take R registers.

Loop tests have been added to the vmla test file to make sure vmlas are generated in loops.

Differential revision: https://reviews.llvm.org/D66295

llvm-svn: 371218
2019-09-06 16:01:32 +00:00
Sanjay Patel
acceedb15f [CodeGenPrepare] Fix use-after-free
If OptimizeExtractBits() encountered a shift instruction with no operands at all,
it would erase the instruction, but still return false.

This previously didn’t matter because its caller would always return after
processing the instruction, but https://reviews.llvm.org/D63233 changed the
function’s caller to fall through if it returned false, which would then cause
a use-after-free detectable by ASAN.

This change makes OptimizeExtractBits return true if it removes a shift
instruction with no users, terminating processing of the instruction.

Patch by: @brentdax (Brent Royal-Gordon)

Differential Revision: https://reviews.llvm.org/D66330

llvm-svn: 369168
2019-08-16 23:10:34 +00:00
Sanjay Patel
8341a847a2 [CodeGenPrepare] fix RUN line settings
I'm not sure if this was running as expected with a broken triple.

llvm-svn: 369156
2019-08-16 21:37:49 +00:00
Fangrui Song
ac14f7b10c [lit] Delete empty lines at the end of lit.local.cfg NFC
llvm-svn: 363538
2019-06-17 09:51:07 +00:00
Sanjay Patel
c8d88ad1a9 [CodeGenPrepare][x86] shift both sides of a vector select when profitable
This is based on the example/discussion in PR37428:
https://bugs.llvm.org/show_bug.cgi?id=37428

Proper vector shift instructions don't appear until AVX2, so we may generate several
extra instructions within a loop trying to compensate for that. It's difficult to
recover from that shift expansion later than this, so use the existing TLI hook and
splat analysis to enable better codegen.

This extends CGP functionality introduced with:
rL201655

Differential Revision: https://reviews.llvm.org/D63233

llvm-svn: 363511
2019-06-16 15:29:03 +00:00
Sanjay Patel
7ea378b940 [CodeGenPrepare] propagate debuginfo when copying a shuffle
llvm-svn: 363409
2019-06-14 15:05:35 +00:00
Sanjay Patel
a1421e8347 [x86] add tests for vector shifts; NFC
llvm-svn: 363203
2019-06-12 21:30:06 +00:00
Sanjay Patel
5ab41a7a05 [CodeGenPrepare] limit overflow intrinsic matching to a single basic block (2nd try)
This is a subset of the original commit from rL359879
which was reverted because it could crash when using the 'RemovedInstructions'
structure that enables delayed deletion of dead instructions. The motivating
compile-time win does not require that change though. We should get most of
that win from this change alone.

Using/updating a dominator tree to match math overflow patterns may be very
expensive in compile-time (because of the way CGP uses a DT), so just handle
the single-block case.

See post-commit thread for rL354298 for more details:
http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20190422/646276.html

Differential Revision: https://reviews.llvm.org/D61075

llvm-svn: 359969
2019-05-04 12:46:32 +00:00
Evgeniy Stepanov
46ec57e576 Revert "[CodeGenPrepare] limit overflow intrinsic matching to a single basic block"
This reverts commit r359879, which introduced a compiler crash.

llvm-svn: 359908
2019-05-03 17:31:49 +00:00
Sanjay Patel
8ff072e48e [CodeGenPrepare] limit overflow intrinsic matching to a single basic block
Using/updating a dominator tree to match math overflow patterns may be very
expensive in compile-time (because of the way CGP uses a DT), so just handle
the single-block case.

Also, we were restarting the iterator loops when doing the overflow intrinsic
transforms by marking the dominator tree for update. That was done to prevent
iterating over a removed instruction. But we can postpone the deletion using
the existing "RemovedInsts" structure, and that means we don't need to update
the DT.

See post-commit thread for rL354298 for more details:
http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20190422/646276.html

Differential Revision: https://reviews.llvm.org/D61075

llvm-svn: 359879
2019-05-03 13:09:18 +00:00
Eric Christopher
cee313d288 Revert "Temporarily Revert "Add basic loop fusion pass.""
The reversion apparently deleted the test/Transforms directory.

Will be re-reverting again.

llvm-svn: 358552
2019-04-17 04:52:47 +00:00
Eric Christopher
a863435128 Temporarily Revert "Add basic loop fusion pass."
As it's causing some bot failures (and per request from kbarton).

This reverts commit r358543/ab70da07286e618016e78247e4a24fcb84077fda.

llvm-svn: 358546
2019-04-17 02:12:23 +00:00
Sanjay Patel
d47eac59ef [CodeGenPrepare] limit formation of overflow intrinsics (PR41129)
This is probably a bigger limitation than necessary, but since we don't have any evidence yet
that this transform led to real-world perf improvements rather than regressions, I'm making a
quick, blunt fix.

In the motivating x86 example from:
https://bugs.llvm.org/show_bug.cgi?id=41129
...and shown in the regression test, we want to avoid an extra instruction in the dominating
block because that could be costly.

The x86 LSR test diff is reversing the changes from D57789. There's no evidence that 1 version
is any better than the other yet.

Differential Revision: https://reviews.llvm.org/D59602

llvm-svn: 356665
2019-03-21 13:57:07 +00:00
Sanjay Patel
fb44f99b73 [CGP][x86] add tests for usubo regression (PR41129); NFC
llvm-svn: 356559
2019-03-20 15:02:35 +00:00
Sanjay Patel
2c9275a790 [CGP] add another bailout for degenerate code (PR41064)
This is almost the same as:
rL355345
...and should prevent any potential crashing from examples like:
https://bugs.llvm.org/show_bug.cgi?id=41064
...although the bug was masked by:
rL355823
...and I'm not sure how to repro the problem after that change.

llvm-svn: 356218
2019-03-14 23:14:31 +00:00
Tim Northover
8935aca9c7 CodeGenPrep: preserve inbounds attribute when sinking GEPs.
Targets can potentially emit more efficient code if they know address
computations never overflow. For example ILP32 code on AArch64 (which only has
64-bit address computation) can ignore the possibility of overflow with this
extra information.

llvm-svn: 355926
2019-03-12 15:22:23 +00:00
Sam Parker
52760bf435 [CGP] Limit distance between overflow math and cmp
Inserting an overflowing arithmetic intrinsic can increase register
pressure by producing two values at a point where only one is needed,
while the second use maybe several blocks away. This increase in
pressure is likely to be more detrimental on performance than
rematerialising one of the original instructions.
    
So, check that the arithmetic and compare instructions are no further
apart than their immediate successor/predecessor.

Differential Revision: https://reviews.llvm.org/D59024

llvm-svn: 355823
2019-03-11 13:19:46 +00:00
Rong Xu
ce3be45cac [CodeGenPrepare] Fix ModifiedDT flag in optimizeSelectInst
r44412 fixed a huge compile time regression but it needed ModifiedDT flag to be
maintained correctly in optimizations in optimizeBlock() and optimizeInst().
Function optimizeSelectInst() does not update the flag.
This patch propagates the flag in optimizeSelectInst() back to
optimizeBlock().

This patch also removes ModifiedDT in CodeGenPrepare class (which is not used).
The property of ModifiedDT is now recorded in a ref parameter.

Differential Revision: https://reviews.llvm.org/D59139

llvm-svn: 355751
2019-03-08 22:46:18 +00:00
Florian Hahn
13bbcb3264 [ARM] Sink zext/sext operands for add and sub to enable vsubl generation.
This uses the infrastructure added in rL353152 to sink zext and sexts to
sub/add users, to enable vsubl/vaddl generation when NEON is available.

See https://bugs.llvm.org/show_bug.cgi?id=40025.

Reviewers: SjoerdMeijer, t.p.northover, samparker, efriedma

Reviewed By: samparker

Differential Revision: https://reviews.llvm.org/D58063

llvm-svn: 355460
2019-03-06 00:10:03 +00:00
Sanjay Patel
3b2d0bc7c2 [CodeGenPrepare] avoid crashing on non-canonical/degenerate code
The test is reduced from an example in the post-commit thread for:
rL354746
http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20190304/632396.html

While we must avoid dying here, the real question should be:
Why is non-canonical and/or degenerate code making it to CGP when
using the new pass manager?

llvm-svn: 355345
2019-03-04 22:47:13 +00:00