Commit Graph

110 Commits

Author SHA1 Message Date
Sanjay Patel
d47eac59ef [CodeGenPrepare] limit formation of overflow intrinsics (PR41129)
This is probably a bigger limitation than necessary, but since we don't have any evidence yet
that this transform led to real-world perf improvements rather than regressions, I'm making a
quick, blunt fix.

In the motivating x86 example from:
https://bugs.llvm.org/show_bug.cgi?id=41129
...and shown in the regression test, we want to avoid an extra instruction in the dominating
block because that could be costly.

The x86 LSR test diff is reversing the changes from D57789. There's no evidence that 1 version
is any better than the other yet.

Differential Revision: https://reviews.llvm.org/D59602

llvm-svn: 356665
2019-03-21 13:57:07 +00:00
Sanjay Patel
fb44f99b73 [CGP][x86] add tests for usubo regression (PR41129); NFC
llvm-svn: 356559
2019-03-20 15:02:35 +00:00
Sanjay Patel
2c9275a790 [CGP] add another bailout for degenerate code (PR41064)
This is almost the same as:
rL355345
...and should prevent any potential crashing from examples like:
https://bugs.llvm.org/show_bug.cgi?id=41064
...although the bug was masked by:
rL355823
...and I'm not sure how to repro the problem after that change.

llvm-svn: 356218
2019-03-14 23:14:31 +00:00
Tim Northover
8935aca9c7 CodeGenPrep: preserve inbounds attribute when sinking GEPs.
Targets can potentially emit more efficient code if they know address
computations never overflow. For example ILP32 code on AArch64 (which only has
64-bit address computation) can ignore the possibility of overflow with this
extra information.

llvm-svn: 355926
2019-03-12 15:22:23 +00:00
Rong Xu
ce3be45cac [CodeGenPrepare] Fix ModifiedDT flag in optimizeSelectInst
r44412 fixed a huge compile time regression but it needed ModifiedDT flag to be
maintained correctly in optimizations in optimizeBlock() and optimizeInst().
Function optimizeSelectInst() does not update the flag.
This patch propagates the flag in optimizeSelectInst() back to
optimizeBlock().

This patch also removes ModifiedDT in CodeGenPrepare class (which is not used).
The property of ModifiedDT is now recorded in a ref parameter.

Differential Revision: https://reviews.llvm.org/D59139

llvm-svn: 355751
2019-03-08 22:46:18 +00:00
Sanjay Patel
3b2d0bc7c2 [CodeGenPrepare] avoid crashing on non-canonical/degenerate code
The test is reduced from an example in the post-commit thread for:
rL354746
http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20190304/632396.html

While we must avoid dying here, the real question should be:
Why is non-canonical and/or degenerate code making it to CGP when
using the new pass manager?

llvm-svn: 355345
2019-03-04 22:47:13 +00:00
Sanjay Patel
cb04ba032f [CGP] add special-cases to form unsigned add with overflow (PR40486)
There's likely a missed IR canonicalization for at least 1 of these
patterns. Otherwise, we wouldn't have needed the pattern-matching
enhancement in D57516.

Note that -- unlike usubo added with D57789 -- the TLI hook for
this transform defaults to 'on'. So if there's any perf fallout
from this, targets should look at how they're lowering the uaddo
node in SDAG and/or override that hook.

The x86 diffs suggest that there's some missing pattern-matching
for forming inc/dec.

This should fix the remaining known problems in:
https://bugs.llvm.org/show_bug.cgi?id=40486
https://bugs.llvm.org/show_bug.cgi?id=31754

llvm-svn: 354746
2019-02-24 15:31:27 +00:00
Sanjay Patel
973143ab79 [CGP] add tests for uaddo increment/decrement; NFC
llvm-svn: 354699
2019-02-22 23:19:34 +00:00
Sanjay Patel
198cc305e9 [CGP] match a special-case of unsigned subtract overflow
This is the 'sub0' (negate) pattern from PR31754:
https://bugs.llvm.org/show_bug.cgi?id=31754

llvm-svn: 354519
2019-02-20 21:23:04 +00:00
Sanjay Patel
8d91faa481 [CGP][x86] add tests for usubo special-case; NFC
This is another example from PR31754:
https://bugs.llvm.org/show_bug.cgi?id=31754

llvm-svn: 354475
2019-02-20 15:40:58 +00:00
Sanjay Patel
d8b4efcb6b [CGP] form usub with overflow from sub+icmp
The motivating x86 cases for forming the intrinsic are shown in PR31754 and PR40487:
https://bugs.llvm.org/show_bug.cgi?id=31754
https://bugs.llvm.org/show_bug.cgi?id=40487
..and those are shown in the IR test file and x86 codegen file.

Matching the usubo pattern is harder than uaddo because we have 2 independent values rather than a def-use.

This adds a TLI hook that should preserve the existing behavior for uaddo formation, but disables usubo
formation by default. Only x86 overrides that setting for now although other targets will likely benefit
by forming usbuo too.

Differential Revision: https://reviews.llvm.org/D57789

llvm-svn: 354298
2019-02-18 23:33:05 +00:00
Sanjay Patel
b696b771bf [CGP] add test for unsigned subtract of 1 with overflow; NFC
llvm-svn: 353179
2019-02-05 15:27:40 +00:00
Sanjay Patel
738e17b5df [CGP] fix bogus test names/comments; NFC
Inverted operand 0 and operand 1.

llvm-svn: 353106
2019-02-04 22:37:05 +00:00
Sanjay Patel
1f9e23e3cc [CGP] add tests for usubo; NFC
llvm-svn: 353103
2019-02-04 22:27:08 +00:00
Sanjay Patel
c00bdab4c8 [CGP] use IRBuilder to simplify code
This is no-functional-change-intended although there could
be intermediate variations caused by a difference in the
debug info produced by setting that from the builder's 
insertion point. 

I'm updating the IR test file associated with this code just
to show that the naming differences from using the builder
are visible.

The motivation for adding a helper function is that we are
likely to extend this code to deal with other overflow ops.

llvm-svn: 353056
2019-02-04 16:30:46 +00:00
Sanjay Patel
84ceae6048 [CGP] adjust target constraints for forming uaddo
There are 2 changes visible here:
1. There's no reason to limit this transform based on number
   of condition registers. That diff allows PPC to produce 
   slightly better (dot-instructions should be generally good) 
   code.
   Note: someone that cares about PPC codegen might want to 
   look closer at that output because it seems like we could
   still improve this.

2. We (probably?) should not bother trying to form uaddo (or
   other overflow ops) when there's no target support for such
   an op. This goes beyond checking whether the op is expanded
   because both PPC and AArch64 show better codegen for standard
   types regardless of whether the op is legal/custom.

llvm-svn: 353001
2019-02-03 17:53:09 +00:00
Sanjay Patel
837552fe9f [PatternMatch] add special-case uaddo matching for increment-by-one (2nd try)
This is the most important uaddo problem mentioned in PR31754:
https://bugs.llvm.org/show_bug.cgi?id=31754
...but that was overcome in x86 codegen with D57637.

That patch also corrects the inc vs. add regressions seen with the  previous attempt at this.

Still, we want to make this matcher complete, so we can potentially canonicalize the pattern 
even if it's an 'add 1' operation.
Pattern matching, however, shouldn't assume that we have canonicalized IR, so we match 4 
commuted variants of uaddo.

There's also a test with a crazy type to show that the existing CGP transform based on this 
matcher is not limited by target legality checks.

I'm not sure if the Hexagon diff means the test is no longer testing what it intended to
test, but that should be solvable in a follow-up.

Differential Revision: https://reviews.llvm.org/D57516

llvm-svn: 352998
2019-02-03 16:16:48 +00:00
Sanjay Patel
b961bd68f0 [CGP] move test file to prevent bot failures
The test specifiies the triple, so it needs to be in the
x86 directory in case a bot has been configured without
the x86 target.

llvm-svn: 352992
2019-02-03 14:19:45 +00:00
David L. Jones
d81f23071c Revert "Reapply "[CGP] Check for existing inttotpr before creating new one""
This change reverts r351626.

The changes in r351626 cause quadratic work in several cases. (See r351626 thread on llvm-commits for details.)

llvm-svn: 352722
2019-01-31 03:28:46 +00:00
Roman Tereshin
a0383d6c1f Reapply "[CGP] Check for existing inttotpr before creating new one"
Original commit: r351582

llvm-svn: 351626
2019-01-19 03:37:25 +00:00
Roman Tereshin
022bf3e8e7 Revert "Reapply "[CGP] Check for existing inttotpr before creating new one""
This reverts commit r351618.

Compiler RT + ASAN tests are failing for PowerPC. Not sure
how would I reproduce these on macOS, so reverting (again)
until I do.

llvm-svn: 351619
2019-01-19 01:53:26 +00:00
Roman Tereshin
dd6f9f68bb Reapply "[CGP] Check for existing inttotpr before creating new one"
Original commit: r351582

llvm-svn: 351618
2019-01-19 01:41:03 +00:00
Roman Tereshin
86ac532687 Revert "[CGP] Check for existing inttotpr before creating new one"
This reverts commit r351582.

Bots are failing. Reverting this to fix and re-commit later.

llvm-svn: 351598
2019-01-18 21:38:44 +00:00
Roman Tereshin
85a0467a11 [CGP] Check for existing inttotpr before creating new one
Make sure CodeGenPrepare doesn't emit multiple inttoptr instructions of
the same integer value while sinking address computations, but rather
CSEs them on the fly: excessive inttoptr's confuse SCEV into thinking
that related pointers have nothing to do with each other.

This problem blocks LoadStoreVectorizer from vectorizing some of the
loads / stores in a downstream target.

Reviewed By: hfinkel

Differential Revision: https://reviews.llvm.org/D56838

llvm-svn: 351582
2019-01-18 20:13:42 +00:00
Vedant Kumar
4760686823 [CodeGenPrepare] Set debug loc when widening a switch condition
Set a debug location on the cast instruction used to widen a switch
condition.

llvm-svn: 340379
2018-08-22 01:23:31 +00:00
Vedant Kumar
1e8a2c963c [CodeGenPrepare] Set debug locations when splitting selects
When splitting a select into a diamond, set debug locations on
newly-created branch instructions and phi nodes.

llvm-svn: 340371
2018-08-22 00:10:37 +00:00
Vedant Kumar
30406fd789 [CodeGenPrepare] Clean up dbg.value use-before-def as late as possible
CodeGenPrepare has a strategy for moving dbg.values so that a value's
definition always dominates its debug users. This cleanup was happening
too early (before certain CGP transforms were run), resulting in some
dbg.value use-before-def errors.

Perform this cleanup as late as possible to avoid use-before-def.

llvm-svn: 340370
2018-08-21 23:43:08 +00:00
Vedant Kumar
8d652b756e [CodeGenPrepare] Pre-commit debug info test for optimizeSelectInst
This test shows that optimizeSelectInst splits a select and sinks a
`fdiv` operation to one side of the diamond. However, the dbg.value for
the operation isn't moved.

llvm-svn: 340369
2018-08-21 23:42:53 +00:00
Guozhi Wei
8c17f9a77d [CodeGenPrepare] Add BothExtension type to PromotedInsts
This patch fixes PR38125.

Instruction extension types are recorded in PromotedInsts, it can be used later in function canGetThrough. If an instruction has two users with different extension types, it will be inserted into PromotedInsts two times in function promoteOperandForOther. The second one overwrites the first one, and the final extension type is wrong, later causes problem in canGetThrough.

This patch changes the simple bool extension type to 2-bit enum type, add a BothExtension type in addition to zero/sign extension. When an user sees BothExtension for an instruction, it actually knows nothing about how that instruction is extended.

Differential Revision: https://reviews.llvm.org/D49512

llvm-svn: 339822
2018-08-15 22:08:26 +00:00
Alina Sbirlea
dfd14adeb0 Generalize MergeBlockIntoPredecessor. Replace uses of MergeBasicBlockIntoOnlyPred.
Summary:
Two utils methods have essentially the same functionality. This is an attempt to merge them into one.
1. lib/Transforms/Utils/Local.cpp : MergeBasicBlockIntoOnlyPred
2. lib/Transforms/Utils/BasicBlockUtils.cpp : MergeBlockIntoPredecessor

Prior to the patch:
1. MergeBasicBlockIntoOnlyPred
Updates either DomTree or DeferredDominance
Moves all instructions from Pred to BB, deletes Pred
Asserts BB has single predecessor
If address was taken, replace the block address with constant 1 (?)

2. MergeBlockIntoPredecessor
Updates DomTree, LoopInfo and MemoryDependenceResults
Moves all instruction from BB to Pred, deletes BB
Returns if doesn't have a single predecessor
Returns if BB's address was taken

After the patch:
Method 2. MergeBlockIntoPredecessor is attempting to become the new default:
Updates DomTree or DeferredDominance, and LoopInfo and MemoryDependenceResults
Moves all instruction from BB to Pred, deletes BB
Returns if doesn't have a single predecessor
Returns if BB's address was taken

Uses of MergeBasicBlockIntoOnlyPred that need to be replaced:

1. lib/Transforms/Scalar/LoopSimplifyCFG.cpp
Updated in this patch. No challenges.

2. lib/CodeGen/CodeGenPrepare.cpp
Updated in this patch.
  i. eliminateFallThrough is straightforward, but I added using a temporary array to avoid the iterator invalidation.
  ii. eliminateMostlyEmptyBlock(s) methods also now use a temporary array for blocks
Some interesting aspects:
  - Since Pred is not deleted (BB is), the entry block does not need updating.
  - The entry block was being updated with the deleted block in eliminateMostlyEmptyBlock. Added assert to make obvious that BB=SinglePred.
  - isMergingEmptyBlockProfitable assumes BB is the one to be deleted.
  - eliminateMostlyEmptyBlock(BB) does not delete BB on one path, it deletes its unique predecessor instead.
  - adding some test owner as subscribers for the interesting tests modified:
    test/CodeGen/X86/avx-cmp.ll
    test/CodeGen/AMDGPU/nested-loop-conditions.ll
    test/CodeGen/AMDGPU/si-annotate-cf.ll
    test/CodeGen/X86/hoist-spill.ll
    test/CodeGen/X86/2006-11-17-IllegalMove.ll

3. lib/Transforms/Scalar/JumpThreading.cpp
Not covered in this patch. It is the only use case using the DeferredDominance.
I would defer to Brian Rzycki to make this replacement.

Reviewers: chandlerc, spatel, davide, brzycki, bkramer, javed.absar

Subscribers: qcolombet, sanjoy, nemanjai, nhaehnle, jlebar, tpr, kbarton, RKSimon, wmi, arsenm, llvm-commits

Differential Revision: https://reviews.llvm.org/D48202

llvm-svn: 335183
2018-06-20 22:01:04 +00:00
Guozhi Wei
c4c6b548c5 [CodeGenPrepare] Move Extension Instructions Through Logical And Shift Instructions
CodeGenPrepare pass move extension instructions close to load instructions in different BB, so they can be combined later. But the extension instructions can't move through logical and shift instructions in current implementation. This patch enables this enhancement, so we can eliminate more extension instructions.

Differential Revision: https://reviews.llvm.org/D45537

This is re-commit of r331783, which was reverted by r333305. The performance regression was caused by some unlucky alignment, not a code generation problem.

llvm-svn: 334049
2018-06-05 21:03:52 +00:00
Guozhi Wei
03005d3f03 [CodeGenPrepare] Revert r331783
The patch r331783 caused regression in one of our internal application. So revert it now, will investigate it further.

llvm-svn: 333305
2018-05-25 20:30:26 +00:00
Shiva Chen
2c864551df [DebugInfo] Add DILabel metadata and intrinsic llvm.dbg.label.
In order to set breakpoints on labels and list source code around
labels, we need collect debug information for labels, i.e., label
name, the function label belong, line number in the file, and the
address label located. In order to keep these information in LLVM
IR and to allow backend to generate debug information correctly.
We create a new kind of metadata for labels, DILabel. The format
of DILabel is

!DILabel(scope: !1, name: "foo", file: !2, line: 3)

We hope to keep debug information as much as possible even the
code is optimized. So, we create a new kind of intrinsic for label
metadata to avoid the metadata is eliminated with basic block.
The intrinsic will keep existing if we keep it from optimized out.
The format of the intrinsic is

llvm.dbg.label(metadata !1)

It has only one argument, that is the DILabel metadata. The
intrinsic will follow the label immediately. Backend could get the
label metadata through the intrinsic's parameter.

We also create DIBuilder API for labels to be used by Frontend.
Frontend could use createLabel() to allocate DILabel objects, and use
insertLabel() to insert llvm.dbg.label intrinsic in LLVM IR.

Differential Revision: https://reviews.llvm.org/D45024

Patch by Hsiangkai Wang.

llvm-svn: 331841
2018-05-09 02:40:45 +00:00
Guozhi Wei
1aea95a9ea [CodeGenPrepare] Move Extension Instructions Through Logical And Shift Instructions
CodeGenPrepare pass move extension instructions close to load instructions in different BB, so they can be combined later. But the extension instructions can't move through logical and shift instructions in current implementation. This patch enables this enhancement, so we can eliminate more extension instructions.

Differential Revision: https://reviews.llvm.org/D45537

llvm-svn: 331783
2018-05-08 17:58:32 +00:00
Serguei Katkov
a20e05bb94 [CGP] Fix the remove of matched phis in complex addressing mode
When we replace the Phi we created with matched ones it is possible that
there are two identical phi nodes in IR. And matcher is smart enough to find that
new created phi matches both of them. So we try to replace our phi node with
matched ones twice and what is bad we delete our phi node twice causing a crash.

As soon as we found that we have two identical Phi nodes it makes sense to do
a clean-up and replace one phi node by other one.
The patch implements it.

Reviewers: john.brawn, reames
Reviewed By: john.brawn
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D43758

llvm-svn: 327250
2018-03-12 03:50:07 +00:00
Simon Pilgrim
073f089c6e [X86][XOP] Update isVectorShiftByScalarCheap with cases covered by XOP
Similar to D42437, XOP supports variable shift for v16i8/v8i16/v4i32/v2i64 types.

Differential Revision: https://reviews.llvm.org/D42526

llvm-svn: 323797
2018-01-30 18:10:21 +00:00
Simon Pilgrim
f15886eb30 Regenerate shuffle sink test
llvm-svn: 323328
2018-01-24 14:59:02 +00:00
Zvi Rackover
b5447b1e7c X86: Update isVectorShiftByScalarCheap with cases covered by AVX512BW
Summary:
AVX512BW adds support for variable shift amount for 16-bit element
vectors.

Reviewers: craig.topper, RKSimon, spatel

Reviewed By: RKSimon

Subscribers: rengolin, tschuett, llvm-commits

Differential Revision: https://reviews.llvm.org/D42437

llvm-svn: 323292
2018-01-24 01:36:40 +00:00
Simon Pilgrim
67b21313ae Add bdver shuffle sink tests.
llvm-svn: 323268
2018-01-23 22:03:57 +00:00
Simon Pilgrim
bf3c39877e Regenerate select test. NFCI.
llvm-svn: 323265
2018-01-23 21:50:46 +00:00
Simon Pilgrim
a075ce04d8 Regenerate shuffle sink test. NFCI.
llvm-svn: 323264
2018-01-23 21:50:11 +00:00
Zvi Rackover
87937e8ed2 X86 Tests: Add AVX512BW config to CodeGenPrepare test. NFC
Case points out that we don't consider shifts supported by AVX512BW
in isVectorShiftByScalarCheap()

llvm-svn: 323242
2018-01-23 19:20:39 +00:00
Serguei Katkov
17e5794f11 [CGP] Fix the GV handling in complex addressing mode
If in complex addressing mode the difference is in GV then
base reg should not be installed because we plan to use
base reg as a merge point of different GVs.

This is a fix for PR35980.

Reviewers: reames, john.brawn, santosh
Reviewed By: john.brawn
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D42230

llvm-svn: 323192
2018-01-23 12:07:49 +00:00
Daniel Neilson
1e68724d24 Remove alignment argument from memcpy/memmove/memset in favour of alignment attributes (Step 1)
Summary:
 This is a resurrection of work first proposed and discussed in Aug 2015:
   http://lists.llvm.org/pipermail/llvm-dev/2015-August/089384.html
and initially landed (but then backed out) in Nov 2015:
   http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20151109/312083.html

 The @llvm.memcpy/memmove/memset intrinsics currently have an explicit argument
which is required to be a constant integer. It represents the alignment of the
dest (and source), and so must be the minimum of the actual alignment of the
two.

 This change is the first in a series that allows source and dest to each
have their own alignments by using the alignment attribute on their arguments.

 In this change we:
1) Remove the alignment argument.
2) Add alignment attributes to the source & dest arguments. We, temporarily,
   require that the alignments for source & dest be equal.

 For example, code which used to read:
  call void @llvm.memcpy.p0i8.p0i8.i32(i8* %dest, i8* %src, i32 100, i32 4, i1 false)
will now read
  call void @llvm.memcpy.p0i8.p0i8.i32(i8* align 4 %dest, i8* align 4 %src, i32 100, i1 false)

 Downstream users may have to update their lit tests that check for
@llvm.memcpy/memmove/memset call/declaration patterns. The following extended sed script
may help with updating the majority of your tests, but it does not catch all possible
patterns so some manual checking and updating will be required.

s~declare void @llvm\.mem(set|cpy|move)\.p([^(]*)\((.*), i32, i1\)~declare void @llvm.mem\1.p\2(\3, i1)~g
s~call void @llvm\.memset\.p([^(]*)i8\(i8([^*]*)\* (.*), i8 (.*), i8 (.*), i32 [01], i1 ([^)]*)\)~call void @llvm.memset.p\1i8(i8\2* \3, i8 \4, i8 \5, i1 \6)~g
s~call void @llvm\.memset\.p([^(]*)i16\(i8([^*]*)\* (.*), i8 (.*), i16 (.*), i32 [01], i1 ([^)]*)\)~call void @llvm.memset.p\1i16(i8\2* \3, i8 \4, i16 \5, i1 \6)~g
s~call void @llvm\.memset\.p([^(]*)i32\(i8([^*]*)\* (.*), i8 (.*), i32 (.*), i32 [01], i1 ([^)]*)\)~call void @llvm.memset.p\1i32(i8\2* \3, i8 \4, i32 \5, i1 \6)~g
s~call void @llvm\.memset\.p([^(]*)i64\(i8([^*]*)\* (.*), i8 (.*), i64 (.*), i32 [01], i1 ([^)]*)\)~call void @llvm.memset.p\1i64(i8\2* \3, i8 \4, i64 \5, i1 \6)~g
s~call void @llvm\.memset\.p([^(]*)i128\(i8([^*]*)\* (.*), i8 (.*), i128 (.*), i32 [01], i1 ([^)]*)\)~call void @llvm.memset.p\1i128(i8\2* \3, i8 \4, i128 \5, i1 \6)~g
s~call void @llvm\.memset\.p([^(]*)i8\(i8([^*]*)\* (.*), i8 (.*), i8 (.*), i32 ([0-9]*), i1 ([^)]*)\)~call void @llvm.memset.p\1i8(i8\2* align \6 \3, i8 \4, i8 \5, i1 \7)~g
s~call void @llvm\.memset\.p([^(]*)i16\(i8([^*]*)\* (.*), i8 (.*), i16 (.*), i32 ([0-9]*), i1 ([^)]*)\)~call void @llvm.memset.p\1i16(i8\2* align \6 \3, i8 \4, i16 \5, i1 \7)~g
s~call void @llvm\.memset\.p([^(]*)i32\(i8([^*]*)\* (.*), i8 (.*), i32 (.*), i32 ([0-9]*), i1 ([^)]*)\)~call void @llvm.memset.p\1i32(i8\2* align \6 \3, i8 \4, i32 \5, i1 \7)~g
s~call void @llvm\.memset\.p([^(]*)i64\(i8([^*]*)\* (.*), i8 (.*), i64 (.*), i32 ([0-9]*), i1 ([^)]*)\)~call void @llvm.memset.p\1i64(i8\2* align \6 \3, i8 \4, i64 \5, i1 \7)~g
s~call void @llvm\.memset\.p([^(]*)i128\(i8([^*]*)\* (.*), i8 (.*), i128 (.*), i32 ([0-9]*), i1 ([^)]*)\)~call void @llvm.memset.p\1i128(i8\2* align \6 \3, i8 \4, i128 \5, i1 \7)~g
s~call void @llvm\.mem(cpy|move)\.p([^(]*)i8\(i8([^*]*)\* (.*), i8([^*]*)\* (.*), i8 (.*), i32 [01], i1 ([^)]*)\)~call void @llvm.mem\1.p\2i8(i8\3* \4, i8\5* \6, i8 \7, i1 \8)~g
s~call void @llvm\.mem(cpy|move)\.p([^(]*)i16\(i8([^*]*)\* (.*), i8([^*]*)\* (.*), i16 (.*), i32 [01], i1 ([^)]*)\)~call void @llvm.mem\1.p\2i16(i8\3* \4, i8\5* \6, i16 \7, i1 \8)~g
s~call void @llvm\.mem(cpy|move)\.p([^(]*)i32\(i8([^*]*)\* (.*), i8([^*]*)\* (.*), i32 (.*), i32 [01], i1 ([^)]*)\)~call void @llvm.mem\1.p\2i32(i8\3* \4, i8\5* \6, i32 \7, i1 \8)~g
s~call void @llvm\.mem(cpy|move)\.p([^(]*)i64\(i8([^*]*)\* (.*), i8([^*]*)\* (.*), i64 (.*), i32 [01], i1 ([^)]*)\)~call void @llvm.mem\1.p\2i64(i8\3* \4, i8\5* \6, i64 \7, i1 \8)~g
s~call void @llvm\.mem(cpy|move)\.p([^(]*)i128\(i8([^*]*)\* (.*), i8([^*]*)\* (.*), i128 (.*), i32 [01], i1 ([^)]*)\)~call void @llvm.mem\1.p\2i128(i8\3* \4, i8\5* \6, i128 \7, i1 \8)~g
s~call void @llvm\.mem(cpy|move)\.p([^(]*)i8\(i8([^*]*)\* (.*), i8([^*]*)\* (.*), i8 (.*), i32 ([0-9]*), i1 ([^)]*)\)~call void @llvm.mem\1.p\2i8(i8\3* align \8 \4, i8\5* align \8 \6, i8 \7, i1 \9)~g
s~call void @llvm\.mem(cpy|move)\.p([^(]*)i16\(i8([^*]*)\* (.*), i8([^*]*)\* (.*), i16 (.*), i32 ([0-9]*), i1 ([^)]*)\)~call void @llvm.mem\1.p\2i16(i8\3* align \8 \4, i8\5* align \8 \6, i16 \7, i1 \9)~g
s~call void @llvm\.mem(cpy|move)\.p([^(]*)i32\(i8([^*]*)\* (.*), i8([^*]*)\* (.*), i32 (.*), i32 ([0-9]*), i1 ([^)]*)\)~call void @llvm.mem\1.p\2i32(i8\3* align \8 \4, i8\5* align \8 \6, i32 \7, i1 \9)~g
s~call void @llvm\.mem(cpy|move)\.p([^(]*)i64\(i8([^*]*)\* (.*), i8([^*]*)\* (.*), i64 (.*), i32 ([0-9]*), i1 ([^)]*)\)~call void @llvm.mem\1.p\2i64(i8\3* align \8 \4, i8\5* align \8 \6, i64 \7, i1 \9)~g
s~call void @llvm\.mem(cpy|move)\.p([^(]*)i128\(i8([^*]*)\* (.*), i8([^*]*)\* (.*), i128 (.*), i32 ([0-9]*), i1 ([^)]*)\)~call void @llvm.mem\1.p\2i128(i8\3* align \8 \4, i8\5* align \8 \6, i128 \7, i1 \9)~g

 The remaining changes in the series will:
Step 2) Expand the IRBuilder API to allow creation of memcpy/memmove with differing
   source and dest alignments.
Step 3) Update Clang to use the new IRBuilder API.
Step 4) Update Polly to use the new IRBuilder API.
Step 5) Update LLVM passes that create memcpy/memmove calls to use the new IRBuilder API,
        and those that use use MemIntrinsicInst::[get|set]Alignment() to use
        getDestAlignment() and getSourceAlignment() instead.
Step 6) Remove the single-alignment IRBuilder API for memcpy/memmove, and the
        MemIntrinsicInst::[get|set]Alignment() methods.

Reviewers: pete, hfinkel, lhames, reames, bollu

Reviewed By: reames

Subscribers: niosHD, reames, jholewinski, qcolombet, jfb, sanjoy, arsenm, dschuff, dylanmckay, mehdi_amini, sdardis, nemanjai, david2050, nhaehnle, javed.absar, sbc100, jgravelle-google, eraman, aheejin, kbarton, JDevlieghere, asb, rbar, johnrusso, simoncook, jordy.potman.lists, apazos, sabuasal, llvm-commits

Differential Revision: https://reviews.llvm.org/D41675

llvm-svn: 322965
2018-01-19 17:13:12 +00:00
Serguei Katkov
4d1dd6b53a [CGP] Fix Complex addressing mode for offset
If the offset is differ in two addressing mode we can continue only if
ScaleReg is not set due to we will use it as merge of different offsets.

It should fix PR35799 and PR35805.

Reviewers: john.brawn, reames
Reviewed By: reames
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D41227

llvm-svn: 322056
2018-01-09 04:37:06 +00:00
Serguei Katkov
b0b67a8d38 [CGP] Fix the handling select inst in complex addressing mode
When we put the value in select placeholder we must pass
the value through simplification tracker due to the value might
be already simplified and erased.

This is a fix for PR35658.

Reviewers: john.brawn, uabelho
Reviewed By: john.brawn
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D41251

llvm-svn: 320956
2017-12-18 04:25:07 +00:00
Serguei Katkov
5036459ae3 [CGP] Fix common type handling in optimizeMemoryInst
If common type is different we should bail out due to we will not be
able to create a select or Phi of these values.

Basically it is done in ExtAddrMode::compare however it does not work
if we handle the null first and then two values of different types.
so add a check in initializeMap as well. The check in ExtAddrMode::compare
is used as earlier bail out.

Reviewers: reames, john.brawn
Reviewed By: john.brawn
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D40479

llvm-svn: 319292
2017-11-29 05:51:26 +00:00
Serguei Katkov
505359f705 [CGP] Fix the crash caused by enable of complex addr mode
We must collect all AddModes even if they are the same.
This is due to Original value is different but we need all original
values collected as they are used as anchors in common phi finding.

Reviewers: john.brawn, reames
Reviewed By: john.brawn
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D40166

llvm-svn: 318638
2017-11-20 05:42:36 +00:00
John Brawn
c3347d4247 Remove stray comma in sink-addrmode test
The extra comma meant it wasn't correctly checking that we weren't getting an
extra getelementptr.

llvm-svn: 318406
2017-11-16 15:15:00 +00:00
Serguei Katkov
365200295a [CGP] Disable Select instruction handling in optimizeMemoryInst. NFC
This patch disables the handling of selects in optimization
extensing scope of optimizeMemoryInst.

The optimization itself is disable by default.
The idea here is just to switch optimiztion level step by step.

Specifically, first optimization will be enabled only for Phi nodes,
then select instructions will be added.

In case someone will complain about perfromance it will be easier to
detect what part of optimizations is responsible for that.

Differential Revision: https://reviews.llvm.org/D36073

llvm-svn: 317555
2017-11-07 09:43:08 +00:00