Commit Graph

25784 Commits

Author SHA1 Message Date
Matt Arsenault
72d27f5525 AMDGPU: Fix tests using old number for constant address space
llvm-svn: 341770
2018-09-10 02:54:25 +00:00
Matt Arsenault
d77fcc2a92 AMDGPU: Use GOT PSV since it has an address space now
llvm-svn: 341768
2018-09-10 02:23:39 +00:00
Matt Arsenault
b998674610 AMDGPU: Don't abort on unknown addrspace argument
llvm-svn: 341767
2018-09-10 02:23:30 +00:00
Craig Topper
3823516103 [X86] Custom type legalize (v2i32 (fp_to_uint v2f64))) without avx512vl by widening to v4i32 and v4f64 instead of v8i32 and v8f64. Make it aware of x86-experimental-vector-widening-legalization
We have isel patterns for v4i32/v4f64 that artificially widen to v8i32/v8f64 so just use that.

If x86-experimental-vector-widening-legalization is enabled, we don't need any custom legalization and can just return. I've modified the test RUN lines to cover this case.

llvm-svn: 341765
2018-09-09 20:36:36 +00:00
Sanjay Patel
6ebf218e4c [SelectionDAG] enhance vector demanded elements to look at a vector select condition operand
This is the DAG equivalent of D51433.
If we know we're not using all vector lanes, use that knowledge to potentially simplify a vselect condition.

The reduction/horizontal tests show that we are eliminating AVX1 operations on the upper half of 256-bit 
vectors because we don't need those anyway.
I'm not sure what the pr34592 test is showing. That's run with -O0; is SimplifyDemandedVectorElts supposed 
to be running there?

Differential Revision: https://reviews.llvm.org/D51696

llvm-svn: 341762
2018-09-09 14:13:22 +00:00
Craig Topper
7af5e333e7 [X86] Create paddus/psubus from narrower vectors with i8/i16 element types.
Summary:
This patch allows vectors with a power of 2 number of elements and i8/i16 element type to select paddus/psubus instructions. ReplaceNodeResults has been updated to custom widen these operations up to 128 bits like we already do for PAVG.

Another step towards fixing PR38691

Reviewers: RKSimon, spatel

Reviewed By: RKSimon, spatel

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D51818

llvm-svn: 341753
2018-09-08 19:32:58 +00:00
Craig Topper
a2c9694bc8 [X86] Mark the ADCX and ADOX instruction as commutable.
llvm-svn: 341752
2018-09-08 18:47:56 +00:00
Craig Topper
4677110348 [X86] Add test cases for commuting ADCX/ADOX instruction to avoid copies.
This is a MIR test so we can test ADOX which we have no isel patterns for. I also plan to remove ADCX isel patterns in the near future so this will help maintain coverage.

llvm-svn: 341751
2018-09-08 18:47:54 +00:00
Craig Topper
c96305970d [X86] Add commuted isel pattern for the load form of ADCX instructions.
This prevents the legacy ADC instruction from being favored over ADCX when the load is in the operand 0.

llvm-svn: 341745
2018-09-08 06:31:43 +00:00
Craig Topper
22a6f51646 [X86] Add load folding test cases for the addcarryx intrinsic.
We are currently only able to fold a load in operand 1 to ADCX. A load in operand 0 will use the legacy ADC instruction.

Ultimately I want to remove isel support for ADCX, but first I'm going to fix the shortcomings I know of so I can write proper MIR tests to maintain coverage later.

llvm-svn: 341744
2018-09-08 06:31:41 +00:00
Craig Topper
761e88d1d4 [X86] Add stack folding MIR test for ADCX/ADOX.
We currently have no way to isel ADOX and I plan to remove isel patterns for ADCX. This test will ensure we still have stack folding support for these instructions if we need them in the future.

llvm-svn: 341743
2018-09-08 05:08:18 +00:00
Reid Kleckner
f803b23879 [COFF] Implement llvm.global_ctors priorities for MSVC COFF targets
Summary:
MSVC and LLD sort sections ASCII-betically, so we need to use section
names that sort between .CRT$XCA (the start) and .CRT$XCU (the default
priority).

In the general case, use .CRT$XCT12345 as the section name, and let the
linker sort the zero-padded digits.

Users with low priorities typically want to initialize as early as
possible, so use .CRT$XCA00199 for prioties less than 200. This number
is arbitrary.

Implements PR38552.

Reviewers: majnemer, mstorsjo

Subscribers: hiraditya, llvm-commits

Differential Revision: https://reviews.llvm.org/D51820

llvm-svn: 341727
2018-09-07 23:07:55 +00:00
Thomas Lively
a0d25815a0 [WebAssembly] v8x16.shuffle
Summary:
Since the shuffle mask is not exposed as an operand in the native ISel
DAG, create a new WebAssembly ISD node exposing the mask. The mask is
lowered as sixteen immediate byte indices no matter what type the
original vector shuffle was operating on.

This CL depends on D51656

Reviewers: aheejin, dschuff

Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits

Differential Revision: https://reviews.llvm.org/D51659

llvm-svn: 341718
2018-09-07 21:54:46 +00:00
Craig Topper
fa535c027e [X86] Add codegen tests for narrow PADDUS/PSUBUS patterns for PR38691.
llvm-svn: 341711
2018-09-07 21:28:46 +00:00
Nick Desaulniers
287a3be379 [AArch64] Support reserving x1-7 registers.
Summary:
Reserving registers x1-7 is used to support CONFIG_ARM64_LSE_ATOMICS in Linux kernel. This change adds support for reserving registers x1 through x7.

Reviewers: javed.absar, phosek, srhines, nickdesaulniers, efriedma

Reviewed By: nickdesaulniers, efriedma

Subscribers: niravd, jfb, manojgupta, nickdesaulniers, jyknight, efriedma, kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D48580

llvm-svn: 341706
2018-09-07 20:58:57 +00:00
Craig Topper
5cbce81c91 [X86] Don't create ZERO_EXTEND_INREG/SIGN_EXTEND_INREG for v1iX vectors.
The generic type legalizer will scalarize vXi1 instructions getting rid of the vector entirely. Creating wider vector instructions is just going to prevent that.

llvm-svn: 341705
2018-09-07 20:56:03 +00:00
Craig Topper
39f48fdcbc [X86] Don't create X86ISD::AVG nodes from v1iX vectors.
The type legalizer will try to scalarize this and fail.

It looks like there's some other v1iX oddities out there too since we still generated some vector instructions.

llvm-svn: 341704
2018-09-07 20:56:01 +00:00
Craig Topper
4863313b35 [X86] Modify the the rdtscp intrinsic to return values instead of taking a pointer argument
Similar to what was recently done for addcarry/subborrow and has been done for rdrand/rdseed for a while. It's better to use two results and an explicit store in IR when the store isn't part of the semantics of the instruction. This allows store->load forwarding to happen in the middle end. Or the store to be removed if its never loaded.

Differential Revision: https://reviews.llvm.org/D51803

llvm-svn: 341698
2018-09-07 19:14:15 +00:00
Craig Topper
72964ae99e [X86] Change the addcarry and subborrow intrinsics to return 2 results and remove the pointer argument.
We should represent the store directly in IR instead. This gives the middle end a chance to remove it if it can see a load from the same address.

Differential Revision: https://reviews.llvm.org/D51769

llvm-svn: 341677
2018-09-07 16:58:39 +00:00
Craig Topper
51e11788a4 [X86] Use regular expressions to make test immune to register allocation changes.
llvm-svn: 341676
2018-09-07 16:58:36 +00:00
Craig Topper
313d09af51 [X86] Teach X86DAGToDAGISel::foldLoadStoreIntoMemOperand to handle loads in operand 1 of commutable operations.
Previously we only handled loads in operand 0, but nothing guarantees the load will be operand 0 for commutable operations.

Differential Revision: https://reviews.llvm.org/D51768

llvm-svn: 341675
2018-09-07 16:27:55 +00:00
Sid Manning
9ad0f02749 Add support for getRegisterByName.
Support required to build the Hexagon Linux kernel.

Differential Revision: https://reviews.llvm.org/D51363

llvm-svn: 341658
2018-09-07 13:36:21 +00:00
Simon Pilgrim
04d0748417 [X86][SSE] Add additional fadd/fsub(x, bitcast_fneg(y)) tests with different integer bitwidths
llvm-svn: 341657
2018-09-07 13:27:07 +00:00
Simon Pilgrim
96d6b9c2e2 [DAGCombiner] foldBitcastedFPLogic - Add basic vector support
Add support for bitcasts from float type to an integer type of the same element bitwidth.

There maybe cases where we need to support different widths (e.g. as SSE __m128i is treated as v2i64) - but I haven't seen cases of this in the wild yet.

llvm-svn: 341652
2018-09-07 12:13:45 +00:00
Simon Pilgrim
a2aef22a72 [X86][SSE] Add fadd/fsub(x, bitcast_fneg(y)) tests
Show missing vector support

llvm-svn: 341650
2018-09-07 11:24:43 +00:00
Tim Northover
bb7d7b3d33 ARM: fix Thumb2 CodeGen for ldrex with folded frame-index.
Because t2LDREX (& t2STREX) were marked as AddrModeNone, but did allow a
FrameIndex operand, rewriteT2FrameIndex asserted. This gives them a
proper addressing-mode and tells the rewriter about it so that encodable
offsets are exploited and others are rejected.

Should fix PR38828.

llvm-svn: 341642
2018-09-07 09:21:25 +00:00
Alexander Timofeev
a805c96c65 [AMDGPU] Preliminary patch for divergence driven instruction selection. Fold immediate SMRD offset.
Differential revision: https://reviews.llvm.org/D51610

Reviewer: rampitec
llvm-svn: 341636
2018-09-07 09:05:34 +00:00
QingShan Zhang
abbb894ff5 [PowerPC] Combine ADD to ADDZE
On the ppc64le platform, if ir has the following form,

define i64 @addze1(i64 %x, i64 %z) local_unnamed_addr #0 {
entry:
  %cmp = icmp ne i64 %z, CONSTANT      (-32767 <= CONSTANT <= 32768)
  %conv1 = zext i1 %cmp to i64
  %add = add nsw i64 %conv1, %x
  ret i64 %add
}
we can optimize it to the form below.

                                when C == 0
                            --> addze X, (addic Z, -1))
                           /
add X, (zext(setne Z, C))--
                           \    when -32768 <= -C <= 32767 && C != 0
                            --> addze X, (addic (addi Z, -C), -1)

Patch By: HLJ2009 (Li Jia He)
Differential Revision: https://reviews.llvm.org/D51403
Reviewed By: Nemanjai 

llvm-svn: 341634
2018-09-07 07:56:05 +00:00
Craig Topper
30e129f256 [X86] Add more test cases for missed opportunities for using RMW form of ADC.
llvm-svn: 341630
2018-09-07 02:39:56 +00:00
Craig Topper
2c9dede9cb [X86] Add RMW ADC patterns with load in operand 1.
ADC is commutable and the load could be in either operand, but we were only checking operand 0.

Ideally we'd mark X86adc_flag as commutable and tablegen would automatically do this, but the EFLAGS register mention is preventing it.

llvm-svn: 341606
2018-09-06 23:55:36 +00:00
Craig Topper
37d68e4599 [X86] Add a test case showing failure to use the RMW form of ADC when the load is in operand 1 going into isel.
The ADC instruction is commutable, but we only have RMW isel patterns with a load on the left hand side. Nothing will canonicalize loads to the LHS on these ops. So we need two patterns.

llvm-svn: 341605
2018-09-06 23:55:34 +00:00
Eric Christopher
fe83270ee9 The initial .text section generated in object files was missing the
SHF_ARM_PURECODE flag when being built with the -mexecute-only flag.
All code sections of an ELF must have the flag set for the final .text
section to be execute-only, otherwise the flag gets removed.

A HasData flag is added to MCSection to aid in the determination that
the section is empty. A virtual setTargetSectionFlags is added to
MCELFObjectTargetWriter to allow subclasses to set target specific
section flags to be added to sections which we then use in the ARM
backend to set SHF_ARM_PURECODE.

Patch by Ivan Lozano!

Reviewed By: echristo

Differential Revision: https://reviews.llvm.org/D48792

llvm-svn: 341593
2018-09-06 22:09:31 +00:00
Scott Linder
834cbc645c Revert r341413
Causes a regression in expensive checks.

llvm-svn: 341589
2018-09-06 21:38:56 +00:00
Sanjay Patel
9e5c163154 [x86] add tests for pow --> cbrt; NFC
llvm-svn: 341575
2018-09-06 18:42:55 +00:00
Michael Berg
1b34b01a8e [NFC] - in preparation for adding nsw, nuw and exact as flags to MI
llvm-svn: 341565
2018-09-06 17:07:29 +00:00
JF Bastien
2920061105 ARM64: improve non-zero memset isel by ~2x
Summary:
I added a few ARM64 memset codegen tests in r341406 and r341493, and annotated
where the generated code was bad. This patch fixes the majority of the issues by
requesting that a 2xi64 vector be used for memset of 32 bytes and above.

The patch leaves the former request for f128 unchanged, despite f128
materialization being suboptimal: doing otherwise runs into other asserts in
isel and makes this patch too broad.

This patch hides the issue that was present in bzero_40_stack and bzero_72_stack
because the code now generates in a better order which doesn't have the store
offset issue. I'm not aware of that issue appearing elsewhere at the moment.

<rdar://problem/44157755>

Reviewers: t.p.northover, MatzeB, javed.absar

Subscribers: eraman, kristof.beyls, chrib, dexonsmith, llvm-commits

Differential Revision: https://reviews.llvm.org/D51706

llvm-svn: 341558
2018-09-06 16:03:32 +00:00
Craig Topper
5a53760f65 [X86][Assembler] Allow %eip as a register in 32-bit mode for .cfi directives.
This basically reverts a change made in r336217, but improves the text of the error message for not allowing IP-relative addressing in 32-bit mode.

Fixes PR38826.

Patch by Iain Sandoe.

llvm-svn: 341512
2018-09-06 02:03:14 +00:00
JF Bastien
ec812ce3d6 NFC: more memset inline arm64 coverage
I'm looking at some codegen optimization in this area and want to make sure I understand the current codegen and don't regress it. This patch further expands the tests (which I already expanded in r341406) to capture more of the current code generation when it comes to stack-based small non-zero memset on arm64. This patch annotates some potential fixes.

llvm-svn: 341493
2018-09-05 20:35:06 +00:00
Sanjay Patel
dbf52837fe [DAGCombiner] try to convert pow(x, 0.25) to sqrt(sqrt(x))
This was proposed as an IR transform in D49306, but it was not clearly justifiable as a canonicalization. 
Here, we only do the transform when the target tells us that sqrt can be lowered with inline code.

This is the basic case. Some potential enhancements are in the TODO comments:

1. Generalize the transform for other exponents (allow more than 2 sqrt calcs if that's really cheaper).
2. If we have less fast-math-flags, generate code to avoid -0.0 and/or INF.
3. Allow the transform when optimizing/minimizing size (might require a target hook to get that right).

Note that by default, x86 converts single-precision sqrt calcs into sqrt reciprocal estimate with 
refinement. That codegen is controlled by CPU attributes and can be manually overridden. We have plenty 
of test coverage for that already, so I didn't bother to include extra testing for that here. AArch uses 
its full-precision ops in all cases (not sure if that's the intended behavior or not, but that should 
also be covered by existing tests).

Differential Revision: https://reviews.llvm.org/D51630 

llvm-svn: 341481
2018-09-05 17:01:56 +00:00
Zhaoshi Zheng
a0aa41d793 Revert "Revert r341269: [Constant Hoisting] Hoisting Constant GEP Expressions"
Reland r341269. Use std::stable_sort when sorting constant condidates.

Reverting commit, r341365:

  Revert r341269: [Constant Hoisting] Hoisting Constant GEP Expressions

  One of the tests is failing 50% of the time when expensive checks are
  enabled. Not sure how deep the problem is so just reverting while the
  author can investigate so that the bots stop repeatedly failing and
  blaming things incorrectly. Will respond with details on the original
  commit.

Original commit, r341269:

  [Constant Hoisting] Hoisting Constant GEP Expressions

  Leverage existing logic in constant hoisting pass to transform constant GEP
  expressions sharing the same base global variable. Multi-dimensional GEPs are
  rewritten into single-dimensional GEPs.

  https://reviews.llvm.org/D51396

Differential Revision: https://reviews.llvm.org/D51654

llvm-svn: 341417
2018-09-04 22:17:03 +00:00
Thomas Lively
cfab8b4b76 [WebAssembly][NFC] Add colon to label in test
llvm-svn: 341414
2018-09-04 21:51:32 +00:00
Scott Linder
dfe089dfd1 [AMDGPU] Legalize VGPR Rsrc operands for MUBUF instructions
Emit a waterfall loop in the general case for a potentially-divergent Rsrc
operand. When practical, avoid this by using Addr64 instructions.

Differential Revision: https://reviews.llvm.org/D50982

llvm-svn: 341413
2018-09-04 21:50:47 +00:00
Thomas Lively
1b55b2be7e [WebAssembly][NFC] Fix formatting and tests
Summary: Small fixes

Reviewers: aheejin, dschuff

Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits

Differential Revision: https://reviews.llvm.org/D51656

llvm-svn: 341411
2018-09-04 21:26:17 +00:00
Krzysztof Parzyszek
f4ad2cb24f [Hexagon] Don't packetize new-value stores with any other stores
llvm-svn: 341409
2018-09-04 21:07:27 +00:00
JF Bastien
fd458fe205 NFC: expand memset inline arm64 coverage
I'm looking at some codegen optimization in this area and want to make sure I understand the current codegen and don't regress it. This patch simply expands the two existing tests to capture more of the current code generation when it comes to heap-based and stack-based small memset on arm64. The tested code is already pretty good, notably when it comes to using STP, FP stores, FP immediate generation, and folding one of the stores into a stack spill when possible. The uses of STUR could be improved, and some more pairing could occur. Straying from bzero patterns currently yield suboptimal code, and I expect a variety of small changes could make things way better.

llvm-svn: 341406
2018-09-04 21:02:00 +00:00
Martin Storsjo
fed420d6b6 [MinGW] [AArch64] Add stubs for potential automatic dllimported variables
The runtime pseudo relocations can't handle the AArch64 format PC
relative addressing in adrp+add/ldr pairs. By using stubs, the potentially
dllimported addresses can be touched up by the runtime pseudo relocation
framework.

Differential Revision: https://reviews.llvm.org/D51452

llvm-svn: 341401
2018-09-04 20:56:21 +00:00
Matt Arsenault
813613c494 AMDGPU: Fix DAG divergence not reporting flat loads
Match behavior in DAG of r340343

llvm-svn: 341393
2018-09-04 18:58:19 +00:00
Dan Gohman
045a217bee [WebAssembly] Fix operand rewriting in inline asm lowering.
Use MachineOperand::ChangeToImmediate rather than reassigning
MachineOperands to new values created from MachineOperand::CreateImm,
so that their parent pointers are preserved.

This fixes "Instruction has operand with wrong parent set" errors
reported by the MachineVerifier.

llvm-svn: 341389
2018-09-04 17:46:12 +00:00
Chandler Carruth
6cb12444cc Revert r341269: [Constant Hoisting] Hoisting Constant GEP Expressions
One of the tests is failing 50% of the time when expensive checks are
enabled. Not sure how deep the problem is so just reverting while the
author can investigate so that the bots stop repeatedly failing and
blaming things incorrectly. Will respond with details on the original
commit.

llvm-svn: 341365
2018-09-04 13:36:44 +00:00
Chandler Carruth
664aa868f5 [x86/SLH] Add a real Clang flag and LLVM IR attribute for Speculative
Load Hardening.

Wires up the existing pass to work with a proper IR attribute rather
than just a hidden/internal flag. The internal flag continues to work
for now, but I'll likely remove it soon.

Most of the churn here is adding the IR attribute. I talked about this
Kristof Beyls and he seemed at least initially OK with this direction.
The idea of using a full attribute here is that we *do* expect at least
some forms of this for other architectures. There isn't anything
*inherently* x86-specific about this technique, just that we only have
an implementation for x86 at the moment.

While we could potentially expose this as a Clang-level attribute as
well, that seems like a good question to defer for the moment as it
isn't 100% clear whether that or some other programmer interface (or
both?) would be best. We'll defer the programmer interface side of this
for now, but at least get to the point where the feature can be enabled
without relying on implementation details.

This also allows us to do something that was really hard before: we can
enable *just* the indirect call retpolines when using SLH. For x86, we
don't have any other way to mitigate indirect calls. Other architectures
may take a different approach of course, and none of this is surfaced to
user-level flags.

Differential Revision: https://reviews.llvm.org/D51157

llvm-svn: 341363
2018-09-04 12:38:00 +00:00