Commit Graph

24465 Commits

Author SHA1 Message Date
Sanjay Patel
8652c53d29 [DAG] propagate FMF for all FPMathOperators
This is a simple hack based on what's proposed in D37686, but we can extend it if needed in follow-ups. 
It gets us most of the FMF functionality that we want without adding any state bits to the flags. It 
also intentionally leaves out non-FMF flags (nsw, etc) to minimize the patch.

It should provide a superset of the functionality from D46563 - the extra tests show propagation and 
codegen diffs for fcmp, vecreduce, and FP libcalls.

The PPC log2() test shows the limits of this most basic approach - we only applied 'afn' to the last 
node created for the call. AFAIK, there aren't any libcall optimizations based on the flags currently, 
so that shouldn't make any difference.

Differential Revision: https://reviews.llvm.org/D46854

llvm-svn: 332358
2018-05-15 14:16:24 +00:00
Simon Pilgrim
891ebcdbaa [X86] Split off F16C WriteCvtPH2PS/WriteCvtPS2PH scheduler classes
Btver2 - VCVTPH2PSYrm needs to double pump the AGU
Broadwell - missing VCVTPS2PH*mr stores extra latency

Allows us to remove the WriteCvtF2FSt conversion store class

llvm-svn: 332357
2018-05-15 14:12:32 +00:00
Artur Gainullin
243a3d56d8 [X86] Improve unsigned saturation downconvert detection.
Summary:
New unsigned saturation downconvert patterns detection was implemented in
X86 Codegen:

(truncate (smin (smax (x, C1), C2)) to dest_type),
where C1 >= 0 and C2 is unsigned max of destination type.

(truncate (smax (smin (x, C2), C1)) to dest_type)
where C1 >= 0, C2 is unsigned max of destination type and C1 <= C2.
These two patterns are equivalent to:

(truncate (umin (smax(x, C1), unsigned_max_of_dest_type)) to dest_type)

Reviewers: RKSimon

Subscribers: llvm-commits, a.elovikov

Differential Revision: https://reviews.llvm.org/D45315

llvm-svn: 332336
2018-05-15 10:24:12 +00:00
Craig Topper
fadf8b8dec [X86] Add fast isel tests for some of the avx512 truncate intrinsics to match current clang codegen.
llvm-svn: 332326
2018-05-15 04:26:27 +00:00
Sanjay Patel
165587b424 [AArch64] enhance test to show FMF loss; NFC
llvm-svn: 332301
2018-05-14 21:53:21 +00:00
Martin Storsjo
ace7ae935f [ARM] Back up R4 and LR if calling the stack probe function
Differential Revision: https://reviews.llvm.org/D46777

llvm-svn: 332298
2018-05-14 21:32:52 +00:00
Sanjay Patel
4c8a67a229 [PowerPC] add more tests for FMF propagation; NFC
llvm-svn: 332295
2018-05-14 21:17:49 +00:00
Krzysztof Parzyszek
771f2422d0 [Hexagon] Add a target feature for memop generation
llvm-svn: 332285
2018-05-14 20:09:07 +00:00
Sid Manning
d9f2873511 Hexagon: Put relocations after instructions not packets.
Change relocation output so that relocation information follows
individual instructions rather than clustering them at the end
of packets.

This change required shifting block of code but the actual change
is in HexagonPrettyPrinter's PrintInst.

Differential Revision: https://reviews.llvm.org/D46728

llvm-svn: 332283
2018-05-14 19:46:08 +00:00
Craig Topper
53ceb4805f [X86] Remove and autoupgrade avx512.vbroadcast.ss/avx512.vbroadcast.sd intrinsics.
llvm-svn: 332271
2018-05-14 18:21:22 +00:00
Simon Pilgrim
228d24a2d6 [X86][BtVer2] Fix MMX/YMM integer vector nt store schedules
MMX was missing and YMM was tagged as a fp nt store

llvm-svn: 332269
2018-05-14 18:07:28 +00:00
Geoff Berry
64a2ea41ea [BranchFolding] Allow hoisting to block with a single conditional branch.
Summary:
The BranchFolding pass is currently missing opportunities to hoist
common code if the hoisted-to block contains a single conditional branch
that has register uses.  This occurs somewhat frequently on AArch64 with
CBZ/TBZ opcodes.

This change also eliminates some code differences when debug info is
present since the presence of e.g. DBG_VALUE instructions in the
hoisted-to block can enable hoisting that wouldn't have occurred without
them.

Reviewers: MatzeB, rnk, kparzysz, twoh, aprantl, javed.absar

Subscribers: kristof.beyls, JDevlieghere, mcrosier, llvm-commits

Differential Revision: https://reviews.llvm.org/D46324

llvm-svn: 332265
2018-05-14 17:31:18 +00:00
Krzysztof Parzyszek
329c3e9a5f [Hexagon] Avoid predicate copies to integer registers from store-locked
llvm-svn: 332260
2018-05-14 16:41:40 +00:00
Evandro Menezes
14fa2e4fa5 [AArch64] Improve single vector lane stores
When storing the 0th lane of a vector, use a simpler and usually more efficient scalar store instead.

Differential revision: https://reviews.llvm.org/D46655

llvm-svn: 332251
2018-05-14 15:26:35 +00:00
Craig Topper
f633f3eb67 [X86] Add fast isel test cases for the clang output for 512-bit cvtps2pd related intrinsics.
llvm-svn: 332214
2018-05-14 05:09:41 +00:00
Craig Topper
0e71c6d5ca [X86] Remove and autoupgrade the cvtusi2sd intrinsic. Use uitofp+insertelement instead.
llvm-svn: 332206
2018-05-14 00:06:49 +00:00
Craig Topper
97e74b05ef [X86] Add patterns for combining movss+uint_to_fp into the intrinsic instructions under AVX512.
This matches what we do for sint_to_fp.

llvm-svn: 332205
2018-05-13 23:24:21 +00:00
Craig Topper
12067185d4 [X86] Add fast-isel test cases for _mm_cvtu32_sd, _mm_cvtu64_sd, _mm_cvtu32_ss, and _mm_cvtu64_ss.
llvm-svn: 332204
2018-05-13 23:24:19 +00:00
Craig Topper
85906cf041 [X86] Remove and autoupgrade masked vpermd/vpermps intrinsics.
llvm-svn: 332198
2018-05-13 18:03:59 +00:00
Dimitry Andric
a39c409619 Follow-up to rL332176 by adding a test case for PR37264.
Noticed by Simon Pilgrim.

llvm-svn: 332197
2018-05-13 14:32:23 +00:00
Matt Arsenault
dfb88dfe30 AMDGPU: Make undef legal for v2i16/v2f16
This is apparently necessary to stop undef from being
turned into a build_vector of 0s.

llvm-svn: 332195
2018-05-13 10:04:38 +00:00
Puyan Lotfi
380a6f55ff [NFC] MIR-Canon: switching to a stable string sorting of instructions.
llvm-svn: 332191
2018-05-13 06:07:20 +00:00
Craig Topper
38b713d4a7 [X86] Add some load folding patterns for cvtsi2ss/sd into intrinsic instructions.
llvm-svn: 332189
2018-05-13 01:54:33 +00:00
Craig Topper
28b85caea8 [X86] Remove some unused CHECK lines from tests.
llvm-svn: 332188
2018-05-13 00:58:23 +00:00
Craig Topper
df3a9cedff [X86] Remove an autoupgrade legacy cvtss2sd intrinsics.
llvm-svn: 332187
2018-05-13 00:29:40 +00:00
Craig Topper
38ad7ddabc [X86] Remove and autoupgrade cvtsi2ss/cvtsi2sd intrinsics to match what clang has used for a very long time.
llvm-svn: 332186
2018-05-12 23:14:39 +00:00
Craig Topper
a288f241cd [X86] Remove some unused masked conversion intrinsics that can be replaced with an older intrinsic and a select.
This is what clang already uses.

llvm-svn: 332170
2018-05-12 02:34:28 +00:00
Stanislav Mekhanoshin
7012c246c1 [AMDGPU] Fix amdgpu-waves-per-eu accounting in scheduler
We cannot query this attribute from a subtarget given a machine function.
At this point attribute itself is already unavailable and can only be
obtained through MFI.

Differential Revision: https://reviews.llvm.org/D46781

llvm-svn: 332166
2018-05-12 01:41:56 +00:00
Tom Stellard
655fdd3f82 AMDGPU/GlobalISel: Implement select() for >32-bit G_STORE
Reviewers: arsenm, nhaehnle

Subscribers: kzhuravl, wdng, yaxunl, rovka, kristof.beyls, dstuttard, tpr, llvm-commits, t-tye

Differential Revision: https://reviews.llvm.org/D46153

llvm-svn: 332154
2018-05-11 23:12:49 +00:00
Changpeng Fang
f094885a9e AMDGPU/SI: Don't promote alloca to vector for AddrSpaceCast instruction.
Summary:
  We have no logic to promote alloca to vector for an AddrSpaceCast instruction.

Reviewer:
  arsenm

Differential Revision:
  https://reviews.llvm.org/D45993

llvm-svn: 332147
2018-05-11 22:17:57 +00:00
Craig Topper
a17d627abb [X86] Remove and autoupgrade a bunch of FMA instrinsics that are no longer used by clang.
llvm-svn: 332146
2018-05-11 21:59:34 +00:00
Yaxun Liu
deba150c27 [AMDGPU] Fix compilation failure when IR contains comdat
Remove a useless SwitchSection which also causes compilation failure
when IR contains comdat.

The SwitchSection is useless because the current section is already
correct text section for the function therefore no need to switch.

It causes compilation failure for comdat because functions with comdat
has specific text section, not the default .text section.

Since HIP uses comdat, this bug caused failures for HIP.

Differential Revision: https://reviews.llvm.org/D46770

llvm-svn: 332137
2018-05-11 20:40:14 +00:00
Vedant Kumar
99d5c072f0 [DAGCombiner] Set the right SDLoc on extended SETCC uses (7/N)
ExtendSetCCUses updates SETCC nodes which use a load (OriginalLoad) to
reflect a simplification to the load (ExtLoad).

Based on my reading, ExtendSetCCUses may create new nodes to extend a
constant attached to a SETCC. It also creates fresh SETCC nodes which
refer to any updated operands.

ISTM that the location applied to the new constant and SETCC nodes
should be the same as the location of the ExtLoad.

This was suggested by Adrian in https://reviews.llvm.org/D45995.

Part of: llvm.org/PR37262

Differential Revision: https://reviews.llvm.org/D46216

llvm-svn: 332119
2018-05-11 18:40:10 +00:00
Vedant Kumar
fd340a4047 [DAGCombiner] Set the right SDLoc on a newly-created sextload (6/N)
This teaches tryToFoldExtOfLoad to set the right location on a
newly-created extload. With that in place, the logic for performing a
certain ([s|z]ext (load ...)) combine becomes identical for sexts and
zexts, and we can get rid of one copy of the logic.

The test case churn is due to dependencies on IROrders inherited from
the wrong SDLoc.

Part of: llvm.org/PR37262

Differential Revision: https://reviews.llvm.org/D46158

llvm-svn: 332118
2018-05-11 18:40:08 +00:00
Simon Pilgrim
661ae7778d [X86][BtVer2] Model ymm move as double pumped instructions
We still need to handle mmx/xmm moves as 'decode-only' no-pipe instructions

llvm-svn: 332109
2018-05-11 17:38:36 +00:00
Alex Bradbury
bca0c3cdb6 [RISCV] Support .option rvc and norvc assembler directives
These directives allow the 'C' (compressed) extension to be enabled/disabled 
within a single file.

Differential Revision: https://reviews.llvm.org/D45864
Patch by Kito Cheng

llvm-svn: 332107
2018-05-11 17:30:28 +00:00
Geoff Berry
60460268c0 [AArch64] Fix performPostLD1Combine to check for constant lane index.
Summary:
performPostLD1Combine in AArch64ISelLowering looks for vector
insert_vector_elt of a loaded value which it can optimize into a single
LD1LANE instruction.  The code checking for the pattern was not checking
if the lane index was a constant which could cause two problems:

- an assert when lowering the LD1LANE ISD node since it assumes an
  constant operand

- an assert in isel if the lane index value depends on the
  post-incremented base register

Both of these issues are avoided by simply checking that the lane index
is a constant.

Fixes bug 35822.

Reviewers: t.p.northover, javed.absar

Subscribers: rengolin, kristof.beyls, mcrosier, llvm-commits

Differential Revision: https://reviews.llvm.org/D46591

llvm-svn: 332103
2018-05-11 16:25:06 +00:00
Sanjoy Das
82105e2a7d Use iteration instead of recursion in CFIInserter
Summary: This recursive step can overflow the stack.

Reviewers: djokov, petarj

Subscribers: mcrosier, jlebar, bixia, llvm-commits

Differential Revision: https://reviews.llvm.org/D46671

llvm-svn: 332101
2018-05-11 15:54:46 +00:00
Simon Pilgrim
22dd72b995 [X86] Split WriteF/WriteVec Move/Load/Store scheduler classes by vector width
Fixes a SNB issue that was missing vlddqu/vmovntdqa ymm instructions

llvm-svn: 332094
2018-05-11 14:30:54 +00:00
Tom Stellard
dcc95e9385 AMDGPU/GlobalISel: Implement select() for 32-bit G_FPTOUI
Reviewers: arsenm, nhaehnle

Subscribers: kzhuravl, wdng, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D45883

llvm-svn: 332082
2018-05-11 05:44:16 +00:00
Craig Topper
1ee19ae126 [X86] Add new patterns for masked scalar load/store to match clang's codegen from r331958.
Clang's codegen now uses 128-bit masked load/store intrinsics in IR. The backend will widen to 512-bits on AVX512F targets.

So this patch adds patterns to detect codegen's widening and patterns for AVX512VL that don't get widened.

We may be able to drop some of the old patterns, but I leave that for a future patch.

llvm-svn: 332049
2018-05-10 21:49:16 +00:00
Tom Stellard
1e0edad4bb AMDGPU/GlobalISel: Implement select() for G_BITCAST s32 <--> <2 x s16>
Reviewers: arsenm, nhaehnle

Subscribers: kzhuravl, wdng, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D45881

llvm-svn: 332042
2018-05-10 21:20:10 +00:00
Tom Stellard
1dc90204bf AMDGPU/GlobalISel: Enable TableGen'd instruction selector
Reviewers: arsenm, nhaehnle

Reviewed By: arsenm

Subscribers: kzhuravl, wdng, mgorny, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D45994

llvm-svn: 332039
2018-05-10 20:53:06 +00:00
Sam Clegg
a5908009cd [WebAsembly] Update default triple in test files to wasm32-unknown-unkown.
Summary: The final -wasm component has been the default for some time now.

Subscribers: jfb, dschuff, jgravelle-google, eraman, aheejin, JDevlieghere, sunfish, llvm-commits

Differential Revision: https://reviews.llvm.org/D46342

llvm-svn: 332007
2018-05-10 17:49:11 +00:00
Simon Pilgrim
38ac0e9c6b [X86] Split WriteVecALU/WriteVecLogic/WriteShuffle/WriteVarShuffle/WritePSADBW/WritePHAdd scheduler classes
Split off XMM classes from the default (MMX) classes.

llvm-svn: 331999
2018-05-10 17:06:09 +00:00
Sanjay Patel
b4e7893ba8 [x86] fix fmaxnum/fminnum with nnan
With nnan, there's no need for the masked merge / blend
sequence (that probably costs much more than the min/max
instruction).

Somewhere between clang 5.0 and 6.0, we started producing
these intrinsics for fmax()/fmin() in C source instead of
libcalls or fcmp/select. The backend wasn't prepared for
that, so we regressed perf in those cases.

Note: it's possible that other targets have similar problems
as seen here. 

Noticed while investigating PR37403 and related bugs:
https://bugs.llvm.org/show_bug.cgi?id=37403

The IR FMF propagation cases still don't work. There's
a proposal that might fix those cases in D46563.

llvm-svn: 331992
2018-05-10 15:40:49 +00:00
Sanjay Patel
ec9b6be26f [x86] fix test names; NFC
llvm-svn: 331989
2018-05-10 14:58:47 +00:00
Sanjay Patel
be97134955 [x86] add tests for maxnum/minnum intrinsics with nnan; NFC
Clang 6.0 was updated to create these intrinsics rather than 
libcalls or fcmp/select, but the backend wasn't prepared to 
handle that optimally. 

This bug is not the primary reason for PR37403:
https://bugs.llvm.org/show_bug.cgi?id=37403
...but it's probably more important for x86 perf.

llvm-svn: 331988
2018-05-10 14:48:42 +00:00
Nirav Dave
a5ad417589 [DAG] Avoid using deleted node in rebuildSetCC
Summary:
The combine in rebuildSetCC may be combined to another
node leaving our references stale. Keep a handle on
it to avoid stale references.

Fixes PR36602.

Reviewers: dbabokin, RKSimon, eli.friedman, davide

Subscribers: hiraditya, uabelho, JesperAntonsson, qcolombet, llvm-commits

Differential Revision: https://reviews.llvm.org/D46404

llvm-svn: 331985
2018-05-10 14:28:54 +00:00
Gabor Buella
a832b22bae [X86] ptwrite intrinsic
Reviewers: craig.topper, RKSimon

Reviewed By: craig.topper, RKSimon

Differential Revision: https://reviews.llvm.org/D46539

llvm-svn: 331961
2018-05-10 07:26:05 +00:00