clang-p2996

Author	SHA1	Message	Date
Sander de Smalen	2d77e788f2	[AArch64] Implement aarch64_vector_pcs codegen support. This patch adds codegen support for the saving/restoring V8-V23 for functions specified with the aarch64_vector_pcs calling convention attribute, as added in patch D51477. Reviewers: t.p.northover, gberry, thegameg, rengolin, javed.absar, MatzeB Reviewed By: thegameg Differential Revision: https://reviews.llvm.org/D51479 llvm-svn: 342049	2018-09-12 12:10:22 +00:00
Jessica Paquette	2386eab360	[MachineOutliner] Add codegen size remarks to the MachineOutliner Since the outliner is a module pass, it doesn't get codegen size remarks like the other codegen passes do. This adds size remarks to the outliner. This is kind of a workaround, so it's peppered with FIXMEs; size remarks really ought to not ever be handled by the pass itself. However, since the outliner is the only "MachineModulePass", this works for now. Since the entire purpose of the MachineOutliner is to produce code size savings, it really ought to be included in codgen size remarks. If we ever go ahead and make a MachineModulePass (say, something similar to MachineFunctionPass), then all of this ought to be moved there. llvm-svn: 342009	2018-09-11 23:05:34 +00:00
Josh Stone	f446facab0	[GlobalISel] Lower dbg.declare into indirect DBG_VALUE Summary: D31439 changed the semantics of dbg.declare to take the address of a variable as the first argument, making it indirect. It specifically updated FastISel for this change here: https://reviews.llvm.org/D31439#change-WVArzi177jPl GlobalISel needs to follow suit, or else it will be missing a level of indirection in the generated debuginfo. This problem was seen in a Rust debuginfo test on aarch64, since GlobalISel is used at -O0 for aarch64. https://github.com/rust-lang/rust/issues/49807 https://bugzilla.redhat.com/show_bug.cgi?id=1611597 https://bugzilla.redhat.com/show_bug.cgi?id=1625768 Reviewers: dblaikie, aprantl, t.p.northover, javed.absar, rnk Reviewed By: rnk Subscribers: #debug-info, rovka, kristof.beyls, JDevlieghere, llvm-commits, tstellar Differential Revision: https://reviews.llvm.org/D51749 llvm-svn: 341969	2018-09-11 17:52:01 +00:00
Roman Lebedev	baf2628043	[DagCombine][NFC] Some more tests fo for X % C == 0 (UREM case) transform For https://reviews.llvm.org/D50222 Patch by: hermord (Dmytro Shynkevych)! llvm-svn: 341953	2018-09-11 15:34:26 +00:00
Sanjay Patel	e368f46788	[AArch64] test codegen for unsigned saturated add; NFC This is identical to the tests added for x86 at rL341845. A semi-generic DAGCombine should improve things universally. llvm-svn: 341935	2018-09-11 13:21:28 +00:00
Nick Desaulniers	287a3be379	[AArch64] Support reserving x1-7 registers. Summary: Reserving registers x1-7 is used to support CONFIG_ARM64_LSE_ATOMICS in Linux kernel. This change adds support for reserving registers x1 through x7. Reviewers: javed.absar, phosek, srhines, nickdesaulniers, efriedma Reviewed By: nickdesaulniers, efriedma Subscribers: niravd, jfb, manojgupta, nickdesaulniers, jyknight, efriedma, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D48580 llvm-svn: 341706	2018-09-07 20:58:57 +00:00
JF Bastien	2920061105	ARM64: improve non-zero memset isel by ~2x Summary: I added a few ARM64 memset codegen tests in r341406 and r341493, and annotated where the generated code was bad. This patch fixes the majority of the issues by requesting that a 2xi64 vector be used for memset of 32 bytes and above. The patch leaves the former request for f128 unchanged, despite f128 materialization being suboptimal: doing otherwise runs into other asserts in isel and makes this patch too broad. This patch hides the issue that was present in bzero_40_stack and bzero_72_stack because the code now generates in a better order which doesn't have the store offset issue. I'm not aware of that issue appearing elsewhere at the moment. <rdar://problem/44157755> Reviewers: t.p.northover, MatzeB, javed.absar Subscribers: eraman, kristof.beyls, chrib, dexonsmith, llvm-commits Differential Revision: https://reviews.llvm.org/D51706 llvm-svn: 341558	2018-09-06 16:03:32 +00:00
JF Bastien	ec812ce3d6	NFC: more memset inline arm64 coverage I'm looking at some codegen optimization in this area and want to make sure I understand the current codegen and don't regress it. This patch further expands the tests (which I already expanded in r341406) to capture more of the current code generation when it comes to stack-based small non-zero memset on arm64. This patch annotates some potential fixes. llvm-svn: 341493	2018-09-05 20:35:06 +00:00
Sanjay Patel	dbf52837fe	[DAGCombiner] try to convert pow(x, 0.25) to sqrt(sqrt(x)) This was proposed as an IR transform in D49306, but it was not clearly justifiable as a canonicalization. Here, we only do the transform when the target tells us that sqrt can be lowered with inline code. This is the basic case. Some potential enhancements are in the TODO comments: 1. Generalize the transform for other exponents (allow more than 2 sqrt calcs if that's really cheaper). 2. If we have less fast-math-flags, generate code to avoid -0.0 and/or INF. 3. Allow the transform when optimizing/minimizing size (might require a target hook to get that right). Note that by default, x86 converts single-precision sqrt calcs into sqrt reciprocal estimate with refinement. That codegen is controlled by CPU attributes and can be manually overridden. We have plenty of test coverage for that already, so I didn't bother to include extra testing for that here. AArch uses its full-precision ops in all cases (not sure if that's the intended behavior or not, but that should also be covered by existing tests). Differential Revision: https://reviews.llvm.org/D51630 llvm-svn: 341481	2018-09-05 17:01:56 +00:00
Zhaoshi Zheng	a0aa41d793	Revert "Revert r341269: [Constant Hoisting] Hoisting Constant GEP Expressions" Reland r341269. Use std::stable_sort when sorting constant condidates. Reverting commit, r341365: Revert r341269: [Constant Hoisting] Hoisting Constant GEP Expressions One of the tests is failing 50% of the time when expensive checks are enabled. Not sure how deep the problem is so just reverting while the author can investigate so that the bots stop repeatedly failing and blaming things incorrectly. Will respond with details on the original commit. Original commit, r341269: [Constant Hoisting] Hoisting Constant GEP Expressions Leverage existing logic in constant hoisting pass to transform constant GEP expressions sharing the same base global variable. Multi-dimensional GEPs are rewritten into single-dimensional GEPs. https://reviews.llvm.org/D51396 Differential Revision: https://reviews.llvm.org/D51654 llvm-svn: 341417	2018-09-04 22:17:03 +00:00
JF Bastien	fd458fe205	NFC: expand memset inline arm64 coverage I'm looking at some codegen optimization in this area and want to make sure I understand the current codegen and don't regress it. This patch simply expands the two existing tests to capture more of the current code generation when it comes to heap-based and stack-based small memset on arm64. The tested code is already pretty good, notably when it comes to using STP, FP stores, FP immediate generation, and folding one of the stores into a stack spill when possible. The uses of STUR could be improved, and some more pairing could occur. Straying from bzero patterns currently yield suboptimal code, and I expect a variety of small changes could make things way better. llvm-svn: 341406	2018-09-04 21:02:00 +00:00
Martin Storsjo	fed420d6b6	[MinGW] [AArch64] Add stubs for potential automatic dllimported variables The runtime pseudo relocations can't handle the AArch64 format PC relative addressing in adrp+add/ldr pairs. By using stubs, the potentially dllimported addresses can be touched up by the runtime pseudo relocation framework. Differential Revision: https://reviews.llvm.org/D51452 llvm-svn: 341401	2018-09-04 20:56:21 +00:00
Chandler Carruth	6cb12444cc	Revert r341269: [Constant Hoisting] Hoisting Constant GEP Expressions One of the tests is failing 50% of the time when expensive checks are enabled. Not sure how deep the problem is so just reverting while the author can investigate so that the bots stop repeatedly failing and blaming things incorrectly. Will respond with details on the original commit. llvm-svn: 341365	2018-09-04 13:36:44 +00:00
Sanjay Patel	0945959869	[AArch64][x86] add tests for pow(x, 0.25); NFC Folds for this were proposed in D49306, but we decided the transform is better suited for the backend. llvm-svn: 341341	2018-09-03 22:11:47 +00:00
Sander de Smalen	6cab60fa06	Extend hasStoreToStackSlot with list of FI accesses. For instructions that spill/fill to and from multiple frame-indices in a single instruction, hasStoreToStackSlot and hasLoadFromStackSlot should return an array of accesses, rather than just the first encounter of such an access. This better describes FI accesses for AArch64 (paired) LDP/STP instructions. Reviewers: t.p.northover, gberry, thegameg, rengolin, javed.absar, MatzeB Reviewed By: MatzeB Differential Revision: https://reviews.llvm.org/D51537 llvm-svn: 341301	2018-09-03 09:15:58 +00:00
Roman Lebedev	d7a6244475	[DAGCombine] optimizeSetCCOfSignedTruncationCheck(): handle inverted pattern Summary: A follow-up for D49266 / rL337166 + D49497 / rL338044. This is still the same pattern to check for the [lack of] signed truncation, but in this case the constants and the predicate are negated. https://rise4fun.com/Alive/BDV https://rise4fun.com/Alive/n7Z Reviewers: spatel, craig.topper, RKSimon, javed.absar, efriedma, dmgreen Reviewed By: spatel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D51532 llvm-svn: 341287	2018-09-02 13:56:22 +00:00
Zhaoshi Zheng	f5297fb24b	[Constant Hoisting] Hoisting Constant GEP Expressions Leverage existing logic in constant hoisting pass to transform constant GEP expressions sharing the same base global variable. Multi-dimensional GEPs are rewritten into single-dimensional GEPs. Differential Revision: https://reviews.llvm.org/D51396 llvm-svn: 341269	2018-09-01 00:04:56 +00:00
Roman Lebedev	75c2961b76	[NFC][X86][AArch64] A few more patterns for [lack of] signed truncation check pattern.[NFC][X86][AArch64] A few more patterns for [lack of] signed truncation check pattern. llvm-svn: 341188	2018-08-31 08:52:03 +00:00
Ties Stuij	9c16d809d2	[CodeGen] emit inline asm clobber list warnings for reserved (cont) Summary: This is a continuation of https://reviews.llvm.org/D49727 Below the original text, current changes in the comments: Currently, in line with GCC, when specifying reserved registers like sp or pc on an inline asm() clobber list, we don't always preserve the original value across the statement. And in general, overwriting reserved registers can have surprising results. For example: extern int bar(int[]); int foo(int i) { int a[i]; // VLA asm volatile( "mov r7, #1" : : : "r7" ); return 1 + bar(a); } Compiled for thumb, this gives: $ clang --target=arm-arm-none-eabi -march=armv7a -c test.c -o - -S -O1 -mthumb ... foo: .fnstart @ %bb.0: @ %entry .save {r4, r5, r6, r7, lr} push {r4, r5, r6, r7, lr} .setfp r7, sp, #12 add r7, sp, #12 .pad #4 sub sp, #4 movs r1, #7 add.w r0, r1, r0, lsl #2 bic r0, r0, #7 sub.w r0, sp, r0 mov sp, r0 @APP mov.w r7, #1 @NO_APP bl bar adds r0, #1 sub.w r4, r7, #12 mov sp, r4 pop {r4, r5, r6, r7, pc} ... r7 is used as the frame pointer for thumb targets, and this function needs to restore the SP from the FP because of the variable-length stack allocation a. r7 is clobbered by the inline assembly (and r7 is included in the clobber list), but LLVM does not preserve the value of the frame pointer across the assembly block. This type of behavior is similar to GCC's and has been discussed on the bugtracker: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=11807 . No consensus seemed to have been reached on the way forward. Clang behavior has briefly been discussed on the CFE mailing (starting here: http://lists.llvm.org/pipermail/cfe-dev/2018-July/058392.html). I've opted for following Eli Friedman's advice to print warnings when there are reserved registers on the clobber list so as not to diverge from GCC behavior for now. The patch uses MachineRegisterInfo's target-specific knowledge of reserved registers, just before we convert the inline asm string in the AsmPrinter. If we find a reserved register, we print a warning: repro.c:6:7: warning: inline asm clobber list contains reserved registers: R7 [-Winline-asm] "mov r7, #1" ^ Reviewers: efriedma, olista01, javed.absar Reviewed By: efriedma Subscribers: eraman, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D51165 llvm-svn: 341062	2018-08-30 12:52:35 +00:00
David Green	1f203bcd75	[AArch64] Optimise load(adr address) to ldr address Providing that the load is known to be 4 byte aligned, we can optimise a ldr(adr address) to just ldr address. Differential Revision: https://reviews.llvm.org/D51030 llvm-svn: 341058	2018-08-30 11:55:16 +00:00
Roman Lebedev	26a1836757	[NFC][CodeGen][SelectionDAG] Tests for X % C == 0 codegen improvement. Hacker's Delight 10-17: when C is constant, the result of X % C == 0 can be computed more cheaply without actually calculating the remainder. The motivation is discussed here: https://bugs.llvm.org/show_bug.cgi?id=35479. Patch by: hermord (Dmytro Shynkevych)! For https://reviews.llvm.org/D50222 llvm-svn: 341047	2018-08-30 09:32:21 +00:00
Huihui Zhang	2f4106592d	[GlobalMerge] Fix GlobalMerge on bss external global variables. Summary: Global variables that are external and zero initialized are supposed to be merged with global variables in the bss section rather than the data section. Reviewers: efriedma, rengolin, t.p.northover, javed.absar, asl, john.brawn, pcc Reviewed By: efriedma Subscribers: dmgreen, llvm-commits Differential Revision: https://reviews.llvm.org/D51379 llvm-svn: 341008	2018-08-30 00:49:50 +00:00
Peter Collingbourne	9c9c8b22d2	Start reserving x18 by default on Android targets. Differential Revision: https://reviews.llvm.org/D45588 llvm-svn: 340889	2018-08-29 01:38:47 +00:00
Aditya Nandakumar	6b4d343e13	[GISel]: Add missing opcodes for overflow intrinsics https://reviews.llvm.org/D51197 Currently, IRTranslator (and GISel) seems to be arbitrarily picking which overflow intrinsics get mapped into opcodes which either have a carry as an input or not. For intrinsics such as Intrinsic::uadd_with_overflow, translate it to an opcode (G_UADDO) which doesn't have any carry inputs (similar to LLVM IR). This patch adds 4 missing opcodes for completeness - G_UADDO, G_USUBO, G_SSUBE and G_SADDE. llvm-svn: 340865	2018-08-28 18:54:10 +00:00
Eli Friedman	071203bbf2	[AArch64] Reject inline asm with FP registers when FP is disabled. Otherwise, we would crash trying to deal with an illegal input. Differential Revision: https://reviews.llvm.org/D51202 llvm-svn: 340637	2018-08-24 19:12:13 +00:00
Sanjay Patel	ed1b9695ee	[SelectionDAG] unroll unsupported vector FP ops earlier to avoid libcalls on undef elements (PR38527) This solves the motivating case from: https://bugs.llvm.org/show_bug.cgi?id=38527 If we are legalizing an FP vector op that maps to 1 of the LLVM intrinsics that mimic libm calls, but we're going to end up with scalar libcalls for that vector type anyway, then we should unroll the vector op into scalars before widening. This avoids libcalls because we've lost the knowledge that some of the scalar elements are undef. Differential Revision: https://reviews.llvm.org/D50791 llvm-svn: 340469	2018-08-22 22:52:05 +00:00
David Green	9dd1d451d9	[AArch64] Add Tiny Code Model for AArch64 This adds the plumbing for the Tiny code model for the AArch64 backend. This, instead of loading addresses through the normal ADRP;ADD pair used in the Small model, uses a single ADR. The 21 bit range of an ADR means that the code and its statically defined symbols need to be within 1MB of each other. This makes it mostly interesting for embedded applications where we want to fit as much as we can in as small a space as possible. Differential Revision: https://reviews.llvm.org/D49673 llvm-svn: 340397	2018-08-22 11:31:39 +00:00
Aditya Nandakumar	2a08285cf3	Revert "Revert r339977: [GISel]: Add Opcodes for a few LLVM Intrinsics" This reverts commit 7debc334e6421bb5251ef8f18e97166dfc7dd787. I missed updating legalizer-info-validation.mir as I had assertions turned off in my build and that specific test requires asserts. Fixed it now. llvm-svn: 340197	2018-08-20 18:43:19 +00:00
Matt Arsenault	25e51540e1	DAG: Fix isKnownNeverNaN for basic non-sNaN cases fadd/fsub/fmul need to worry about infinities as well as fdiv. llvm-svn: 340085	2018-08-17 21:19:22 +00:00
Luke Cheeseman	64dcdec60c	[AArch64] - Generate pointer authentication instructions - Generate pointer authentication instructions - The functions instrumented depend on function attribtues: all (all functions instrumentent) non-leaf (only those that spill LR) none - Function epilogues sign the LR before spilling to the stack and authenticate the LR once restored - If the target is v8.3a or greater than can use the combined authenticate and return instruction Differential revision: https://reviews.llvm.org/D49793 llvm-svn: 340018	2018-08-17 12:53:22 +00:00
Chandler Carruth	b898b86f49	Revert r339977: [GISel]: Add Opcodes for a few LLVM Intrinsics This is breaking ~all the bots. llvm-svn: 339982	2018-08-17 04:47:16 +00:00
Aditya Nandakumar	973a557338	[GISel]: Add Opcodes for a few LLVM Intrinsics https://reviews.llvm.org/D50401 Add opcodes for llvm.intrinsic.trunc, round, and update the IRTranslator for the same. Reviewed by: dsanders. llvm-svn: 339977	2018-08-17 01:41:56 +00:00
Eli Friedman	73e8a784e6	[SelectionDAG] Improve the legalisation lowering of UMULO. There is no way in the universe, that doing a full-width division in software will be faster than doing overflowing multiplication in software in the first place, especially given that this same full-width multiplication needs to be done anyway. This patch replaces the previous implementation with a direct lowering into an overflowing multiplication algorithm based on half-width operations. Correctness of the algorithm was verified by exhaustively checking the output of this algorithm for overflowing multiplication of 16 bit integers against an obviously correct widening multiplication. Baring any oversights introduced by porting the algorithm to DAG, confidence in correctness of this algorithm is extremely high. Following table shows the change in both t = runtime and s = space. The change is expressed as a multiplier of original, so anything under 1 is “better” and anything above 1 is worse. +-------+-----------+-----------+-------------+-------------+ \| Arch \| u64u64 t \| u64u64 s \| u128u128 t \| u128u128 s \| +-------+-----------+-----------+-------------+-------------+ \| X64 \| - \| - \| ~0.5 \| ~0.64 \| \| i686 \| ~0.5 \| ~0.6666 \| ~0.05 \| ~0.9 \| \| armv7 \| - \| ~0.75 \| - \| ~1.4 \| +-------+-----------+-----------+-------------+-------------+ Performance numbers have been collected by running overflowing multiplication in a loop under `perf` on two x86_64 (one Intel Haswell, other AMD Ryzen) based machines. Size numbers have been collected by looking at the size of function containing an overflowing multiply in a loop. All in all, it can be seen that both performance and size has improved except in the case of armv7 where code size has regressed for 128-bit multiply. u128*u128 overflowing multiply on 32-bit platforms seem to benefit from this change a lot, taking only 5% of the time compared to original algorithm to calculate the same thing. The final benefit of this change is that LLVM is now capable of lowering the overflowing unsigned multiply for integers of any bit-width as long as the target is capable of lowering regular multiplication for the same bit-width. Previously, 128-bit overflowing multiply was the widest possible. Patch by Simonas Kazlauskas! Differential Revision: https://reviews.llvm.org/D50310 llvm-svn: 339922	2018-08-16 18:39:39 +00:00
Sanjay Patel	49a8280f43	[AArch64] add tests for poor vector intrinsic lowering via legalization (PR38527); NFC These correspond to the x86 tests added with rL339790 / rL339791, but I widened the non-fsin tests to v3f32 to show the problem because AArch supports v2f32 ops. llvm-svn: 339793	2018-08-15 17:06:21 +00:00
Amara Emerson	30e61404a8	[GlobalISel][IRTranslator] Fix a bug in handling repeating struct types during argument lowering. Differential Revision: https://reviews.llvm.org/D49442 llvm-svn: 339674	2018-08-14 12:04:25 +00:00
Sanjay Patel	15d1501aae	[SelectionDAG] try harder to convert funnel shift to rotate Similar to rL337966 - if the DAGCombiner's rotate matching was working as expected, I don't think we'd see any test diffs here. AArch only goes right, and PPC only goes left. x86 has both, so no diffs there. Differential Revision: https://reviews.llvm.org/D50091 llvm-svn: 339359	2018-08-09 17:26:22 +00:00
Ties Stuij	0244aa67d6	revert tests of '[CodeGen] emit inline asm clobber list warnings for reserved' llvm-svn: 339276	2018-08-08 17:19:32 +00:00
Ties Stuij	52f3631f4b	[CodeGen] emit inline asm clobber list warnings for reserved Summary: Currently, in line with GCC, when specifying reserved registers like sp or pc on an inline asm() clobber list, we don't always preserve the original value across the statement. And in general, overwriting reserved registers can have surprising results. For example: ``` extern int bar(int[]); int foo(int i) { int a[i]; // VLA asm volatile( "mov r7, #1" : : : "r7" ); return 1 + bar(a); } ``` Compiled for thumb, this gives: ``` $ clang --target=arm-arm-none-eabi -march=armv7a -c test.c -o - -S -O1 -mthumb ... foo: .fnstart @ %bb.0: @ %entry .save {r4, r5, r6, r7, lr} push {r4, r5, r6, r7, lr} .setfp r7, sp, #12 add r7, sp, #12 .pad #4 sub sp, #4 movs r1, #7 add.w r0, r1, r0, lsl #2 bic r0, r0, #7 sub.w r0, sp, r0 mov sp, r0 @APP mov.w r7, #1 @NO_APP bl bar adds r0, #1 sub.w r4, r7, #12 mov sp, r4 pop {r4, r5, r6, r7, pc} ... ``` r7 is used as the frame pointer for thumb targets, and this function needs to restore the SP from the FP because of the variable-length stack allocation a. r7 is clobbered by the inline assembly (and r7 is included in the clobber list), but LLVM does not preserve the value of the frame pointer across the assembly block. This type of behavior is similar to GCC's and has been discussed on the bugtracker: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=11807 . No consensus seemed to have been reached on the way forward. Clang behavior has briefly been discussed on the CFE mailing (starting here: http://lists.llvm.org/pipermail/cfe-dev/2018-July/058392.html). I've opted for following Eli Friedman's advice to print warnings when there are reserved registers on the clobber list so as not to diverge from GCC behavior for now. The patch uses MachineRegisterInfo's target-specific knowledge of reserved registers, just before we convert the inline asm string in the AsmPrinter. If we find a reserved register, we print a warning: ``` repro.c:6:7: warning: inline asm clobber list contains reserved registers: R7 [-Winline-asm] "mov r7, #1" ^ ``` Reviewers: eli.friedman, olista01, javed.absar, efriedma Reviewed By: efriedma Subscribers: efriedma, eraman, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D49727 llvm-svn: 339257	2018-08-08 15:15:59 +00:00
Bryan Chan	e023706471	[AArch64] Fix assertion failure on widened f16 BUILD_VECTOR Summary: Ensure that NormalizedBuildVector returns a BUILD_VECTOR with operands of the same type. This fixes an assertion failure in VerifySDNode. Reviewers: SjoerdMeijer, t.p.northover, javed.absar Reviewed By: SjoerdMeijer Subscribers: kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D50202 llvm-svn: 339013	2018-08-06 14:14:41 +00:00
Aditya Nandakumar	e07b3b737b	[GISel]: Add Opcodes for CTLZ/CTTZ/CTPOP https://reviews.llvm.org/D48600 Added IRTranslator support to translate these known intrinsics into GISel opcodes. llvm-svn: 338944	2018-08-04 01:22:12 +00:00
Alexander Ivchenko	49168f6778	[GlobalISel] Rewrite CallLowering::lowerReturn to accept multiple VRegs per Value This is logical continuation of https://reviews.llvm.org/D46018 (r332449) Differential Revision: https://reviews.llvm.org/D49660 llvm-svn: 338685	2018-08-02 08:33:31 +00:00
Lei Liu	b9a7b7a84d	Fix FCOPYSIGN expansion In expansion of FCOPYSIGN, the shift node is missing when the two operands of FCOPYSIGN are of the same size. We should always generate shift node (if the required shift bit is not zero) to put the sign bit into the right position, regardless of the size of underlying types. Differential Revision: https://reviews.llvm.org/D49973 llvm-svn: 338665	2018-08-02 01:54:12 +00:00
Sanjay Patel	8aac22e06a	[SelectionDAG] fix bug in translating funnel shift with non-power-of-2 type The bug is visible in the constant-folded x86 tests. We can't use the negated shift amount when the type is not power-of-2: https://rise4fun.com/Alive/US1r ...so in that case, use the regular lowering that includes a select to guard against a shift-by-bitwidth. This path is improved by only calculating the modulo shift amount once now. Also, improve the rotate (with power-of-2 size) lowering to use a negate rather than subtract from bitwidth. This improves the codegen whether we have a rotate instruction or not (although we can still see that we're not matching to a legal rotate in all cases). llvm-svn: 338592	2018-08-01 17:17:08 +00:00
Bryan Chan	67106b5e08	[AArch64] Fix FCCMP with FP16 operands Summary: This patch adds support for FCCMP instruction with FP16 operands, avoiding an assertion during instruction selection. Reviewers: olista01, SjoerdMeijer, t.p.northover, javed.absar Reviewed By: SjoerdMeijer Subscribers: kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D50115 llvm-svn: 338554	2018-08-01 13:50:29 +00:00
Amara Emerson	6cdfe29d8e	[GlobalISel][IRTranslator] Use RPO traversal when visiting blocks to translate. Previously we were just visiting the blocks in the function in IR order, which is rather arbitrary. Therefore we wouldn't always visit defs before uses, but the translation code relies on this assumption in some places. Only codegen change seen in tests is an elision of a redundant copy. Fixes PR38396 llvm-svn: 338476	2018-08-01 02:17:42 +00:00
Amara Emerson	1e8c164c63	[AArch64][GlobalISel] Add isel support for G_BLOCK_ADDR. Also refactors some existing code to materialize addresses for the large code model so it can be shared between G_GLOBAL_VALUE and G_BLOCK_ADDR. This implements PR36390. Differential Revision: https://reviews.llvm.org/D49903 llvm-svn: 338337	2018-07-31 00:09:02 +00:00
Amara Emerson	0e86c07077	[AArch64][GlobalISel] Make G_BLOCK_ADDR legal. Differential Revision: https://reviews.llvm.org/D49902 llvm-svn: 338336	2018-07-31 00:08:56 +00:00
Amara Emerson	6aff5a7810	[GlobalISel] Add a G_BLOCK_ADDR opcode to handle IR blockaddress constants. Differential Revision: https://reviews.llvm.org/D49900 llvm-svn: 338335	2018-07-31 00:08:50 +00:00
Sanjay Patel	9f807f44b1	[DAGCombiner] transform sub-of-shifted-signbit to add This is exchanging a sub-of-1 with add-of-minus-1: https://rise4fun.com/Alive/plKAH This is another step towards improving select-of-constants codegen (see D48970). x86 is the motivating target, and those diffs all appear to be wins. PPC and AArch64 look neutral. I've limited this to early combining (!LegalOperations) in case a target wants to reverse it, but I think canonicalizing to 'add' is more likely to produce further transforms because we have more folds for 'add'. Differential Revision: https://reviews.llvm.org/D49924 llvm-svn: 338317	2018-07-30 22:21:37 +00:00
Jessica Paquette	fa3bee4756	[MachineOutliner][AArch64] Add support for saving LR to a register This teaches the outliner to save LR to a register rather than the stack when possible. This allows us to avoid bumping the stack in outlined functions in some cases. By doing this, in a later patch, we can teach the outliner to do something like this: f1: ... bl OUTLINED_FUNCTION ... f2: ... move LR's contents to a register bl OUTLINED_FUNCTION move the register's contents back instead of falling back to saving LR in both cases. llvm-svn: 338278	2018-07-30 17:45:28 +00:00

1 2 3 4 5 ...

2243 Commits