clang-p2996

Author	SHA1	Message	Date
Jessica Paquette	bd22a99c57	Add missing `REQUIRES: asserts` to combine-icmp-to-lhs-known-bits.mir	2021-09-03 09:25:37 -07:00
Amara Emerson	6d9505b8e0	[AArch64][GlobalISel] Support for folding G_ROTR as shifted operands. This allows selection like: eor w0, w1, w2, ror #8 Saves 500 bytes on ClamAV -Os, which is 0.1%. Differential Revision: https://reviews.llvm.org/D109206	2021-09-02 21:37:24 -07:00
Jessica Paquette	844d8e0337	[GlobalISel] Combine icmp eq/ne x, 0/1 -> x when x == 0 or 1 This adds the following combines: ``` x = ... 0 or 1 c = icmp eq x, 1 -> c = x ``` and ``` x = ... 0 or 1 c = icmp ne x, 0 -> c = x ``` When the target's true value for the relevant types is 1. This showed up in the following situation: https://godbolt.org/z/M5jKexWTW SDAG currently supports the `ne` case, but not the `eq` case. This can probably be further generalized, but I don't feel like thinking that hard right now. This gives some minor code size improvements across the board on CTMark at -Os for AArch64. (0.1% for 7zip and pairlocalalign in particular.) Differential Revision: https://reviews.llvm.org/D109130	2021-09-02 15:05:31 -07:00
Bradley Smith	14e1a4a6ee	[AArch64][SVE] Workaround incorrect types when lowering fixed length gather/scatter When lowering a fixed length gather/scatter the index type is assumed to be the same as the memory type, this is incorrect in cases where the extension of the index has been folded into the addressing mode. For now add a temporary workaround to fix the codegen faults caused by this by preventing the removal of this extension. At a later date the lowering for SVE gather/scatters will be redesigned to improve the way addressing modes are handled. As a short term side effect of this change, the addressing modes generated for fixed length gather/scatters will not be optimal. Differential Revision: https://reviews.llvm.org/D109145	2021-09-02 15:07:24 +00:00
Roman Lebedev	3f1f08f0ed	Revert @llvm.isnan intrinsic patchset. Please refer to https://lists.llvm.org/pipermail/llvm-dev/2021-September/152440.html (and that whole thread.) TLDR: the original patch had no prior RFC, yet it had some changes that really need a proper RFC discussion. It won't be productive to discuss such an RFC, once it's actually posted, while said patch is already committed, because that introduces bias towards already-committed stuff, and the tree is potentially in broken state meanwhile. While the end result of discussion may lead back to the current design, it may also not lead to the current design. Therefore i take it upon myself to revert the tree back to last known good state. This reverts commit `4c4093e6e3`. This reverts commit `0a2b1ba33a`. This reverts commit `d9873711cb`. This reverts commit `791006fb8c`. This reverts commit `c22b64ef66`. This reverts commit `72ebcd3198`. This reverts commit `5fa6039a5f`. This reverts commit `9efda541bf`. This reverts commit `94d3ff09cf`.	2021-09-02 13:53:56 +03:00
Ben Shi	14500628b6	[AArch64][test] Add new tests for (mul (add x, c0), c1) Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D108870	2021-09-02 10:39:49 +08:00
Jon Roelofs	9237eda304	Revert "[AArch64][GlobalISel] Legalize bswap <2 x i16>" This reverts commit `5cd63e9ec2`. https://bugs.llvm.org/show_bug.cgi?id=51707 The sequence feeding in/out of the rev32/ushr isn't quite right: _swap: ldr h0, [x0] ldr h1, [x0, #2] - mov v0.h[1], v1.h[0] + mov v0.s[1], v1.s[0] rev32 v0.8b, v0.8b ushr v0.2s, v0.2s, #16 - mov h1, v0.h[1] + mov s1, v0.s[1] str h0, [x0] str h1, [x0, #2] ret	2021-09-01 16:49:20 -07:00
Roman Lebedev	f5753125f0	[Codegen][TLI][X86] SimplifyMultipleUseDemandedBits(): 0'th vec subreg widening is free, try to perform it earlier I believe, the profitability reasoning here is correct "sub"reg is already located within the 0'th subreg of wider reg, so if we have suvector insertion at index 0 into undef, then it's always free do to. After this, D109065 finally avoids the regression in D108382. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D109074	2021-09-02 00:54:05 +03:00
Amara Emerson	a86bbe1e31	[AArch64][GlobalISel] Handle any-extending FPR loads in manual selection code. When we have an any-extending FPR bank load, none of the tablegen patterns match and we fall back to the C++ selector. Like with the truncating stores that were fixed recently, the C++ wasn't able to handle it and ended up generating invalid copies between different size regclasses. This change adds handling for this case, splitting the load into a regular load and a SUBREG_TO_REG to extend it into the original wide destination reg.	2021-09-01 10:19:22 -07:00
Jessica Paquette	94d3ff09cf	[GlobalISel] Don't use G_FPTOSI in G_ISNAN legalization As noted in the comments in D108227, using G_FPTOSI produces wrong results for G_ISNAN. Drop the G_FPTOSI and perform the operation on integer types. Elsewhere in LLVM, a bitcast would be the appropriate choice (as it is in SDAG). GlobalISel does not distinguish between integer and FP types, so a bitcast would be meaningless here.	2021-08-31 10:26:42 -07:00
Owen Anderson	db9de22f2b	Teach the AArch64 backend patterns to generate the EOR3 instruction. Adds patterns to match the EOR3 instruction. Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D108793	2021-08-30 20:01:08 +00:00
Ellis Hoag	47b239eb5a	[DIBuilder] Do not replace empty enum types It looks like this array was missed in `4276d4a8d0` Fixed tests that expected `elements` to be empty or depeneded on the order of the empty DINode. Reviewed By: aprantl Differential Revision: https://reviews.llvm.org/D107024	2021-08-30 12:33:03 -07:00
Jun Ma	15b2a8e7fa	[AArch64][SVE] Optimize ptrue predicate pattern with known sve register width. For vectors that are exactly equal to getMaxSVEVectorSizeInBits, just use AArch64SVEPredPattern::all, which can enable the use of unpredicated ptrue when available. TestPlan: check-llvm Differential Revision: https://reviews.llvm.org/D108706	2021-08-27 20:03:48 +08:00
Jessica Paquette	2363a20001	[AArch64][GlobalISel] Optimize G_BUILD_VECTOR of undef + 1 elt -> SUBREG_TO_REG This pattern ``` %elt = ... something ... %undef = G_IMPLICIT_DEF %vec = G_BUILD_VECTOR %elt, %undef, %undef, ... %undef ``` Can be selected to a SUBREG_TO_REG, assuming `%elt` and `%vec` have the same register bank. We don't care about any of the bits in `%vec` aside from those in `%elt`, which just happens to be the 0th element. This is preferable to emitting `mov` instructions for every index. This gives minor code size improvements on the test suite at -Os. Differential Revision: https://reviews.llvm.org/D108773	2021-08-26 11:45:11 -07:00
Simon Wallis	c4dc81eeab	[AArch64] provide strictfp attributes in test file A post-commit review comment on https://reviews.llvm.org/D107452 pointed out that https://llvm.org/docs/LangRef.html says: "In a function that uses the constrained intrinsics the strictfp attribute is required on all function calls." Although there are several files across several test directories which don't follow this guidance, it is straightforward to provide this attribute. Reviewed By: kpn Differential Revision: https://reviews.llvm.org/D107567	2021-08-26 16:56:43 +01:00
Jacob Bramley	05f3219b38	[AArch64] Lower fptoi.sat intrinsics for NEON. Following on from D102353, extend the fptoi.sat intrinsics to use NEON fcvt* instructions. Differential Revision: https://reviews.llvm.org/D108460	2021-08-26 15:37:00 +01:00
Andrew Wei	99c4336374	[LoopDataPrefetch] Add missed LoopSimplify dependence for prefetch pass SCEVExpander::expandCodeFor may expand add recurrences for loop with a preheader, so we should make LoopDataPrefetch dependent on LoopSimplify. This patch will try to fix : https://bugs.llvm.org/show_bug.cgi?id=43784 Reviewed By: Meinersbur Differential Revision: https://reviews.llvm.org/D108448	2021-08-26 21:01:59 +08:00
David Green	6ffc6951a3	[AArch64] Remove unpredictable from narrowing instructions. Like other similar instructions the xtn2 family do not have side effects, and explicitly marking them as such can help improve scheduling freedom.	2021-08-26 09:43:44 +01:00
Nicholas Guy	36fcf47fc8	[AArch64] Generate SMOV in place of sext(fmov(...)) A single smov instruction is capable of moving from a vector register while performing the sign-extend during said move, rather than each step being performed by separate instructions. Differential Revision: https://reviews.llvm.org/D108633	2021-08-25 15:23:22 +01:00
Peilin Guo	4c4dbeeeea	[DAGCombine] Check the legality of the index of EXTRACT_SUBVECTOR For ISD::EXTRACT_SUBVECTOR, its second operand must be a constant multiple of the known-minimum vector length of the result type. Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D107795	2021-08-25 19:33:39 +08:00
Amara Emerson	2ed8053d46	Revert "[AArch64][GlobalISel] Don't contract cross-bank copies into truncating stores." This reverts commit `67bf3ac744`. The reason is that this change is now superseded by `04fb9b729a` which fixes the underlying problem in the selector. Now it's fine to generate truncating FP stores since the selector code will just generate subreg copies to handle them.	2021-08-24 16:26:56 -07:00
Amara Emerson	04fb9b729a	[AArch64][GlobalISel] Fix incorrect handling of fp truncating stores. When the tablegen patterns fail to select a truncating scalar FPR store, our manual selection code also failed to handle it silently, trying to generate an invalid copy. Fix this by adding support in the manual code to generate a proper subreg copy before selecting a non-truncating store.	2021-08-24 16:07:00 -07:00
Jessica Paquette	ef8707574b	[AArch64][GlobalISel] Legalize narrow scalar FP arithmetic Widen narrow fp arithmetic ops (e.g. G_FADD). When we don't have full FP16 support, widen to s32. Otherwise widen to s16. https://godbolt.org/z/TbT9Pqa7e Differential Revision: https://reviews.llvm.org/D108660	2021-08-24 13:54:28 -07:00
Eli Friedman	09dcf31d74	[NFC] Add tests for i128 fshl on a few targets. In preparation for D108058.	2021-08-24 11:43:35 -07:00
Jessica Paquette	db232de193	[AArch64][GlobalISel] Legalize + select v2p0 -> v264 G_PTRTOINT 1) Just mark this case as legal because it can just be a copy. 2) Ensure the copy in the existing code actually gets selected. Without doing this, we'll crash because the destination won't have a register class. This fell back 35 times in a build of clang with GISel for AArch64. Differential Revision: https://reviews.llvm.org/D108610	2021-08-24 11:02:01 -07:00
Jessica Paquette	67d4dd5c07	[AArch64][GlobalISel] Select @llvm.aarch64.neon.ld4.* Reuse the selection code from the ld2 case. This is similar to how SDAG handles things in AArch64ISelDAGToDAG. (See SelectLoad) This fell back ~100 times while building clang with GISel enabled for AArch64. Factoring out the gross subreg copy part ought to make selecting the rest of this family fairly easy. Differential Revision: https://reviews.llvm.org/D108600	2021-08-24 09:03:49 -07:00
Petar Avramovic	2bf4eeeeb6	[GlobalISel] Avoid creating COPY in LegalizationArtifactCombiner When Src and Dst used in buildAnyExtOrTrunc or buildSExtOrTrunc have the same type (creates COPY) use Src register directly or use replaceRegOrBuildCopy instead. Differential Revision: https://reviews.llvm.org/D108306	2021-08-24 11:09:56 +02:00
Jessica Paquette	2ec2b25fba	[AArch64][GlobalISel] Select @llvm.aarch64.neon.ld2.* This is pretty similar to the ST2 selection code in `AArch64InstructionSelector::selectIntrinsicWithSideEffects`. This is a GISel equivalent of the ld2 case in `AArch64DAGToDAGISel::Select`. There's some weirdness there that appears here too (e.g. using ld1 for scalar cases, which are 1-element vectors in SDAG.) It's a little gross that we have to create the copy and then select it right after, but I think we'd need to refactor the existing copy selection code quite a bit to do better. This was falling back while building llvm-project with GISel for AArch64. Differential Revision: https://reviews.llvm.org/D108590	2021-08-23 17:15:53 -07:00
Stanislav Mekhanoshin	401a45c61b	Fix late rematerialization operands check D106408 enables rematerialization of instructions with virtual register uses. That has uncovered the bug in the allUsesAvailableAt implementation: https://bugs.llvm.org/show_bug.cgi?id=51516. In the majority of cases canRematerializeAt() called to check if an instruction can be rematerialized before the given UseIdx. However, SplitEditor::enterIntvAtEnd() calls it to rematerialize an instruction at the end of a block passing LIS.getMBBEndIdx() into the check. In the testcase from the bug it has attempted to rematerialize ADDXri after STRXui in bb.17. The use operand %55 of the ADD is killed by the STRX but that is undetected by the check because it adjusts passed UseIdx to the reg slot, before the kill. The value is dead at the index passed to the check however. This change uses a later of passed UseIdx and its reg slot. This shall be correct because if are checking an availability of operands before an instruction that instruction cannot be the one defining these operands. If we are checking for late rematerialization we are really interested if operands live past the instruction. The bug is not exploitable without D106408 but needed to reland reverted D106408. Differential Revision: https://reviews.llvm.org/D108475	2021-08-23 12:23:58 -07:00
Jessica Paquette	a2c8e17658	[AArch64][GlobalISel] Add regbankselect support for G_LLROUND Same as G_LROUND: destination should always be a GPR, source should always be a FPR. Differential Revision: https://reviews.llvm.org/D108566	2021-08-23 10:32:20 -07:00
Jessica Paquette	fe51f9098b	[AArch64][GlobalISel] Legalize G_LLROUND for s64 + s32 Same as G_LROUND. Also add a TODO for full fp16 legalization. Differential Revision: https://reviews.llvm.org/D108564	2021-08-23 09:45:23 -07:00
Jessica Paquette	6760e2a7bc	[GlobalISel] Translate @llvm.llround.* -> G_LLROUND Translate it using `IRTranslator::translateSimpleIntrinsic`. Differential Revision: https://reviews.llvm.org/D108563	2021-08-23 09:42:53 -07:00
Amara Emerson	3187a4f3f1	[AArch64][GlobalISel] Add legalizer support for the @llvm.get.dynamic.area.offset intrinsic. This is just 0 on AArch64.	2021-08-20 17:13:34 -07:00
Amara Emerson	67bf3ac744	[AArch64][GlobalISel] Don't contract cross-bank copies into truncating stores. Truncating stores with GPR bank sources shouldn't be mutated into using FPR bank sources, since those aren't supported. Ideally this should be a selection failure in the tablegen patterns, but for now avoid generating them.	2021-08-20 16:36:23 -07:00
Jessica Paquette	9e9d70591e	[AArch64][GlobalISel] Legalize non-register-sized scalar G_BITREVERSE Clamp types to [s32, s64] and make them a power of 2. This matches SDAG's behaviour. https://godbolt.org/z/vTeGqf4vT Differential Revision: https://reviews.llvm.org/D108344	2021-08-20 14:44:03 -07:00
Jessica Paquette	7e91c59844	[AArch64][GlobalISel] Legalize 32-bit + narrow G_SMULO + G_UMULO SDAG lowers 32-bit and 64-bit G_SMULO + G_UMULO. We were missing the 32-bit case. For other sizes, make the 0th type a power of 2 and clamp it to either 32 bits or 64 bits. Right now, this will allow us to handle narrow types (e.g. s4, s24, etc.). The LegalizerHelper doesn't support narrowing G_SMULO or G_UMULO right now. I think we want clamping behaviour either way, so we might as well include it now to be explicit. Differential Revision: https://reviews.llvm.org/D108240	2021-08-20 14:37:46 -07:00
Jessica Paquette	16caf6321c	[AArch64][GlobalISel] Clamp vectors of p0 when legalizing G_LOAD/G_STORE We had a rule for <n x s64> but not one for <n x p0>. As a result, we'd fall back on like <5 x p0> or whatever. Differential Revision: https://reviews.llvm.org/D108484	2021-08-20 14:34:49 -07:00
Jessica Paquette	470c74f181	[AArch64][GlobalISel] Add regbankselect support for G_LROUND Destination is always a GPR, since the result is always an integer. Source is always a FPR, since the source is always floating point. Differential Revision: https://reviews.llvm.org/D108419	2021-08-20 14:31:14 -07:00
Jessica Paquette	44bf0dc625	[AArch64][GlobalISel] Mark G_LROUND as legal for s64 dst + s32/s64 src. Matches SDAG's behaviour for these types. Differential Revision: https://reviews.llvm.org/D108420	2021-08-20 14:22:58 -07:00
Jessica Paquette	af8e09d4bb	[GlobalISel] Add G_LLROUND Basically the same as G_LROUND. Handles the llvm.llround family of intrinsics. Also add a helper function to the MachineVerifier for checking if all of the (virtual register) operands of an instruction are scalars. Seems like a useful thing to have. Differential Revision: https://reviews.llvm.org/D108429	2021-08-20 14:07:21 -07:00
Tim Northover	3d41ef68e7	AArch64: don't form indexed paired ops if base reg overlaps operands. The registers involved might not be identical, but can still overlap (e.g. "str w0, [x0, #4]!").	2021-08-20 11:39:38 +01:00
Jessica Paquette	3207ed196c	[GlobalISel] Add IRTranslator support for @llvm.lround.* -> G_LROUND Translate the `@llvm.lround.*` family to G_LROUND via `IRTranslator::translateSimpleIntrinsic`. Differential Revision: https://reviews.llvm.org/D108418	2021-08-19 17:08:08 -07:00
Jessica Paquette	3118926483	[GlobalISel] Add a G_LROUND instruction Meant to represent the `@llvm.lround.*` family. Add the opcode, docs, and verification. Differential Revision: https://reviews.llvm.org/D108417	2021-08-19 17:06:24 -07:00
Amara Emerson	95ac3d15e9	[AArch64][GlobalISel] Add G_VECREDUCE fewerElements support for full scalarization. For some reductions like G_VECREDUCE_OR on AArch64, we need to scalarize completely if the source is <= 64b. This change adds support for that in the legalizer. If the source has a pow-2 num elements, then we can do a tree reduction using the scalar operation in the individual elements. Otherwise, we just create a sequential chain of operations. For AArch64, we only need to scalarize if the input is <64b. If it's great than 64b then we can first do a fewElements step to 64b, taking advantage of vector instructions until we reach the point of scalarization. I also had to relax the verifier checks for reductions because the intrinsics support <1 x EltTy> types, which we lower to scalars for GlobalISel. Differential Revision: https://reviews.llvm.org/D108276	2021-08-19 16:38:52 -07:00
Amara Emerson	a0051f7149	[AArch64][GlobalISel] Fix miscompile of <16 x s8> G_EXTRACT_VECTOR_ELT. When support for copying vector s8 lanes was added recently, this also had the side effect of fixing a fallback for <16 x s8> extracts since both used the same helper. However, there was a bug in another helper to get the regclass for a specific FPR-native type, which was assigning FPR16 to s8 instead of FPR8.	2021-08-19 16:22:32 -07:00
Tim Northover	edab411ee6	AArch64: copy all parts of the mem operand across when combining a store In particular we were dropping volatility, which can lead to unwanted transformations.	2021-08-19 18:26:39 +01:00
Owen Anderson	06a4c85890	Use v16i8 rather than v2i64 as the VT for memset expansion on AArch64. This allows the instruction selector to realize that it can directly broadcast the low byte of the memset value, rather than replicating it to a 64-bit GPR before broadcasting. This fixes PR50985. Differential Revision: https://reviews.llvm.org/D108354	2021-08-19 16:54:07 +00:00
David Green	d10f23a25d	[ISel] Expand saddsat and ssubsat via asr and xor This changes the lowering of saddsat and ssubsat so that instead of using: r,o = saddo x, y c = setcc r < 0 s = c ? INTMAX : INTMIN ret o ? s : r into using asr and xor to materialize the INTMAX/INTMIN constants: r,o = saddo x, y s = ashr r, BW-1 x = xor s, INTMIN ret o ? x : r https://alive2.llvm.org/ce/z/TYufgD This seems to reduce the instruction count in most testcases across most architectures. X86 has some custom lowering added to compensate for cases where it can increase instruction count. Differential Revision: https://reviews.llvm.org/D105853	2021-08-19 16:08:07 +01:00
Jessica Paquette	3d91d5b757	[AArch64][GlobalISel] Mark G_FMINNUM/G_FMAXNUM as floating point opcodes We need to ensure that these end up on FPR to allow imported patterns to select them. This will also ensure that we get good regbank selection when dealing with instructions like G_PHI/G_LOAD/G_STORE which deduce their banks from their uses/users. Differential Revision: https://reviews.llvm.org/D108260	2021-08-18 13:32:19 -07:00
Jessica Paquette	45e1a6bd25	[AArch64][GlobalISel] Legalize scalar G_FMINNUM + G_FMAXNUM For subtargets with full FP16, this is legal for s16, s32, and s64. Without full FP16, it's legal for s32 and s64. For s128, this is a libcall. We also support some vector types, but for now, let's just support scalars. Differential Revision: https://reviews.llvm.org/D108259	2021-08-18 13:30:03 -07:00

1 2 3 4 5 ...

4922 Commits