clang-p2996

Author	SHA1	Message	Date
Nick Desaulniers	68945a8686	[Thumb2] support `movs pc, lr` alias for `subs pc, lr, #0`/`eret` This is used by the Linux kernel built with CONFIG_THUMB2_KERNEL. Because different operands are not permitted to `movs`, the diagnostics now provide multiple suggestions along the lines of using a non-pc destination operand or lr source operand. Forked from D95586. Signed-off-by: Nick Desaulniers <ndesaulniers@google.com> Reviewed By: DavidSpickett Differential Revision: https://reviews.llvm.org/D96304	2021-02-10 11:00:42 -08:00
Matt Arsenault	b72a23650f	GlobalISel: Fix using wrong calling convention for callees This was taking the calling convention from the parent function, instead of the callee. Avoids regressions in a future patch when the caller and callee have different type breakdowns. For some reason AArch64's lowerFormalArguments seems to intentionally ignore the parent isVarArg.	2021-02-09 13:48:56 -05:00
Jinsong Ji	9202806241	Revert "[CostModel] Remove VF from IntrinsicCostAttributes" This reverts commit `502a67dd7f`. This expose a failure in test-suite build on PowerPC, revert to unblock buildbot first, Dave will re-commit in https://reviews.llvm.org/D96287. Thanks Dave.	2021-02-09 02:14:14 +00:00
David Green	0c7e044a7f	[ARM] One-off identity shuffle A One-Off Identity mask is a shuffle that is mostly an identity mask from as single source but contains a single element out-of-place, either from a different vector or from another position in the same vector. As opposed to lowering this via a ARMISD::BUILD_VECTOR we can generate an extract/insert pair directly. Under ARM with individually accessible lane elements this often becomes a simple lane move. This also alters the LowerVECTOR_SHUFFLEUsingMovs code to use v4f32 (not v4i32), a more natural type for lane moves. Differential Revision: https://reviews.llvm.org/D95551	2021-02-08 21:24:32 +00:00
David Green	11e415dc90	[ARM] Make v2f64 scalar_to_vector legal Because we mark all operations as expand for v2f64, scalar_to_vector would end up lowering through a stack store/reload. But it is pretty simple to implement, only inserting a D reg into an undef vector. This helps clear up some inefficient codegen from soft calling conventions. Differential Revision: https://reviews.llvm.org/D96153	2021-02-08 11:34:55 +00:00
David Green	1b435eb8f3	[ARM] i16 insert-of-extract to VINS pattern This adds another tablegen fold that converts an i16 odd-lane-insert of an even-lane-extract into a VINS. We extract the existing f32 value from the destination register and VINS the new value into it. The rest of the backend then is able to optimize the INSERT_SUBREG / COPY_TO_REGCLASS / EXTRACT_SUBREG. Differential Revision: https://reviews.llvm.org/D95456	2021-02-08 08:41:07 +00:00
David Green	502a67dd7f	[CostModel] Remove VF from IntrinsicCostAttributes getIntrinsicInstrCost takes a IntrinsicCostAttributes holding various parameters of the intrinsic being costed. It can either be called with a scalar intrinsic (RetTy==Scalar, VF==1), with a vector instruction (RetTy==Vector, VF==1) or from the vectorizer with a scalar type and vector width (RetTy==Scalar, VF>1). A RetTy==Vector, VF>1 is considered an error. Both of the vector modes are expected to be treated the same, but because this is confusing many backends end up getting it wrong. Instead of trying work with those two values separately this removes the VF parameter, widening the RetTy/ArgTys by VF used called from the vectorizer. This keeps things simpler, but does require some other modifications to keep things consistent. Most backends look like this will be an improvement (or were not using getIntrinsicInstrCost). AMDGPU needed the most changes to keep the code from `c230965ccf` working. ARM removed the fix in `dfac521da1`, webassembly happens to get a fixup for an SLP cost issue and both X86 and AArch64 seem to now be using better costs from the vectorizer. Differential Revision: https://reviews.llvm.org/D95291	2021-02-05 09:34:24 +00:00
Fangrui Song	e5269da979	[ARM][WebAssembly] Fix incorrect MCOperand::createDFPImm after D96091	2021-02-04 20:39:52 -08:00
Craig Topper	11ef356d9e	[TargetLowering] Use Align in allowsMisalignedMemoryAccesses. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D96097	2021-02-04 19:22:06 -08:00
Dan Gohman	698c6b0a09	[WebAssembly] Support single-floating-point immediate value As mentioned in TODO comment, casting double to float causes NaNs to change bits. To avoid the change, this patch adds support for single-floating-point immediate value on MachineCode. Patch by Yuta Saito. Differential Revision: https://reviews.llvm.org/D77384	2021-02-04 18:05:06 -08:00
Ayke van Laethem	aecdf15cc7	[ARM] Do not emit ldrexd/strexd on Cortex-M chips The ldrexd/strexd instructions are not supported on M-class chips, see for example https://developer.arm.com/documentation/dui0489/e/arm-and-thumb-instructions/memory-access-instructions/ldrex-and-strex which says: > All these 32-bit Thumb instructions are available in ARMv6T2 and > above, except that LDREXD and STREXD are not available in the ARMv7-M > architecture. Looking at the ARMv8-M architecture, it appears that these instructions aren't supported either. The Architecture Reference Manual lists ldrex/strex but not ldrexd/strexd: https://developer.arm.com/documentation/ddi0553/bn/ Godbolt example on LLVM 11.0.0, which incorrectly emits ldrexd/strexd instructions: https://llvm.godbolt.org/z/5qqPnE Differential Revision: https://reviews.llvm.org/D95891	2021-02-04 21:55:34 +01:00
David Green	649a3d00df	[ARM] Handle f16 in GeneratePerfectShuffle This new f16 shuffle under Neon would hit an assert in GeneratePerfectShuffle as it would try to treat a f16 vector as an i8. Add f16 handling, treating them like an i16. Differential Revision: https://reviews.llvm.org/D95446	2021-02-04 11:14:52 +00:00
David Green	3e780616c4	[ARM] Correct some tablegen operand types. NFC	2021-02-02 16:55:31 +00:00
David Green	2753722b0f	[ARM] Mark MVE_VMOV_to_lane_32 as isInsertSubregLike This allows the peephole optimizer to know that a MVE_VMOV_to_lane_32 is the same as an insert subreg, allowing it to optimize some redundant lane moves. Differential Revision: https://reviews.llvm.org/D95433	2021-02-02 16:35:47 +00:00
David Green	3a5adf8483	[ARM] Add MVE insert-of-extract pattern A v4i32 insert of an extract can become a simple lane move, as opposed to round-tripping via a GPR. This adds a patterns that turns an v4i32 insert-extract pair into a EXTRACT_SUBREG/INSERT_SUBREG, with the required COPY_TO_REGCLASS. These get better optimized into a simple lane move by the rest of the backend. Differential Revision: https://reviews.llvm.org/D95428	2021-02-02 15:15:04 +00:00
David Green	c722575633	[ARM] Select VINS from vector inserts This patch adds tablegen patterns for pairs of i16/f16 insert/extracts. If we are inserting into two adjacent vector lanes (0 and 1 for example), we can use either a vmov;vins or vmovx;vins to insert the pair together, avoiding a round-trip from GRP registers. This is quite a large patterns with a number of EXTRACT_SUBREG/INSERT_SUBREG/ COPY_TO_REGCLASS nodes, but hopefully as most of those become copies all that will be cleaned up by further optimizations. The VINS pattern was also adjusted to allow it to represent that it is inserting into the top half of an existing register. Differential Revision: https://reviews.llvm.org/D95381	2021-02-02 13:50:02 +00:00
David Green	48230355e9	[ARM] Remove DLS lr, lr A DLS lr, lr instruction only moves lr to itself. It need not be emitted on it's own to save a instruction in the loop preheader. Differential Revision: https://reviews.llvm.org/D78916	2021-02-02 11:09:31 +00:00
David Green	5b2626ea87	[ARM] Flatten identity shuffles through vqdmulh nodes Given a shuffle(vqdmulh(shuffle, shuffle), we can flatter the shuffles out if they become an identity mask. This can come up during lane interleaving, when we do that better. Differential Revision: https://reviews.llvm.org/D94034	2021-02-01 19:14:20 +00:00
David Green	5805521207	[ARM] Simplify VMOVRRD from extracts of buildvectors Under the softfp calling convention, we are often left with VMOVRRD(extract(bitcast(build_vector(a, b, c, d)))) for the return value of the function. These can be simplified to a,b or c,d directly, depending on the value of the extract. Big endian is a little different because the bitcast switches the lanes around, meaning we end up with b,a or d,c. Differential Revision: https://reviews.llvm.org/D94989	2021-02-01 16:09:25 +00:00
David Green	ad12e6ee95	[ARM] Turn sext_inreg(VGetLaneu) into VGetLaneu This adds a DAG combine for converting sext_inreg of VGetLaneu into VGetLanes, providing the types match correctly. Differential Revision: https://reviews.llvm.org/D95073	2021-02-01 11:10:35 +00:00
David Green	6ab792b68d	[ARM] Simplify extract of VMOVDRR Under SoftFP calling conventions, we can be left with extract(bitcast(BUILD_VECTOR(VMOVDRR(a, b), ..))) patterns that can simplify to a or b, depending on the extract lane. Differential Revision: https://reviews.llvm.org/D94990	2021-02-01 10:24:57 +00:00
Kazu Hirata	7925aa091d	[llvm] Populate SmallVector at construction time (NFC)	2021-01-28 22:21:14 -08:00
Christudasan Devadasan	892e4567e1	Support a list of CostPerUse values This patch allows targets to define multiple cost values for each register so that the cost model can be more flexible and better used during the register allocation as per the target requirements. For AMDGPU the VGPR allocation will be more efficient if the register cost can be associated dynamically based on the calling convention. Reviewed By: qcolombet Differential Revision: https://reviews.llvm.org/D86836	2021-01-29 10:14:52 +05:30
David Green	40f46cb0e4	[ARM] Add alignment checks for MVE VLDn The MVE VLD2/4 and VST2/4 instructions require the pointer to be aligned to at least the size of the element type. This adds a check for that into the ARM lowerInterleavedStore and lowerInterleavedLoad functions, not creating the intrinsics if they are invalid for the alignment of the load/store. Unfortunately this is one of those bug fixes that does effect some useful codegen, as we were able to sometimes do some nice lowering of q15 types. But they can cause problem with low aligned pointers. Differential Revision: https://reviews.llvm.org/D95319	2021-01-28 13:10:08 +00:00
Kazu Hirata	f890fd5f91	[llvm] Use llvm::is_sorted (NFC)	2021-01-27 23:25:39 -08:00
David Green	9e2768a3d9	[ARM] Add neon FP16 scalar_to_vector patterns. This adds some simple fp16 scalar_to_vector patterns, preventing a selection failure if this came up. Differential Revision: https://reviews.llvm.org/D95427	2021-01-27 09:59:15 +00:00
Zhuojia Shen	8cef45517e	[ARM] Fix STRT/STRHT/STRBT input/output operands. STRT, STRHT, and STRBT are store instructions and their source register $Rt should be treated as an input operand instead of an output operand. This should fix things (e.g., liveness tracking in LivePhysRegs) if these instructions were used in CodeGen. Differential Revision: https://reviews.llvm.org/D95074	2021-01-26 14:00:58 -08:00
Adhemerval Zanella	dad55c2218	[ARM] [ELF] Fix ARMMaterializeGV for Indirect calls Recent shouldAssumeDSOLocal changes (introduced by `961f31d8ad`) do not take in consideration the relocation model anymore. The ARM fast-isel pass uses the function return to set whether a global symbol is loaded indirectly or not, and without the expected information llvm now generates an extra load for following code: ``` $ cat test.ll @__asan_option_detect_stack_use_after_return = external global i32 define dso_local i32 @main(i32 %argc, i8** %argv) #0 { entry: %0 = load i32, i32* @__asan_option_detect_stack_use_after_return, align 4 %1 = icmp ne i32 %0, 0 br i1 %1, label %2, label %3 2: ret i32 0 3: ret i32 1 } attributes #0 = { noinline optnone } $ lcc test.ll -o - [...] main: .fnstart [...] movw r0, :lower16:__asan_option_detect_stack_use_after_return movt r0, :upper16:__asan_option_detect_stack_use_after_return ldr r0, [r0] ldr r0, [r0] cmp r0, #0 [...] ``` And without 'optnone' it produces: ``` [...] main: .fnstart [...] movw r0, :lower16:__asan_option_detect_stack_use_after_return movt r0, :upper16:__asan_option_detect_stack_use_after_return ldr r0, [r0] clz r0, r0 lsr r0, r0, #5 bx lr [...] ``` This triggered a lot of invalid memory access in sanitizers for arm-linux-gnueabihf. I checked this patch both a stage1 built with gcc and a stage2 bootstrap and it fixes all the Linux sanitizers issues. Reviewed By: MaskRay Differential Revision: https://reviews.llvm.org/D95379	2021-01-26 15:57:55 -03:00
Kazu Hirata	16baad8f4e	[llvm] Use pop_back_val (NFC)	2021-01-24 12:18:57 -08:00
Kazu Hirata	054444177b	[Target] Use llvm::append_range (NFC)	2021-01-24 12:18:56 -08:00
Kazu Hirata	e4847a7fcf	Revert "[Target] Use llvm::append_range (NFC)" This reverts commit `cc7a238286`. The X86WinEHState.cpp hunk seems to break certain builds.	2021-01-23 11:25:27 -08:00
Kazu Hirata	cc7a238286	[Target] Use llvm::append_range (NFC)	2021-01-23 10:56:31 -08:00
Stanislav Mekhanoshin	607bec0bb9	Change materializeFrameBaseRegister() to return register The only caller of this function is in the LocalStackSlotAllocation and it creates base register of class returned by the target's getPointerRegClass(). AMDGPU wants to use a different reg class here so let materializeFrameBaseRegister to just create and return whatever it wants. Differential Revision: https://reviews.llvm.org/D95268	2021-01-22 15:51:06 -08:00
David Green	af03324984	[ARM] Disable sign extended SSAT pattern recognition. I may have given bad advice, and skipping sext_inreg when matching SSAT patterns is not valid on it's own. It at least needs to sext_inreg the input again, but as far as I can tell is still only valid based on demanded bits. For the moment disable that part of the combine, hopefully reimplementing it in the future more correctly.	2021-01-22 14:07:48 +00:00
David Green	9ae73cdbc1	[ARM] Adjust isSaturatingConditional to return a new SDValue. NFC This replaces the isSaturatingConditional function with LowerSaturatingConditional that directly returns a new SSAT or USAT SDValue, instead of returning true and the components of it.	2021-01-22 11:11:36 +00:00
David Green	39db5753f9	[LV][ARM] Inloop reduction cost modelling This adds cost modelling for the inloop vectorization added in `745bf6cf44`. Up until now they have been modelled as the original underlying instruction, usually an add. This happens to works OK for MVE with instructions that are reducing into the same type as they are working on. But MVE's instructions can perform the equivalent of an extended MLA as a single instruction: %sa = sext <16 x i8> A to <16 x i32> %sb = sext <16 x i8> B to <16 x i32> %m = mul <16 x i32> %sa, %sb %r = vecreduce.add(%m) -> R = VMLADAV A, B There are other instructions for performing add reductions of v4i32/v8i16/v16i8 into i32 (VADDV), for doing the same with v4i32->i64 (VADDLV) and for performing a v4i32/v8i16 MLA into an i64 (VMLALDAV). The i64 are particularly interesting as there are no native i64 add/mul instructions, leading to the i64 add and mul naturally getting very high costs. Also worth mentioning, under NEON there is the concept of a sdot/udot instruction which performs a partial reduction from a v16i8 to a v4i32. They extend and mul/sum the first four elements from the inputs into the first element of the output, repeating for each of the four output lanes. They could possibly be represented in the same way as above in llvm, so long as a vecreduce.add could perform a partial reduction. The vectorizer would then produce a combination of in and outer loop reductions to efficiently use the sdot and udot instructions. Although this patch does not do that yet, it does suggest that separating the input reduction type from the produced result type is a useful concept to model. It also shows that a MLA reduction as a single instruction is fairly common. This patch attempt to improve the costmodelling of in-loop reductions by: - Adding some pattern matching in the loop vectorizer cost model to match extended reduction patterns that are optionally extended and/or MLA patterns. This marks the cost of the reduction instruction correctly and the sext/zext/mul leading up to it as free, which is otherwise difficult to tell and may get a very high cost. (In the long run this can hopefully be replaced by vplan producing a single node and costing it correctly, but that is not yet something that vplan can do). - getExtendedAddReductionCost is added to query the cost of these extended reduction patterns. - Expanded the ARM costs to account for these expanded sizes, which is a fairly simple change in itself. - Some minor alterations to allow inloop reduction larger than the highest vector width and i64 MVE reductions. - An extra InLoopReductionImmediateChains map was added to the vectorizer for it to efficiently detect which instructions are reductions in the cost model. - The tests have some updates to show what I believe is optimal vectorization and where we are now. Put together this can greatly improve performance for reduction loop under MVE. Differential Revision: https://reviews.llvm.org/D93476	2021-01-21 21:03:41 +00:00
David Green	dfac521da1	[ARM] Fix vector saddsat costs. It turns out the vectorizer calls the getIntrinsicInstrCost functions with a scalar return type and vector VF. This updates the costmodel to handle that, still producing the correct vector costs. A vectorizer test is added to show it vectorizing at the correct factor again.	2021-01-21 15:30:39 +00:00
David Green	6a563eef13	[ARM] Expand vXi1 VSELECT's We have no lowering for VSELECT vXi1, vXi1, vXi1, so mark them as expanded to turn them into a series of logical operations. Differential Revision: https://reviews.llvm.org/D94946	2021-01-19 17:56:50 +00:00
David Green	f373b30923	[ARM] Add MVE add.sat costs This adds some basic MVE sadd_sat/ssub_sat/uadd_sat/usub_sat costs, based on when the instruction is legal. With smaller than legal types that are promoted we generate shr(qadd(shl, shl)), so the cost is 4 appropriately. Differential Revision: https://reviews.llvm.org/D94958	2021-01-19 15:38:46 +00:00
Yvan Roux	244ad228f3	[ARM][MachineOutliner] Add stack fixup feature This patch handles cases where we have to save/restore the link register into the stack and and load/store instruction which use the stack are part of the outlined region. It checks that there will be no overflow introduced by the new offset and fixup these instructions accordingly. Differential Revision: https://reviews.llvm.org/D92934	2021-01-19 10:59:09 +01:00
David Green	e7dc083a41	[ARM] Don't handle low overhead branches in AnalyzeBranch It turns our that the BranchFolder and IfCvt does not like unanalyzable branches that fall-through. This means that removing the unconditional branches from the end of tail predicated instruction can run into asserts and verifier issues. This effectively reverts `372eb2bbb6`, but adds handling to t2DoLoopEndDec which are not branches, so can be safely skipped.	2021-01-18 17:16:07 +00:00
David Green	1454724215	[ARM] Align blocks that are not fallthough targets If the previous block in a function does not fallthough, adding nop's to align it will never be executed. This means we can freely (except for codesize) align more branches. This happens in constantislandspass (as it cannot happen later) and only happens at aggressive optimization levels as it does increase codesize. Differential Revision: https://reviews.llvm.org/D94394	2021-01-16 22:19:35 +00:00
David Green	372eb2bbb6	[ARM] Add low overhead loops terminators to AnalyzeBranch This treats low overhead loop branches the same as jump tables and indirect branches in analyzeBranch - they cannot be analyzed but the direct branches on the end of the block may be removed. This helps remove the unnecessary branches earlier, which can help produce better codegen (and change block layout in a number of cases). Differential Revision: https://reviews.llvm.org/D94392	2021-01-16 18:30:21 +00:00
Kazu Hirata	2082b10d10	[llvm] Use *::empty (NFC)	2021-01-16 09:40:55 -08:00
David Green	f5abf0bd48	[ARM] Tail predication with constant loop bounds The TripCount for a predicated vector loop body will be ceil(ElementCount/Width). This alters the conversion of an active.lane.mask to a VCPT intrinsics to match. Differential Revision: https://reviews.llvm.org/D94608	2021-01-15 18:17:31 +00:00
Sam Tebbs	1a497ae9b8	[ARM][Block placement] Check the predecessor exists before processing it Not all machine loops will have a predecessor. so the pass needs to check it before continuing. Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D94780	2021-01-15 15:45:13 +00:00
Sam Tebbs	5e4480b6c0	[ARM] Don't run the block placement pass at O0 The block placement pass shouldn't run unless optimisations are enabled. Differential Revision: https://reviews.llvm.org/D94691	2021-01-15 13:59:29 +00:00
Oliver Stannard	3676ef1053	[ARM][GISel] Treat calls as variadic even if only fixed arguments provided For the ARM hard-float calling convention, calls to variadic functions need to be treated diffrently, even if only the fixed arguments are provided. This fixes GCC-C-execute-pr68390 in the test-suite, which is failing on the ARM GlobaISel bot.	2021-01-15 09:37:01 +00:00
Kazu Hirata	7dc3575ef2	[llvm] Remove redundant return and continue statements (NFC) Identified with readability-redundant-control-flow.	2021-01-14 20:30:34 -08:00
Sam Tebbs	60fda8ebb6	[ARM] Add a pass that re-arranges blocks when there is a backwards WLS branch Blocks can be laid out such that a t2WhileLoopStart branches backwards. This is forbidden by the architecture and so it fails to be converted into a low-overhead loop. This new pass checks for these cases and moves the target block, fixing any fall-through that would then be broken. Differential Revision: https://reviews.llvm.org/D92385	2021-01-13 17:23:00 +00:00

1 2 3 4 5 ...

11267 Commits