clang-p2996

Author	SHA1	Message	Date
Matt Arsenault	b14e83d1a4	IR: Add llvm.exp10 intrinsic We currently have log, log2, log10, exp and exp2 intrinsics. Add exp10 to fix this asymmetry. AMDGPU already has most of the code for f32 exp10 expansion implemented alongside exp, so the current implementation is duplicating nearly identical effort between the compiler and library which is inconvenient. https://reviews.llvm.org/D157871	2023-09-01 19:45:03 -04:00
Nikita Popov	98cf20f890	Revert "[Verifier] Sanity check alloca size against DILocalVariable fragment size" This reverts commit `183f49c3e0`. The lang/cpp/trivial_abi/TestTrivialABI.py lldb test fails on buildbots.	2023-08-28 09:44:51 +02:00
Nikita Popov	183f49c3e0	[Verifier] Sanity check alloca size against DILocalVariable fragment size Add a check that the DILocalVariable fragment size in dbg.declare does not exceed the size of the alloca. This would have caught the invalid debuginfo regenerated by rustc in https://github.com/llvm/llvm-project/issues/64149. Differential Revision: https://reviews.llvm.org/D158743	2023-08-28 09:16:33 +02:00
Oliver Stannard	40614e1c14	[ARM] Save and restore CPSR around tMOVimm32 When resolving a frame index with a large offset for v6M execute-only, we emit a tMOVimm32 pseudo-instruction, which later gets lowered to a sequence of instructions, all of which are flag-setting. However, a frame index may be generated for a register spill or reload instruction, which can be inserted at a point where CPSR is live. This patch inserts MRS and MSR instructions around the tMOVimm32 to save and restore the value of CPSR, if CPSR is live at that point. This may need up to two virtual registers (one to build the immediate value, one to save CPSR) during frame index lowering, which happens after register allocation, so we need to ensure two spill slots are avilable to the register scavenger to ensure it can free up enough registers for this. There is no test for the emission (or not) of the MRS/MSR pair, because it requires a spill or reload to be inserted at a point where CPSR is live, which requires a large, complex function and is fragile enough that any optimisation changes will break the test. This bug was easily found by csmith with -verify-machineinstrs, which I now run regularly on v6M execute-only (and many other combinations). Patch by John Brawn and myself. Reviewed By: stuij Differential Revision: https://reviews.llvm.org/D158404	2023-08-24 14:15:02 +01:00
Nikita Popov	69bd66b3ce	[Tests] Remove some and/or constant expressions in tests (NFC) In preparation for their removal in D158081.	2023-08-21 12:05:32 +02:00
Keith Walker	2d9c6e699a	[Thumb1] Use callee-saved register to adjust stack pointer When adjusting the Stack Pointer at the end of the function epilogue, use a callee-saved register, rather than explicitly using R4 which may not have been saved. Differential Revision: https://reviews.llvm.org/D157500	2023-08-17 18:29:50 +01:00
Nicholas Guy	d65feccb12	[ARM] Set preferred function alignment Aligning functions yields small performance gains on embedded cores, moreso with numerous small function calls. Similar to aligning loops, if the function can fit within a single cache line then the performance overhead of fetching more instructions can be limited. Differential Revision: https://reviews.llvm.org/D157514	2023-08-16 17:31:21 +01:00
Matt Arsenault	c8cac15613	PreISelIntrinsicLowering: Check RuntimeLibcalls instead of TLI for memory functions We need a better mechanism for expressing which calls you are allowed to emit and which calls are recognized. This should be applied to the 17 branch.	2023-08-10 16:40:04 -04:00
John Brawn	f83ab2b3be	[ARM] Improve generation of thumb stack accesses Currently when a stack access is out of range of an sp-relative ldr or str then we jump straight to generating the offset with a literal pool load or mov32 pseudo-instruction. This patch improves that in two ways: * If the offset is within range of sp-relative add plus an ldr then use that. * When we use the mov32 pseudo-instruction, if putting part of the offset into the ldr will simplify the expansion of the mov32 then do so. Differential Revision: https://reviews.llvm.org/D156875	2023-08-07 17:53:32 +01:00
Francesco Petrogalli	cd921e0fd7	[MISched] Do not erase resource booking history for subunits. When dealing with the subunits of a resource group, we should reset the subunits availability at the first avaiable cycle of the resource that contains the subunits. Previously, the reset operation was returning cycle 0, effectively erasing the booking history of the subunits. Without this change, when using intervals for models have make use of subunits, the erasing of resource booking for subunits can raise the assertion "A resource is being overwritten" in `ResourceSegments::add`. The test added in the patch is one of such cases. Reviewed By: andreadb Differential Revision: https://reviews.llvm.org/D156530	2023-08-01 14:00:37 +02:00
John Brawn	8336d38be9	[ARM] Correctly handle combining segmented stacks with execute-only Using segmented stacks with execute-only mostly works, but we need to use the correct movi32 opcode in 6-M, and there's one place where for thumb1 (i.e. 6-M and 8-M.base) a constant pool was unconditionally used which needed to be fixed. Differential Revision: https://reviews.llvm.org/D156339	2023-07-28 10:37:40 +01:00
Fangrui Song	845d83d85f	[test] Add --show-all-symbols to some llvm-objdump -d commands llvm-objdump -d will be changed to not display mapping symbols by default (D156190). Add --show-all-symbols to make the intent clearer and prevent test adjustment with the new behavior.	2023-07-27 19:33:51 -07:00
Jay Foad	2dcf051259	[CodeGen] Store call frame size in MachineBasicBlock Record the call frame size on entry to each basic block. This is usually zero except when a basic block has been split in the middle of a call sequence. This simplifies PEI::replaceFrameIndices which previously had to visit basic blocks in a specific order and had special handling for unreachable blocks. More importantly it paves the way for an equally simple implementation of a backwards version of replaceFrameIndices, which is required to fully convert PrologEpilogInserter to backwards register scavenging, which is preferred because it does not rely on accurate kill flags. Differential Revision: https://reviews.llvm.org/D156113	2023-07-27 10:32:00 +01:00
Jay Foad	6c8f4472b4	[ARM] Extend regression test for D154281 Add a test case with a larger call frame which does not satisfy ARMFrameLowering::hasReservedCallFrame.	2023-07-21 15:48:45 +01:00
Momchil Velikov	4c95f79cce	[CodeGenPrepare] Refactor optimizeSelectInst (NFC) Refactor to use BasicBlockUtils functions and make life easier for a subsequent patch for updating the dominator tree. Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D154053	2023-07-19 18:56:44 +01:00
John Brawn	cee7e7b245	[ARM] Correctly handle execute-only in EmitStructByval Currently when compiling for an execute-only target without movt then EmitStructByval will generate a constant pool load which isn't compatible with execute-only. Handle this by emitting tMOVi32imm, and also simplify the existing movt handling by emitting t2MOVi32imm or MOVi32imm. Differential Revision: https://reviews.llvm.org/D154944	2023-07-19 13:56:36 +01:00
John Brawn	1b12b1a335	[ARM] Restructure MOVi32imm expansion to not do pointless instructions The expansion of the various MOVi32imm pseudo-instructions works by splitting the operand into components (either halfwords or bytes) and emitting instructions to combine those components into the final result. When the operand is an immediate with some components being zero this can result in pointless instructions that just add zero. Avoid this by restructuring things so that a separate function handles splitting the operand into components, then don't emit the component if it is a zero immediate. This is straightforward for movw/movt, where we just don't emit the movt if it's zero, but the thumb1 expansion using mov/add/lsl is more complex, as even when we don't emit a given byte we still need to get the shift correct. Differential Revision: https://reviews.llvm.org/D154943	2023-07-19 13:56:36 +01:00
Jay Foad	496766840f	[ARM] Add a regression test for D154281 This is a reduced version of one of the tests that was broken by the original commit of D154281 "[CodeGen] Store SP adjustment in MachineBasicBlock. NFCI.". Differential Revision: https://reviews.llvm.org/D155471	2023-07-19 10:32:21 +01:00
John Brawn	343e204a52	[ARM] Replace TransferImpOps with copyImplicitOps In most places where TransferImpOps is currently used we just have one machine instruction, so it's doing the same thing as copyImplicitOps anyway. In those cases where we have more than one machine instruction the destination is written to in each instruction so any implicit defs should appear on all of them (and we shouldn't see any implicit refs as these pseudo-instruction don't have any register inputs), meaning the current use of TransferImpOps is incorrect and we should be using copyImplicitOps on all of the generated instructions. Differential Revision: https://reviews.llvm.org/D155301	2023-07-18 14:01:04 +01:00
Maurice Heumann	a1cdb323e2	[ARM] Adjust strd/ldrd codegen alignment requirements In change https://reviews.llvm.org/D152790, it was discovered that the alignment requirement calculation for LDRD/STRD codegen was suboptimal and the calculation for volatile loads and stores was adjusted. This change here adopts the calculation for the remaining non-volatile occurances. Recommitting after undefined behavior fix in D155093. Differential Revision: https://reviews.llvm.org/D153800	2023-07-14 12:54:18 -07:00
Oliver Stannard	aea8db8eb9	Revert "[CodeGen] Store SP adjustment in MachineBasicBlock. NFCI." This reverts commit `58d1eaa3b6`.	2023-07-13 14:25:39 +01:00
Caslyn Tonelli	b11559122e	Revert "[ARM] Restructure MOVi32imm expansion to not do pointless instructions" This reverts commit `647aff2855`. Differential Revision: https://reviews.llvm.org/D155122	2023-07-12 23:29:15 +00:00
Jay Foad	58d1eaa3b6	[CodeGen] Store SP adjustment in MachineBasicBlock. NFCI. Record the SP adjustment on entry to each basic block. This is almost always zero except on targets like ARM which can split a basic block in the middle of a call sequence. This simplifies PEI::replaceFrameIndices which previously had to visit basic blocks in a specific order and had special handling for unreachable blocks. More importantly it paves the way for an equally simple implementation of a backwards version of replaceFrameIndices, which is required to fully convert PrologEpilogInserter to backwards register scavenging, which is preferred because it does not rely on accurate kill flags. Differential Revision: https://reviews.llvm.org/D154281	2023-07-12 14:29:26 +01:00
Nikita Popov	edb2fc6dab	[llvm] Remove explicit -opaque-pointers flag from tests (NFC) Opaque pointers mode is enabled by default, no need to explicitly enable it.	2023-07-12 14:35:55 +02:00
John Brawn	210f61cbdd	[ARM] Correctly handle execute-only in EmitStructByval Currently when compiling for an execute-only target without movt then EmitStructByval will generate a constant pool load which isn't compatible with execute-only. Handle this by emitting tMOVi32imm, and also simplify the existing movt handling by emitting t2MOVi32imm or MOVi32imm. Differential Revision: https://reviews.llvm.org/D154944	2023-07-12 11:48:01 +01:00
John Brawn	647aff2855	[ARM] Restructure MOVi32imm expansion to not do pointless instructions The expansion of the various MOVi32imm pseudo-instructions works by splitting the operand into components (either halfwords or bytes) and emitting instructions to combine those components into the final result. When the operand is an immediate with some components being zero this can result in pointless instructions that just add zero. Avoid this by restructuring things so that a separate function handles splitting the operand into components, then don't emit the component if it is a zero immediate. This is straightforward for movw/movt, where we just don't emit the movt if it's zero, but the thumb1 expansion using mov/add/lsl is more complex, as even when we don't emit a given byte we still need to get the shift correct. Differential Revision: https://reviews.llvm.org/D154943	2023-07-12 11:48:01 +01:00
Simon Wallis	82458ce69e	[ARM] mark tMOVi32imm as killing flags Mark the tMOVi32imm pseudo instr as killing the flags register. The pseudo instruction expands to a sequence of 7 movs/lsls/adds instructions, which are all Thumb-1 flag setting instructions. For a test case, take an existing arm test which checks for "Don't CSE a cmp across a call that clobbers CPSR." and retarget it at thumbv6m execute-only. Reviewed By: stuij Differential Revision: https://reviews.llvm.org/D154845 Change-Id: I8f8209fbc40a833f8875629937b9606c1e2c021d	2023-07-11 14:42:07 +01:00
Ties Stuij	f0ae3c23b5	[ARM] in LowerConstantFP, make sure we cover armv6-m execute-only Currently in LowerConstantFP, when we compile for execute-only (XO) we don't check what architecture we're compiling for (v6m=< or >v6m). We shouldn't get here for v6m, so put in an assert. Reviewed By: simonwallis2, dmgreen Differential Revision: https://reviews.llvm.org/D154506	2023-07-11 10:42:15 +01:00
Ties Stuij	d145abcfb3	[ARM] fix typo in large-stack.ll introduced when fixing another typo	2023-07-04 11:23:24 +01:00
Ties Stuij	61bcaae7ab	[ARM] fix typo in large-stack.ll test In llvm/test/CodeGen/ARM/large-stack.ll, the C in FileCheck wasn't uppercased. This wasn't spotted in development as MacOS's HFS+ fs is apparently often configured case-insensitive.	2023-07-04 11:18:25 +01:00
Ties Stuij	112d769e5e	[ARM] generate correct code for armv6-m XO big stack operations The ARM backend codebase is dotted with places where armv6-m will generate constant pools. Now that we can generate execute-only code for armv6-m, we need to make sure we use the movs/lsls/adds/lsls/adds/lsls/adds pattern instead of these. Big stacks is one of the obvious places. In this patch we take care of two sites: 1. take care of big stacks in prologue/epilogue 2. take care of save/tSTRspi nodes, which implicitly fixes emitThumbRegPlusImmInReg which is used in several frame lowering fns Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D154233	2023-07-04 10:40:06 +01:00
David Spickett	ab3bb86d44	Revert "[ARM] Adjust strd/ldrd codegen alignment requirements" This reverts commit `92a9c30c61`. This has caused a test failure in the 2nd stage of Linaro's Arm 32 bit buildbots. LLVM::simplified-template-names.s 7: error: Simplified template DW_AT_name could not be reconstituted: check:10'0 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 8: original: f3<unsigned char, (unsigned char)'\x00'> check:10'0 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 9: reconstituted: f3<unsigned char, (unsigned char)'\x7f'> check:10'0 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ I suspect a load/store is slightly off.	2023-07-03 14:05:49 +00:00
Maurice Heumann	92a9c30c61	[ARM] Adjust strd/ldrd codegen alignment requirements In change https://reviews.llvm.org/D152790, it was discovered that the alignment requirement calculation for LDRD/STRD codegen was suboptimal and the calculation for volatile loads and stores was adjusted. This change here adopts the calculation for the remaining non-volatile occurances. Differential Revision: https://reviews.llvm.org/D153800	2023-07-02 14:25:25 -07:00
Fangrui Song	afd20587f9	MachineFunction: -fsanitize={function,kcfi}: ensure 4-byte alignment Fix https://github.com/llvm/llvm-project/issues/63579 ``` % cat a.c void foo() {} % clang --target=arm-none-eabi -mthumb -mno-unaligned-access -fsanitize=kcfi a.c -S -o - \| grep p2align .p2align 1 % clang --target=armv6m-none-eabi -fsanitize=function a.c -S -o - \| grep p2align .p2align 1 ``` Ensure that -fsanitize={function,kcfi} instrumented functions are aligned by at least 4, so that loading the type hash before the function label will not cause a misaligned access. This is especially important for -mno-unaligned-access configurations that don't set `setMinFunctionAlignment` to 4 or greater. With this patch, the generated assembly for the examples above will contain `.p2align 2` before the type hash. If `__attribute__((aligned(N)))` or `-falign-functions=N` is specified, the larger alignment will be used. Reviewed By: simon_tatham, samitolvanen Differential Revision: https://reviews.llvm.org/D154125	2023-06-30 09:13:19 -07:00
Matt Arsenault	160d7227e0	DAG: Fix libcall expansion for frexp on ARM The ExpandLibcallResult result was a bitcast and not the direct call result, so we couldn't find the chain. Use the new separate chain return value instead.	2023-06-30 09:03:45 -04:00
Luke Lau	742fb8b5c7	[DAGCombine] Fold (store (insert_elt (load p)) x p) -> (store x) If we have a store of a load with no other uses in between it, it's considered dead and is removed. So sometimes when legalizing a fixed length vector store of an insert, we end up producing better code through scalarization than without. An example is the follow below: %a = load <4 x i64>, ptr %x %b = insertelement <4 x i64> %a, i64 %y, i32 2 store <4 x i64> %b, ptr %x If this is scalarized, then DAGCombine successfully removes 3 of the 4 stores which are considered dead, and on RISC-V we get: sd a1, 16(a0) However if we make the vector type legal (-mattr=+v), then we lose the optimisation because we don't scalarize it. This patch attempts to recover the optimisation for vectors by identifying patterns where we store a load with a single insert inbetween, replacing it with a scalar store of the inserted element. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D152276	2023-06-28 22:45:04 +01:00
John Brawn	4fb0e0114f	[ARM] Generate out-of-line jump tables for XO without 32-bit branch When we only have a 16-bit pc-relative branch instruction we generate a table of address for a jump table. Currently this is placed inline, but this won't work with execute-only memory. In this case generate the jump table out-of-line. Differential Revision: https://reviews.llvm.org/D153774	2023-06-28 13:30:39 +01:00
Ties Stuij	4f19c6a7c7	[ARM] allow long-call codegen for armv6-M eXecute Only (XO) Recently eXecute Only (XO) codegen was also allowed for armv6-M. Previously this was only implemented for ~armv7+, effectively if MOVW/MOVT is available. Regarding long calls, we remove the check for MOVW/MOVT when generating code for XO, which already was redundant as in the subtarget initialization we already check if XO is valid for the target. And targets that generate valid XO code should be able to handle the (wrapper globaladdress) node. Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D153782	2023-06-28 10:50:24 +01:00
Ties Stuij	03db28edbb	[ARM] in ExpandTMOV32BitImm, CPSR register ops should be `Define`d The CPSR registers ops of the instructions constructed in ExpandTMOV32BitImm were marked as kill, instead of define. Best to use the pre-existing t1CondCodeOp fn to construct CPSRs. Reviewed By: simonwallis2 Differential Revision: https://reviews.llvm.org/D153763	2023-06-27 14:58:22 +01:00
Matthias Braun	02ba5b8c6b	Ignore load/store until stack address computation No longer conservatively assume a load/store accesses the stack when we can prove that we did not compute any stack-relative address up to this point in the program. We do this in a cheap not-quite-a-dataflow-analysis: Assume `NoStackAddressUsed` when all predecessors of a block already guarantee it. Process blocks in reverse post order to guarantee that except for loop headers we have processed all predecessors of a block before processing the block itself. For loops we accept the conservative answer as they are unlikely to be shrink-wrappable anyway. Differential Revision: https://reviews.llvm.org/D152213	2023-06-26 13:50:36 -07:00
Matthias Braun	759b217626	Switch tests to use update_llc_test_checks Switch and update some tests to use `update_llc_test_checks` to reduce clutter in upcoming change. Differential Revision: https://reviews.llvm.org/D152215	2023-06-26 13:50:36 -07:00
Maurice Heumann	249bd9eab0	[ARM] Fix codegen of unaligned volatile load/store of i64 Volatile loads/stores of i64 are lowered to LDRD/STRD on ARMv5TE. However, these instructions require the addresses to be aligned. Unaligned loads/stores therefore should be ignored by this handling. Differential Revision: https://reviews.llvm.org/D152790	2023-06-26 10:45:41 -07:00
Eli Friedman	bc7f11ccb0	[SelectionDAG] Improve expansion of wide min/max The current implementation tries to handle the high and low halves separately, but that's less efficient in most cases; use a wide SETCC instead. Differential Revision: https://reviews.llvm.org/D151358	2023-06-26 10:45:41 -07:00
Amaury Séchet	8412a17b79	[NFC] Autogenerate CodeGen/ARM/2013-07-29-vector-or-combine.ll	2023-06-25 01:05:21 +00:00
Amaury Séchet	7457acb842	[NFC] Autogenerate CodeGen/ARM/2011-03-15-LdStMultipleBug.ll	2023-06-25 01:02:49 +00:00
Amaury Séchet	e271a539c5	[NFC] Autogenerate CodeGen/ARM/and-sext-combine.ll	2023-06-25 00:55:03 +00:00
Amaury Séchet	78c1985f99	[NFC] Autogenerate CodeGen/ARM/machine-cse-cmp.ll	2023-06-25 00:44:30 +00:00
Amaury Séchet	2e8111d4c4	[NFC] Autogenerate CodeGen/ARM/pr35103.ll	2023-06-25 00:29:14 +00:00
Ties Stuij	5ddd561cb5	disable execute-only tests which are failing with expensive checks Temporarily disabling the execute-only tests. We recently added codegen for armv6-m, which is still in heavy development (D152795). Disabling the tests while we're figuring out what's going on is probably the least disruptive option, as a patch dependent on it also already landed.	2023-06-23 16:35:24 +01:00
Ties Stuij	2273741ea2	[ARM] generate armv6m eXecute Only (XO) code [ARM] generate armv6m eXecute Only (XO) code for immediates, globals Previously eXecute Only (XO) support was implemented for targets that support MOVW/MOVT (~armv7+). See: https://reviews.llvm.org/D27449 XO prevents the compiler from generating data accesses to code sections. This patch implements XO codegen for armv6-M, which does not support MOVW/MOVT, and must resort to the following general pattern to avoid loads: movs r3, :upper8_15:foo lsls r3, #8 adds r3, :upper0_7:foo lsls r3, #8 adds r3, :lower8_15:foo lsls r3, #8 adds r3, :lower0_7:foo ldr r3, [r3] This is equivalent to the code pattern generated by GCC. The above relocations are new to LLVM and have been implemented in a parent patch: https://reviews.llvm.org/D149443. This patch limits itself to implementing codegen for this pattern and enabling XO for armv6-M in the backend. Separate patches will follow for: - switch tables - replacing specific loads from constant islands which are spread out over the ARM backend codebase. Amongst others: FastISel, call lowering, stack frames. Reviewed By: john.brawn Differential Revision: https://reviews.llvm.org/D152795	2023-06-23 10:50:47 +01:00

1 2 3 4 5 ...

4790 Commits