clang-p2996

Author	SHA1	Message	Date
Serge Pavlov	2f81788067	[ARM][FPEnv] Lowering of fpmode intrinsics (#74054 ) LLVM intrinsics `get_fpmode`, `set_fpmode` and `reset_fpmode` operate control modes, the bits of FP environment that affect FP operations. On ARM these bits are in FPSCR together with the status bits. The implementation of these intrinsics produces code close to that of functions `fegetmode` and `fesetmode` from GLIBC. Pull request: https://github.com/llvm/llvm-project/pull/74054	2023-12-18 18:57:36 +07:00
ostannard	4888218d03	[ARM] Do not emit unwind tables when saving LR around outlined call (#69611 ) In some cases, the machine outliner needs to preserve LR across an outlined call by pushing it onto the stack. Previously, this also generated unwind table instructions, which is incorrect because EHABI unwind tables cannot represent different stack frames a different points in the function, so the extra unwind info applied to the entire function. The outliner code already avoided generating CFI instructions, but EHABI unwind data is generated later from the actual instructions, so we need to avoid using the FrameSetup and FrameDestroy flags to prevent unwind data being generated.	2023-12-14 14:46:13 +00:00
Shih-Po Hung	b97c5a9554	[VPlan] Add a test for testing unused interleave recipes (#75026 ) - Precommit of tests from #71360. - Replace `undef` pointer operands and add stores to avoid the loads being optmized away.	2023-12-14 21:16:11 +08:00
Simon Pilgrim	b7fc78255e	Revert rG2047ab00eaf0a17e71ce5e8a5b27a8c90f034c3d "[VPlan] Add a test for testing unused interleave recipes (#75026 )" vplan-unused-interleave-group.ll is causing buildbot failures	2023-12-14 10:25:41 +00:00
Shih-Po Hung	2047ab00ea	[VPlan] Add a test for testing unused interleave recipes (#75026 ) - Precommit of tests from #71360. - Replace `undef` pointer operands and add stores to avoid the loads being optmized away.	2023-12-14 17:36:58 +08:00
paperchalice	b0cc42ae0f	[CodeGen] Port `SjLjEHPrepare` to new pass manager (#75023 ) `doInitialization` in `SjLjEHPrepare` is trivial. This is the last pass suffix with `ehprepare`.	2023-12-12 16:07:26 +08:00
Simon Pilgrim	faecc736e2	[DAG] isSplatValue - node is a splat if all demanded elts have the same whole constant value (#74443 )	2023-12-08 10:53:51 +00:00
Simon Pilgrim	22df0886a1	[DAG] Don't split f64 constant stores if the fp imm is legal (#74622 ) If the target can generate a specific fp immediate constant, then don't split the store into 2 x i32 stores Another cleanup step for #74304	2023-12-07 10:33:03 +00:00
Simon Pilgrim	609d980b3f	[ARM] Regenerate aapcs-hfa-code.ll	2023-12-06 12:09:30 +00:00
Nikita Popov	eecb99c5f6	[Tests] Add disjoint flag to some tests (NFC) These tests rely on SCEV looking recognizing an "or" with no common bits as an "add". Add the disjoint flag to relevant or instructions in preparation for switching SCEV to use the flag instead of the ValueTracking query. The IR with disjoint flag matches what InstCombine would produce.	2023-12-05 14:09:36 +01:00
simpal01	74cdb8e6f8	[llvm][ARM] Emit MVE .arch_extension after .fpu directive if it does not include MVE features (#71545 ) The floating-point and MVE features together specify the MVE functionality that is supported on the Cortex-M85 processor. But the FPU extension for the underlying architecture(armv8.1-m.main) is FPV5 which does not include MVE-F. So Compiler's -S output and `-save-temps=obj` loses MVE feature which leads to assembler error. What happening here is .fpu directive overrides any previously set features by .cpu directive. Since the the corresponding .fpu generated (.fpu fpv5-d16) does not include MVE-F, it overrides those features even though it is supported and set by the .cpu directive. Looks like .fpu is supposed to do this. In this case, there should be an .arch_extension directive re-enabling the relevant extensions after .fpu if the goal is to keep these extensions enabled. GCC also does the same. So this patch enables the MVE features by emitting the below arch extension: .fpu fpv5-d16 .arch_extension mve.fp --------- Co-authored-by: Simi Pallipurath <simi.pallipurath.com>	2023-11-22 09:16:58 +00:00
Serge Pavlov	a2e1de1934	[ARM][FPEnv] Lowering of fpenv intrinsics The change implements lowering of `get_fpenv`, `set_fpenv` and `reset_fpenv`. Differential Revision: https://reviews.llvm.org/D81843	2023-11-20 15:08:25 +07:00
Tavian Barnes	75cf672b12	[SDAG] Simplify is-power-of-2 codegen (#72275 ) When x is not known to be nonzero, ctpop(x) == 1 is expanded to x != 0 && (x & (x - 1)) == 0 resulting in codegen like leal -1(%rdi), %eax testl %eax, %edi sete %cl testl %edi, %edi setne %al andb %cl, %al But another expression that works is (x ^ (x - 1)) > x - 1 which has nicer codegen: leal -1(%rdi), %eax xorl %eax, %edi cmpl %eax, %edi seta %al	2023-11-15 22:26:34 +09:00
Serge Pavlov	5b0f703918	Revert "[ARM][FPEnv] Lowering of fpenv intrinsics" This reverts commit `d62f040418`. Some cuda buildbots start failing.	2023-11-10 16:24:51 +07:00
Serge Pavlov	d62f040418	[ARM][FPEnv] Lowering of fpenv intrinsics The change implements lowering of `get_fpenv`, `set_fpenv` and `reset_fpenv`. Differential Revision: https://reviews.llvm.org/D81843	2023-11-10 16:06:33 +07:00
Nikita Popov	e4a4122eb6	[IR] Remove zext and sext constant expressions (#71040 ) Remove support for zext and sext constant expressions. All places creating them have been removed beforehand, so this just removes the APIs and uses of these constant expressions in tests. There is some additional cleanup that can be done on top of this, e.g. we can remove the ZExtInst vs ZExtOperator footgun. This is part of https://discourse.llvm.org/t/rfc-remove-most-constant-expressions/63179.	2023-11-03 10:46:07 +01:00
Tobias Stadler	373c343a77	Reland: [GlobalISel] LegalizationArtifactCombiner: Elide redundant G_AND Reland `3686a0b` after fixing an exposed miscompile in #68840 Differential Revision: https://reviews.llvm.org/D159140	2023-11-02 00:18:19 +01:00
Fangrui Song	5888dee7d0	[ARM,ELF] Fix access to dso_preemptable __stack_chk_guard with static relocation model (#70014 ) The ELF code from https://reviews.llvm.org/D112811 emits LDRLIT_ga_pcrel when `TM.isPositionIndependent()` but uses a different condition `Subtarget.isGVIndirectSymbol(GV)` (aka dso_preemptable on ELF targets). This would cause incorrect access for dso_preemptable `__stack_chk_guard` with the static relocation model. Regarding whether `__stack_chk_guard` gets the dso_local specifier, https://reviews.llvm.org/D150841 switched to `M.getDirectAccessExternalData()` (implied by "PIC Level") instead of `TM.getRelocationModel() == Reloc::Static`. The result is that when non-zero "PIC Level" is used with static relocation model (e.g. -fPIE/-fPIC LTO compiles with -no-pie linking), `__stack_chk_guard` accesses are incorrect. ``` ldr r0, .LCPI0_0 ldr r0, [r0] ldr r0, [r0] // incorrectly dereferences __stack_chk_guard ... .LCPI0_0: .long __stack_chk_guard ``` To fix this, for dso_preemptable `__stack_chk_guard`, emit a GOT PIC code sequence like for -fpic using `LDRLIT_ga_pcrel`: ``` ldr r0, .LCPI0_0 .LPC0_0: add r0, pc, r0 ldr r0, [r0] ldr r0, [r0] ... LCPI0_0: .Ltmp0: .long __stack_chk_guard(GOT_PREL)-((.LPC0_0+8)-.Ltmp0) ``` Technically, `LDRLIT_ga_abs` with `R_ARM_GOT_ABS` could be used, but `R_ARM_GOT_ABS` does not have GNU or integrated assembler support. (Note, `.LCPI0_0: .long __stack_chk_guard@GOT` produces an `R_ARM_GOT_BREL`, which is not desired). This patch fixes #6499 while not changing behavior for the following configurations: ``` run arm.linux.nopic --target=arm-linux-gnueabi -fno-pic run arm.linux.pie --target=arm-linux-gnueabi -fpie run arm.linux.pic --target=arm-linux-gnueabi -fpic run armv6.darwin.nopic --target=armv6-apple-darwin -fno-pic run armv6.darwin.dynamicnopic --target=armv6-apple-darwin -mdynamic-no-pic run armv6.darwin.pic --target=armv6-apple-darwin -fpic run armv7.darwin.nopic --target=armv7-apple-darwin -mcpu=cortex-a8 -fno-pic run armv7.darwin.dynamicnopic --target=armv7-apple-darwin -mcpu=cortex-a8 -mdynamic-no-pic run armv7.darwin.pic --target=armv7-apple-darwin -mcpu=cortex-a8 -fpic run arm64.darwin.pic --target=arm64-apple-darwin ```	2023-10-31 15:37:26 -07:00
Fangrui Song	6ae7b735db	[ARM][test] Improve stack-protector tests llvm/test/LTO/ARM/ssp-static-reloc.ll is more about using the static relocation model with "PIC Level" and unrelated to the LTO infrastructure. Move the test. Update stack_guard_remat.ll to clearly test "PIC Level" with the relevant relocation models.	2023-10-31 15:30:08 -07:00
XChy	fc6bdb8549	[SimplifyCFG] Reland transform for redirecting phis between unmergeable BB and SuccBB (#68473 ) Reland #67275 with #68953 resolved.	2023-10-28 17:10:20 +08:00
Matthias Braun	e3cf80c5c1	BlockFrequencyInfoImpl: Avoid big numbers, increase precision for small spreads BlockFrequencyInfo calculates block frequencies as Scaled64 numbers but as a last step converts them to unsigned 64bit integers (`BlockFrequency`). This improves the factors picked for this conversion so that: * Avoid big numbers close to UINT64_MAX to avoid users overflowing/saturating when adding multiply frequencies together or when multiplying with integers. This leaves the topmost 10 bits unused to allow for some room. * Spread the difference between hottest/coldest block as much as possible to increase precision. * If the hot/cold spread cannot be represented loose precision at the lower end, but keep the frequencies at the upper end for hot blocks differentiable.	2023-10-24 20:27:39 -07:00
David Green	8a701024f3	[ARM] Lower i1 concat via MVETRUNC The MVETRUNC operation can perform the same truncate of two vectors, without requiring lane inserts/extracts from every vector lane. This moves the concat i1 lowering to use it for v8i1 and v16i1 result types, trading a bit of extra stack space for less instructions.	2023-10-18 19:40:11 +01:00
Nikita Popov	a72d88fb4f	Revert "Reapply [Verifier] Sanity check alloca size against DILocalVariable fragment size" This reverts commit `8840da2db2`. This results in verifier failures during LTO, see #68929.	2023-10-16 12:17:24 +02:00
weiguozhi	b6043f9867	[RA] Disable split around hint register if optimize for size (#68619 ) Split a virtual register with hint may generate COPY instructions in multiple cold basic blocks, and increase code size. So disable this split when the function is optimized for size.	2023-10-11 14:57:15 -07:00
Nikita Popov	8840da2db2	Reapply [Verifier] Sanity check alloca size against DILocalVariable fragment size Reapply now that generation of incorrect debuginfo for FnDef in rustc has been fixed. ----- Add a check that the DILocalVariable fragment size in dbg.declare does not exceed the size of the alloca. This would have caught the invalid debuginfo regenerated by rustc in https://github.com/llvm/llvm-project/issues/64149. Differential Revision: https://reviews.llvm.org/D158743	2023-10-09 14:22:12 +02:00
Jay Foad	7b3bbd83c0	Revert "[CodeGen] Really renumber slot indexes before register allocation (#67038 )" This reverts commit `2501ae58e3`. Reverted due to various buildbot failures.	2023-10-09 12:31:32 +01:00
Jay Foad	2501ae58e3	[CodeGen] Really renumber slot indexes before register allocation (#67038 ) PR #66334 tried to renumber slot indexes before register allocation, but the numbering was still affected by list entries for instructions which had been erased. Fix this to make the register allocator's live range length heuristics even less dependent on the history of how instructions have been added to and removed from SlotIndexes's maps.	2023-10-09 11:44:41 +01:00
Fangrui Song	d20190e684	[test] Change llc -march=aarch64\|arm64 to -mtriple=aarch64\|arm64 Similar to commit `806761a762` to avoid issues due to object file format differences. These tests are currently benign.	2023-09-29 10:13:06 -07:00
Tobias Stadler	305fbc1b32	Revert "[GlobalISel] LegalizationArtifactCombiner: Elide redundant G_AND" This reverts commit `3686a0b611`. This seems to have broken some sanitizer tests: https://lab.llvm.org/buildbot/#/builders/184/builds/7721	2023-09-29 03:35:40 +02:00
Tobias Stadler	3686a0b611	[GlobalISel] LegalizationArtifactCombiner: Elide redundant G_AND The legalizer currently generates lots of G_AND artifacts. For example between boolean uses and defs there is always a G_AND with a mask of 1, but when the target uses ZeroOrOneBooleanContents, this is unnecessary. Currently these artifacts have to be removed using post-legalize combines. Omitting these artifacts at their source in the artifact combiner has a few advantages: - We know that the emitted G_AND is very likely to be useless, so our KnownBits call is likely worth it. - The G_AND and G_CONSTANT can interrupt e.g. G_UADDE/... sequences generated during legalization of wide adds which makes it harder to detect these sequences in the instruction selector (e.g. useful to prevent unnecessary reloading of AArch64 NZCV register). - This cleans up a lot of legalizer output and even improves compilation-times. AArch64 CTMark geomean: `O0` -5.6% size..text; `O0` and `O3` ~-0.9% compilation-time (instruction count). Since this introduces KnownBits into code-paths used by `O0`, I reduced the default recursion depth. This doesn't seem to make a difference in CTMark, but should prevent excessive recursive calls in the worst case. Reviewed By: aemerson Differential Revision: https://reviews.llvm.org/D159140	2023-09-29 02:11:57 +02:00
Jay Foad	fb32baf0ec	[ARM] Make some test checks more robust This makes some tests robust against minor codegen differences that will be caused by PR #67038.	2023-09-28 14:26:13 +01:00
Douglas Yung	6716d3dd77	Move test split-deadloop.mir that was added in `e3d714f` to AArch64 directory instead of ARM.	2023-09-26 09:51:47 -07:00
weiguozhi	31f81e96a4	[RA] Don't split a register generated from another split (#67351 ) Split a register generated from another split usually doesn't bring us too much benefit. It may also cause dead loop as pr67188 shows if the heuristic cost always satisfy the split condition. So prevent such splitting. It fixed pr67188.	2023-09-26 08:38:18 -07:00
Muhammad Omair Javaid	431969ede1	Revert "[SimplifyCFG] Transform for redirecting phis between unmergeable BB and SuccBB (#67275 )" This reverts commit `fc86d031fe`. This change breaks LLVM buildbot clang-aarch64-sve-vls-2stage https://lab.llvm.org/buildbot/#/builders/176/builds/5474 I am going to revert this patch as the bot has been failing for more than a day without a fix.	2023-09-26 15:47:16 +05:00
XChy	fc86d031fe	[SimplifyCFG] Transform for redirecting phis between unmergeable BB and SuccBB (#67275 ) This patch extends function TryToSimplifyUncondBranchFromEmptyBlock to handle the similar cases below. ```llvm define i8 @src(i8 noundef %arg) { start: switch i8 %arg, label %unreachable [ i8 0, label %case012 i8 1, label %case1 i8 2, label %case2 i8 3, label %end ] unreachable: unreachable case1: br label %case012 case2: br label %case012 case012: %phi1 = phi i8 [ 3, %case2 ], [ 2, %case1 ], [ 1, %start ] br label %end end: %phi2 = phi i8 [ %phi1, %case012 ], [ 4, %start ] ret i8 %phi2 } ``` The phis here should be merged into one phi, so that we can better optimize it: ```llvm define i8 @tgt(i8 noundef %arg) { start: switch i8 %arg, label %unreachable [ i8 0, label %end i8 1, label %case1 i8 2, label %case2 i8 3, label %case3 ] unreachable: unreachable case1: br label %end case2: br label %end case3: br label %end end: %phi = phi i8 [ 4, %case3 ], [ 3, %case2 ], [ 2, %case1 ], [ 1, %start ] ret i8 %phi } ``` Proof: [normal](https://alive2.llvm.org/ce/z/vAWi88) [multiple stages](https://alive2.llvm.org/ce/z/DDBQqp) [multiple stages 2](https://alive2.llvm.org/ce/z/nGkeqN) [multiple phi combinations](https://alive2.llvm.org/ce/z/VQeEdp) And lookup table optimization should convert it into add %arg 1. This patch just match similar CFG structure and merge the phis in different cases. Maybe such transform can be applied to other situations besides switch, but I'm not sure whether it's better than not merging. Therefore, I only try it in switch, Related issue: #63876 [Migrated](https://reviews.llvm.org/D155940)	2023-09-25 10:13:45 +08:00
Matt Harding	64d1ceaa38	Add command line option --no-trap-after-noreturn (#67051 ) Add the command line option --no-trap-after-noreturn, which exposes the pre-existing TargetOption `NoTrapAfterNoreturn`. This pull request was split off from this one: https://github.com/llvm/llvm-project/pull/65876	2023-09-22 22:03:21 +02:00
Jon Roelofs	83e6d2edfc	Revert "[ARM] Always lower direct calls as direct when the outliner is enabled (#66434 )" This reverts commit `003bcad9a8`. ARM folks say it regresses some of their benchmarks: https://github.com/llvm/llvm-project/pull/66434#issuecomment-1722424162	2023-09-18 09:45:46 -07:00
Nikita Popov	38c59b9f53	Revert "Reapply [Verifier] Sanity check alloca size against DILocalVariable fragment size" This reverts commit `47324cfd7d`. This exposed incorrect debuginfo in rustc. Revert the verification until this has been fixed.	2023-09-18 17:24:53 +02:00
Guozhi Wei	cbdccb30c2	[RA] Split a virtual register in cold blocks if it is not assigned preferred physical register If a virtual register is not assigned preferred physical register, it means some COPY instructions will be changed to real register move instructions. In this case we can try to split the virtual register in colder blocks, if success, the original COPY instructions can be deleted, and the new COPY instructions in colder blocks will be generated as register move instructions. It results in fewer dynamic register move instructions executed. The new test case split-reg-with-hint.ll gives an example, the hot path contains 24 instructions without this patch, now it is only 4 instructions with this patch. Differential Revision: https://reviews.llvm.org/D156491	2023-09-15 19:52:50 +00:00
Jon Roelofs	003bcad9a8	[ARM] Always lower direct calls as direct when the outliner is enabled (#66434 ) The indirect lowering hinders the outliner's ability to see that sequences are in fact common, since the sequence similarity is rendered opaque by the register callee. The size savings from making them indirect seems to be dwarfed by the outliner's savings from de-duplication. rdar://115178034 rdar://115459865	2023-09-15 10:04:56 -07:00
Nikita Popov	47324cfd7d	Reapply [Verifier] Sanity check alloca size against DILocalVariable fragment size Reapply after fixing a clang bug this exposed in D158972 and adjusting a number of tests that failed for 32-bit targets. ----- Add a check that the DILocalVariable fragment size in dbg.declare does not exceed the size of the alloca. This would have caught the invalid debuginfo regenerated by rustc in https://github.com/llvm/llvm-project/issues/64149. Differential Revision: https://reviews.llvm.org/D158743	2023-09-15 14:51:50 +02:00
Allen	347b3f1209	[ARM][ISel] Fix crash of ISD::FMINNUM/FMAXNUM (#65849 ) The instruction of ISD::FMINNUM/FMAXNUM should be legal if HasFPARMv8 && HasNEON. For the combination of armv7+fp-armv8, armv7 imply the feature HasNEON on, and fp-armv8 matchs the feature HasFPARMv8, so it is legal Fixes https://github.com/llvm/llvm-project/issues/65820	2023-09-14 10:35:07 +08:00
David Green	a82c106e57	[ARM] Change CRC predicate to just HasCRC This removes the backend requirement for crc instructions on HasV8, relying on just HasCRC instead. This should allow them to be selected with ArmV7 + crc, making them more usable whilst hopefully not making them incorrectly generated (they only come from intrinsics, and HasCRC usually requires HasV8). This is how most other instructions are specified.	2023-09-08 09:02:15 +01:00
Matt Arsenault	b14e83d1a4	IR: Add llvm.exp10 intrinsic We currently have log, log2, log10, exp and exp2 intrinsics. Add exp10 to fix this asymmetry. AMDGPU already has most of the code for f32 exp10 expansion implemented alongside exp, so the current implementation is duplicating nearly identical effort between the compiler and library which is inconvenient. https://reviews.llvm.org/D157871	2023-09-01 19:45:03 -04:00
Nikita Popov	98cf20f890	Revert "[Verifier] Sanity check alloca size against DILocalVariable fragment size" This reverts commit `183f49c3e0`. The lang/cpp/trivial_abi/TestTrivialABI.py lldb test fails on buildbots.	2023-08-28 09:44:51 +02:00
Nikita Popov	183f49c3e0	[Verifier] Sanity check alloca size against DILocalVariable fragment size Add a check that the DILocalVariable fragment size in dbg.declare does not exceed the size of the alloca. This would have caught the invalid debuginfo regenerated by rustc in https://github.com/llvm/llvm-project/issues/64149. Differential Revision: https://reviews.llvm.org/D158743	2023-08-28 09:16:33 +02:00
Oliver Stannard	40614e1c14	[ARM] Save and restore CPSR around tMOVimm32 When resolving a frame index with a large offset for v6M execute-only, we emit a tMOVimm32 pseudo-instruction, which later gets lowered to a sequence of instructions, all of which are flag-setting. However, a frame index may be generated for a register spill or reload instruction, which can be inserted at a point where CPSR is live. This patch inserts MRS and MSR instructions around the tMOVimm32 to save and restore the value of CPSR, if CPSR is live at that point. This may need up to two virtual registers (one to build the immediate value, one to save CPSR) during frame index lowering, which happens after register allocation, so we need to ensure two spill slots are avilable to the register scavenger to ensure it can free up enough registers for this. There is no test for the emission (or not) of the MRS/MSR pair, because it requires a spill or reload to be inserted at a point where CPSR is live, which requires a large, complex function and is fragile enough that any optimisation changes will break the test. This bug was easily found by csmith with -verify-machineinstrs, which I now run regularly on v6M execute-only (and many other combinations). Patch by John Brawn and myself. Reviewed By: stuij Differential Revision: https://reviews.llvm.org/D158404	2023-08-24 14:15:02 +01:00
Nikita Popov	69bd66b3ce	[Tests] Remove some and/or constant expressions in tests (NFC) In preparation for their removal in D158081.	2023-08-21 12:05:32 +02:00
Keith Walker	2d9c6e699a	[Thumb1] Use callee-saved register to adjust stack pointer When adjusting the Stack Pointer at the end of the function epilogue, use a callee-saved register, rather than explicitly using R4 which may not have been saved. Differential Revision: https://reviews.llvm.org/D157500	2023-08-17 18:29:50 +01:00
Nicholas Guy	d65feccb12	[ARM] Set preferred function alignment Aligning functions yields small performance gains on embedded cores, moreso with numerous small function calls. Similar to aligning loops, if the function can fit within a single cache line then the performance overhead of fetching more instructions can be limited. Differential Revision: https://reviews.llvm.org/D157514	2023-08-16 17:31:21 +01:00

1 2 3 4 5 ...

4833 Commits