clang-p2996

Author	SHA1	Message	Date
Nikita Popov	33a3139f80	[CGP] Avoid branch on poison UB in test (NFC)	2023-01-03 14:52:48 +01:00
Nikita Popov	02b02cd050	[CodeGenPrepare] Avoid branch on undef UB in tests (NFC)	2023-01-03 13:51:00 +01:00
Matt Arsenault	d9e51e7552	CodeGenPrepare: Convert most tests to opaque pointers NVPTX/dont-introduce-addrspacecast.ll required manually removing a check for a bitcast. AArch64/combine-address-mode.ll required rerunning update_test_checks Mips required some manual updates due to a CHECK-NEXT coming after a deleted bitcast. ARM/sink-addrmode.ll needed one small manual fix. Excludes one X86 function which needs more attention.	2022-11-28 09:21:59 -05:00
Matt Arsenault	e446d1d93f	CodeGenPrepare: Don't use anonymous values some in tests These are always an obstacle to test updates, and often break after running opaquify scripts on them.	2022-11-27 10:30:37 -05:00
Matt Arsenault	8824318512	X86: Make test check more precise This is really checking an i8*, not an i8.	2022-11-27 10:17:38 -05:00
Matt Arsenault	ffb20958cd	CodeGenPrepare: Don't use undef base pointers in addressing mode test This broke after the opaquify script.	2022-11-27 10:15:31 -05:00
Alex Richardson	16f9c5577d	[SimplifyLibCalls] Retain attributes added by Builder.CreateMem* This currently does not make much of a difference (only one tests is affected), but it is helpful e.g. for the out-of-tree CHERI target where Builder.CreateMemCpy() can add attributes other than parameter alignment. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D135075	2022-10-04 13:11:34 +00:00
Florian Hahn	b2c195da6d	[CGP] Update failing test missed in `81a11da762`.	2022-09-15 19:35:25 +01:00
Florian Hahn	81a11da762	[CGP,AArch64] Replace zexts with shuffle that can be lowered using tbl. This patch extends CodeGenPrepare to lower zext v16i8 -> v16i32 in loops using a wide shuffle creating a v64i8 vector, selecting groups of 3 zero elements and an element from the input. This is profitable on AArch64 where such shuffles can be lowered to tbl instructions, but only in loops, because it requires materializing 4 masks, which can be done in the loop preheader. This is the only reason the transform is part of CGP. If there's a better alternative I missed, please let me know. The same goes for the shouldReplaceZExtWithShuffle hook which guards this. I am not sure if this transform will be beneficial on other targets, but it seems like there is no way other convenient way. This improves the generated code for loops like the one below in combination with D96522. int foo(uint8_t p, int N) { unsigned long long sum = 0; for (int i = 0; i < N ; i++, p++) { unsigned int v = p; sum += (v < 127) ? v : 256 - v; } return sum; } https://clang.godbolt.org/z/Wco866MjY Reviewed By: t.p.northover Differential Revision: https://reviews.llvm.org/D120571	2022-09-15 19:18:13 +01:00
Alex Bradbury	c44c1e9d3e	[RISCV] Implement isMaskAndCmp0FoldingBeneficial hook This hook is currently only used by CodeGenPrepare, which will sink and duplicate an 'and' into a block that has an 'icmp 0' user of it if the hook returns true. This hook is less useful for RISC-V than for targets like AArch64 that have a TBZ (test bit and branch if zero instruction), but may still be profitable if Zbs is available and a BEXTI can be selected. Conservatively, we return false even if Zbs is enabled for any masks that fit in the ANDI immediate because it's possible the only use is a branch on the result, and ANDI+BNEZ => BEXTI+BNEZ isn't a profitable transformation. Differential Revision: https://reviews.llvm.org/D131492	2022-09-13 18:54:00 +01:00
Xiang1 Zhang	16743c9534	[CodeGen] Limit building time in CodeGenPrepare for huge function Details: Currently CodeGenPrepare is very time consuming in handling big functions. Old Algorithm : It iterate each BB in function, and go on handle very instructions in BB. Due to some instruction optimizations may affect the BBs' dominate tree. The old logic will re-iterate and try optimize for each BB. Suppose we have a big function with 20000 BBs, If we handled the last BB with fine tuning the dominate tree. We need totally re-iterate and try optimize the 20000 BBs from the beginning. The Complex is near N! And we really encounter somes big tests (> 20000 BBs) that cost more than 30 mins in this pass. (Debug version compiler will cost 2 hours here) What this patch do for huge function ? It mainly changes the iteration way for optimization. 1 We do optimizeBlock for each BB (that is same with old way). And, in the meaning time, If BB is changed/updated in the optimization, it will be put into FreshBBs (try do optimizeBlock again). The new created BB at previous iteration will also put into FreshBBs. 2 For the BBs which not updated at previous iteration, we directly skip it. Strictly speaking, here may miss some opportunity, but the probability is very small. 3 For Instructions in single BB, we do optimizeInst for each instruction. If optimizeInst change the instruction dominator in this BB, rather than break and go back to optimize the first BB (the old way), we directly iterate instructions (to do optimizeInst) in this updated BB again (the new way). What this patch do for small/normal (not huge) function ? It is same with the Old Algorithm. (NFC) Reviewed By: LuoYuanke Differential Revision: https://reviews.llvm.org/D129352	2022-09-07 10:05:40 +08:00
Arthur Eubanks	19d4f5e649	[test] Add missing REQUIRES: arm-registered-target	2022-07-20 10:59:07 -07:00
Ruobing Han	2b98b8e8fb	fix bug for useless malloc elimination in CodeGenPrepare Put AllocationFn check before I->willReturn can allow CodeGenPrepare to remove useless malloc instruction Differential Revision: https://reviews.llvm.org/D130126	2022-07-20 16:29:51 +00:00
Tim Besard	a323dfc015	Don't sink ptrtoint/inttoptr sequences into non-noop addrspacecasts. In https://reviews.llvm.org/D30114, support for mismatching address spaces was introduced to CodeGenPrepare's optimizeMemoryInst, using addrspacecast as it was argued that only no-op addrspacecasts would be considered when constructing the address mode. However, by doing inttoptr/ptrtoint, it's possible to get CGP to emit an addrspace that's not actually no-op, introducing a miscompilation: define void @kernel(i8* %julia_ptr) { %intptr = ptrtoint i8* %julia_ptr to i64 %ptr = inttoptr i64 %intptr to i32 addrspace(3)* br label %end end: store atomic i32 1, i32 addrspace(3)* %ptr unordered, align 4 ret void } Gets compiled to: define void @kernel(i8* %julia_ptr) { end: %0 = addrspacecast i8* %julia_ptr to i32 addrspace(3)* store atomic i32 1, i32 addrspace(3)* %0 unordered, align 4 ret void } In the case of NVPTX, this introduces a cvta.to.shared, whereas leaving out the %end block and branch doesn't trigger this optimization. This results in illegal memory accesses as seen in https://github.com/JuliaGPU/CUDA.jl/issues/558 In this change, I introduced a check before doing the pointer cast that verifies address spaces are the same. If not, it emits a ptrtoint/inttoptr combination to get a no-op cast between address spaces. I decided against disallowing ptrtoint/inttoptr with non-default AS in matchOperationAddr, because now its still possible to look through multiple sequences of them that ultimately do not result in a address space mismatch (i.e. the second lit test).	2022-07-16 10:56:42 -04:00
Nikita Popov	c10921fa1a	[CGP] Also freeze ctlz/cttz operand when despeculating D125887 changed the ctlz/cttz despeculation transform to insert a freeze for the introduced branch on zero. While this does fix the "branch on poison" issue, we may still get in trouble if we pick a different value for the branch and for the ctz argument (i.e. non-zero for the branch, but zero for the ctz). To avoid this, we should use the same frozen value in both positions. This does cause a regression in RISCV codegen by introducing an additional sext. The DAG looks like this: t0: ch = EntryToken t2: i64,ch = CopyFromReg t0, Register:i64 %3 t4: i64 = AssertSext t2, ValueType:ch:i32 t23: i64 = freeze t4 t9: ch = CopyToReg t0, Register:i64 %0, t23 t16: ch = CopyToReg t0, Register:i64 %4, Constant:i64<32> t18: ch = TokenFactor t9, t16 t25: i64 = sign_extend_inreg t23, ValueType:ch:i32 t24: i64 = setcc t25, Constant:i64<0>, seteq:ch t28: i64 = and t24, Constant:i64<1> t19: ch = brcond t18, t28, BasicBlock:ch<cond.end 0x8311f68> t21: ch = br t19, BasicBlock:ch<cond.false 0x8311e80> I don't see a really obvious way to improve this, as we can't push the freeze past the AssertSext (which may produce poison). Differential Revision: https://reviews.llvm.org/D126638	2022-06-10 09:46:10 +02:00
Florian Hahn	786c687810	[AArch64] Add support for FMA intrinsics to shouldSinkOperands. If the fma operates on a legal vector type, the indexed variants can be used, if the second operand is a splat of a valid index. Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D126234	2022-05-27 10:37:03 +01:00
Florian Hahn	a9a012086a	[AArch64] Add additional tests for sinking free shuffles for FMAs.	2022-05-26 10:35:38 +01:00
Florian Hahn	8661725686	[AArch64] Add tests with free shuffles for indexed fma variants. The new tests contain examples where shuffles are free, because indexed fma instructions can be used.	2022-05-23 20:27:42 +01:00
Nikita Popov	5126c38012	[CGP] Freeze condition when despeculating ctlz/cttz Freeze the condition of the newly introduced conditional branch, to avoid immediate undefined behavior if the input to ctlz/cttz was originally poison. Differential Revision: https://reviews.llvm.org/D125887	2022-05-23 11:01:18 +02:00
Matthias Braun	8d03c49f49	Extend switch condition in optimizeSwitchPhiConst when free In a case like: switch((i32)x) { case 42: phi((i64)42, ...); } replace `(i64)42` with `zext(x)` when we can do so for free. This fixes a part of https://github.com/llvm/llvm-project/issues/55153 Differential Revision: https://reviews.llvm.org/D124897	2022-05-18 16:23:53 -07:00
Nikita Popov	8e4c5d9902	[CGP] Regenerate test checks (NFC)	2022-05-18 15:35:21 +02:00
Matthias Braun	de9ad98d2d	Fix endless loop in optimizePhiConst with integer constant switch condition Avoid endless loop in degenerate case with an integer constant as switch condition as reported in https://reviews.llvm.org/D124552	2022-05-11 08:49:01 -07:00
Matthias Braun	f0ea9c9cec	CodeGenPrepare: Replace constant PHI arguments with switch condition value We often see code like the following after running SCCP: switch (x) { case 42: phi(42, ...); } This tends to produce bad code as we currently materialize the constant phi-argument in the switch-block. This increases register pressure and if the pattern repeats for `n` case statements, we end up generating `n` constant values. This changes CodeGenPrepare to catch this pattern and revert it back to: switch (x) { case 42: phi(x, ...); } Differential Revision: https://reviews.llvm.org/D124552	2022-05-10 10:00:10 -07:00
Matthias Braun	cd19af74c0	Avoid 8 and 16bit switch conditions on x86 This adds a `TargetLoweringBase::getSwitchConditionType` callback to give targets a chance to control the type used in `CodeGenPrepare::optimizeSwitchInst`. Implement callback for X86 to avoid i8 and i16 types where possible as they often incur extra zero-extensions. This is NFC for non-X86 targets. Differential Revision: https://reviews.llvm.org/D124894	2022-05-10 10:00:10 -07:00
Momchil Velikov	d0ea42a7c1	[AArch64] Async unwind - function epilogues Reviewed By: MaskRay, chill Differential Revision: https://reviews.llvm.org/D112330	2022-04-12 16:50:50 +01:00
Momchil Velikov	50a97aacac	[AArch64] Async unwind - function prologues Re-commit of `32e8b550e5` This patch rearranges emission of CFI instructions, so the resulting DWARF and `.eh_frame` information is precise at every instruction. The current state is that the unwind info is emitted only after the function prologue. This is fine for synchronous (e.g. C++) exceptions, but the information is generally incorrect when the program counter is at an instruction in the prologue or the epilogue, for example: ``` stp x29, x30, [sp, #-16]! // 16-byte Folded Spill mov x29, sp .cfi_def_cfa w29, 16 ... ``` after the `stp` is executed the (initial) rule for the CFA still says the CFA is in the `sp`, even though it's already offset by 16 bytes A correct unwind info could look like: ``` stp x29, x30, [sp, #-16]! // 16-byte Folded Spill .cfi_def_cfa_offset 16 mov x29, sp .cfi_def_cfa w29, 16 ... ``` Having this information precise up to an instruction is useful for sampling profilers that would like to get a stack backtrace. The end goal (towards this patch is just a step) is to have fully working `-fasynchronous-unwind-tables`. Reviewed By: danielkiss, MaskRay Differential Revision: https://reviews.llvm.org/D111411	2022-03-24 16:16:44 +00:00
Hans Wennborg	85c53c7092	Revert "[AArch64] Async unwind - function prologues" It caused builds to assert with: (StackSize == 0 && "We already have the CFA offset!"), function generateCompactUnwindEncoding, file AArch64AsmBackend.cpp, line 624. when targeting iOS. See comment on the code review for reproducer. > This patch rearranges emission of CFI instructions, so the resulting > DWARF and `.eh_frame` information is precise at every instruction. > > The current state is that the unwind info is emitted only after the > function prologue. This is fine for synchronous (e.g. C++) exceptions, > but the information is generally incorrect when the program counter is > at an instruction in the prologue or the epilogue, for example: > > ``` > stp x29, x30, [sp, #-16]! // 16-byte Folded Spill > mov x29, sp > .cfi_def_cfa w29, 16 > ... > ``` > > after the `stp` is executed the (initial) rule for the CFA still says > the CFA is in the `sp`, even though it's already offset by 16 bytes > > A correct unwind info could look like: > ``` > stp x29, x30, [sp, #-16]! // 16-byte Folded Spill > .cfi_def_cfa_offset 16 > mov x29, sp > .cfi_def_cfa w29, 16 > ... > ``` > > Having this information precise up to an instruction is useful for > sampling profilers that would like to get a stack backtrace. The end > goal (towards this patch is just a step) is to have fully working > `-fasynchronous-unwind-tables`. > > Reviewed By: danielkiss, MaskRay > > Differential Revision: https://reviews.llvm.org/D111411 This reverts commit `32e8b550e5`.	2022-03-04 17:36:26 +01:00
Momchil Velikov	63c9aca12a	Revert "[AArch64] Async unwind - function epilogues" This reverts commit `74319d6794`. It causes test failures that look like infinite loop in asan/hwasan unwinding.	2022-03-02 15:01:57 +00:00
Momchil Velikov	74319d6794	[AArch64] Async unwind - function epilogues Counterpart of https://reviews.llvm.org/D111411 this change makes the unwind information instruction precise in function epilogues. Reviewed By: MaskRay Differential Revision: https://reviews.llvm.org/D112330	2022-03-02 13:15:11 +00:00
Momchil Velikov	32e8b550e5	[AArch64] Async unwind - function prologues This patch rearranges emission of CFI instructions, so the resulting DWARF and `.eh_frame` information is precise at every instruction. The current state is that the unwind info is emitted only after the function prologue. This is fine for synchronous (e.g. C++) exceptions, but the information is generally incorrect when the program counter is at an instruction in the prologue or the epilogue, for example: ``` stp x29, x30, [sp, #-16]! // 16-byte Folded Spill mov x29, sp .cfi_def_cfa w29, 16 ... ``` after the `stp` is executed the (initial) rule for the CFA still says the CFA is in the `sp`, even though it's already offset by 16 bytes A correct unwind info could look like: ``` stp x29, x30, [sp, #-16]! // 16-byte Folded Spill .cfi_def_cfa_offset 16 mov x29, sp .cfi_def_cfa w29, 16 ... ``` Having this information precise up to an instruction is useful for sampling profilers that would like to get a stack backtrace. The end goal (towards this patch is just a step) is to have fully working `-fasynchronous-unwind-tables`. Reviewed By: danielkiss, MaskRay Differential Revision: https://reviews.llvm.org/D111411	2022-02-28 13:37:57 +00:00
Florian Hahn	166968a892	[AArch64] Add test cases where zext can be lowered to series of tbl. Add a set of tests for upcoming patches that allow lowering vector zext using AArch64 tbl instructions instead of shifts.	2022-02-25 15:36:32 +00:00
Arthur Eubanks	687263183b	[test] Test domtree validity with -verify-dom-info instead of -analyze Verified that the test properly crashes without D16893's fix.	2022-02-09 16:00:18 -08:00
Nikita Popov	46f9e45ef0	[Statepoint] Update gc.statepoint calls in tests with elementtype (NFC) This updates tests for the LangRef change in D117890.	2022-02-04 14:15:41 +01:00
Sunho Kim	44601f4956	[AARCH64][NEON] Allow to sink operands for aarch64_neon_pmull This teaches AArch64TargetLowering::shouldSinkOperands to sink the operands of aarch64_neon_pmull intrinsic. Differential Revision: https://reviews.llvm.org/D117944	2022-02-03 16:46:49 +00:00
Micah Weston	93deac2e2b	[AArch64] Optimize add/sub with immediate through MIPeepholeOpt Fixes the build issue with D111034, whose goal was to optimize add/sub with long immediates. Optimize ([add\|sub] r, imm) -> ([ADD\|SUB] ([ADD\|SUB] r, #imm0, lsl #12), #imm1), if imm == (imm0<<12)+imm1. and both imm0 and imm1 are non-zero 12-bit unsigned integers. Optimize ([add\|sub] r, imm) -> ([SUB\|ADD] ([SUB\|ADD] r, #imm0, lsl #12), #imm1), if imm == -(imm0<<12)-imm1, and both imm0 and imm1 are non-zero 12-bit unsigned integers. The change which fixed the build issue in D111034 was the use of new virtual registers so that SSA form is maintained until deleting MI. Differential Revision: https://reviews.llvm.org/D117429	2022-01-22 12:39:22 +00:00
Florian Hahn	62476c7c14	Revert "[AArch64] Revive optimize add/sub with immediate through MIPeepholeOpt" This reverts commit `e6698f0992`. This commit appears to introduce new machine verifier failures when building the llvm-test-suite with `-mllvm -verify-machineinstrs` enabled: https://green.lab.llvm.org/green/job/test-suite-verify-machineinstrs-aarch64-O3/11061/ FAILED: MultiSource/Benchmarks/Olden/health/CMakeFiles/health.dir/health.c.o /Users/buildslave/jenkins/workspace/test-suite-verify-machineinstrs-aarch64-O3/test-suite-build/tools/timeit --summary MultiSource/Benchmarks/Olden/health/CMakeFiles/health.dir/health.c.o.time /Users/buildslave/jenkins/workspace/test-suite-verify-machineinstrs-aarch64-O3/compiler/bin/clang -DNDEBUG -B /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin -Wno-unused-command-line-argument -mllvm -verify-machineinstrs -O3 -arch arm64 -isysroot /Applications/Xcode.app/Contents/Developer/Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS13.5.sdk -w -Werror=date-time -DTORONTO -MD -MT MultiSource/Benchmarks/Olden/health/CMakeFiles/health.dir/health.c.o -MF MultiSource/Benchmarks/Olden/health/CMakeFiles/health.dir/health.c.o.d -o MultiSource/Benchmarks/Olden/health/CMakeFiles/health.dir/health.c.o -c /Users/buildslave/jenkins/workspace/test-suite-verify-machineinstrs-aarch64-O3/test-suite/MultiSource/Benchmarks/Olden/health/health.c * Bad machine code: Illegal virtual register for instruction * - function: alloc_tree - basic block: %bb.1 if.else (0x7fc0db8f8bb0) - instruction: %31:gpr64 = nsw MADDXrrr killed %39:gpr64sp, killed %25:gpr64, $xzr - operand 1: killed %39:gpr64sp Expected a GPR64 register, but got a GPR64sp register fatal error: error in backend: Found 1 machine code errors. PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace, preprocessed source, and associated run script. Stack dump: 0. Program arguments: /Users/buildslave/jenkins/workspace/test-suite-verify-machineinstrs-aarch64-O3/compiler/bin/clang -DNDEBUG -B /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin -Wno-unused-command-line-argument -mllvm -verify-machineinstrs -O3 -arch arm64 -isysroot /Applications/Xcode.app/Contents/Developer/Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS13.5.sdk -w -Werror=date-time -DTORONTO -MD -MT MultiSource/Benchmarks/Olden/health/CMakeFiles/health.dir/health.c.o -MF MultiSource/Benchmarks/Olden/health/CMakeFiles/health.dir/health.c.o.d -o MultiSource/Benchmarks/Olden/health/CMakeFiles/health.dir/health.c.o -c /Users/buildslave/jenkins/workspace/test-suite-verify-machineinstrs-aarch64-O3/test-suite/MultiSource/Benchmarks/Olden/health/health.c 1. <eof> parser at end of file 2. Code generation 3. Running pass 'Function Pass Manager' on module '/Users/buildslave/jenkins/workspace/test-suite-verify-machineinstrs-aarch64-O3/test-suite/MultiSource/Benchmarks/Olden/health/health.c'. 4. Running pass 'Verify generated machine code' on function '@alloc_tree' Stack dump without symbol names (ensure you have llvm-symbolizer in your PATH or set the environment var `LLVM_SYMBOLIZER_PATH` to point to it): 0 clang 0x000000011191896b llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) + 43 1 clang 0x00000001119179b5 llvm::sys::RunSignalHandlers() + 85 2 clang 0x00000001119180e2 llvm::sys::CleanupOnSignal(unsigned long) + 210 3 clang 0x0000000111849f6a (anonymous namespace)::CrashRecoveryContextImpl::HandleCrash(int, unsigned long) + 106 4 clang 0x0000000111849ee8 llvm::CrashRecoveryContext::HandleExit(int) + 24 5 clang 0x0000000111914acc llvm::sys::Process::Exit(int, bool) + 44 6 clang 0x000000010f4e9be9 LLVMErrorHandler(void, char const, bool) + 89 7 clang 0x0000000114eba333 llvm::report_fatal_error(llvm::Twine const&, bool) + 323 8 clang 0x0000000110d8c620 (anonymous namespace)::MachineVerifier::BBInfo::~BBInfo() + 0 9 clang 0x0000000110cdddca llvm::MachineFunctionPass::runOnFunction(llvm::Function&) + 378 10 clang 0x00000001110b0154 llvm::FPPassManager::runOnFunction(llvm::Function&) + 1092 11 clang 0x00000001110b6268 llvm::FPPassManager::runOnModule(llvm::Module&) + 72 12 clang 0x00000001110b074a llvm::legacy::PassManagerImpl::run(llvm::Module&) + 986 13 clang 0x0000000111c20ad4 clang::EmitBackendOutput(clang::DiagnosticsEngine&, clang::HeaderSearchOptions const&, clang::CodeGenOptions const&, clang::TargetOptions const&, clang::LangOptions const&, llvm::StringRef, llvm::Module, clang::BackendAction, std::__1::unique_ptr<llvm::raw_pwrite_stream, std::__1::default_delete<llvm::raw_pwrite_stream> >) + 3764 14 clang 0x0000000111f6dd31 clang::BackendConsumer::HandleTranslationUnit(clang::ASTContext&) + 1905 15 clang 0x00000001131a28b3 clang::ParseAST(clang::Sema&, bool, bool) + 643 16 clang 0x00000001122b02a4 clang::FrontendAction::Execute() + 84 17 clang 0x000000011222d6a9 clang::CompilerInstance::ExecuteAction(clang::FrontendAction&) + 873 18 clang 0x000000011232faf5 clang::ExecuteCompilerInvocation(clang::CompilerInstance) + 661 19 clang 0x000000010f4e9860 cc1_main(llvm::ArrayRef<char const>, char const, void) + 2544 20 clang 0x000000010f4e7168 ExecuteCC1Tool(llvm::SmallVectorImpl<char const>&) + 312 21 clang 0x00000001120ab187 void llvm::function_ref<void ()>::callback_fn<clang::driver::CC1Command::Execute(llvm::ArrayRef<llvm::Optional<llvm::StringRef> >, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, bool) const::$_1>(long) + 23 22 clang 0x0000000111849eb4 llvm::CrashRecoveryContext::RunSafely(llvm::function_ref<void ()>) + 228 23 clang 0x00000001120aac24 clang::driver::CC1Command::Execute(llvm::ArrayRef<llvm::Optional<llvm::StringRef> >, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, bool) const + 324 24 clang 0x000000011207b85d clang::driver::Compilation::ExecuteCommand(clang::driver::Command const&, clang::driver::Command const&) const + 221 25 clang 0x000000011207bdad clang::driver::Compilation::ExecuteJobs(clang::driver::JobList const&, llvm::SmallVectorImpl<std::__1::pair<int, clang::driver::Command const> >&) const + 125 26 clang 0x0000000112092f7c clang::driver::Driver::ExecuteCompilation(clang::driver::Compilation&, llvm::SmallVectorImpl<std::__1::pair<int, clang::driver::Command const> >&) + 204 27 clang 0x000000010f4e6977 main + 10375 28 libdyld.dylib 0x00007fff6be90cc9 start + 1 29 libdyld.dylib 0x0000000000000018 start + 18446603338705728336 clang-14: error: clang frontend command failed with exit code 70 (use -v to see invocation) clang version 14.0.0 (https://github.com/llvm/llvm-project.git `c90d136be4`) Target: arm64-apple-darwin19.5.0 Thread model: posix InstalledDir: /Users/buildslave/jenkins/workspace/test-suite-verify-machineinstrs-aarch64-O3/compiler/bin clang-14: note: diagnostic msg: *******************	2022-01-18 13:17:02 +00:00
Micah Weston	e6698f0992	[AArch64] Revive optimize add/sub with immediate through MIPeepholeOpt Fixes the build issue with D111034, whose goal was to optimize add/sub with long immediates. Optimize ([add\|sub] r, imm) -> ([ADD\|SUB] ([ADD\|SUB] r, #imm0, lsl #12), #imm1), if imm == (imm0<<12)+imm1. and both imm0 and imm1 are non-zero 12-bit unsigned integers. Optimize ([add\|sub] r, imm) -> ([SUB\|ADD] ([SUB\|ADD] r, #imm0, lsl #12), #imm1), if imm == -(imm0<<12)-imm1, and both imm0 and imm1 are non-zero 12-bit unsigned integers. The change which fixed the build issue in D111034 was the use of new virtual registers so that SSA form is maintained until deleting MI. Differential Revision: https://reviews.llvm.org/D117429	2022-01-17 17:17:15 +00:00
Simon Pilgrim	a3f50fb06d	[X86] isVectorShiftByScalarCheap - vXi8 select(shift(x,splat0),shift(x,splat1)) is better than shift(x,select(splat0,splat1)) Even though we don't have vXi8 vector shifts (apart from XOP), it is still better to prefer shift (or funnel-shift/rotate) by scalar where possible. https://llvm.godbolt.org/z/6ss6ffTxv Differential Revision: https://reviews.llvm.org/D116191	2021-12-23 14:30:02 +00:00
David Green	760d4d03d5	[AArch64] Sink splat shuffles to lane index intrinsics This teaches AArch64TargetLowering::shouldSinkOperands to sink splat shuffles to certain neon intrinsics, so that they can make use of the lane variants of the instructions that are available. Differential Revision: https://reviews.llvm.org/D112994	2021-11-22 08:11:35 +00:00
Ben Shi	59c3b48d99	Revert "[AArch64] Optimize add/sub with immediate" This reverts commit `3de3ca3137`.	2021-11-03 14:15:21 +08:00
Ben Shi	3de3ca3137	[AArch64] Optimize add/sub with immediate Optimize ([add\|sub] r, imm) -> ([ADD\|SUB] ([ADD\|SUB] r, #imm0, lsl #12), #imm1), if imm == (imm0<<12)+imm1. and both imm0 and imm1 are non-zero 12-bit unsigned integers. Optimize ([add\|sub] r, imm) -> ([SUB\|ADD] ([SUB\|ADD] r, #imm0, lsl #12), #imm1), if imm == -(imm0<<12)-imm1, and both imm0 and imm1 are non-zero 12-bit unsigned integers. Reviewed By: jaykang10, dmgreen Differential Revision: https://reviews.llvm.org/D111034	2021-11-03 03:06:43 +00:00
Fraser Cormack	eabf11f9ea	[CodeGenPrepare] Avoid a scalable-vector crash in ctlz/cttz This patch fixes a crash when despeculating ctlz/cttz intrinsics with scalable-vector types. It is not safe to speculatively get the size of the vector type in bits in case the vector type is not a fixed-length type. As it happens this isn't required as vector types are skipped anyway. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D112141	2021-10-20 16:45:55 +01:00
Ben Shi	d0dbc991c0	Revert "[AArch64] Optimize add/sub with immediate" This reverts commit `9bf6bef995`.	2021-10-16 22:17:18 +00:00
Ben Shi	9bf6bef995	[AArch64] Optimize add/sub with immediate Optimize ([add\|sub] r, imm) -> ([ADD\|SUB] ([ADD\|SUB] r, #imm0, lsl #12), #imm1), if imm == (imm0<<12)+imm1. and both imm0 and imm1 are non-zero 12-bit unsigned integers. Optimize ([add\|sub] r, imm) -> ([SUB\|ADD] ([SUB\|ADD] r, #imm0, lsl #12), #imm1), if imm == -(imm0<<12)-imm1, and both imm0 and imm1 are non-zero 12-bit unsigned integers. Reviewed By: jaykang10, dmgreen Differential Revision: https://reviews.llvm.org/D111034	2021-10-16 08:50:39 +00:00
David Green	adec922361	[AArch64] Make -mcpu=generic schedule for an in-order core We would like to start pushing -mcpu=generic towards enabling the set of features that improves performance for some CPUs, without hurting any others. A blend of the performance options hopefully beneficial to all CPUs. The largest part of that is enabling in-order scheduling using the Cortex-A55 schedule model. This is similar to the Arm backend change from `eecb353d0e` which made -mcpu=generic perform in-order scheduling using the cortex-a8 schedule model. The idea is that in-order cpu's require the most help in instruction scheduling, whereas out-of-order cpus can for the most part out-of-order schedule around different codegen. Our benchmarking suggests that hypothesis holds. When running on an in-order core this improved performance by 3.8% geomean on a set of DSP workloads, 2% geomean on some other embedded benchmark and between 1% and 1.8% on a set of singlecore and multicore workloads, all running on a Cortex-A55 cluster. On an out-of-order cpu the results are a lot more noisy but show flat performance or an improvement. On the set of DSP and embedded benchmarks, run on a Cortex-A78 there was a very noisy 1% speed improvement. Using the most detailed results I could find, SPEC2006 runs on a Neoverse N1 show a small increase in instruction count (+0.127%), but a decrease in cycle counts (-0.155%, on average). The instruction count is very low noise, the cycle count is more noisy with a 0.15% decrease not being significant. SPEC2k17 shows a small decrease (-0.2%) in instruction count leading to a -0.296% decrease in cycle count. These results are within noise margins but tend to show a small improvement in general. When specifying an Apple target, clang will set "-target-cpu apple-a7" on the command line, so should not be affected by this change when running from clang. This also doesn't enable more runtime unrolling like -mcpu=cortex-a55 does, only changing the schedule used. A lot of existing tests have updated. This is a summary of the important differences: - Most changes are the same instructions in a different order. - Sometimes this leads to very minor inefficiencies, such as requiring an extra mov to move variables into r0/v0 for the return value of a test function. - misched-fusion.ll was no longer fusing the pairs of instructions it should, as per D110561. I've changed the schedule used in the test for now. - neon-mla-mls.ll now uses "mul; sub" as opposed to "neg; mla" due to the different latencies. This seems fine to me. - Some SVE tests do not always remove movprfx where they did before due to different register allocation giving different destructive forms. - The tests argument-blocks-array-of-struct.ll and arm64-windows-calls.ll produce two LDR where they previously produced an LDP due to store-pair-suppress kicking in. - arm64-ldp.ll and arm64-neon-copy.ll are missing pre/postinc on LPD. - Some tests such as arm64-neon-mul-div.ll and ragreedy-local-interval-cost.ll have more, less or just different spilling. - In aarch64_generated_funcs.ll.generated.expected one part of the function is no longer outlined. Interestingly if I switch this to use any other scheduled even less is outlined. Some of these are expected to happen, such as differences in outlining or register spilling. There will be places where these result in worse codegen, places where they are better, with the SPEC instruction counts suggesting it is not a decrease overall, on average. Differential Revision: https://reviews.llvm.org/D110830	2021-10-09 15:58:31 +01:00
David Green	92128b7801	[AArch64] Regenerate even more tests This updates a few more check lines, in some mte tests that were close to auto generated already and some CodeGenPrepare/consthoist tests where being able to see the entire code sequence is useful for determining whether code differences are improvements or not.	2021-10-06 14:32:01 +01:00
Andrew Wei	c9066c5d37	[CGP] Fix the crash for combining address mode when having cyclic dependency In the combination of addressing modes, when replacing the matched phi nodes, sometimes the phi node to be replaced has been modified. For example, there’s matcher set [A, B] and [C, A], which will have cyclic dependency: A is replaced by B and C will be replaced by A. Because we tried to match new phi node to another new phi node, we should ignore new phi nodes when mapping new phi node to old one. Reviewed By: skatkov Differential Revision: https://reviews.llvm.org/D108635	2021-08-26 22:52:42 +08:00
Tiehu Zhang	9cfa9b44a5	[CodeGenPrepare] The instruction to be sunk should be inserted before its user in a block In current implementation, the instruction to be sunk will be inserted before the target instruction without considering the def-use tree, which may case Instruction does not dominate all uses error. We need to choose a suitable location to insert according to the use chain Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D107262	2021-08-17 18:58:15 +08:00
David Green	013030a0b2	[AArch64] Correct sinking of shuffles to adds/subs This was checking extends as shuffles, where as we should be checking the operands. This helps sink the shuffles, creating more addl/subl instructions. Differential Revision: https://reviews.llvm.org/D107623	2021-08-10 13:25:42 +01:00
David Green	3f74a68c35	[AArch64] Regenerate sink-free-instructions.ll. NFC	2021-08-10 13:25:42 +01:00

1 2 3 4 5 ...

367 Commits