clang-p2996

Author	SHA1	Message	Date
Momchil Velikov	d0ea42a7c1	[AArch64] Async unwind - function epilogues Reviewed By: MaskRay, chill Differential Revision: https://reviews.llvm.org/D112330	2022-04-12 16:50:50 +01:00
Momchil Velikov	50a97aacac	[AArch64] Async unwind - function prologues Re-commit of `32e8b550e5` This patch rearranges emission of CFI instructions, so the resulting DWARF and `.eh_frame` information is precise at every instruction. The current state is that the unwind info is emitted only after the function prologue. This is fine for synchronous (e.g. C++) exceptions, but the information is generally incorrect when the program counter is at an instruction in the prologue or the epilogue, for example: ``` stp x29, x30, [sp, #-16]! // 16-byte Folded Spill mov x29, sp .cfi_def_cfa w29, 16 ... ``` after the `stp` is executed the (initial) rule for the CFA still says the CFA is in the `sp`, even though it's already offset by 16 bytes A correct unwind info could look like: ``` stp x29, x30, [sp, #-16]! // 16-byte Folded Spill .cfi_def_cfa_offset 16 mov x29, sp .cfi_def_cfa w29, 16 ... ``` Having this information precise up to an instruction is useful for sampling profilers that would like to get a stack backtrace. The end goal (towards this patch is just a step) is to have fully working `-fasynchronous-unwind-tables`. Reviewed By: danielkiss, MaskRay Differential Revision: https://reviews.llvm.org/D111411	2022-03-24 16:16:44 +00:00
Hans Wennborg	85c53c7092	Revert "[AArch64] Async unwind - function prologues" It caused builds to assert with: (StackSize == 0 && "We already have the CFA offset!"), function generateCompactUnwindEncoding, file AArch64AsmBackend.cpp, line 624. when targeting iOS. See comment on the code review for reproducer. > This patch rearranges emission of CFI instructions, so the resulting > DWARF and `.eh_frame` information is precise at every instruction. > > The current state is that the unwind info is emitted only after the > function prologue. This is fine for synchronous (e.g. C++) exceptions, > but the information is generally incorrect when the program counter is > at an instruction in the prologue or the epilogue, for example: > > ``` > stp x29, x30, [sp, #-16]! // 16-byte Folded Spill > mov x29, sp > .cfi_def_cfa w29, 16 > ... > ``` > > after the `stp` is executed the (initial) rule for the CFA still says > the CFA is in the `sp`, even though it's already offset by 16 bytes > > A correct unwind info could look like: > ``` > stp x29, x30, [sp, #-16]! // 16-byte Folded Spill > .cfi_def_cfa_offset 16 > mov x29, sp > .cfi_def_cfa w29, 16 > ... > ``` > > Having this information precise up to an instruction is useful for > sampling profilers that would like to get a stack backtrace. The end > goal (towards this patch is just a step) is to have fully working > `-fasynchronous-unwind-tables`. > > Reviewed By: danielkiss, MaskRay > > Differential Revision: https://reviews.llvm.org/D111411 This reverts commit `32e8b550e5`.	2022-03-04 17:36:26 +01:00
Momchil Velikov	63c9aca12a	Revert "[AArch64] Async unwind - function epilogues" This reverts commit `74319d6794`. It causes test failures that look like infinite loop in asan/hwasan unwinding.	2022-03-02 15:01:57 +00:00
Momchil Velikov	74319d6794	[AArch64] Async unwind - function epilogues Counterpart of https://reviews.llvm.org/D111411 this change makes the unwind information instruction precise in function epilogues. Reviewed By: MaskRay Differential Revision: https://reviews.llvm.org/D112330	2022-03-02 13:15:11 +00:00
Momchil Velikov	32e8b550e5	[AArch64] Async unwind - function prologues This patch rearranges emission of CFI instructions, so the resulting DWARF and `.eh_frame` information is precise at every instruction. The current state is that the unwind info is emitted only after the function prologue. This is fine for synchronous (e.g. C++) exceptions, but the information is generally incorrect when the program counter is at an instruction in the prologue or the epilogue, for example: ``` stp x29, x30, [sp, #-16]! // 16-byte Folded Spill mov x29, sp .cfi_def_cfa w29, 16 ... ``` after the `stp` is executed the (initial) rule for the CFA still says the CFA is in the `sp`, even though it's already offset by 16 bytes A correct unwind info could look like: ``` stp x29, x30, [sp, #-16]! // 16-byte Folded Spill .cfi_def_cfa_offset 16 mov x29, sp .cfi_def_cfa w29, 16 ... ``` Having this information precise up to an instruction is useful for sampling profilers that would like to get a stack backtrace. The end goal (towards this patch is just a step) is to have fully working `-fasynchronous-unwind-tables`. Reviewed By: danielkiss, MaskRay Differential Revision: https://reviews.llvm.org/D111411	2022-02-28 13:37:57 +00:00
Florian Hahn	166968a892	[AArch64] Add test cases where zext can be lowered to series of tbl. Add a set of tests for upcoming patches that allow lowering vector zext using AArch64 tbl instructions instead of shifts.	2022-02-25 15:36:32 +00:00
Arthur Eubanks	687263183b	[test] Test domtree validity with -verify-dom-info instead of -analyze Verified that the test properly crashes without D16893's fix.	2022-02-09 16:00:18 -08:00
Nikita Popov	46f9e45ef0	[Statepoint] Update gc.statepoint calls in tests with elementtype (NFC) This updates tests for the LangRef change in D117890.	2022-02-04 14:15:41 +01:00
Sunho Kim	44601f4956	[AARCH64][NEON] Allow to sink operands for aarch64_neon_pmull This teaches AArch64TargetLowering::shouldSinkOperands to sink the operands of aarch64_neon_pmull intrinsic. Differential Revision: https://reviews.llvm.org/D117944	2022-02-03 16:46:49 +00:00
Micah Weston	93deac2e2b	[AArch64] Optimize add/sub with immediate through MIPeepholeOpt Fixes the build issue with D111034, whose goal was to optimize add/sub with long immediates. Optimize ([add\|sub] r, imm) -> ([ADD\|SUB] ([ADD\|SUB] r, #imm0, lsl #12), #imm1), if imm == (imm0<<12)+imm1. and both imm0 and imm1 are non-zero 12-bit unsigned integers. Optimize ([add\|sub] r, imm) -> ([SUB\|ADD] ([SUB\|ADD] r, #imm0, lsl #12), #imm1), if imm == -(imm0<<12)-imm1, and both imm0 and imm1 are non-zero 12-bit unsigned integers. The change which fixed the build issue in D111034 was the use of new virtual registers so that SSA form is maintained until deleting MI. Differential Revision: https://reviews.llvm.org/D117429	2022-01-22 12:39:22 +00:00
Florian Hahn	62476c7c14	Revert "[AArch64] Revive optimize add/sub with immediate through MIPeepholeOpt" This reverts commit `e6698f0992`. This commit appears to introduce new machine verifier failures when building the llvm-test-suite with `-mllvm -verify-machineinstrs` enabled: https://green.lab.llvm.org/green/job/test-suite-verify-machineinstrs-aarch64-O3/11061/ FAILED: MultiSource/Benchmarks/Olden/health/CMakeFiles/health.dir/health.c.o /Users/buildslave/jenkins/workspace/test-suite-verify-machineinstrs-aarch64-O3/test-suite-build/tools/timeit --summary MultiSource/Benchmarks/Olden/health/CMakeFiles/health.dir/health.c.o.time /Users/buildslave/jenkins/workspace/test-suite-verify-machineinstrs-aarch64-O3/compiler/bin/clang -DNDEBUG -B /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin -Wno-unused-command-line-argument -mllvm -verify-machineinstrs -O3 -arch arm64 -isysroot /Applications/Xcode.app/Contents/Developer/Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS13.5.sdk -w -Werror=date-time -DTORONTO -MD -MT MultiSource/Benchmarks/Olden/health/CMakeFiles/health.dir/health.c.o -MF MultiSource/Benchmarks/Olden/health/CMakeFiles/health.dir/health.c.o.d -o MultiSource/Benchmarks/Olden/health/CMakeFiles/health.dir/health.c.o -c /Users/buildslave/jenkins/workspace/test-suite-verify-machineinstrs-aarch64-O3/test-suite/MultiSource/Benchmarks/Olden/health/health.c * Bad machine code: Illegal virtual register for instruction * - function: alloc_tree - basic block: %bb.1 if.else (0x7fc0db8f8bb0) - instruction: %31:gpr64 = nsw MADDXrrr killed %39:gpr64sp, killed %25:gpr64, $xzr - operand 1: killed %39:gpr64sp Expected a GPR64 register, but got a GPR64sp register fatal error: error in backend: Found 1 machine code errors. PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace, preprocessed source, and associated run script. Stack dump: 0. Program arguments: /Users/buildslave/jenkins/workspace/test-suite-verify-machineinstrs-aarch64-O3/compiler/bin/clang -DNDEBUG -B /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin -Wno-unused-command-line-argument -mllvm -verify-machineinstrs -O3 -arch arm64 -isysroot /Applications/Xcode.app/Contents/Developer/Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS13.5.sdk -w -Werror=date-time -DTORONTO -MD -MT MultiSource/Benchmarks/Olden/health/CMakeFiles/health.dir/health.c.o -MF MultiSource/Benchmarks/Olden/health/CMakeFiles/health.dir/health.c.o.d -o MultiSource/Benchmarks/Olden/health/CMakeFiles/health.dir/health.c.o -c /Users/buildslave/jenkins/workspace/test-suite-verify-machineinstrs-aarch64-O3/test-suite/MultiSource/Benchmarks/Olden/health/health.c 1. <eof> parser at end of file 2. Code generation 3. Running pass 'Function Pass Manager' on module '/Users/buildslave/jenkins/workspace/test-suite-verify-machineinstrs-aarch64-O3/test-suite/MultiSource/Benchmarks/Olden/health/health.c'. 4. Running pass 'Verify generated machine code' on function '@alloc_tree' Stack dump without symbol names (ensure you have llvm-symbolizer in your PATH or set the environment var `LLVM_SYMBOLIZER_PATH` to point to it): 0 clang 0x000000011191896b llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) + 43 1 clang 0x00000001119179b5 llvm::sys::RunSignalHandlers() + 85 2 clang 0x00000001119180e2 llvm::sys::CleanupOnSignal(unsigned long) + 210 3 clang 0x0000000111849f6a (anonymous namespace)::CrashRecoveryContextImpl::HandleCrash(int, unsigned long) + 106 4 clang 0x0000000111849ee8 llvm::CrashRecoveryContext::HandleExit(int) + 24 5 clang 0x0000000111914acc llvm::sys::Process::Exit(int, bool) + 44 6 clang 0x000000010f4e9be9 LLVMErrorHandler(void, char const, bool) + 89 7 clang 0x0000000114eba333 llvm::report_fatal_error(llvm::Twine const&, bool) + 323 8 clang 0x0000000110d8c620 (anonymous namespace)::MachineVerifier::BBInfo::~BBInfo() + 0 9 clang 0x0000000110cdddca llvm::MachineFunctionPass::runOnFunction(llvm::Function&) + 378 10 clang 0x00000001110b0154 llvm::FPPassManager::runOnFunction(llvm::Function&) + 1092 11 clang 0x00000001110b6268 llvm::FPPassManager::runOnModule(llvm::Module&) + 72 12 clang 0x00000001110b074a llvm::legacy::PassManagerImpl::run(llvm::Module&) + 986 13 clang 0x0000000111c20ad4 clang::EmitBackendOutput(clang::DiagnosticsEngine&, clang::HeaderSearchOptions const&, clang::CodeGenOptions const&, clang::TargetOptions const&, clang::LangOptions const&, llvm::StringRef, llvm::Module, clang::BackendAction, std::__1::unique_ptr<llvm::raw_pwrite_stream, std::__1::default_delete<llvm::raw_pwrite_stream> >) + 3764 14 clang 0x0000000111f6dd31 clang::BackendConsumer::HandleTranslationUnit(clang::ASTContext&) + 1905 15 clang 0x00000001131a28b3 clang::ParseAST(clang::Sema&, bool, bool) + 643 16 clang 0x00000001122b02a4 clang::FrontendAction::Execute() + 84 17 clang 0x000000011222d6a9 clang::CompilerInstance::ExecuteAction(clang::FrontendAction&) + 873 18 clang 0x000000011232faf5 clang::ExecuteCompilerInvocation(clang::CompilerInstance) + 661 19 clang 0x000000010f4e9860 cc1_main(llvm::ArrayRef<char const>, char const, void) + 2544 20 clang 0x000000010f4e7168 ExecuteCC1Tool(llvm::SmallVectorImpl<char const>&) + 312 21 clang 0x00000001120ab187 void llvm::function_ref<void ()>::callback_fn<clang::driver::CC1Command::Execute(llvm::ArrayRef<llvm::Optional<llvm::StringRef> >, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, bool) const::$_1>(long) + 23 22 clang 0x0000000111849eb4 llvm::CrashRecoveryContext::RunSafely(llvm::function_ref<void ()>) + 228 23 clang 0x00000001120aac24 clang::driver::CC1Command::Execute(llvm::ArrayRef<llvm::Optional<llvm::StringRef> >, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, bool) const + 324 24 clang 0x000000011207b85d clang::driver::Compilation::ExecuteCommand(clang::driver::Command const&, clang::driver::Command const&) const + 221 25 clang 0x000000011207bdad clang::driver::Compilation::ExecuteJobs(clang::driver::JobList const&, llvm::SmallVectorImpl<std::__1::pair<int, clang::driver::Command const> >&) const + 125 26 clang 0x0000000112092f7c clang::driver::Driver::ExecuteCompilation(clang::driver::Compilation&, llvm::SmallVectorImpl<std::__1::pair<int, clang::driver::Command const> >&) + 204 27 clang 0x000000010f4e6977 main + 10375 28 libdyld.dylib 0x00007fff6be90cc9 start + 1 29 libdyld.dylib 0x0000000000000018 start + 18446603338705728336 clang-14: error: clang frontend command failed with exit code 70 (use -v to see invocation) clang version 14.0.0 (https://github.com/llvm/llvm-project.git `c90d136be4`) Target: arm64-apple-darwin19.5.0 Thread model: posix InstalledDir: /Users/buildslave/jenkins/workspace/test-suite-verify-machineinstrs-aarch64-O3/compiler/bin clang-14: note: diagnostic msg: *******************	2022-01-18 13:17:02 +00:00
Micah Weston	e6698f0992	[AArch64] Revive optimize add/sub with immediate through MIPeepholeOpt Fixes the build issue with D111034, whose goal was to optimize add/sub with long immediates. Optimize ([add\|sub] r, imm) -> ([ADD\|SUB] ([ADD\|SUB] r, #imm0, lsl #12), #imm1), if imm == (imm0<<12)+imm1. and both imm0 and imm1 are non-zero 12-bit unsigned integers. Optimize ([add\|sub] r, imm) -> ([SUB\|ADD] ([SUB\|ADD] r, #imm0, lsl #12), #imm1), if imm == -(imm0<<12)-imm1, and both imm0 and imm1 are non-zero 12-bit unsigned integers. The change which fixed the build issue in D111034 was the use of new virtual registers so that SSA form is maintained until deleting MI. Differential Revision: https://reviews.llvm.org/D117429	2022-01-17 17:17:15 +00:00
Simon Pilgrim	a3f50fb06d	[X86] isVectorShiftByScalarCheap - vXi8 select(shift(x,splat0),shift(x,splat1)) is better than shift(x,select(splat0,splat1)) Even though we don't have vXi8 vector shifts (apart from XOP), it is still better to prefer shift (or funnel-shift/rotate) by scalar where possible. https://llvm.godbolt.org/z/6ss6ffTxv Differential Revision: https://reviews.llvm.org/D116191	2021-12-23 14:30:02 +00:00
David Green	760d4d03d5	[AArch64] Sink splat shuffles to lane index intrinsics This teaches AArch64TargetLowering::shouldSinkOperands to sink splat shuffles to certain neon intrinsics, so that they can make use of the lane variants of the instructions that are available. Differential Revision: https://reviews.llvm.org/D112994	2021-11-22 08:11:35 +00:00
Ben Shi	59c3b48d99	Revert "[AArch64] Optimize add/sub with immediate" This reverts commit `3de3ca3137`.	2021-11-03 14:15:21 +08:00
Ben Shi	3de3ca3137	[AArch64] Optimize add/sub with immediate Optimize ([add\|sub] r, imm) -> ([ADD\|SUB] ([ADD\|SUB] r, #imm0, lsl #12), #imm1), if imm == (imm0<<12)+imm1. and both imm0 and imm1 are non-zero 12-bit unsigned integers. Optimize ([add\|sub] r, imm) -> ([SUB\|ADD] ([SUB\|ADD] r, #imm0, lsl #12), #imm1), if imm == -(imm0<<12)-imm1, and both imm0 and imm1 are non-zero 12-bit unsigned integers. Reviewed By: jaykang10, dmgreen Differential Revision: https://reviews.llvm.org/D111034	2021-11-03 03:06:43 +00:00
Fraser Cormack	eabf11f9ea	[CodeGenPrepare] Avoid a scalable-vector crash in ctlz/cttz This patch fixes a crash when despeculating ctlz/cttz intrinsics with scalable-vector types. It is not safe to speculatively get the size of the vector type in bits in case the vector type is not a fixed-length type. As it happens this isn't required as vector types are skipped anyway. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D112141	2021-10-20 16:45:55 +01:00
Ben Shi	d0dbc991c0	Revert "[AArch64] Optimize add/sub with immediate" This reverts commit `9bf6bef995`.	2021-10-16 22:17:18 +00:00
Ben Shi	9bf6bef995	[AArch64] Optimize add/sub with immediate Optimize ([add\|sub] r, imm) -> ([ADD\|SUB] ([ADD\|SUB] r, #imm0, lsl #12), #imm1), if imm == (imm0<<12)+imm1. and both imm0 and imm1 are non-zero 12-bit unsigned integers. Optimize ([add\|sub] r, imm) -> ([SUB\|ADD] ([SUB\|ADD] r, #imm0, lsl #12), #imm1), if imm == -(imm0<<12)-imm1, and both imm0 and imm1 are non-zero 12-bit unsigned integers. Reviewed By: jaykang10, dmgreen Differential Revision: https://reviews.llvm.org/D111034	2021-10-16 08:50:39 +00:00
David Green	adec922361	[AArch64] Make -mcpu=generic schedule for an in-order core We would like to start pushing -mcpu=generic towards enabling the set of features that improves performance for some CPUs, without hurting any others. A blend of the performance options hopefully beneficial to all CPUs. The largest part of that is enabling in-order scheduling using the Cortex-A55 schedule model. This is similar to the Arm backend change from `eecb353d0e` which made -mcpu=generic perform in-order scheduling using the cortex-a8 schedule model. The idea is that in-order cpu's require the most help in instruction scheduling, whereas out-of-order cpus can for the most part out-of-order schedule around different codegen. Our benchmarking suggests that hypothesis holds. When running on an in-order core this improved performance by 3.8% geomean on a set of DSP workloads, 2% geomean on some other embedded benchmark and between 1% and 1.8% on a set of singlecore and multicore workloads, all running on a Cortex-A55 cluster. On an out-of-order cpu the results are a lot more noisy but show flat performance or an improvement. On the set of DSP and embedded benchmarks, run on a Cortex-A78 there was a very noisy 1% speed improvement. Using the most detailed results I could find, SPEC2006 runs on a Neoverse N1 show a small increase in instruction count (+0.127%), but a decrease in cycle counts (-0.155%, on average). The instruction count is very low noise, the cycle count is more noisy with a 0.15% decrease not being significant. SPEC2k17 shows a small decrease (-0.2%) in instruction count leading to a -0.296% decrease in cycle count. These results are within noise margins but tend to show a small improvement in general. When specifying an Apple target, clang will set "-target-cpu apple-a7" on the command line, so should not be affected by this change when running from clang. This also doesn't enable more runtime unrolling like -mcpu=cortex-a55 does, only changing the schedule used. A lot of existing tests have updated. This is a summary of the important differences: - Most changes are the same instructions in a different order. - Sometimes this leads to very minor inefficiencies, such as requiring an extra mov to move variables into r0/v0 for the return value of a test function. - misched-fusion.ll was no longer fusing the pairs of instructions it should, as per D110561. I've changed the schedule used in the test for now. - neon-mla-mls.ll now uses "mul; sub" as opposed to "neg; mla" due to the different latencies. This seems fine to me. - Some SVE tests do not always remove movprfx where they did before due to different register allocation giving different destructive forms. - The tests argument-blocks-array-of-struct.ll and arm64-windows-calls.ll produce two LDR where they previously produced an LDP due to store-pair-suppress kicking in. - arm64-ldp.ll and arm64-neon-copy.ll are missing pre/postinc on LPD. - Some tests such as arm64-neon-mul-div.ll and ragreedy-local-interval-cost.ll have more, less or just different spilling. - In aarch64_generated_funcs.ll.generated.expected one part of the function is no longer outlined. Interestingly if I switch this to use any other scheduled even less is outlined. Some of these are expected to happen, such as differences in outlining or register spilling. There will be places where these result in worse codegen, places where they are better, with the SPEC instruction counts suggesting it is not a decrease overall, on average. Differential Revision: https://reviews.llvm.org/D110830	2021-10-09 15:58:31 +01:00
David Green	92128b7801	[AArch64] Regenerate even more tests This updates a few more check lines, in some mte tests that were close to auto generated already and some CodeGenPrepare/consthoist tests where being able to see the entire code sequence is useful for determining whether code differences are improvements or not.	2021-10-06 14:32:01 +01:00
Andrew Wei	c9066c5d37	[CGP] Fix the crash for combining address mode when having cyclic dependency In the combination of addressing modes, when replacing the matched phi nodes, sometimes the phi node to be replaced has been modified. For example, there’s matcher set [A, B] and [C, A], which will have cyclic dependency: A is replaced by B and C will be replaced by A. Because we tried to match new phi node to another new phi node, we should ignore new phi nodes when mapping new phi node to old one. Reviewed By: skatkov Differential Revision: https://reviews.llvm.org/D108635	2021-08-26 22:52:42 +08:00
Tiehu Zhang	9cfa9b44a5	[CodeGenPrepare] The instruction to be sunk should be inserted before its user in a block In current implementation, the instruction to be sunk will be inserted before the target instruction without considering the def-use tree, which may case Instruction does not dominate all uses error. We need to choose a suitable location to insert according to the use chain Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D107262	2021-08-17 18:58:15 +08:00
David Green	013030a0b2	[AArch64] Correct sinking of shuffles to adds/subs This was checking extends as shuffles, where as we should be checking the operands. This helps sink the shuffles, creating more addl/subl instructions. Differential Revision: https://reviews.llvm.org/D107623	2021-08-10 13:25:42 +01:00
David Green	3f74a68c35	[AArch64] Regenerate sink-free-instructions.ll. NFC	2021-08-10 13:25:42 +01:00
David Sherwood	8439415333	[IR] Let ConstantVector::getSplat use poison instead of undef This patch updates ConstantVector::getSplat to use poison instead of undef when using insertelement/shufflevector to splat. This follows on from D93793. Differential Revision: https://reviews.llvm.org/D107751	2021-08-10 08:27:43 +01:00
serge-sans-paille	4ab3041acb	Revert "[NFC] remove explicit default value for strboolattr attribute in tests" This reverts commit `bda6e5bee0`. See https://lab.llvm.org/buildbot/#/builders/109/builds/15424 for instance	2021-05-24 19:43:40 +02:00
serge-sans-paille	bda6e5bee0	[NFC] remove explicit default value for strboolattr attribute in tests Since `d6de1e1a71`, no attributes is quivalent to setting attribute to false. This is a preliminary commit for https://reviews.llvm.org/D99080	2021-05-24 19:31:04 +02:00
David Green	dd5c52029d	[CPG][ARM] Optimize towards branch on zero in codegenprepare This adds a simple fold into codegenprepare that converts comparison of branches towards comparison with zero if possible. For example: %c = icmp ult %x, 8 br %c, bla, blb %tc = lshr %x, 3 becomes %tc = lshr %x, 3 %c = icmp eq %tc, 0 br %c, bla, blb As a first order approximation, this can reduce the number of instructions needed to perform the branch as the shift is (often) needed anyway. At the moment this does not effect very much, as llvm tends to prefer the opposite form. But it can protect against regressions from commits like rG9423f78240a2. Simple cases of Add and Sub are added along with Shift, equally as the comparison to zero can often be folded with cpsr flags. Differential Revision: https://reviews.llvm.org/D101778	2021-05-16 17:54:06 +01:00
David Green	d539357e1b	[ARM] Extra branch on zero tests. NFC	2021-05-16 17:22:52 +01:00
Simon Pilgrim	2bb41851a1	[Utils] recognizeBSwapOrBitReverseIdiom - support matching from funnel shift roots (PR40058) We were missing bitreverse matches in cases where InstCombine had seen a byte-level rotation at the end of a bitreverse sequence (replacing or() with fshl()), hindering the exhaustive bitreverse matching in CodeGenPrepare later on.	2021-05-04 13:46:45 +01:00
Simon Pilgrim	e0dd708f40	[CodeGenPrepare][X86] Add bitreverse detection tests Initially only test for XOP which is the only thing that supports scalar bitreverse - we can add vector tests later.	2021-05-04 13:29:19 +01:00
Thomas Preud'homme	a6950c33e8	[test, ARM] Fix use of var defined in CHECK-NOT tries to check for the absence of a sequence of instructions with several CHECK-NOT with one of those directives using a variable defined in another. LLVM test CodeGenPrepare/ARM/sink-add-mul-shufflevector.ll tries to check for the absence of a sequence of instructions with several CHECK-NOT with one of those directives using a variable defined in another. However, CHECK-NOT are checked independently so that is using a variable defined in a pattern that should not occur in the input. The bug was then copied over in Transforms/CodeGenPrepare/ARM/sink-add-mul-shufflevector-inseltpoison.ll This commit removes the definition and uses of variable to check each line independently, making the check stronger than the current one. Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D99597	2021-03-30 16:28:13 +01:00
Jann Horn	202ae987d3	[test] Fix new CodeGenPrepare test for non-X86 systems The new test llvm/test/Transforms/CodeGenPrepare/remove-assume-block.ll breaks on non-X86 machines. Change it to look like the existing test llvm/test/Transforms/CodeGenPrepare/X86/delete-assume-dead-code.ll to fix it. Reviewed By: bkramer Differential Revision: https://reviews.llvm.org/D97952	2021-03-05 11:48:38 +01:00
Jann Horn	91c9dee3fb	[CodeGenPrepare] Eliminate llvm.expect before removing empty blocks CodeGenPrepare currently first removes empty blocks, then in a loop performs other optimizations. One of those optimizations is the removal of call instructions that invoke @llvm.assume, which can create new empty blocks. This means that when a branch only contains a call to __builtin_assume(), the empty branch will survive into MIR, and will then only be half-removed by MIR-level optimizations (e.g. removing the branch but leaving the condition intact). Fix it by eliminating @llvm.expect builtin calls before removing empty blocks. Reviewed By: bkramer Differential Revision: https://reviews.llvm.org/D97848	2021-03-04 14:48:26 +01:00
Jun Ma	54842fa0bb	[CodeGenPrepare] Also skip lifetime.end intrinsic when check return block in dupRetToEnableTailCallOpts. Differential Revision: https://reviews.llvm.org/D95424	2021-02-01 08:18:44 +08:00
Florian Hahn	292077072e	[Local] Treat calls that may not return as being alive. With the addition of the `willreturn` attribute, functions that may not return (e.g. due to an infinite loop) are well defined, if they are not marked as `willreturn`. This patch updates `wouldInstructionBeTriviallyDead` to not consider calls that may not return as dead. This patch still provides an escape hatch for intrinsics, which are still assumed as willreturn unconditionally. It will be removed once all intrinsics definitions have been reviewed and updated. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D94106	2021-01-23 16:05:14 +00:00
Juneyoung Lee	ae6e89327b	Precommit tests that have poison as shufflevector's placeholder This commit copies existing tests at llvm/Transforms containing 'shufflevector X, undef' and replaces them with 'shufflevector X, poison'. The new copied tests have -inseltpoison.ll suffix at its file name (as `db7a2f347f` did) See https://reviews.llvm.org/D93793 Test files listed using grep -R -E "^[^;]shufflevector <.> ., <.> undef" \| cut -d":" -f1 \| uniq Test files copied & updated using file_org=llvm/test/Transforms/$1 if [[ "$file_org" = -inseltpoison.ll ]]; then file=$file_org else file=${file_org%.ll}-inseltpoison.ll if [ ! -f $file ]; then cp $file_org $file fi fi sed -i -E 's/^([^;])shufflevector <(.)> (.), <(.)> undef/\1shufflevector <\2> \3, <\4> poison/g' $file head -1 $file \| grep "Assertions have been autogenerated by utils/update_test_checks.py" -q if [ "$?" == 1 ]; then echo "$file : should be manually updated" # The test is manually updated exit 1 fi python3 ./llvm/utils/update_test_checks.py --opt-binary=./build-releaseassert/bin/opt $file	2020-12-29 17:09:31 +09:00
Juneyoung Lee	db7a2f347f	Precommit transform tests that have poison as insertelement's placeholder This commit copies existing tests at llvm/Transforms and replaces 'insertelement undef' in those files with 'insertelement poison'. (see https://reviews.llvm.org/D93586) Tests listed using this script: grep -R -E '^[^;]insertelement <.> undef,' . \| cut -d":" -f1 \| uniq \| wc -l Tests updated: file_org=llvm/test/Transforms/$1 file=${file_org%.ll}-inseltpoison.ll cp $file_org $file sed -i -E 's/^([^;])insertelement <(.)> undef/\1insertelement <\2> poison/g' $file head -1 $file \| grep "Assertions have been autogenerated by utils/update_test_checks.py" -q if [ "$?" == 1 ]; then echo "$file : should be manually updated" # I manually updated the script exit 1 fi python3 ./llvm/utils/update_test_checks.py --opt-binary=./build-releaseassert/bin/opt $file	2020-12-24 11:46:17 +09:00
Paul Walker	6d35bd1d48	[CodeGenPrepare] Update optimizeGatherScatterInst for scalable vectors. optimizeGatherScatterInst does nothing specific to fixed length vectors but uses FixedVectorType to extract the number of elements. This patch simply updates the code to use VectorType and getElementCount instead. For testing I just copied Transforms/CodeGenPrepare/X86/gather-scatter-opt.ll replacing `<4 x ` with `<vscale x 4`. Differential Revision: https://reviews.llvm.org/D92572	2020-12-15 10:57:51 +00:00
Pan, Tao	7af802994e	[CodeGen] Add text section prefix for COFF object file Text section prefix is created in CodeGenPrepare, it's file format independent implementation, text section name is written into object file in TargetLoweringObjectFile, it's file format dependent implementation, port code of adding text section prefix to text section name from ELF to COFF. Different with ELF that use '.' as concatenation character, COFF use '$' as concatenation character. That is, concatenation character is variable, so split concatenation character from text section prefix. Text section prefix is existing feature of ELF, it can help to reduce icache and itlb misses, it's also make possible aggregate other compilers e.g. v8 created same prefix sections. Furthermore, the recent feature Machine Function Splitter (basic block level text prefix section) is based on text section prefix. Reviewed By: pengfei, rnk Differential Revision: https://reviews.llvm.org/D92073	2020-12-08 18:56:21 +08:00
Fangrui Song	2262b04cab	[test] Add explicit dso_local to constant/global variable declarations They are currently implicit because TargetMachine::shouldAssumeDSOLocal implies dso_local. For external data, clang -fno-pic emits the dso_local specifier for ELF and non-MinGW COFF. Adding explicit dso_local makes these tests in align with the clang behavior and helps implementing an option to use GOT indirection for external data access in -fno-pic mode (to avoid copy relocations).	2020-12-04 13:51:01 -08:00
Simon Pilgrim	8c4a86f790	[CodeGenPrepare] Remove unused check-prefixes	2020-11-09 13:12:39 +00:00
Yevgeny Rouban	88690a9658	[CodeGenPrepare] Fix zapping dead operands of assume This patch fixes a problem of the commit `52cc97a0`. A test case is created to demonstrate the crash caused by the instruction iterator invalidated by the recursive removal of dead operands of assume. The solution restarts from the blocks's first instruction in case CurInstIterator is invalidated by RecursivelyDeleteTriviallyDeadInstructions(). Reviewed By: bkramer Differential Revision: https://reviews.llvm.org/D87434	2020-09-14 11:46:34 +07:00
Craig Topper	b16e8687ab	[CodeGenPrepare][X86] Teach optimizeGatherScatterInst to turn a splat pointer into GEP with scalar base and 0 index This helps SelectionDAGBuilder recognize the splat can be used as a uniform base. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D86371	2020-09-02 20:44:12 -07:00
Craig Topper	ab83348a63	[X86][CGP] Add gather test cases for D86371.	2020-08-31 13:12:53 -07:00
Craig Topper	44133d9a08	[X86][CGP] Pre-commit test cases for D86371.	2020-08-31 10:48:56 -07:00
Benjamin Kramer	52cc97a0db	[CodeGenPrepare] Zap the argument of llvm.assume when deleting it We know that the argument is mostly likely dead, so we can purge it early. Otherwise it would make it to codegen, and can block further optimizations.	2020-08-28 20:52:22 +02:00
Philip Reames	1621c004da	[Tests] Be consistent w/definition of statepoint-example These tests use the statepoint-example builtin gc which expects address space #1 to the only non-integral address space. The fact the test used as=0 happened to work, but was caught by a downstream assert. (Literally years ago, I just happened to notice the XFAIL and fix it now.)	2020-08-14 20:45:48 -07:00

1 2 3 4 5 ...

343 Commits