clang-p2996

Author	SHA1	Message	Date
Mircea Trofin	fa99cb64ff	[mlgo][regalloc] Add score calculation for training Add the calculation of a score, which will be used during ML training. The score qualifies the quality of a regalloc policy, and is independent of what we train (currently, just eviction), or the regalloc algo itself. We can then use scores to guide training (which happens offline), by formulating a reward based on score variation - the goal being lowering scores (currently, that reward is percentage reduction relative to Greedy's heuristic) Currently, we compute the score by factoring different instruction counts (loads, stores, etc) with the machine basic block frequency, regardless of the instructions' provenance - i.e. they could be due to the regalloc policy or be introduced previously. This is different from RAGreedy::reportStats, which accummulates the effects of the allocator alone. We explored this alternative but found (at least currently) that the more naive alternative introduced here produces better policies. We do intend to consolidate the two, however, as we are actively investigating improvements to our reward function, and will likely want to re-explore scoring just the effects of the allocator. In either case, we want to decouple score calculation from allocation algorighm, as we currently evaluate it after a few more passes after allocation (also, because score calculation should be reusable regardless of allocation algorithm). We intentionally accummulate counts independently because it facilitates per-block reporting, which we found useful for debugging - for instance, we can easily report the counts indepdently, and then cross-reference with perf counter measurements. Differential Revision: https://reviews.llvm.org/D115195	2021-12-07 09:00:27 -08:00
spupyrev	f573f6866e	ext-tsp basic block layout A new basic block ordering improving existing MachineBlockPlacement. The algorithm tries to find a layout of nodes (basic blocks) of a given CFG optimizing jump locality and thus processor I-cache utilization. This is achieved via increasing the number of fall-through jumps and co-locating frequently executed nodes together. The name follows the underlying optimization problem, Extended-TSP, which is a generalization of classical (maximum) Traveling Salesmen Problem. The algorithm is a greedy heuristic that works with chains (ordered lists) of basic blocks. Initially all chains are isolated basic blocks. On every iteration, we pick a pair of chains whose merging yields the biggest increase in the ExtTSP value, which models how i-cache "friendly" a specific chain is. A pair of chains giving the maximum gain is merged into a new chain. The procedure stops when there is only one chain left, or when merging does not increase ExtTSP. In the latter case, the remaining chains are sorted by density in decreasing order. An important aspect is the way two chains are merged. Unlike earlier algorithms (e.g., based on the approach of Pettis-Hansen), two chains, X and Y, are first split into three, X1, X2, and Y. Then we consider all possible ways of gluing the three chains (e.g., X1YX2, X1X2Y, X2X1Y, X2YX1, YX1X2, YX2X1) and choose the one producing the largest score. This improves the quality of the final result (the search space is larger) while keeping the implementation sufficiently fast. Differential Revision: https://reviews.llvm.org/D113424	2021-12-07 07:31:10 -08:00
Paulo Matos	2fd634a5e3	[WebAssembly] Implement table instruction intrinsics This change implements intrinsics for table.grow, table.fill, table.size, and table.copy. Differential Revision: https://reviews.llvm.org/D113420	2021-12-07 13:25:59 +01:00
Fraser Cormack	40d51de5cb	[SelectionDAG] Use UnknownSize for VP memory ops In the style of D113888, this patch updates the various VP memory operations (load, store, gather, scatter) to use UnknownSize. This is for the same reason as for masked loads and stores: the number of elements accessed is not generally known at compile time. This is somewhat pessimistic in the sense that we may still find un-canonicalized intrinsics featuring both an all-true mask and an EVL equal to the vector size. Arguably those should be canonicalized before the SelectionDAG, so those have been left for future work. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D115036	2021-12-07 10:51:02 +00:00
Fraser Cormack	3460cc2585	[VP] Propagate align parameter attr on VP load/store to ISel This patch fixes a case where the 'align' parameter attribute on the pointer operands to llvm.vp.load and llvm.vp.store was being dropped during the conversion to the SelectionDAG. The default alignment equal to the ABI type alignment of the vector type was kept. It also updates the documentation to reflect the fact that the parameter attribute is now properly supported. Reviewed By: simoll Differential Revision: https://reviews.llvm.org/D114422	2021-12-07 10:16:16 +00:00
Sameer Sahasrabuddhe	0fe61ecc2c	CycleInfo: Introduce cycles as a generalization of loops LLVM loops cannot represent irreducible structures in the CFG. This change introduce the concept of cycles as a generalization of loops, along with a CycleInfo analysis that discovers a nested hierarchy of such cycles. This is based on Havlak (1997), Nesting of Reducible and Irreducible Loops. The cycle analysis is implemented as a generic template and then instatiated for LLVM IR and Machine IR. The template relies on a new GenericSSAContext template which must be specialized when used for each IR. This review is a restart of an older review request: https://reviews.llvm.org/D83094 Original implementation by Nicolai Hähnle <nicolai.haehnle@amd.com>, with recent refactoring by Sameer Sahasrabuddhe <sameer.sahasrabuddhe@amd.com> Differential Revision: https://reviews.llvm.org/D112696	2021-12-07 12:02:34 +05:30
Mircea Trofin	2bd7384d3a	[NFC][MachineInstr] Pass-by-value DebugLoc in CreateMachineInstr DebugLoc is cheap to move, passing it by-val rather than const ref to take advantage of the fact that it is consumed that way by the MachineInstr ctor, which creates some optimization oportunities. Differential Revision: https://reviews.llvm.org/D115208	2021-12-06 19:42:19 -08:00
Kai Luo	b206ee6906	[MachineVerifier] Make TiedOpsRewritten computable in MIRParser This patch is to address post-commit comment https://reviews.llvm.org/D80538#anchor-inline-1091625, which make the constraint stronger based on what https://reviews.llvm.org/D80538 does, i.e., "TiedOpsRewritten is set iff leave-ssa and all tied operands share the same register". Reviewed By: MatzeB Differential Revision: https://reviews.llvm.org/D114573	2021-12-07 02:25:15 +00:00
Mircea Trofin	615e374252	[NFC][MachineInstr] Rename some vars to conform to coding style	2021-12-06 17:19:11 -08:00
Nico Weber	3678326d28	Revert "ext-tsp basic block layout" This reverts commit `c68f71eb37`. Breaks tests on arm hosts, see comments on https://reviews.llvm.org/D113424	2021-12-06 19:08:20 -05:00
James Nagurne	cc3bb85580	[llvm][Hexagon] Generalize VLIWResourceModel, VLIWMachineScheduler, and ConvergingVLIWScheduler The Pre-RA VLIWMachineScheduler used by Hexagon is a relatively generic implementation that would make sense to use on other VLIW targets. This commit lifts those classes into their own header/source file with the root VLIWMachineScheduler. I chose this path rather than adding the strategy et al. into MachineScheduler to avoid bloating the file with other implementations. Target-specific behaviors have been captured and replicated through function overloads. - Added an overloadable DFAPacketizer creation member function. This is mainly done for our downstream, which has the capability to override the DFAPacketizer with custom implementations. This is an upstreamable TODO on our end. Currently, it always returns the result of TargetInstrInfo::CreateTargetScheduleState - Added an extra helper which returns the number of instructions in the current packet. This is used in our downstream, and may be useful elsewhere. - Placed the priority heuristic values into the ConvergingVLIWscheduler class instead of defining them as local statics in the implementation - Added a overridable helper in ConvergingVLIWScheduler so that targets can create their own VLIWResourceModel Differential Revision: https://reviews.llvm.org/D113150	2021-12-06 16:23:48 -06:00
spupyrev	c68f71eb37	ext-tsp basic block layout A new basic block ordering improving existing MachineBlockPlacement. The algorithm tries to find a layout of nodes (basic blocks) of a given CFG optimizing jump locality and thus processor I-cache utilization. This is achieved via increasing the number of fall-through jumps and co-locating frequently executed nodes together. The name follows the underlying optimization problem, Extended-TSP, which is a generalization of classical (maximum) Traveling Salesmen Problem. The algorithm is a greedy heuristic that works with chains (ordered lists) of basic blocks. Initially all chains are isolated basic blocks. On every iteration, we pick a pair of chains whose merging yields the biggest increase in the ExtTSP value, which models how i-cache "friendly" a specific chain is. A pair of chains giving the maximum gain is merged into a new chain. The procedure stops when there is only one chain left, or when merging does not increase ExtTSP. In the latter case, the remaining chains are sorted by density in decreasing order. An important aspect is the way two chains are merged. Unlike earlier algorithms (e.g., based on the approach of Pettis-Hansen), two chains, X and Y, are first split into three, X1, X2, and Y. Then we consider all possible ways of gluing the three chains (e.g., X1YX2, X1X2Y, X2X1Y, X2YX1, YX1X2, YX2X1) and choose the one producing the largest score. This improves the quality of the final result (the search space is larger) while keeping the implementation sufficiently fast. Differential Revision: https://reviews.llvm.org/D113424	2021-12-06 08:56:39 -08:00
Kazu Hirata	c4a8928b51	[CodeGen] Use range-based for loops (NFC)	2021-12-06 08:49:10 -08:00
Jack Andersen	f108c7f59d	[GlobalISel] Allow DBG_VALUE to use undefined vregs before LiveDebugValues. Expanding on D109750. Since `DBG_VALUE` instructions have final register validity determined in `LDVImpl::handleDebugValue`, there is no apparent reason to immediately prune unused register operands as their defs are erased. Consequently, this renders `MachineInstr::eraseFromParentAndMarkDBGValuesForRemoval` moot; gaining a substantial performance improvement. The only necessary changes involve making relevant passes consider invalid DBG_VALUE vregs uses as valid. Reviewed By: MatzeB Differential Revision: https://reviews.llvm.org/D112852	2021-12-05 15:55:59 -05:00
Michael Liao	b6ccca217c	Fix `-Wunused-variable` warning. NFC.	2021-12-05 13:40:35 -05:00
Kazu Hirata	1457e78352	[llvm] Use range-based for loops (NFC)	2021-12-05 08:33:02 -08:00
Kristina Bessonova	75b622a795	Reland [DwarfDebug] Support emitting function-local declaration for a lexical block This is another attempt to make function-local declarations (like static variables, structs/classes and other) be correctly emitted within a lexical (bracketed) block. Fixes https://bugs.llvm.org/show_bug.cgi?id=19238. Reviewed By: dblaikie Differential Revision: https://reviews.llvm.org/D113741	2021-12-05 13:56:45 +02:00
Kristina Bessonova	0ac75e82ff	Reland [DwarfDebug] Move emission of global vars, types and imports to endModule() This patch proposes to move emission of global variables, types, imported entities, etc from DwarfDebug::beginModule() to DwarfDebug::endModule(). Effectively, this changes nothing but the order of debug entities which will be as follows: * subprograms (including related context, local variables/labels, local imported entities; related types can be created as a part of the emission of local entities of an abstract subprogram); * global variables (including related context and types); * retained types and enums; * non-local-scoped imported entities; * basic types; * other types left (as a part of local variables attributes emission). Note that the order of emitted compile units may also be changed as now we emit units that contain subprograms first and then all other non-empty units. The motivation behind this change is the following: (1) DwarfDebug::beginModule() is run at the very beginning of backend's pipeline, from this time IR can be significantly changed by target-specific passes. If it happens for debug metadata of global entities, those changes will not be reflected in the emitted DWARF. (2) imported subprogram names should refer to an abstract subprogram if it exists, but it isn't known in DwarfDebug::beginModule() (it's possible to make some guesses based on location info, but it's not quite reliable); (3) aforementioned entities if they are scoped within a bracketed block (subject of D113741) couldn't be emitted in DwarfDebug::beginModule() (they need parent emitted first). Another problem is if to try to gather some information about local entities and defer their emission (till subprogram's processing or DwarfDebug::endModule()) all the gathered details might be irrelevant / invalid by the time the entities are being emitted (because of (1)). Reviewed By: dblaikie Differential Revision: https://reviews.llvm.org/D114705	2021-12-05 13:56:45 +02:00
David Green	57ff805a6d	[DAG] Create fptoui.sat from clamped fptosi As an extension to D111976, this converts clamp fptosi, clamped between 0 and (2^n)-1 to a fptoui.sat. This can greatly help on targets with conversions that naturally saturate, such as Arm. X86 disables the transform as some of the test cases increases in size. A fptoui.sat necessitates a fp clamp without native support, so there is little use in converting if the instruction is just going to be expanded. Differential Revision: https://reviews.llvm.org/D112428	2021-12-05 09:25:52 +00:00
Kazu Hirata	ca2f53897a	[CodeGen] Use range-based for loops (NFC)	2021-12-04 08:48:05 -08:00
Kristina Bessonova	a961604819	Revert "[DwarfDebug] Support emitting function-local declaration for a lexical block" This reverts commits * `ee691970a9` (D113741), * `79d3132998` (D114705) due to lldb and dexter test failures.	2021-12-04 18:06:57 +02:00
Kristina Bessonova	ee691970a9	[DwarfDebug] Support emitting function-local declaration for a lexical block This is another attempt to make function-local declarations (like static variables, structs/classes and other) be correctly emitted within a lexical (bracketed) block. Fixes https://bugs.llvm.org/show_bug.cgi?id=19238. Differential Revision: https://reviews.llvm.org/D113741	2021-12-04 17:12:47 +02:00
Kristina Bessonova	79d3132998	[DwarfDebug] Move emission of global vars, types and imports to endModule() This patch proposes to move emission of global variables, types, imported entities, etc from DwarfDebug::beginModule() to DwarfDebug::endModule(). Effectively, this changes nothing but the order of debug entities which will be as follows: * subprograms (including related context, local variables/labels, local imported entities; related types can be created as a part of the emission of local entities of an abstract subprogram); * global variables (including related context and types); * retained types and enums; * non-local-scoped imported entities; * basic types; * other types left (as a part of local variables attributes emission). Note that the order of emitted compile units may also be changed as now we emit units that contain subprograms first and then all other non-empty units. The motivation behind this change is the following: (1) DwarfDebug::beginModule() is run at the very beginning of backend's pipeline, from this time IR can be significantly changed by target-specific passes. If it happens for debug metadata of global entities, those changes will not be reflected in the emitted DWARF. (2) imported subprogram names should refer to an abstract subprogram if it exists, but it isn't known in DwarfDebug::beginModule() (it's possible to make some guesses based on location info, but it's not quite reliable); (3) aforementioned entities if they are scoped within a bracketed block (subject of D113741) couldn't be emitted in DwarfDebug::beginModule() (they need parent emitted first). Another problem is if to try to gather some information about local entities and defer their emission (till subprogram's processing or DwarfDebug::endModule()) all the gathered details might be irrelevant / invalid by the time the entities are being emitted (because of (1)). Reviewed By: dblaikie Differential Revision: https://reviews.llvm.org/D114705	2021-12-04 14:10:01 +02:00
Kazu Hirata	3aed282257	[CodeGen] Use range-based for loops (NFC)	2021-12-03 20:45:59 -08:00
Simon Pilgrim	ebf5271918	[DAG] PromoteIntRes_FunnelShift - rename shift Amount variable to Amt to prevent line overflow. NFC.	2021-12-03 17:24:45 +00:00
Stephen Tozer	98a021fcbf	[DebugInfo] Attempt to preserve more information during tail duplication Prior to this patch, tail duplication handled debug info poorly - specifically, debug instructions would be dropped instead of being set undef, potentially extending the lifetimes of prior debug values that should be killed. The pass was also very aggressive with dropping debug info, dropping debug info even when the SSA value it referred to was still present. This patch attempts to handle debug info more carefully, checking to see whether each affected debug value can still be live, setting it undef if not. Reviewed By: jmorse Differential Revision: https://reviews.llvm.org/D106875	2021-12-03 15:30:05 +00:00
David Green	255ad73424	[ARM] Make MVE v2i1 predicates legal MVE can treat v16i1, v8i1, v4i1 and v2i1 as different views onto the same 16bit VPR.P0 register, with v2i1 holding two 8 bit values for the two halves. This was never treated as a legal type in llvm in the past as there are not many 64bit instructions and no 64bit compares. There are a few instructions that could use it though, notably a VSELECT (as it can handle any size using the underlying v16i8 VPSEL), AND/OR/XOR for similar reasons, some gathers/scatter and long multiplies and VCTP64 instructions. This patch goes through and makes v2i1 a legal type, handling all the cases that fall out of that. It also makes VSELECT legal for v2i64 as a side benefit. A lot of the codegen changes as a result - usually in way that is a little better or a little worse, but still expensive. Costs can change a little too in the process, again in a way that expensive things remain expensive. A lot of the tests that changed are mainly to ensure correctness - the code can hopefully be improved in the future where it comes up in practice. The intrinsics currently remain using the v4i1 they previously did to emulate a v2i1. This will be changed in a followup patch but this one was already large enough. Differential Revision: https://reviews.llvm.org/D114449	2021-12-03 14:05:41 +00:00
Jay Foad	d133a21b71	[SelectionDAG] Add newline to a debug message	2021-12-03 13:33:32 +00:00
Simon Pilgrim	6803d08c38	[DAG][PowerPC] Enable initial ISD::BITCAST SimplifyDemandedBits/SimplifyMultipleUseDemandedBits big-endian handling This patch begins extending handling for peeking through bitcast nodes to big-endian targets as well as the existing little-endian case. Differential Revision: https://reviews.llvm.org/D114676	2021-12-02 11:47:53 +00:00
Omer Aviram	617ad14060	[SelectionDAG] Add pattern to haveNoCommonBitsSet Correctly identify the following pattern, which has no common bits: (X & ~M) op (Y & M). Differential Revision: https://reviews.llvm.org/D113970	2021-12-01 12:04:04 -05:00
Simon Pilgrim	19d34f6e95	[X86] combinePMULH - recognise 'cheap' trunctions via PACKS/PACKUS as well as SEXT/ZEXT combinePMULH currently only truncates vXi32/vXi64 multiplies to PMULHW/PMULUW if the source operands are SEXT/ZEXT instructions for a 'free' truncation. But we can generalize this to any source operand with sufficient leading sign/zero bits that would allow PACKS/PACKUS to be used as a 'cheap' truncation. This helps us avoid the wider multiplies, in exchange for truncation on both source operands instead of the result. Differential Revision: https://reviews.llvm.org/D113371	2021-12-01 16:37:49 +00:00
Ties Stuij	f5f28d5b0c	[ARM] Implement BTI placement pass for PACBTI-M This patch implements a new MachineFunction in the ARM backend for placing BTI instructions. It is similar to the existing AArch64 aarch64-branch-targets pass. BTI instructions are inserted into basic blocks that: - Have their address taken - Are the entry block of a function, if the function has external linkage or has its address taken - Are mentioned in jump tables - Are exception/cleanup landing pads Each BTI instructions is placed in the beginning of a BB after the so-called meta instructions (e.g. exception handler labels). Each outlining candidate and the outlined function need to be in agreement about whether BTI placement is enabled or not. If branch target enforcement is disabled for a function, the outliner should not covertly enable it by emitting a call to an outlined function, which begins with BTI. The cost mode of the outliner is adjusted to account for the extra BTI instructions in the outlined function. The ARM Constant Islands pass will maintain the count of the jump tables, which reference a block. A `BTI` instruction is removed from a block only if the reference count reaches zero. PAC instructions in entry blocks are replaced with PACBTI instructions (tests for this case will be added in a later patch because the compiler currently does not generate PAC instructions). The ARM Constant Island pass is adjusted to handle BTI instructions correctly. Functions with static linkage that don't have their address taken can still be called indirectly by linker-generated veneers and thus their entry points need be marked with BTI or PACBTI. The changes are tested using "LLVM IR -> assembly" tests, jump tables also have a MIR test. Unfortunately it is not possible add MIR tests for exception handling and computed gotos because of MIR parser limitations. This patch is part of a series that adds support for the PACBTI-M extension of the Armv8.1-M architecture, as detailed here: https://community.arm.com/arm-community-blogs/b/architectures-and-processors-blog/posts/armv8-1-m-pointer-authentication-and-branch-target-identification-extension The PACBTI-M specification can be found in the Armv8-M Architecture Reference Manual: https://developer.arm.com/documentation/ddi0553/latest The following people contributed to this patch: - Mikhail Maltsev - Momchil Velikov - Ties Stuij Reviewed By: ostannard Differential Revision: https://reviews.llvm.org/D112426	2021-12-01 12:54:05 +00:00
Bradley Smith	0eb1efb92c	[DAGCombiner] When combining REM ensure optimized div nodes are unique The REM DAG combine uses the visitDivLike functions to try and get an optimized DIV node to provide better codegen, however in some cases this visitDivLike call ends up in the BuildSDIVPow2 target hook, which in turn sometimes will return the same node passed in to indicate not to change it. The REM DAG combine does not anticipate this and creates a cycle in the DAG because of it. Fix this by ensuring any such optimized div node returned is distinct from the node being combined. Differential Revision: https://reviews.llvm.org/D114716	2021-12-01 11:24:26 +00:00
Simon Pilgrim	9981dd142f	[DAG] Apply clang-format to visitMSTORE + visitMLOAD. NFC. Reduce diff in D114582	2021-12-01 11:23:47 +00:00
Qiu Chaofan	15826eb437	[Legalizer] Avoid expansion to BR_CC if illegal Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D110616	2021-12-01 12:22:21 +08:00
Mircea Trofin	a503cb00d1	[NFC][regalloc] Factor accesses to ExtraRegInfo We'll move ExtraRegInfo to the RegAllocEvictionAdvisor subsequently. This change prepares for that by factoring all accesses. RFC: https://lists.llvm.org/pipermail/llvm-dev/2021-November/153639.html Differential Revision: https://reviews.llvm.org/D114759	2021-11-30 15:10:49 -08:00
David Green	9e8a71caf0	[DAG] Create fptosi.sat from clamped fptosi This adds a fold in DAGCombine to create fptosi_sat from sequences for smin(smax(fptosi(x))) nodes, where the min/max saturate the output of the fp convert to a specific bitwidth (say INT_MIN and INT_MAX). Because it is dealing with smin(/smax) in DAG they may currently be ISD::SMIN, ISD::SETCC/ISD::SELECT, ISD::VSELECT or ISD::SELECT_CC nodes which need to be handled similarly. A shouldConvertFpToSat method was added to control when converting may be profitable. The original fptosi will have a less strict semantics than the fptosisat, with less values that need to produce defined behaviour. This especially helps on ARM/AArch64 where the vcvt instructions naturally saturate the result. Differential Revision: https://reviews.llvm.org/D111976	2021-11-30 15:29:14 +00:00
Hans Wennborg	a87782c34d	Revert "[DAG] Create fptosi.sat from clamped fptosi" It causes builds to fail with this assert: llvm/include/llvm/ADT/APInt.h:990: bool llvm::APInt::operator==(const llvm::APInt &) const: Assertion `BitWidth == RHS.BitWidth && "Comparison requires equal bit widths"' failed. See comment on the code review. > This adds a fold in DAGCombine to create fptosi_sat from sequences for > smin(smax(fptosi(x))) nodes, where the min/max saturate the output of > the fp convert to a specific bitwidth (say INT_MIN and INT_MAX). Because > it is dealing with smin(/smax) in DAG they may currently be ISD::SMIN, > ISD::SETCC/ISD::SELECT, ISD::VSELECT or ISD::SELECT_CC nodes which need > to be handled similarly. > > A shouldConvertFpToSat method was added to control when converting may > be profitable. The original fptosi will have a less strict semantics > than the fptosisat, with less values that need to produce defined > behaviour. > > This especially helps on ARM/AArch64 where the vcvt instructions > naturally saturate the result. > > Differential Revision: https://reviews.llvm.org/D111976 This reverts commit `52ff3b0093`.	2021-11-30 15:36:56 +01:00
Jeremy Morse	3c04507088	[DebugInfo] Turn instruction referencing on by default for x86 This patch is designed to be reverted -- it activates a reasonably large block of new-ish code, so some turbulence is likely. Instruction referencing is best summarised, and it being on-by-default, is discussed here: https://lists.llvm.org/pipermail/llvm-dev/2021-November/153653.html Differential Revision: https://reviews.llvm.org/D114631	2021-11-30 13:44:07 +00:00
Jeremy Morse	651122fc4a	[DebugInfo][InstrRef] Pre-land on-by-default-for-x86 changes Over in D114631 and [0] there's a plan for turning instruction referencing on by default for x86. This patch adds / removes all the relevant bits of code, with the aim that the final patch is extremely small, for an easy revert. It should just be a condition in CommandFlags.cpp and removing the XFail on instr-ref-flag.ll. [0] https://lists.llvm.org/pipermail/llvm-dev/2021-November/153653.html	2021-11-30 12:40:59 +00:00
Jeremy Morse	8dda516b83	[DebugInfo][InstrRef] Avoid dropping fragment info during PHI elimination InstrRefBasedLDV used to crash on the added test -- the exit block is not in scope for the variable being propagated, but is still considered because it contains an assignment. The failure-mode was vlocJoin ignoring assign-only blocks and not updating DIExpressions, but pickVPHILoc would still find a variable location for it. That led to DBG_VALUEs created with the wrong fragment information. Fix this by removing a filter inherited from VarLocBasedLDV: vlocJoin will now consider assign-only blocks and will update their expressions. Differential Revision: https://reviews.llvm.org/D114727	2021-11-30 11:32:31 +00:00
David Green	52ff3b0093	[DAG] Create fptosi.sat from clamped fptosi This adds a fold in DAGCombine to create fptosi_sat from sequences for smin(smax(fptosi(x))) nodes, where the min/max saturate the output of the fp convert to a specific bitwidth (say INT_MIN and INT_MAX). Because it is dealing with smin(/smax) in DAG they may currently be ISD::SMIN, ISD::SETCC/ISD::SELECT, ISD::VSELECT or ISD::SELECT_CC nodes which need to be handled similarly. A shouldConvertFpToSat method was added to control when converting may be profitable. The original fptosi will have a less strict semantics than the fptosisat, with less values that need to produce defined behaviour. This especially helps on ARM/AArch64 where the vcvt instructions naturally saturate the result. Differential Revision: https://reviews.llvm.org/D111976	2021-11-30 11:05:32 +00:00
Abinav Puthan Purayil	bc5dbb0bae	[GlobalISel] Add matchers for constant splat. This change exposes isBuildVectorConstantSplat() to the llvm namespace and uses it to implement the constant splat versions of m_SpecificICst(). CombinerHelper::matchOrShiftToFunnelShift() can now work with vector types and CombinerHelper::matchMulOBy2()'s match for a constant splat is simplified. Differential Revision: https://reviews.llvm.org/D114625	2021-11-30 15:18:50 +05:30
Guozhi Wei	f1d8345a2a	[TwoAddressInstructionPass] Create register mapping for registers with multiple uses in the current MBB Currently we create register mappings for registers used only once in current MBB. For registers with multiple uses, when all the uses are in the current MBB, we can also create mappings for them similarly according to the last use. For example %reg101 = ... = ... reg101 %reg103 = ADD %reg101, %reg102 We can create mapping between %reg101 and %reg103. Differential Revision: https://reviews.llvm.org/D113193	2021-11-29 19:01:59 -08:00
Mircea Trofin	e8b8304d76	[NFC][Regalloc] Split canEvictInterference into hint and general There are 2 eviction queries. One is made by tryAssign, when it attempts to free an interference occupying the hint of the candidate. The other is during 'regular' interference resolution, where we scan over all physical registers and try to see if we can evict live ranges in favor of the candidate. We currently use the same logic in both cases, just that the former never passes the cost to any subsequent query. Technically, the 2 decisions could be implemented with different policies. This patch splits the 2. RFC: https://lists.llvm.org/pipermail/llvm-dev/2021-November/153639.html Differential Revision: https://reviews.llvm.org/D114019	2021-11-29 16:04:03 -08:00
Jeremy Morse	0eee844539	[DebugInfo][InstrRef] Terminate overlapping variable fragments If we have a variable where its fragments are split into overlapping segments: DBG_VALUE $ax, $noreg, !123, !DIExpression(DW_OP_LLVM_fragment_0, 16) ... DBG_VALUE $eax, $noreg, !123, !DIExpression(DW_OP_LLVM_fragment_0, 32) we should only propagate the most recently assigned fragment out of a block. LiveDebugValues only deals with live-in variable locations, as overlaps within blocks is DbgEntityHistoryCalculators domain. InstrRefBasedLDV has kept the accumulateFragmentMap method from VarLocBasedLDV, we just need it to recognise DBG_INSTR_REFs. Once it's produced a mapping of variable / fragments to the overlapped variable / fragments, VLocTracker uses it to identify when a debug instruction needs to terminate the other parts it overlaps with. The test is updated for some standard "InstrRef picks different registers" variation, and the order of some unrelated DBG_VALUEs changes. Differential Revision: https://reviews.llvm.org/D114603	2021-11-29 23:37:20 +00:00
Jeremy Morse	a20987adf4	[DebugInfo][InstrRef] Add indirection from dbg.declare in SelectionDAG Usually dbg.declares get translated into either entries in an MF side-table, or a DBG_VALUE on entry to the function with IsIndirect set (including in instruction referencing mode). Much rarer is a dbg.declare attached to a non-argument value, such as in the test added in this patch where there's a variable-length-array. Such dbg.declares become SDDbgValue nodes with InIndirect=true. As it happens, we weren't correctly emitting DBG_INSTR_REFs with the additional indirection. This patch adds the extra indirection, encoded as adding an additional DW_OP_deref to the expression. Differential Revision: https://reviews.llvm.org/D114440	2021-11-29 22:24:19 +00:00
Jeremy Morse	9cf31b8d39	[DebugInfo][InstrRef] Preserve properties of restored variables InstrRefBasedLDV observes when variable locations are clobbered, scans what values are available in the machine, and re-issues a DBG_VALUE for the variable if it can find another location. Unfortunately, I hadn't joined up the Indirectness flag, so if it did this to an Indirect Value, the indirectness would be dropped. Fix this, and add a test that if we clobber a variable value (on the stack in this case), then the recovered variable location keeps the Indirect flag. Differential Revision: https://reviews.llvm.org/D114378	2021-11-29 21:57:24 +00:00
Kazu Hirata	f240e528ce	[llvm] Use range-based for loops (NFC)	2021-11-29 09:04:44 -08:00
Mirko Brkusanin	0dd570ff56	[AMDGPU][GlobalISel] Transform (fsub (fpext (fneg (fmul x, y))), z) -> (fneg (fma (fpext x), (fpext y), z)) Patch by: Mateja Marjanovic Differential Revision: https://reviews.llvm.org/D98050	2021-11-29 16:27:22 +01:00

1 2 3 4 5 ...

31598 Commits