Commit Graph

34376 Commits

Author SHA1 Message Date
Jay Foad
2dcf051259 [CodeGen] Store call frame size in MachineBasicBlock
Record the call frame size on entry to each basic block. This is usually
zero except when a basic block has been split in the middle of a call
sequence.

This simplifies PEI::replaceFrameIndices which previously had to visit
basic blocks in a specific order and had special handling for
unreachable blocks. More importantly it paves the way for an equally
simple implementation of a backwards version of replaceFrameIndices,
which is required to fully convert PrologEpilogInserter to backwards
register scavenging, which is preferred because it does not rely on
accurate kill flags.

Differential Revision: https://reviews.llvm.org/D156113
2023-07-27 10:32:00 +01:00
Vitaly Buka
a496c8be6e Revert "[CodeGen]Allow targets to use target specific COPY instructions for live range splitting"
And dependent commits.

Details in D150388.

This reverts commit 825b7f0ca5.
This reverts commit 7a98f084c4.
This reverts commit b4a62b1fa5.
This reverts commit b7836d8562.

No conflicts in the code, few tests had conflicts in autogenerated CHECKs:
llvm/test/CodeGen/Thumb2/mve-float32regloops.ll
llvm/test/CodeGen/AMDGPU/fix-frame-reg-in-custom-csr-spills.ll

Reviewed By: alexfh

Differential Revision: https://reviews.llvm.org/D156381
2023-07-26 22:13:32 -07:00
Sameer Sahasrabuddhe
b14e30f10d [LLVM] refactor GenericSSAContext and its specializations
Fix the GenericSSAContext template so that it actually declares all the
necessary typenames and the methods that must be implemented by its
specializations SSAContext and MachineSSAContext.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D156288
2023-07-27 09:54:50 +05:30
Pranav Kant
6f305e0658 [DAGCombiner] Limit graph traversal to cap compile times
hasPredecessorHelper method, that is used by DAGCombiner to combine to pre-indexed and post-indexed load/stores, is a major source of slowdown while compiling a large function with MSan enabled on Arm. This patch caps the DFS-graph traversal for this method to 8192 which cuts compile time by 50% (4m -> 2m compile time) at the cost of less overall nodes combined.

Here's the summary of pre-index DAG nodes created and time it took to compile the pathological case with different MaxDepth limit:
1. With MaxDepth = 0 (unlimited): 1800, took 4m
2. With MaxDepth = 32k, 560, took 2m31s
3. With MaxDepth = 8k, 139, took 2m.

Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D154885
2023-07-26 17:29:38 +00:00
DianQK
30f2170a78 Revert "[DebugInfo] Fix potential CU mismatch for attachRangesOrLowHighPC"
This reverts commit d20e4a1d68.
After committing 2ee4d0386c, We don't support subprogram definitions nested within `DICompositeType` when doing LTO builds.
For a detailed discussion, see https://reviews.llvm.org/D152095.
2023-07-26 19:58:00 +08:00
Zhongyunde
05aae0839f Reland [AArch64][NFC] Call the API getVScaleRange directly
Use the maximum 64 for BitWidth of getVScaleRange to avoid returning an empty range.

the previous changes bring in a Buildbot failure because MinSVEVectorSize = MinSVEVectorSize.
    error: explicitly assigning value of variable of type 'unsigned int' to itself [-Werror,-Wself-assign]

Reviewed By: sdesmalen, nikic, dmgreen
Differential Revision: https://reviews.llvm.org/D155708
2023-07-26 18:55:31 +08:00
Jay Foad
6fcad9cf93 [DAGCombiner] Simplify foldAndOrOfSETCC. NFC.
Pull out repeated hasOneUse checks. Simplify some conditions. Reduce
indentation.

Differential Revision: https://reviews.llvm.org/D156220
2023-07-26 10:22:55 +01:00
Zhongyunde
ebaac2b2d6 Revert "[AArch64][NFC] Call the API getVScaleRange directly"
This reverts commit 67005c8e6f.
2023-07-26 16:44:14 +08:00
Zhongyunde
67005c8e6f [AArch64][NFC] Call the API getVScaleRange directly
Use the maximum 64 for BitWidth of getVScaleRange to
avoid returning an empty range.

Reviewed By: sdesmalen, nikic, dmgreen
Differential Revision: https://reviews.llvm.org/D155708
2023-07-26 15:54:04 +08:00
esmeyi
e83b8a5e71 [XCOFF] Enable available_externally linkage for functions.
Summary: D80642 added support for emitting AvailableExternally Linkage on AIX. However, an assertion of "Trying to get csect representation of this symbol but none was set." occurred when a function is declared as available_externally. This is due to we missing to generate a csect for the function. This patch fixes it.

Reviewed By: hubert.reinterpretcast, shchenz

Differential Revision: https://reviews.llvm.org/D156213

Signed-off-by: Esme Yi <esme.yi@ibm.com>
2023-07-25 22:47:11 -04:00
Qi Hu
ddd7d35c6c [RegAlloc] Fix assertion failure caused by inline assembly
When inline assembly code requests more registers than available, the
MachineInstr::emitError function in the RegAllocFast pass emits an error
but doesn't stop the pass, and then the compiler crashes later with an
assertion failure. This commit, mimicking the RegAllocGreedy pass, assigns
a random physical register, and therefore avoids the crash after producing
the diagnostic. This problem has been observed for both rustc and clang,
while it doesn't occur in gcc.
2023-07-25 19:21:03 -04:00
Craig Topper
1f5a1b8952 [DAGCombiner] Minor improvements to foldAndOrOfSETCC. NFC
Reduce the scope of some variables.
Replace an if with an assertion.

Reviewed By: kmitropoulou

Differential Revision: https://reviews.llvm.org/D156140
2023-07-25 00:20:06 -07:00
Matt Arsenault
0d797b71eb RegisterCoaleser: Fix empty subrange verifier error
In this example an implicit def had live-out undef subrange
defs. After coalescing with the def from a previous block, the
undef-defed lanes are no longer live out of the block in the new
interval. An empty subrange was tenatively created for these lanes,
but it must be deleted.
2023-07-24 12:18:34 -04:00
Matt Arsenault
2a53b6c06b RegisterCoalescer: Fix verifier error on redef of subregister for live out implicit_defs
A live out implicit_def wasn't deleted, but the subranges weren't
correctly updated. The main range was correct but the def
corresponding to the initial main range def instruction was missing
from the lanes redefined in another block.

The written lanes are not quite the same as the valid lanes in the
case of an implicit_def.

Fixes verifier error in blender. There is an additional verifier in
some of the testcase variants where an empty subrange remains.
2023-07-24 12:18:34 -04:00
WANG Rui
595d5f36f4 [DAGCombine] Canonicalize operands for visitANDLike
During the construction of SelectionDAG, there are no explicit canonicalization rules to adjust the order of operands for AND nodes.  This may prevent the optimization in DAGCombiner::visitANDLike from being triggered. This patch canonicalizes the operands before matches, which can be observed to improve optimization on the RISC-V target architecture.

Canonicalize:
```
and(x, add) -> and(add, x)
```

Signed-off-by: WANG Rui <wangrui@loongson.cn>

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D154760
2023-07-24 16:52:04 +08:00
Antonio Frighetto
2dea969d83 [clang][CodeGen] Introduce -frecord-command-line for MachO
Allow clang driver command-line recording when
targeting MachO object files as well.

Reviewed-by: sgraenitz

Differential Revision: https://reviews.llvm.org/D155716
2023-07-24 09:24:59 +02:00
David Green
6edc9a7662 [AArch64][GISel] Additional FPExt vector lowering
Similar to D155311, this adds lowering for more vector cases for FPExt

Differential Revision: https://reviews.llvm.org/D155601
2023-07-23 16:58:13 +01:00
Amaury Séchet
88452508f3 [DAG] Improve carry reconstruction in combineCarryDiamond.
The gain is usually suffiscient to go the extra mile and reconstruct a carry in some cases.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D154533
2023-07-22 22:49:48 +00:00
Craig Topper
a815f03f9b [LegalizeTypes] Use report_fatal_error instead of llvm_unreachable in the default case of some type legalization handlers.
These can be triggered by in various ways when intrinsics are used wrong or a target doesn't correctly
not support something. Using a fatal error prevents strange behavior
like infinite loops.

We already do this for some of the vector type legalization handles.
2023-07-22 11:05:24 -07:00
Daniel Hoekwater
0315fca912 [AArch64] Move branch relaxation after bbsection assignment
Because branch relaxation needs to factor in if branches target
a block in the same section or a different one, it needs to run
after the Basic Block Sections / Machine Function Splitting passes.

Because Jump table compression relies on block offsets remaining
fixed after the table is compressed, we must also move the JT
compression pass.

The only tests affected are ones enforcing just the ordering and
the a few that have basic block ids changed because RenumberBlocks
hasn't run yet.

Differential Revision: https://reviews.llvm.org/D153829
2023-07-21 20:24:52 +00:00
Simon Pilgrim
ae60706da0 [DAG] SimplifyDemandedBits - call ComputeKnownBits for constant non-uniform ISD::SRL shift amounts
We only attempted to determine KnownBits for uniform constant shift amounts, but ComputeKnownBits is able to handle some non-uniform cases as well that we can use as a fallback.
2023-07-21 14:52:57 +01:00
Jingu Kang
351b4c17dd Revert "[MachineLICM] Handle Subloops"
This reverts commit 50dd383d08.
2023-07-20 17:12:25 +01:00
Jingu Kang
50dd383d08 [MachineLICM] Handle Subloops
Following discussion on https://reviews.llvm.org/D154205, make MachineLICM pass
handle subloops with only visiting outmost loop's blocks once.

Differential Revision: https://reviews.llvm.org/D154205
2023-07-20 16:39:13 +01:00
Danila Malyutin
e1aa4e7b38 [Statepoint] Use correct RegisterClass for spilling
Copy propagation might have changed the register class of the register

Differential Revision: https://reviews.llvm.org/D155792
2023-07-20 16:00:00 +03:00
Simon Pilgrim
7567b72f4d [DAG] ShrinkDemandedConstant - early-out for empty DemandedBits/Elts
Leave this to constant folding in SimplifyDemandedBits

Fixes #63975
2023-07-20 12:18:10 +01:00
Simon Pilgrim
697f60598e [DAG] hoistLogicOpWithSameOpcodeHands - ensure SIGN_EXTEND_INREG nodes have the same extension value type
Fix bug in the check for matching SIGN_EXTEND_INREG types
2023-07-20 10:44:46 +01:00
David Green
0c41c59dee [DAG][AArch64] Fix truncated vscale constant types
It appears that vscale values truncated to i1 causes mismatches in the constant
types when created in getNode. https://godbolt.org/z/TaaTo86ne.

Differential Revision: https://reviews.llvm.org/D155626
2023-07-20 09:12:05 +01:00
Fangrui Song
b215a2c885 .debug_gnu_pub{names,types}: Stabilize iteration order
StringMap iteration order is not guaranteed to be deterministic
(https://llvm.org/docs/ProgrammersManual.html#llvm-adt-stringmap-h).
Sort by DIE offset (which looks like a pre-order traversal order).
2023-07-19 23:30:30 -07:00
Fangrui Song
60a2b99125 AccelTable: Use MapVector to stabilize iteration order
Entries of the same DJB hash are in the hash lookup table/name table are
ordered by the iteration order of `Entries` (a StringMap). Change
`Entries` to a MapVector to stabilize the order and simplify future
changes like D142862 that change the StringMap hash function.
2023-07-19 19:50:36 -07:00
Fangrui Song
30e753dd07 [FSAFDO] Switch to xxh3_64bits
Following recent changes switching from xxh64 to xxh32 for better
hashing performance. This particular instance may or may not have
noticeable performance difference, but this change makes us toward
removing xxHash64.
2023-07-19 14:23:28 -07:00
Momchil Velikov
4c95f79cce [CodeGenPrepare] Refactor optimizeSelectInst (NFC)
Refactor to use BasicBlockUtils functions and make life easier for
a subsequent patch for updating the dominator tree.

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D154053
2023-07-19 18:56:44 +01:00
Igor Kirillov
c15557d64e [CodeGen] Extend ComplexDeinterleaving pass to recognise patterns using integer types
AArch64 introduced CMLA and CADD instructions as part of SVE2. This
change allows to generate such instructions when this architecture
feature is available.

Differential Revision: https://reviews.llvm.org/D153808
2023-07-19 11:01:19 +00:00
Simon Pilgrim
98b0f1360d [DAG] hoistLogicOpWithSameOpcodeHands - add support for SIGN_EXTEND_INREG nodes.
This can reuse the existing *_EXTEND node handling (with special handling for the valuetype arg)
2023-07-19 11:56:32 +01:00
Simon Pilgrim
2167ae93c9 [DAG] hoistLogicOpWithSameOpcodeHands - add support for *_EXTEND_VECTOR_INREG nodes.
This can reuse the existing *_EXTEND node handling.
2023-07-19 10:50:23 +01:00
Jingu Kang
62ed3ff4bb Revert "[MachineLICM] Handle Subloops"
This reverts commit 33e60484d7.
2023-07-19 10:30:50 +01:00
Han Shen
f7f744a522 [CodeGen] Separate MachineFunctionSplitter logic for different profile types.
In D152577 @xur has a post-submit comment regarding to an awkward usage
of MFS for Autofdo - instead of just using -fsplit-machine-function, the
user needs to add "-mllvm -mfs-psi-cutoff=0" to choose the right logic
for AutoFDO. The compiler should choose the right default values for
such case.

This CL separate MFS logic for different profile types.

Reviewed By: xur, wenlei

Differential Revision: https://reviews.llvm.org/D155253
2023-07-18 11:21:35 -07:00
David Green
74c0bdff7d [AArch64][GISel] Additional FPTrunc vector lowering
I was attempting to add llvm.reduce.fminimum/fmaximum support for GlobalISel.
In the process I noticed that llvm.reduce.fmin/fmax was missing, and could do
with being added first. That led on to adding additional vector support for
minnum/maxnum, which in turn led to needing to handle fptrunc and fpext for
some of the fp16 types. So this patch extends the vector handling for fptrunc,
adding support for f16 types which are clamped to 4 elements, and scalarizing
the rest.

I went round in circles a little with how smaller than legal vectors should be
handled, but this seems simple and seems to work, if not always optimally yet.

Differential Revision: https://reviews.llvm.org/D155311
2023-07-18 18:52:19 +01:00
Simon Pilgrim
d7eb9240c0 [DAG] SimplifyDemandedBits - attempt to use SimplifyMultipleUseDemandedBits for bitcasts from larger element types
Attempt to avoid multi-use ops if the bitcast doesn't need anything from them.
2023-07-18 18:38:03 +01:00
Simon Pilgrim
3ad4f92f83 [DAG] More aggressively (extract_vector_elt (build_vector x, y), c) iff element is zero constant
We currently don't extract vector elements from multi-use build vectors unless TLI.aggressivelyPreferBuildVectorSources accepts them, which seems a little extreme for constant build vectors (especially as under some cases ComputeKnownBits will indirectly extract the data for us).

This is causing a few regressions in some upcoming SimplifyDemandedBits work I'm looking at, all of which just need to know that the element is zero, so I've tweaked the fold to accept zero elements as well, which will typically fold very easily.

Differential Revision: https://reviews.llvm.org/D155582
2023-07-18 17:31:34 +01:00
Matt Arsenault
3f8ef57bed MachineSink: Fix sinking VGPR def out of a divergent loop
This fixes sinking a VGPR def out of a loop past the reconvergence
point at the SI_END_CF. There was a prior fix which introduced
blockPrologueInterferes (D121277) to fix the same basic problem for
the post RA sink. This also had the special case isIgnorableUse case
which was incorrect, because in some contexts the exec use is not
ignorable.

I'm thinking about a new way to represent this which will avoid
needing hasIgnorableUse and isBasicBlockPrologue, which would function
more like the exception handling.

Fixes: SWDEV-407790

https://reviews.llvm.org/D155343
2023-07-18 06:15:50 -04:00
Sameer Sahasrabuddhe
ef7d53731b [llvm] minor cleanup in GenericSSAContext
- update comments to reflect actual state
- use (implicitly inline) constexpr for a const static member
2023-07-18 12:01:11 +05:30
Konstantina Mitropoulou
4c42ab1199 [DAGCombiner] Change foldAndOrOfSETCC() to optimize and/or patterns
CMP(A,C)||CMP(B,C) => CMP(MIN/MAX(A,B), C)
CMP(A,C)&&CMP(B,C) => CMP(MIN/MAX(A,B), C)

This first patch handles integer types.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D153502
2023-07-17 17:13:47 -07:00
Matt Arsenault
825b7f0ca5 InlineSpiller: Fix copy identification bugs in isCopyOfBundle
Noticed by inspection of
b7836d8562. This was checking if the
first instruction was a copy, not the current MI. It should fully
respect the isCopyInstr result. Hopefully this fixes a reported
regression which we can extract a test from.
2023-07-17 20:05:56 -04:00
Matt Arsenault
296e24cd2e DAG: Constant fold frexp nodes
Special casing the nonfinite exponent value everywhere is kind of
annoying.
2023-07-17 17:34:29 -04:00
Simon Pilgrim
4f95821f58 [DAG] SelectionDAG::getNode() - consistently use N1 for first operand. NFCI.
This has been annoying me for years - rename Operand to N1 so it matches all the other getNode() calls, and simplifies my debug watch windows!
2023-07-17 17:17:40 +01:00
Simon Pilgrim
e9caa37e9c [DAG] Move lshr narrowing from visitANDLike to SimplifyDemandedBits
Inspired by some of the cases from D145468

Let SimplifyDemandedBits handle the narrowing of lshr to half-width if we don't require the upper bits, the narrowed shift is profitable and the zext/trunc are free.

A future patch will propose the equivalent shl narrowing combine.

Differential Revision: https://reviews.llvm.org/D146121
2023-07-17 15:50:09 +01:00
Jay Foad
a1a9c53ae7 [GlobalISel] Fix infinite loop in reassociation combine
Don't reassociate (C1+C2)+Y -> C1+(C2+Y).

Fixes https://github.com/llvm/llvm-project/issues/63849

Differential Revision: https://reviews.llvm.org/D155284
2023-07-16 14:15:24 +01:00
Matt Arsenault
c4ccd6e3d2 MachineSink: Remove unnecessary empty block check 2023-07-14 18:46:18 -04:00
Matt Arsenault
6d3027e3d1 MachineSink: Move helper function and use more const 2023-07-14 18:46:18 -04:00
Weining Lu
ef33d6cbfc [XRay] Add initial support for loongarch64
Only support patching FunctionEntry/FunctionExit/FunctionTailExit for now.

Reviewed By: MaskRay, xen0n
Co-Authored-By: zhanglimin <zhanglimin@loongson.cn>

Differential Revision: https://reviews.llvm.org/D140727
2023-07-14 09:27:13 +08:00