If the SRL for Hi constant folds, but we don't remoe those bits from
the Lo, we can end up with strange constant folding through DAGCombine later.
I've only seen this with constants being lowered to constant pools
during lowering on RISC-V.
The intrinsic uses ImmArg so TargetConstant would be consistent
with how other intrinsics are handled.
This hides the constants from type legalization so we can remove
the promotion support.
isel patterns are updated accordingly.
We'll need to generalize this fold to check for any zero upperbits to address some of the D155472 regressions, but this exposes a number of issues. For now, just use the general MaskedValueIsZero test instead of the assertzext.
Update LiveIntervals after rewriting:
%reg = INSERT_SUBREG undef %reg, %subreg, subidx
to:
undef %reg:subidx = COPY %subreg
D113044 implemented this for the non-undef case.
This patch adjusts the legality check for riscv to use `cpop/cpopw` since `isOperationLegal(ISD::CTPOP, MVT::i32)` returns false on rv64gc_zbb.
Clang vs gcc: https://godbolt.org/z/rc3s4hjPh
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D156390
If a virtual register is not assigned preferred physical register, it means some
COPY instructions will be changed to real register move instructions. In this
case we can try to split the virtual register in colder blocks, if success, the
original COPY instructions can be deleted, and the new COPY instructions in
colder blocks will be generated as register move instructions. It results in
fewer dynamic register move instructions executed.
The new test case split-reg-with-hint.ll gives an example, the hot path contains
24 instructions without this patch, now it is only 4 instructions with this
patch.
Differential Revision: https://reviews.llvm.org/D156491
Revision c383f4d655 enabled using variadic-form debug values to represent
single-location, non-stack-value debug values, and a further patch made all
DBG_INSTR_REFs use variadic form. Not all code paths were updated correctly to
handle the new syntax however, with entry values in still expecting an expression
that begins exactly DW_OP_LLVM_entry_value, 1.
A function already exists to select non-variadic-like expressions; this patch
adds an extra function to cheaply simplify such cases to non-variadic form, which
we use prior to any entry-value processing to put DBG_INSTR_REFs and DBG_VALUEs
down the same code path. We also use it for a few DIExpression functions that
check for whether the first element(s) of a DIExpression match a particular
pattern, so that they will return the same result for
DIExpression(DW_OP_LLVM_arg, 0, <ops>) as for DIExpression(<ops>).
Differential Revision: https://reviews.llvm.org/D158185
The indirect lowering hinders the outliner's ability to see that
sequences are in fact common, since the sequence similarity is rendered
opaque by the register callee. The size savings from making them
indirect seems to be dwarfed by the outliner's savings from
de-duplication.
rdar://115178034
rdar://115459865
This matches what is done on calls, since
cc981d285d (extended for another case in
5a751e747d).
Apply both those cases on invoke just like is done for call.
Also update the preexisting comment which was left without update in
5a751e747d.
This fixes github issue #61941.
Fixes#65004 by trimming assignments from out of bounds stores (out of bounds
of either the base variable or the backing alloca). If there's no overlap at
all or the out of bounds access starts at a negative offset from the alloca,
the assignment is simply skipped.
Currently clang's medium code model treats all data as large, putting them in a large data section and using more expensive instruction sequences to access them.
Following gcc's -mlarge-data-threshold, which allows putting data under a certain size in a normal data section as opposed to a large data section. This allows using cheaper code sequences to access some portion of data in the binary (which will be implemented in LLVM in a future patch).
And under the medium codel mode, only put data above the large data threshold into large data sections, not all data.
Reviewed By: MaskRay, rnk
Differential Revision: https://reviews.llvm.org/D149288
This will make it easy for callers to see issues with and fix up calls
to createTargetMachine after a future change to the params of
TargetMachine.
This matches other nearby enums.
For downstream users, this should be a fairly straightforward
replacement,
e.g. s/CodeGenOpt::Aggressive/CodeGenOptLevel::Aggressive
or s/CGFT_/CodeGenFileType::
Multiplying raw block frequency with an integer carries a high risk
of overflow.
- Add `BlockFrequency::mul` return an std::optional with the product
or `nullopt` to indicate an overflow.
- Fix two instances where overflow was likely.
reland [InlineAsm] wrap ConstraintCode in enum class NFC (#66003)
This reverts commit ee643b706b.
Fix up build failures in targets I missed in #66003
Kept as 3 commits for reviewers to see better what's changed. Will
squash when
merging.
- reland [InlineAsm] wrap ConstraintCode in enum class NFC (#66003)
- fix all the targets I missed in #66003
- fix off by one found by llvm/test/CodeGen/SystemZ/inline-asm-addr.ll
No need to special case add 0, N. SelectionDAG::getNode contains the canonicalization and simplification for this case, so no need to duplicate it here.
This reverts commit 2ca4d13612.
Also revert the followup, "[InlineAsm] fix botched merge conflict resolution"
This reverts commit 8b9bf3a9f7.
There were SystemZ and Mips build errors, too many to fix forward.
A splat index means the operation is reading from (writing to) the same
memory location. Generally, zero is the cheapest value to splat. As
such, we'd prefer to add the splatted value to the base, and use a
constant zero as the index operand.
Similar to
commit 2fad6e6985 ("[InlineAsm] wrap Kind in enum class NFC")
Fix the TODOs added in
commit 93bd428742 ("[InlineAsm] refactor InlineAsm class NFC
(#65649)")
Followup to D59363 which failed to handle the icmp(X,undef) -> isTrueWhenEqual case - similar to llvm::ConstantFoldCompareInstruction
As discussed on the review, this is affecting some previously reduced test cases, but will also prevent reductions from relying on this inconsistent behaviour in the future.
Reapplied after reversion at e1e3c75c7d with a tweak to the pseudo-probe-peep.ll test
Differential Revision: https://reviews.llvm.org/D158068
Followup to D59363 which failed to handle the icmp(X,undef) -> isTrueWhenEqual case - similar to llvm::ConstantFoldCompareInstruction
As discussed on the review, this is affecting some previously reduced test cases, but will also prevent reductions from relying on this inconsistent behaviour in the future.
Differential Revision: https://reviews.llvm.org/D158068
Actually pass along the depth parameter of getKnownBits to computeKnownBitsImpl.
Reviewed By: aemerson
Differential Revision: https://reviews.llvm.org/D159321
The AsmPrinter currently assumes that a Debug Variable will have all of its
fragments with the same "kind" of location (i.e. all in the stack or all in
entry values). This is not enforced by the verifier, so it needs to be handled
properly. Until we do so, we conservatively drop one of the fragments.
Differential Revision: https://reviews.llvm.org/D159468
By default, `DAGCombiner` folds `sext x` to `zext x` when `x` is
non-negative. It will generate redundant `zext` inst seq on riscv64
(typically `slli (srli x, 32), 32`).
godbolt: https://godbolt.org/z/osf6adP1o
This patch applies the transform iff `zext` is **cheaper** than `sext`.
A register's use with isUndef flags shouldn't be considered as a point where the register is live. LiveVariables::runOnInstr ignores such uses.
This was found when I tried to replace calls to
SIOptimizeVGPRLiveRange::updateLiveRangeInThenRegion
SIOptimizeVGPRLiveRange::updateLiveRangeInElseRegion
with LiveVariables::recomputeForSingleDefVirtReg.
In the testcase below %2 use is undef in the last REG_SEQUENCE.
CodeGen/AMDGPU/si-opt-vgpr-liverange-bug-deadlanes.mir failed:
Function Live Ins: $vgpr0 in %0
bb.0:
successors: %bb.1(0x40000000), %bb.2(0x40000000); %bb.1(50.00%), %bb.2(50.00%)
liveins: $vgpr0
%0:vgpr_32 = COPY killed $vgpr0
%1:vgpr_32 = V_MOV_B32_e32 0, implicit $exec
%2:vgpr_32 = BUFFER_LOAD_DWORD_OFFEN killed %0:vgpr_32, undef %5:sgpr_128, 0, 0, 0, 0, implicit $exec :: (dereferenceable invariant load (s32))
%3:sreg_64 = V_CMP_NE_U32_e64 0, %2:vgpr_32, implicit $exec
%7:sreg_64 = SI_IF killed %3:sreg_64, %bb.2, implicit-def dead $exec, implicit-def dead $scc, implicit $exec
S_BRANCH %bb.1
bb.1:
; predecessors: %bb.0
successors: %bb.2(0x80000000); %bb.2(100.00%)
%8:vreg_128 = REG_SEQUENCE killed %1:vgpr_32, %subreg.sub0, %1:vgpr_32, %subreg.sub1, %1:vgpr_32, %subreg.sub2, undef %4.sub3:vreg_128, %subreg.sub3
bb.2:
; predecessors: %bb.0, %bb.1
successors: %bb.3(0x40000000), %bb.4(0x40000000); %bb.3(50.00%), %bb.4(50.00%)
%9:vreg_128 = PHI undef %10:vreg_128, %bb.0, %8:vreg_128, %bb.1
%14:vgpr_32 = PHI %2:vgpr_32, %bb.0, undef %15:vgpr_32, %bb.1
%11:sreg_64 = SI_ELSE killed %7:sreg_64, %bb.4, implicit-def dead $exec, implicit-def dead $scc, implicit $exec
S_BRANCH %bb.3
bb.3:
; predecessors: %bb.2
successors: %bb.4(0x80000000); %bb.4(100.00%)
%12:vreg_128 = REG_SEQUENCE killed %14:vgpr_32, %subreg.sub0, %14:vgpr_32, %subreg.sub1, %14:vgpr_32, %subreg.sub2, undef %6:vgpr_32, %subreg.sub3
bb.4:
; predecessors: %bb.2, %bb.3
%13:vreg_128 = PHI %9:vreg_128, %bb.2, %12:vreg_128, %bb.3
SI_END_CF killed %11:sreg_64, implicit-def dead $exec, implicit-def dead $scc, implicit $exec
dead %4:vreg_128 = REG_SEQUENCE killed %13.sub2:vreg_128, %subreg.sub0, %13.sub2:vreg_128, %subreg.sub1, %13.sub2:vreg_128, %subreg.sub2, **undef %2:vgpr_32**, %subreg.sub3
S_ENDPGM 0
*** Bad machine code: LiveVariables: Block should not be in AliveBlocks ***
- function: _amdgpu_ps_main
- basic block: %bb.1 (0x55e17ebd7100)
Virtual register %2 is not needed live through the block.
*** Bad machine code: LiveVariables: Block should not be in AliveBlocks ***
- function: _amdgpu_ps_main
- basic block: %bb.2 (0x55e17ebd7200)
Virtual register %2 is not needed live through the block.
*** Bad machine code: LiveVariables: Block should not be in AliveBlocks ***
- function: _amdgpu_ps_main
- basic block: %bb.3 (0x55e17ebd7300)
Virtual register %2 is not needed live through the block.
Differential Revision: https://reviews.llvm.org/D158167
Live out implicit_defs need to be kept, but the check for this only
checked if the block parent was the same. This doesn't work if the
parent blocks are the same but the value is live. Fixes verifier error
"Instruction ending live segment doesn't read the register", which
would appear at the coalesced non-implicit_def def.
Fixes#38788https://reviews.llvm.org/D158882
This fixes some verifier errors when live out implicit defs are
coalesced with identity copies. Fixes some reduced testcases from
issue #38788 but doesn't solve the original failure.
I was surprised this seems to obviate the special casing in
analyzeValue that's been there since the subregister liveness support
went in.
https://reviews.llvm.org/D158850
This potentially has a slightly positive performance impact, as
std::visit can be implemented as a `switch`-like jump rather than
a series of `if`s.
More importantly, the reader can be confident is no overlap between the
cases.
Differential Revision: https://reviews.llvm.org/D158678
Only a subset of the fields of DbgVariable are meaningful at any time,
and some fields are re-used for multiple purposes (for example
FrameIndexExprs is used with a throw-away frame-index of 0 to hold a
single DIExpression without needing to add another member). The exact
invariants must be reverse-engineered by inspecting the actual use of
the class, its imprecise/outdated doc-comment, and some asserts.
Refactor DbgVariable into a sum type by inheriting from std::variant.
This makes the active fields for any given state explicit and removes
the need to re-use fields in disparate contexts. As a bonus, it seems to
reduce the size on my x86_64 linux box from 144 bytes to 96 bytes.
There is some potential cost to `std::get` as it must check the active
alternative even when context or an assert obviates it. To try to help
ensure the compiler can optimize out the checks the patch also adds a
helper `get` method which uses the noexcept `std::get_if`.
Some of the extra cost would also be avoided more cleanly with a
refactor that exposes the alternative types in the public interface,
which will come in another patch.
Differential Revision: https://reviews.llvm.org/D158675
I would like to steal one of these bits to denote whether a kind may be
spilled by the register allocator or not, but I'm afraid to touch of any
this code using bitwise operands.
Make flags a first class type using bitfields, rather than launder data
around via `unsigned`.
Continuing the patch series to get rid of debug intrinsics [0], instruction
insertion needs to be done with iterators rather than instruction pointers,
so that we can communicate information in the iterator class. This patch
adds an iterator-taking insertBefore method and converts various call sites
to take iterators. These are all sites where such debug-info needs to be
preserved so that a stage2 clang can be built identically; it's likely that
many more will need to be changed in the future.
At this stage, this is just changing the spelling of a few operations,
which will eventually become signifiant once the debug-info bearing
iterator is used.
[0] https://discourse.llvm.org/t/rfc-instruction-api-changes-needed-to-eliminate-debug-intrinsics-from-ir/68939
Differential Revision: https://reviews.llvm.org/D152537
Add handling for subrange updates in LiveInterval preservation.
This requires extending MachineBasicBlock::SplitCriticalEdge
to also update subrange intervals.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D158144