uses when looking for load/store users. This was a simple logic bug during translation
of the equivalent function in SelectionDAG:
```
for (SDNode *Node : N->uses()) {
if (auto *LoadStore = dyn_cast<MemSDNode>(Node)) {
```
After D157690 we are seeing some crashes from Global ISel, which seem to be
related to the shift_of_shifted_logic_chain combine that can remove too many
instructions if the shift amount is zero.
This limits the fold to non-zero shifts, under the assumption that it is better
in that case to fold away the shift to a COPY.
Differential Revision: https://reviews.llvm.org/D158596
Rewrites some simple rules that cause little to no codegen regressions as MIR patterns.
I may have missed some easy cases, but some other rules have intentionally been left as-is because bigger
changes are needed to make them work.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D157690
This check was unnecessary/incorrect, it was already being done by the target
hook default implementation, and the one in the matcher was checking for a
completely different thing. This change:
1) Removes the check and updates affected tests which now do some more reassociations.
2) Modifies the AMDGPU hooks which were stubbed with "return true" to also do the oneuse
check. Not sure why I didn't do this the first time.
There is no case where those functions return false. It's always return true.
Even if they were to return false, it's not really something we should rely on I think.
With the current combiner implementation, it would just make `tryCombineAll` return false without retrying anymore rules.
I also believe that if an applyer were to return false, it would mean that the match function is not good enough. Asserting on failure in an apply function is a better idea, IMO.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D153619
- (op (op X, C1), C2) -> (op X, (op C1, C2))
- (op (op X, C1), Y) -> (op (op X, Y), C1)
Some code duplication with the G_PTR_ADD reassociations unfortunately but no
easy way to avoid it that I can see.
Differential Revision: https://reviews.llvm.org/D150230
matchCombineShlOfExtend did not check if the size of new shift would be
wider then a size of operand. Current condition did not work if the value
being shifted was zero. Updated to support vector splat.
Patch by: Acim Maravic
Differential Revision: https://reviews.llvm.org/D151122
In some rare corner cases where in between the div/rem pair there's a def of
the second instruction's source (but a different vreg due to the combine's
eqivalence checks), it will place the DIVREM at the first instruction's point,
causing a use-before-def. There wasn't an obvious fix that stood out to me
without doing more involved analysis than a combine should really be doing.
Fixes issue #60516
I'm open to new suggestions on how to approach this, as I'm not too happy
at bailing out here. It's not the first time we run into issues with value liveness
that the DAG world isn't affected by.
Differential Revision: https://reviews.llvm.org/D144336
I don't really understand what the point of wip_match_opcode is.
It doesn't seem to have any purpose other than to list opcodes
to have all the logic in pure C++. You can't seem to use it to
select multiple opcodes in the same way you use match.
Something is wrong with it, since the match emitter prints
"errors" if an opcode is covered by wip_match_opcode and
then appears in another pattern. For exmaple with this patch,
you see this several times in the build:
error: Leaf constant_fold_fabs is unreachable
note: Leaf idempotent_prop will have already matched
The combines are actually produced and the tests for them
do pass, so this seems to just be a broken warning.
Without the fix gcc complains with
../lib/CodeGen/GlobalISel/CombinerHelper.cpp:1652:52: warning: suggest parentheses around '&&' within '||' [-Wparentheses]
1652 | SrcDef->getOpcode() == TargetOpcode::G_OR && "Unexpected op");
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~
There's a target hook that's called in DAGCombiner that we stub here, I'll
implement the equivalent override for AArch64 in a subsequent patch since it's
used by different shift combine.
This change by itself has minor code size improvements on arm64 -Os CTMark:
Program size.__text
outputg181ppyy output8av1cxfn diff
consumer-typeset/consumer-typeset 410648.00 410648.00 0.0%
tramp3d-v4/tramp3d-v4 364176.00 364176.00 0.0%
kimwitu++/kc 449216.00 449212.00 -0.0%
7zip/7zip-benchmark 576128.00 576120.00 -0.0%
sqlite3/sqlite3 285108.00 285100.00 -0.0%
SPASS/SPASS 411720.00 411688.00 -0.0%
ClamAV/clamscan 379868.00 379764.00 -0.0%
Bullet/bullet 452064.00 451928.00 -0.0%
mafft/pairlocalalign 246184.00 246108.00 -0.0%
lencod/lencod 428524.00 428152.00 -0.1%
Geomean difference -0.0%
Differential Revision: https://reviews.llvm.org/D150086
If we have set of mergeable stores of shifts, but the original source value being shifted
is wider than the merged size, we should still be able to merge if we truncate first. To do this
however we need to search for stores speculatively up the block, without knowing exactly how
many stores we should see before we stop. The old algorithm has to match an exact number of
stores to fit the wide type, or it dies. The new one will try to set the wide type to however
many stores we found in the upwards block traversal and use later checks to verify if they're
a valid mergeable set.
The reason I need to move this to LoadStoreOpt is because the combiner works going top down
inside a block, which means that we end up doing partial merges because we haven't seen all
the possible stores before we mutate the MIR. In LoadStoreOpt we can go bottom up.
As a side effect of this change, we also end up doing better on an existing test case (missing_store)
since we manage to do a partial merge there.
We use this combine in the AArch64 postlegalizer combiner, which causes this
function to query the legalizer rules for the action for an invalid opcode/type
combination (G_AND and p0). Moving the legalizer query until after the validity
check in matchHoistLogicOpWithSameOpcodeHands() fixes this.
The extending loads combine tries to prefer sign-extends folding into loads vs
zexts, and in cases where a G_ZEXTLOAD is first used by a G_ZEXT, and then used
by a G_SEXT, it would select the G_SEXT even though the load is already
zero-extending.
Fixes issue #59630
RegBankSelect can insert G_UNMERGE_VALUES in a lot of places which
left us with a lot of unmerge/merge pairs that could be simplified.
These often got in the way of pattern matching and made codegen
worse.
This patch:
- Makes the necessary changes to the merge/unmerge combines so they can run post RegBankSelect
- Adds relevant unmerge combines to the list of RegBankSelect combines for AMDGPU
- Updates some tablegen patterns that were missing explicit cross-regbank copies (V_BFI patterns were causing constant bus violations with this change).
This seems to be mostly beneficial for code quality.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D142192
This patch replaces `GMergeLikeOp` with `GMergeLikeInstr` and
`MachineIRBuilder::buildAssertOp` with `buildAssertInstr` in order to
remove ambiguity. Discussed in: https://reviews.llvm.org/D141372
Instead of doing the adjustment in 3 different places in the code
base, do it inside UnsignedDivideUsingMagic::get.
Differential Revision: https://reviews.llvm.org/D141014
The magic algorithm sets IsAdd indication for division by 1 that
the caller had to ignore.
I considered folding the ignore into UnsignedDivisionByConstantInfo,
but we only allow 1 for vectors of mixed visiors. And really what we
want to end up with is undef. Currently, we get to undef via
DemandedElts optimizations using the select instruction. We could
directly emit undef.
Differential Revision: https://reviews.llvm.org/D140940
This combine only handled left shifts, but now it can handle right shifts as well. It handles right shifts conservatively and only truncates them to the size returned by TLI.
AMDGPU benefits from always lowering shifts to 32 bits for instance, but AArch64 would rather keep them at 64 bits.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D136319
This patch mechanically replaces None with std::nullopt where the
compiler would warn if None were deprecated. The intent is to reduce
the amount of manual work required in migrating from Optional to
std::optional.
This is part of an effort to migrate from llvm::Optional to
std::optional:
https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716
If LogicNonShiftReg is the same to Shift1Base, and shift1 const is the same to MatchInfo.Shift2 const, CSEMIRBuilder will reuse the old shift1 when build shift2.
So, if we erase MatchInfo.Shift2 at the end, actually we remove old shift1. And it will cause crash later.
Solution for this issue is just erase it earlier to avoid the crash.
Fix#58423
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D138187
A target can return if a misaligned access is 'fast' as defined
by the target or not. In reality there can be different levels
of 'fast' and 'slow'. This patch changes the boolean 'Fast'
argument of the allowsMisalignedMemoryAccesses family of functions
to an unsigned representing its speed.
A target can still define it as it wants and the direct translation
of the current code uses 0 and 1 for current false and true. This
makes the change an NFC.
Subsequent patch will start using an actual value of speed in
the load/store vectorizer to compare if a vectorized access going
to be not just fast, but not slower than before.
Differential Revision: https://reviews.llvm.org/D124217
When we match a pattern from m_GCst, the register type could be different from original op. So we can't replace the original op to vreg direct.
This code create a new constant with original op type then replace the original op.
Fix#58906
Reviewed By: arsenm, aemerson
Differential Revision: https://reviews.llvm.org/D137778
Caused by legacy min/max combines (select + cmp) asking for legalizer info
in prelegalizer (D135047 added combine to all_combines).
Combine still does not work for AMDGPU since destination opcode is custom,
not legal. Similar combine works on DAG since it asks for legal or custom.
Differential Revision: https://reviews.llvm.org/D137274