Non-throwing inline asm infers the nounwind attribute in
instcombine. Thus, it can be handled in the same manner as
non-throwing target functions are generally. Further special casing is
unnecessary complexity.
Use deduction guides instead of helper functions.
The only non-automatic changes have been:
1. ArrayRef(some_uint8_pointer, 0) needs to be changed into ArrayRef(some_uint8_pointer, (size_t)0) to avoid an ambiguous call with ArrayRef((uint8_t*), (uint8_t*))
2. CVSymbol sym(makeArrayRef(symStorage)); needed to be rewritten as CVSymbol sym{ArrayRef(symStorage)}; otherwise the compiler is confused and thinks we have a (bad) function prototype. There was a few similar situation across the codebase.
3. ADL doesn't seem to work the same for deduction-guides and functions, so at some point the llvm namespace must be explicitly stated.
4. The "reference mode" of makeArrayRef(ArrayRef<T> &) that acts as no-op is not supported (a constructor cannot achieve that).
Per reviewers' comment, some useless makeArrayRef have been removed in the process.
This is a follow-up to https://reviews.llvm.org/D140896 that introduced
the deduction guides.
Differential Revision: https://reviews.llvm.org/D140955
Add G_BUILD_VECTOR and G_BUILD_VECTOR_TRUNC to the list of opcodes in
`shouldCSEOpc`. This simplifies the code generated for vector splats.
Differential Revision: https://reviews.llvm.org/D140965
At the moment, `MachineIRBuilder::buildInstr` may build an instruction
with a different opcode than the one passed in as parameter. This may
cause confusion for its consumers, such as `CSEMIRBuilder`, which will
memoize the instruction based on the new opcode, but will search
through the memoized instructions based on the original one (resulting
in missed CSE opportunities). This is all the more unpleasant since
buildInstr is virtual and may call itself recursively both directly
and via buildCast, so it's not always easy to follow what's going on.
This patch simplifies the API of `MachineIRBuilder` so that the `buildInstr`
method does the least surprising thing (i.e. builds an instruction with
the specified opcode) and only the convenience `buildX` methods
(`buildMerge` etc) are allowed freedom over which opcode to use. This can
still be confusing (e.g. one might write a unit test using
`buildBuildVectorTrunc` but instead get a plain `G_BUILD_VECTOR`), but at
least it's explained in the comments.
In practice, this boils down to 3 changes:
* `buildInstr(G_MERGE_VALUES)` will no longer call itself with
`G_BUILD_VECTOR` or `G_CONCAT_VECTORS`; this functionality is moved to
`buildMerge` and replaced with an assert;
* `buildInstr(G_BUILD_VECTOR_TRUNC)` will no longer call itself with
`G_BUILD_VECTOR`; this functionality is moved to `buildBuildVectorTrunc`
and replaced with an assert;
* `buildInstr(G_MERGE_VALUES)` will no longer call `buildCast` and will
instead assert if we're trying to merge a single value; no change is
needed in `buildMerge` since it was already asserting more than one
source operand.
This change is NFC for users of the `buildX` methods, but users that
call `buildInstr` with relaxed parameters will have to update their code
(such instances will hopefully be easy to find thanks to the asserts).
Differential Revision: https://reviews.llvm.org/D140964
Instead of doing the adjustment in 3 different places in the code
base, do it inside UnsignedDivideUsingMagic::get.
Differential Revision: https://reviews.llvm.org/D141014
The magic algorithm sets IsAdd indication for division by 1 that
the caller had to ignore.
I considered folding the ignore into UnsignedDivisionByConstantInfo,
but we only allow 1 for vectors of mixed visiors. And really what we
want to end up with is undef. Currently, we get to undef via
DemandedElts optimizations using the select instruction. We could
directly emit undef.
Differential Revision: https://reviews.llvm.org/D140940
This combine only handled left shifts, but now it can handle right shifts as well. It handles right shifts conservatively and only truncates them to the size returned by TLI.
AMDGPU benefits from always lowering shifts to 32 bits for instance, but AArch64 would rather keep them at 64 bits.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D136319
This is not a correctness fix because the set is only used for debug
output. However, it helps avoid noise when looking at diffs between
compiler runs.
The set is only maintained with debug output enabled, so the added cost
should be acceptable.
Differential Revision: https://reviews.llvm.org/D139465
We currently have a bug where the legalizer, when dealing with phi operands,
may create instructions in the phi's incoming blocks at points which are effectively
dead due to a possible exception throw.
Say we have:
throwbb:
EH_LABEL
x0 = %callarg1
BL @may_throw_call
EH_LABEL
B returnbb
bb:
%v = phi i1 %true, throwbb, %false....
When legalizing we may need to widen the i1 %true value, and to do that we need
to create new extension instructions in the incoming block. Our insertion point
currently is the MBB::getFirstTerminator() which puts the IP before the unconditional
branch terminator in throwbb. These extensions may never be executed if the call
throws, and therefore we need to emit them before the call (but not too early, since
our new instruction may need values defined within throwbb as well).
throwbb:
EH_LABEL
x0 = %callarg1
BL @may_throw_call
EH_LABEL
%true = G_CONSTANT i32 1 ; <<<-- ruh'roh, this never executes if may_throw_call() throws!
B returnbb
bb:
%v = phi i32 %true, throwbb, %false....
To fix this, I've added two new instructions. The main idea is that G_INVOKE_REGION_START
is a terminator, which tries to model the fact that in the IR, the original invoke inst
is actually a terminator as well. By using that as the new insertion point, we
make sure to place new instructions on always executing paths.
Unfortunately we still need to make the legalizer use a new insertion point API
that I've added, since the existing `getFirstTerminator()` method does a reverse
walk up the block, and any non-terminator instructions cause it to bail out. To
avoid impacting compile time for all `getFirstTerminator()` uses, I've added a new
method that does a forward walk instead.
Differential Revision: https://reviews.llvm.org/D137905
After IRTranslator pass, constants are deduplicated and translated into instructions at entry block, having debug locations lost.
Localization of constants may cause emission of extra zero lines in debug_line section, like here https://godbolt.org/z/ecvsxxfKn. In this example, constant gets placed as
a first instruction in entry block, and despite it has no debug location, AsmPrinter emits zero line for it.
If a localized constant has the only user, we can assume that it has the same debug location as its user, since they are placed consequently.
Differential Revision: https://reviews.llvm.org/D128192
This patch mechanically replaces None with std::nullopt where the
compiler would warn if None were deprecated. The intent is to reduce
the amount of manual work required in migrating from Optional to
std::optional.
This is part of an effort to migrate from llvm::Optional to
std::optional:
https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716
The use of a PSV for buffer intrinsics is misleading because it may be
misinterpreted as all buffer intrinsics accessing the same address in
memory, which is clearly not true.
Instead, build MachineMemOperands without a pointer value but with an
address space, so that address space-based alias analysis can still
work.
There is a lot of test churn because previously address space 4
(constant address space) was used as an address space for buffer
intrinsics. This doesn't make much sense and seems to have been an
accident -- see the change in
AMDGPUTargetMachine::getAddressSpaceForPseudoSourceKind.
Differential Revision: https://reviews.llvm.org/D138711
class support and introduce GlobalISel implementation for AMDGPU
Uses existing SelectionDAG lowering of the llvm.amdgcn.class intrinsic
for llvm.is.fpclass
If LogicNonShiftReg is the same to Shift1Base, and shift1 const is the same to MatchInfo.Shift2 const, CSEMIRBuilder will reuse the old shift1 when build shift2.
So, if we erase MatchInfo.Shift2 at the end, actually we remove old shift1. And it will cause crash later.
Solution for this issue is just erase it earlier to avoid the crash.
Fix#58423
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D138187
A target can return if a misaligned access is 'fast' as defined
by the target or not. In reality there can be different levels
of 'fast' and 'slow'. This patch changes the boolean 'Fast'
argument of the allowsMisalignedMemoryAccesses family of functions
to an unsigned representing its speed.
A target can still define it as it wants and the direct translation
of the current code uses 0 and 1 for current false and true. This
makes the change an NFC.
Subsequent patch will start using an actual value of speed in
the load/store vectorizer to compare if a vectorized access going
to be not just fast, but not slower than before.
Differential Revision: https://reviews.llvm.org/D124217
When we match a pattern from m_GCst, the register type could be different from original op. So we can't replace the original op to vreg direct.
This code create a new constant with original op type then replace the original op.
Fix#58906
Reviewed By: arsenm, aemerson
Differential Revision: https://reviews.llvm.org/D137778
Caused by legacy min/max combines (select + cmp) asking for legalizer info
in prelegalizer (D135047 added combine to all_combines).
Combine still does not work for AMDGPU since destination opcode is custom,
not legal. Similar combine works on DAG since it asks for legal or custom.
Differential Revision: https://reviews.llvm.org/D137274
Originaly the loop did almost nothing as the calculated location was
overwritten on the next iteration.
Differential Revision: https://reviews.llvm.org/D136937
In https://github.com/llvm/llvm-project/issues/57452, we found that IRTranslator is translating `i1 true` into `i32 -1`.
This is because IRTranslator uses SExt for indices.
In this fix, we change the expected behavior of extractelement's index, moving from SExt to ZExt.
This change includes both documentation, SelectionDAG and IRTranslator.
We also included a test for AMDGPU, updated tests for AArch64, Mips, PowerPC, RISCV, VE, WebAssembly and X86
This patch fixes issue #57452.
Differential Revision: https://reviews.llvm.org/D132978
Similar to the current "Trunc/BuildVector" folding - which folds low element extracts of BuildVectors, folds hi element extracts done using bitshifts.
For D134354
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D135148
As a result of making these legal, and tweaking the combine to allow vectors,
we generate vector G_SEXT_INREG during legalization.
The reason we want to make these legal in the first place is to allow for
more combine opportunities. Once those have been done, we can just lower them
back to shifts in the post-legalizer lowering.
This needs to be one commit otherwise we start causing tests to fail due to
incomplete support for selection etc.
Vector support seems to work immediately, as long as we run the combine before
legalization (so the vector SELECTs don't get lowered) and the legalizer rules
are there to enable generation.
Differential Revision: https://reviews.llvm.org/D135047
This adds a combine that handles
```
(x + y) - y -> x
(x + y) - x -> y
x - (y + x) -> 0 - y
x - (x + z) -> 0 - z
```
On AArch64, we get added benefit for `0 - y` because it can be selected to a
`neg` instruction.
Differential Revision: https://reviews.llvm.org/D135010
Before, the isPreLegalize() query in CombinerHelper only checked for the
presence of a LegalizerInfo object. This is problematic when we want to have
a combine actually check for legality in a pre-legalizer combine pass, since
if we pass a LegalizerInfo object to the constructor it causes the combines to
think that we're running *post* legalizer, which isn't true.
This change fixes it to instead check an explicit bool that passes to signal
whether the pass will be run before or after legalization.
Doing so exposed a bug in the extending loads combine, which tried to check for
legality of candidate extending loads if LegalizerInfo was present. Since we
only ran it pre-legalizer and therefore with a null LegalizerInfo, it never
actually ran. Also fixes the legality checks to keep the tests passing.
Differential Revision: https://reviews.llvm.org/D135044
Function buildCopyToRegs did not handle properly the case when it should
make wider vector result. It happened, for example, in a function that
returns value of type <2 x f32>, which should be widen to <4 x f32> to
fit XMM register. The function eventually calls
MachineIRBuilder.buildUnmerge, which does not expect that only one
destination register is specified.
Now this case is treated specifically in buildCopyToRegs.
Differential Revision: https://reviews.llvm.org/D128546