Inspired by some of the cases from D145468
Let SimplifyDemandedBits handle the narrowing of lshr to half-width if we don't require the upper bits, the narrowed shift is profitable and the zext/trunc are free.
A future patch will propose the equivalent shl narrowing combine.
Differential Revision: https://reviews.llvm.org/D146121
GCC and existing codebases allow the use of integral values to be used
with this constraint. A recent change D133914 in this area started causing asserts.
Removing the assert is enough as the rest of the code works fine.
rdar://109675485
Differential Revision: https://reviews.llvm.org/D155023
TargetLowering::SimplifySetCC wants to swap the operands of a SETCC to
canonicalize the constant to the RHS. The bug here was that it did so whether
or not the RHS was already a constant, leading to an infinite loop.
rdar://111847838
Divverential revision: https://reviews.llvm.org/D155095
This reverts commit cdc633e4bc.
This mostly copies cases that already exist in ValueTracking, although
it skips the more complex ones. Those can be filled in as needed.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D149199
TargetLowering::SimplifySetCC wants to swap the operands of a SETCC to
canonicalize the constant to the RHS. The bug here was that it did so whether
or not the RHS was already a constant, leading to an infinite loop.
rdar://111847838
Differential revision: https://reviews.llvm.org/D155095
For RISC-V, getRegisterType for fp16 returns i16. i16->fp64 extload
is considered legal because the LoadExtActions defaults to Legal
for all entries. Only fp/fp and int/int entries are changed to
Expand fore RISC-V.
This patch detects the FP-ness has changed and won't try to call
isLoadExtLegal.
Alternatively, we could add Expand for int/fp and fp/int, but that
seemed a little silly.
Fixes#63816
Reviewed By: asb, wangpc
Differential Revision: https://reviews.llvm.org/D155040
https://reviews.llvm.org/D130883 introduced MIMetadata to simplify
metadata propagation (DebugLoc and PCSections).
However, we're currently still permitting implicit conversion of
DebugLoc to MIMetadata, to allow for a gradual transition and let the
old code work as-is.
This manifests in lost !pcsections metadata for X86-specific lowerings.
For example, 128-bit atomics.
Fix the situation for X86ISelLowering by converting all BuildMI() calls
to use an explicitly constructed MIMetadata.
Reviewed By: dvyukov
Differential Revision: https://reviews.llvm.org/D154986
Documentation for TargetLowering::getShiftAmountTy says that LegalTypes
should generally be true during type legalization, so this patch does
that.
On AMDGPU the effect is that we use i32 (a sane type) instead of i64
(pointer sized type) for more shift amounts, which in turn allows more
formation of rotates and funnel shifts pre-legalization.
Differential Revision: https://reviews.llvm.org/D154960
InstCombine should have taken care of this, but I think
this is more useful in the future when the expansion
tries to handle multiple cases at a time with fcmp.
x87 looks worse to me but the only thing I know about it is that
I aggressively do not care about it.
https://reviews.llvm.org/D143198
Combine the two checks into a check if the exponent bits are 0. The
inverted case isn't reachable until a future change, and GlobalISel
currently doesn't attempt the inversion optimization.
https://reviews.llvm.org/D143182
The aarch64 backend will benefit from expanding 64vector sdiv/udiv into mulh
using shift(mul(ext, ext)), as the larger type size is legal and the mul(ext,
ext) can efficiently use smull/umull instructions. This extends the existing
code in GetMULHS to handle vector types for it.
Differential Revision: https://reviews.llvm.org/D154049
The ExpandLibcallResult result was a bitcast and not the direct call
result, so we couldn't find the chain. Use the new separate chain
return value instead.
If the libcall expansion requires use of the inserted call's result
chain, it's unreliable to query it from the main result. The call
lowering may have added additional casts or other obscuring operations
we don't want to parse through.
During legalization, we can end up with shuffles that are identity masks, so
act like extract_subvector, but do not simplify to extract_subvector. This
adjusts the profitability heuristic in foldExtractSubvectorFromShuffleVector to
allow identity vectors that do not start at element 0. Undef masks elements are
excluded as it can be more useful to keep the undef elements.
Differential Revision: https://reviews.llvm.org/D153504
If we have a store of a load with no other uses in between it, it's
considered dead and is removed. So sometimes when legalizing a fixed
length vector store of an insert, we end up producing better code
through scalarization than without.
An example is the follow below:
%a = load <4 x i64>, ptr %x
%b = insertelement <4 x i64> %a, i64 %y, i32 2
store <4 x i64> %b, ptr %x
If this is scalarized, then DAGCombine successfully removes 3 of the 4
stores which are considered dead, and on RISC-V we get:
sd a1, 16(a0)
However if we make the vector type legal (-mattr=+v), then we lose the
optimisation because we don't scalarize it.
This patch attempts to recover the optimisation for vectors by
identifying patterns where we store a load with a single insert
inbetween, replacing it with a scalar store of the inserted element.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D152276
Add an intrinsic which returns the two pieces as multiple return
values. Alternatively could introduce a pair of intrinsics to
separately return the fractional and exponent parts.
AMDGPU has native instructions to return the two halves, but could use
some generic legalization and optimization handling. For example, we
should be able to handle legalization of f16 on older targets, and for
bf16. Additionally antique targets need a hardware workaround which
would be better handled in the backend rather than in library code
where it is now.
The functions are identical except for the opcode of the node.
We can have a single function and use N->getOpcode().
Reviewed By: luke, paulwalker-arm
Differential Revision: https://reviews.llvm.org/D153929
Legalizing in `AArch64TargetLowering::LowerCONCAT_VECTORS()` and combining in `DAGCombiner::visitCONCAT_VECTORS()` could cause an infinite loop.
This commit fixes that issue by conditionally skipping the combining.
Fix https://github.com/llvm/llvm-project/issues/63322
Reviewed By: RKSimon, MaskRay
Differential Revision: https://reviews.llvm.org/D153316
In NVPTX `ReplaceVectorLoad()`, i1 and i8 types are promoted to i16,
followed by a truncate operation. Thus, v2i8 (or v2i1) and v2i16 will
have the same VTList, which causes a collision in CSEMap.
To differentiate the original VTList, let's add the size in generating
an ID. Otherwise the compiler crashes in refineAlignment:
`MMO->getSize() == getSize() && "Size mismatch!"`
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D153712
If the index needs to be scaled by a scalable size, just give up.
Fixes#63459
Reviewed By: frasercrmck, RKSimon
Differential Revision: https://reviews.llvm.org/D153601
The current implementation tries to handle the high and low halves
separately, but that's less efficient in most cases; use a wide SETCC
instead.
Differential Revision: https://reviews.llvm.org/D151358
Partial progress towards replacing in-tree uses of
`Type::getPointerTo()`.
If `getPointerTo()` is used solely to support an unnecessary bitcast,
remove the bitcast.
Reviewed By: barannikov88, nikic
Differential Revision: https://reviews.llvm.org/D153307
`__xray_customevent` and `__xray_typedevent` are built-in functions in Clang.
With -fxray-instrument, they are lowered to intrinsics llvm.xray.customevent and
llvm.xray.typedevent, respectively. These intrinsics are then lowered to
TargetOpcode::{PATCHABLE_EVENT_CALL,PATCHABLE_TYPED_EVENT_CALL}. The target is
responsible for generating a code sequence that calls either
`__xray_CustomEvent` (with 2 arguments) or `__xray_TypedEvent` (with 3
arguments).
Before patching, the code sequence is prefixed by a branch instruction that
skips the rest of the code sequence. After patching
(compiler-rt/lib/xray/xray_AArch64.cpp), the branch instruction becomes a NOP
and the function call will take effects.
This patch implements the lowering process for
{PATCHABLE_EVENT_CALL,PATCHABLE_TYPED_EVENT_CALL} and implements the runtime.
```
// Lowering of PATCHABLE_EVENT_CALL
.Lxray_sled_N:
b #24
stp x0, x1, [sp, #-16]!
x0 = reg of op0
x1 = reg of op1
bl __xray_CustomEvent
ldrp x0, x1, [sp], #16
```
As a result, two updated tests in compiler-rt/test/xray/TestCases/Posix/ now
pass on AArch64.
Reviewed By: peter.smith
Differential Revision: https://reviews.llvm.org/D153320
If we have an excessive number of stores in a single chain then the candidate WideVT may exceed the maximum width of an EVT integer type (and will assert) - but since mergeTruncStores doesn't support anything wider than a i64 store we should just early-out if we've collected more than stores than that.
Fixes#63306
This call to reassociateReduction is used by both fminnum/fmaxnum and
fminimum/fmaximum. In adding support for fminimum/fmaximum we appear to be
fixing the use of an incorrect reduction type, which should have only applied
to minnum/maxnum.
I also believe that it doesn't need nsz and reassoc to perform the
reassociation. For float min/max it should always be valid.
Differential Revision: https://reviews.llvm.org/D153247
When the sign of either of the operands is known, it is possible to
determine what the saturating value will be without having to compute it
using the sign bits.
Differential Revision: https://reviews.llvm.org/D153575
When eliding an argument copy, we need to update the chain to ensure
the argument reads are performed before later writes. However, the
code doing this only handled this for the first part of the argument.
If the argument had multiple parts, the chains of the later parts were
dropped. Make sure we preserve all chains.
Fixes https://github.com/llvm/llvm-project/issues/63430.