Use the maximum 64 for BitWidth of getVScaleRange to avoid returning an empty range.
the previous changes bring in a Buildbot failure because MinSVEVectorSize = MinSVEVectorSize.
error: explicitly assigning value of variable of type 'unsigned int' to itself [-Werror,-Wself-assign]
Reviewed By: sdesmalen, nikic, dmgreen
Differential Revision: https://reviews.llvm.org/D155708
Use the maximum 64 for BitWidth of getVScaleRange to
avoid returning an empty range.
Reviewed By: sdesmalen, nikic, dmgreen
Differential Revision: https://reviews.llvm.org/D155708
This mostly copies cases that already exist in ValueTracking, although
it skips the more complex ones. Those can be filled in as needed.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D149199
Add an intrinsic which returns the two pieces as multiple return
values. Alternatively could introduce a pair of intrinsics to
separately return the fractional and exponent parts.
AMDGPU has native instructions to return the two halves, but could use
some generic legalization and optimization handling. For example, we
should be able to handle legalization of f16 on older targets, and for
bf16. Additionally antique targets need a hardware workaround which
would be better handled in the backend rather than in library code
where it is now.
In NVPTX `ReplaceVectorLoad()`, i1 and i8 types are promoted to i16,
followed by a truncate operation. Thus, v2i8 (or v2i1) and v2i16 will
have the same VTList, which causes a collision in CSEMap.
To differentiate the original VTList, let's add the size in generating
an ID. Otherwise the compiler crashes in refineAlignment:
`MMO->getSize() == getSize() && "Size mismatch!"`
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D153712
I have fixed an existing DAGCombiner bug that caused the previous assertion failure.
See 7163539466.
Original message
We don't have VP_ANY_EXTEND or VP_SIGN_EXTEND_INREG yet so I've
deviated a little from the non-VP lowering.
My goal was to fix the crashes that occurs on these test cases without this patch.
Reviewed By: fakepaper56
Differential Revision: https://reviews.llvm.org/D152854
We don't have VP_ANY_EXTEND or VP_SIGN_EXTEND_INREG yet so I've
deviated a little from the non-VP lowering.
My goal was to fix the crashes that occurs on these test cases without this patch.
Reviewed By: fakepaper56
Differential Revision: https://reviews.llvm.org/D152854
Type legalization may need to promote the result to the same type
as the input. Instead of forming a vp_truncate with the same
source and dest type, don't create any vp_truncate.
Handling in getNode like is done for ISD::TRUNCATE.
This patch introduces the reduction intrinsic for floating point minimum
and maximum which has the same semantics (for NaN and signed zero) as
llvm.minimum and llvm.maximum.
Reviewed-By: nikic
Differential Revision: https://reviews.llvm.org/D152370
rG2eb7cbf987f21 added this code, which results in crash for vector
nodes. This patch solves it by skipping for the vector nodes.
Thanks Steve for helping reducing the test case.
Co-authored-by: Steve Merritt <steve.merritt@intel.com>
Reviewed By: goldstein.w.n
Differential Revision: https://reviews.llvm.org/D152492
AMDGPU has native instructions and target intrinsics for this, but
these really should be subject to legalization and generic
optimizations. This will enable legalization of f16->f32 on targets
without f16 support.
Implement a somewhat horrible inline expansion for targets without
libcall support. This could be better if we could introduce control
flow (GlobalISel version not yet implemented). Support for strictfp
legalization is less complete but works for the simple cases.
The change implements intrinsics 'get_fpenv', 'set_fpenv' and 'reset_fpenv'.
They are used to read floating-point environment, set it or reset to
some default state. They do the same actions as C library functions
'fegetenv' and 'fesetenv'. By default these intrinsics are lowered to calls
to these functions.
The new intrinsics specify FP environment as a value of integer type, it
is convenient of most targets where the FP state is a content of some
register. Some targets however use long representations. On X86 the size
of FP environment is 256 bits, and even half of this size is not a legal
ibteger type. To facilitate legalization in such cases, two sets of DAG
nodes is used. Nodes GET_FPENV and SET_FPENV are used when FP
environment may be represented by a legal integer type. Nodes
GET_FPENV_MEM and SET_FPENV_MEM consider FP environment as a region in
memory, much like `fesetenv` and `fegetenv` do. They are used when
target has long representation for floationg-point state.
Differential Revision: https://reviews.llvm.org/D71742
FoldSetCC() returns UNDEF in a number of cases. However, the SetCC
result must follow BooleanContents. Unless the type is a
pre-legalization i1 or we have UndefinedBooleanContents, the use of
UNDEF will not uphold the requirement that the top bits are either
zero or match the low bit. In such cases, return zero instead.
Fixes https://github.com/llvm/llvm-project/issues/63055.
Differential Revision: https://reviews.llvm.org/D151883
Given an insert of a scalar load into a vector shuffle with mask
u,0,1,2,3,4,5,6 or 1,2,3,4,5,6,7,u (depending on the insert index),
it can be more profitable to convert to a single load and avoid the
shuffles. This adds a DAG combine for it, providing the new load is
still fast.
Differential Revision: https://reviews.llvm.org/D151029
This exposed a miscompile due to incorrect flag preservation in
integer type legalization, which has been fixed in D151472.
-----
This patch is a continuation of D150110. It separates the cases for
ADD and SUB into their own cases so that computeForAddSub can be
directly called and the NSW flag passed. This allows better
optimization when the NSW flag is enabled, and allows fixing up the
TODO that was there previously in SimplifyDemandedBits.
Differential Revision: https://reviews.llvm.org/D150769
This reverts commit 9b92f70d47. The issue
with the re-applied change was an implicit truncation due to the
multiplication. Although the operations were converted to `APInt`, the
values were implicitly converted to `long` due to the typing rules.
Fixes: #59594
Differential Revision: https://reviews.llvm.org/D140347
The generic implementation is umin(TC, VF * vscale).
Lowering to vsetvli for RISC-V will come in a future patch.
This patch is a pre-requisite to be able to CodeGen vectorized code from
D99750.
Reviewed By: reames, frasercrmck
Differential Revision: https://reviews.llvm.org/D149916
This patch is a continuation of D150110. It separates the cases for
ADD and SUB into their own cases so that computeForAddSub can be
directly called and the NSW flag passed. This allows better
optimization when the NSW flag is enabled, and allows fixing up the
TODO that was there previously in SimplifyDemandedBits.
Differential Revision: https://reviews.llvm.org/D150769
Define intersectWith and unionWith as two complementary ways of
combining KnownBits. The names are chosen for consistency with
ConstantRange.
Deprecate commonBits as a synonym for intersectWith.
Differential Revision: https://reviews.llvm.org/D150443
The current logic is pretty limitted unless the `Op` is a
constant. This at least covers more obvious cases.
Reviewed By: craig.topper, foad
Differential Revision: https://reviews.llvm.org/D149196
Both of these functions recursively call themselves so it makes sense
to limit that upper bound.
Differential Revision: https://reviews.llvm.org/D149195
We were missing any support for ISD::INTRINSIC_W_CHAIN/INTRINSIC_VOID
used for memory operations.
For ISD::PREFETCH and target memory nodes we didn't add the subclass
data.
This patch handles all MemIntrinsicSDNode in one place and adds the
missing subclass data.
Note. Unlike load/stores we don't add the memory VT in AddNodeIDCustom or getMemIntrinsicNode. Not sure why.
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D150387
For constant BUILD_VECTORs the operands need to be legal types. This can mean
that when the number of sign bits is calculated it may look that the entire
constant and inefficiently produce less sign bits than it could. For example i8
vectors could use i32 elements, for which 0x000000ff would be incorrectly
limited to 1 sign bit as the original value has 24 sign bits. This makes it
look at the constant directly, truncated to the correct type for the element so
that it can correctly return 8.
Differential Revision: https://reviews.llvm.org/D149956
Add basic computeOverflowForSignedAdd helper to recognise that sadd overflow can't occur if both operands have more that one sign bit.
Add computeOverflowForAdd wrapper that calls computeOverflowForSignedAdd/computeOverflowForUnsignedAdd depending on the IsSigned argument, and use this in DAGCombiner::visitADDO
This is rework of;
- rG13e77db2df94 (r328395; MVT)
Since `LowLevelType.h` has been restored to `CodeGen`, `MachinveValueType.h`
can be restored as well.
Depends on D148767
Differential Revision: https://reviews.llvm.org/D149024
The only way known bits could help identify a known power of two is if
it knows exactly which power of two it is, i.e. if it is a known
constant. But in that case the value should have been simplified to a
constant already. So save some compile time by not calling
computeKnownBits.
Differential Revision: https://reviews.llvm.org/D149325
This patch replaces the uses of PointerUnion.is function by llvm::isa,
PointerUnion.get function by llvm::cast, and PointerUnion.dyn_cast by
llvm::dyn_cast_if_present. This is according to the FIXME in
the definition of the class PointerUnion.
This patch does not remove them as they are being used in other
subprojects.
Reviewed By: mehdi_amini
Differential Revision: https://reviews.llvm.org/D148449
Similar to the existing SelectionDAG::SplitVector helper, this helper creates the EXTRACT_ELEMENT nodes for the LO/HI halves of the scalar source.
Differential Revision: https://reviews.llvm.org/D147264
The patch customized lower vector type ISD::STRICT_FP_ROUND to RISCVISD::STRICT_FP_ROUND.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D147113
Extend the existing store(load()) removal code to account for intermediate truncates that some targets won't remove with canCombineTruncStore - we only care about the load/store MemoryVT.
Fixes regression from D146121