If the SRL for Hi constant folds, but we don't remoe those bits from
the Lo, we can end up with strange constant folding through DAGCombine later.
I've only seen this with constants being lowered to constant pools
during lowering on RISC-V.
The intrinsic uses ImmArg so TargetConstant would be consistent
with how other intrinsics are handled.
This hides the constants from type legalization so we can remove
the promotion support.
isel patterns are updated accordingly.
We'll need to generalize this fold to check for any zero upperbits to address some of the D155472 regressions, but this exposes a number of issues. For now, just use the general MaskedValueIsZero test instead of the assertzext.
This patch adjusts the legality check for riscv to use `cpop/cpopw` since `isOperationLegal(ISD::CTPOP, MVT::i32)` returns false on rv64gc_zbb.
Clang vs gcc: https://godbolt.org/z/rc3s4hjPh
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D156390
This will make it easy for callers to see issues with and fix up calls
to createTargetMachine after a future change to the params of
TargetMachine.
This matches other nearby enums.
For downstream users, this should be a fairly straightforward
replacement,
e.g. s/CodeGenOpt::Aggressive/CodeGenOptLevel::Aggressive
or s/CGFT_/CodeGenFileType::
reland [InlineAsm] wrap ConstraintCode in enum class NFC (#66003)
This reverts commit ee643b706b.
Fix up build failures in targets I missed in #66003
Kept as 3 commits for reviewers to see better what's changed. Will
squash when
merging.
- reland [InlineAsm] wrap ConstraintCode in enum class NFC (#66003)
- fix all the targets I missed in #66003
- fix off by one found by llvm/test/CodeGen/SystemZ/inline-asm-addr.ll
No need to special case add 0, N. SelectionDAG::getNode contains the canonicalization and simplification for this case, so no need to duplicate it here.
This reverts commit 2ca4d13612.
Also revert the followup, "[InlineAsm] fix botched merge conflict resolution"
This reverts commit 8b9bf3a9f7.
There were SystemZ and Mips build errors, too many to fix forward.
A splat index means the operation is reading from (writing to) the same
memory location. Generally, zero is the cheapest value to splat. As
such, we'd prefer to add the splatted value to the base, and use a
constant zero as the index operand.
Similar to
commit 2fad6e6985 ("[InlineAsm] wrap Kind in enum class NFC")
Fix the TODOs added in
commit 93bd428742 ("[InlineAsm] refactor InlineAsm class NFC
(#65649)")
Followup to D59363 which failed to handle the icmp(X,undef) -> isTrueWhenEqual case - similar to llvm::ConstantFoldCompareInstruction
As discussed on the review, this is affecting some previously reduced test cases, but will also prevent reductions from relying on this inconsistent behaviour in the future.
Reapplied after reversion at e1e3c75c7d with a tweak to the pseudo-probe-peep.ll test
Differential Revision: https://reviews.llvm.org/D158068
Followup to D59363 which failed to handle the icmp(X,undef) -> isTrueWhenEqual case - similar to llvm::ConstantFoldCompareInstruction
As discussed on the review, this is affecting some previously reduced test cases, but will also prevent reductions from relying on this inconsistent behaviour in the future.
Differential Revision: https://reviews.llvm.org/D158068
By default, `DAGCombiner` folds `sext x` to `zext x` when `x` is
non-negative. It will generate redundant `zext` inst seq on riscv64
(typically `slli (srli x, 32), 32`).
godbolt: https://godbolt.org/z/osf6adP1o
This patch applies the transform iff `zext` is **cheaper** than `sext`.
I would like to steal one of these bits to denote whether a kind may be
spilled by the register allocator or not, but I'm afraid to touch of any
this code using bitwise operands.
Make flags a first class type using bitfields, rather than launder data
around via `unsigned`.
Assuming the ADD is nsw then it may be sign-extended to merge with a SHL op in a similar fold to the existing (shl (add x, c1), c2) -> (add (shl x, c2), c1 << c2) fold.
This is most useful for helping to expose address math for X86, but has also touched several aarch64 test cases as well.
Alive2: https://alive2.llvm.org/ce/z/2UpSbJ
Differential Revision: https://reviews.llvm.org/D159198
Assuming the ADD is nsw then it may be sign-extended to merge with a SHL op in a similar fold to the existing (shl (add x, c1), c2) -> (add (shl x, c2), c1 << c2) fold.
This is most useful for helping to expose address math for X86, but has also touched several aarch64 test cases as well.
Alive2: https://alive2.llvm.org/ce/z/2UpSbJ
Differential Revision: https://reviews.llvm.org/D159198
On PPC there are instructions to store element from vector(e.g.
stxsdx/stxsiwx), and these instructions can be leveraged to avoid tail
constant in memset and constant splat array initialization.
This patch tries to explore these opportunities.
Reviewed By: shchenz
Differential Revision: https://reviews.llvm.org/D138883
In some cases where the same mask is used for multiple
extending masked loads it can be more efficient to combine
the zero- or sign-extend into the load even if it's not a
legal or custom operation. This leads to splitting up the
extending load into smaller parts, which also requires
splitting the mask. For SVE at least this improves the
performance of the SPEC benchmark x264 slightly on
neoverse-v1 (~0.3%), and at least one other benchmark
improves by around 30%. The uplift for SVE seems due to
removing the dependencies (vector unpacks) introduced
between the loads and the vector operations, since this
should increase the level of parallelism.
See tests:
CodeGen/AArch64/sve-masked-ldst-sext.ll
CodeGen/AArch64/sve-masked-ldst-zext.ll
https://reviews.llvm.org/D159191
We currently have log, log2, log10, exp and exp2 intrinsics. Add exp10
to fix this asymmetry. AMDGPU already has most of the code for f32
exp10 expansion implemented alongside exp, so the current
implementation is duplicating nearly identical effort between the
compiler and library which is inconvenient.
https://reviews.llvm.org/D157871
Extension to the signbit case, if the signbits extend down through all the demanded bits then SMIN/SMAX/UMIN/UMAX nodes can be simplified to a OR/AND/AND/OR.
Alive2: https://alive2.llvm.org/ce/z/mFVFAn (general case)
Differential Revision: https://reviews.llvm.org/D158364
Irritatingly, atomic_store had operands in the opposite order from
regular store. This made it difficult to share patterns between
regular and atomic stores.
There was a previous incomplete attempt to move atomic_store into the
regular StoreSDNode which would be better.
I think it was a mistake for all atomicrmw to swap the operand order,
so maybe it's better to take this one step further.
https://reviews.llvm.org/D123143
The CodeView `S_ARMSWITCHTABLE` debug symbol is used to describe the layout of a jump table, it contains the following information:
* The address of the branch instruction that uses the jump table.
* The address of the jump table.
* The "base" address that the values in the jump table are relative to.
* The type of each entry (absolute pointer, a relative integer, a relative integer that is shifted).
Together this information can be used by debuggers and binary analysis tools to understand what an jump table indirect branch is doing and where it might jump to.
Documentation for the symbol can be found in the Microsoft PDB library dumper: 0fe89a942f/cvdump/dumpsym7.cpp (L5518)
This change adds support to LLVM to emit the `S_ARMSWITCHTABLE` debug symbol as well as to dump it out (for testing purposes).
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D149367
Should add some minor type safety to the use of this information, since
there's quite a bit of metadata being laundered through an `unsigned`.
I'm looking to potentially add more bitfields to that `unsigned`, but I
find InlineAsm's big ol' bag of enum values and usage of `unsigned`
confusing, type-unsafe, and un-ergonomic. These can probably be better
abstracted.
I think the lack of static_cast outside of InlineAsm indicates the prior
code smell fixed here.
Reviewed By: qcolombet
Differential Revision: https://reviews.llvm.org/D159242
From the discussion in https://reviews.llvm.org/D158853, moving the truncate
into the splat helps more splatted scalar operands get selected on RISC-V, and
also avoids the need for splat_vector_parts on RV32.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D159147
That matches how such a SPLAT_VECTOR would have been type legalized
so assume it is ok to use for creating constants after type legalization.
Still need some improvements to SPLAT_VECTOR lowering.
This overlaps with some of what D158742 was trying to fix.
Reviewed By: luke
Differential Revision: https://reviews.llvm.org/D158870
We can work out the known bits for a given lane by concatenating the known bits of each scalar operand.
In the description of ISD::SPLAT_VECTOR_PARTS in ISDOpcodes.h it says that the
total size of the scalar operands must cover the output element size, but I've
added a stricter assertion here that the total width of the scalar operands
must be exactly equal to the element size. It doesn't seem to trigger, and I'm
not sure if there any targets that use SPLAT_VECTOR_PARTS for anything other
than v4i32 -> v2i64 splats.
We also need to include it in isTargetCanonicalConstantNode, otherwise
returning the known bits introduces an infinite combine loop.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D158852
This improves some cases where a splat_vector uses a build_pair that can be
simplified, e.g:
(rotl x:i64, splat_vector (build_pair x1:i32, x2:i32))
rotl only demands the bottom 6 bits, so this patch allows it to simplify it to:
(rotl x:i64, splat_vector (build_pair x1:i32, undef:i32))
Which in turn improves some cases where a splat_vector_parts is lowered on
RV32.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D158839
The CodeView `S_ARMSWITCHTABLE` debug symbol is used to describe the layout of a jump table, it contains the following information:
* The address of the branch instruction that uses the jump table.
* The address of the jump table.
* The "base" address that the values in the jump table are relative to.
* The type of each entry (absolute pointer, a relative integer, a relative integer that is shifted).
Together this information can be used by debuggers and binary analysis tools to understand what an jump table indirect branch is doing and where it might jump to.
Documentation for the symbol can be found in the Microsoft PDB library dumper: 0fe89a942f/cvdump/dumpsym7.cpp (L5518)
This change adds support to LLVM to emit the `S_ARMSWITCHTABLE` debug symbol as well as to dump it out (for testing purposes).
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D149367
There is no vp.fpclass after FCLASS_VL(D151176), try to support vp.fpclass.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D152993