The global constant arguments could be in a different address space
than the first argument, so we have to add another overloaded argument.
This patch was originally made for CHERI LLVM (where globals can be in
address space 200), but it also appears to be useful for in-tree targets
as can be seen from the test diffs.
Differential Revision: https://reviews.llvm.org/D138722
The old code didn't bother to memoize blocks for which exact exit count is not
known. As result, in situation when exact isn't known but symbolic is known, this
info was lost. This patch fixes the situation: now we memoize when symbolic is
known (exact always implies symbolic, so this is a strict superset of what was before).
Differential Revision: https://reviews.llvm.org/D139515
Reviewed By: nikic
This reverts commit 7883e5b061.
The original commit was reverted that it didn't update test files after D136263
landed. The recommit fixed those.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D139509
The patch made VectorLegalizer expand ISD::VP_FSHL and ISD::VP_FSHR to
achieve the codegen.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D138379
This was added in 29e2d9461a and likely never worked in a useful
way.
The test added for it fails when converted to opaque pointers, since
the lifetime intrinsic now directly uses the address. The code was
only trying to handle a user indirectly through a bitcast
instruction. That would never have been useful; a bitcast of a global
value would be folded to a ConstantExpr cast.
I also don't understand why it was special casing use_empty on the
cast. Relax the check to be either BitCastOperator or
AddrSpaceCastOperator. In practice, BitCastOperator won't appear
today.
I believe the change in parallel_deletion_cg_update is a correct
improvement but I didn't fully follow it. .omp_outlined..0 is used in
a constant expression cast to a call which ends up getting deleted.
LAA analyzes cross-iteration memory dependencies, as such AA should
not make assumptions about equality of values inside the loop, as
they may come from different iterations.
Fix this by exposing the MayBeCrossIteration AA flag and enabling
it for LAA.
Differential Revision: https://reviews.llvm.org/D137958
StackSafetyAnalysis/lifetime.ll had one bitcast removed that may have
mattered. The concluded lifetime is longer based on the underlying
alloca, instead of the bitcasted pointer so left that as a pointless
cast.
local.ll memintrin.ll needed some manual fixes
This reuses the routine implemented in 0e6f0b7 to implement several existing TODOs. Many of the operations scale linearly with LMUL; this change represents that in the cost model.
Differential Revision: https://reviews.llvm.org/D139039
At the IR level, we generally assume that constants are free to materialize. However, for RISCV due to some quirks of the ISA, materializing arbitrary constants can be rather expensive. We frequently fallback to constant pool loads.
We've been slowly moving in the direction of modeling the cost of the remat as part of the instruction cost. This has the effect of disincentivizing vectorization - mostly SLP - when we'd have to materialize an expensive constant.
We need better modeling of which constants are expensive and not, but the moment let's be consistent with how we model arithmetic and memory instructions. The difference between the two is that arithmetic can sometimes fold a splat operation which stores can not.
Differential Revision: https://reviews.llvm.org/D138941
The logic in isWideningInstruction handles instructions like uaddw and
smull, where 'add(x, zext(y))' or 'mul(sext(x), sext(y))' can be
converted to single instructions, making the extends free. This doesn't
apply the same to SVE instructions though.
https://godbolt.org/z/695d3nhGd
(There are instructions like SMULLT/B, but they require top/bottom lane
interleaving. That is similar to MVE instructions, which required a
special pass to perform the lane interleaving).
This patch just bails out of the call to isWideningInstruction if the
vector is scalable, getting a more accurate cost.
Differential Revision: https://reviews.llvm.org/D138591
This patch adds basic broadcast shuffle costs in order to enable SLP vectorization.
And adds `getLMULCost` to consider reciprocal throughput for different LMUL.
Reviewed By: reames
Differential Revision: https://reviews.llvm.org/D137276
When an internal global is passed to a 'nocallback' call as
a 'nocapture' pointer, it cannot escape through this call and
be indirectly referenced in this module.
So it must not alias with any pointer in the module.
This may provide some remedy for Fortran module-private array descriptors
that are usually passed by address to some runtime functions
(e.g. to allocation/deallocation functions). In general, a good aliasing
information derived from Fortran language rules would solve the same issue,
but I think this change may be beneficial as-is (given that nocapture,
nocallback attributes are properly set).
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D138336
This patch implements getArithmeticInstrCost for RISCV, supports cost
model for integer and float vector arithmetic instructions.
Differential Revision: https://reviews.llvm.org/D133552 (Original patch by jacquesguan. Subset by me with todos added.)
Currently the model over estimates the cost of a udiv instruction with one constant. The correct cost for a udiv instruction is
insert_cost * extract_cost * num_elements
Reviewed By: dmgreen
Differential Revision: https://reviews.llvm.org/D135991
Similar to 9f9e8ba114, add support for memcyp_chk to
MemoryLocation::getForArgument.
The size argument for memcpy_chk is an upper bound for the size of the
pointer argument. memcpy_chk may read/write less than the specified length,
if it exceeds the specified max size and aborts.
Reviewed By: xbolva00, jdoerfert
Differential Revision: https://reviews.llvm.org/D138613
At the moment, getRangeRef may overflow the stack for very deeply nested
expressions.
This patch introduces a new getRangeRefIter function, which first builds
a worklist of N-ary expressions and phi nodes, followed by their
operands iteratively.
getRangeRef has been extended to also take a Depth argument and it
switches to use getRangeRefIter once the depth reaches a certain
threshold.
This ensures compile-time is not impacted in general. Note that
the iterative algorithm may lead to a slightly different evaluation
order, which could result in slightly worse ranges for cyclic phis.
https://llvm-compile-time-tracker.com/compare.php?from=23c3eb7cdf3478c9db86f6cb5115821a8f0f5f40&to=e0e09fa338e77e53242bfc846e1484350ad79773&stat=instructionsFixes#49579.
Reviewed By: mkazantsev
Differential Revision: https://reviews.llvm.org/D130728
nearbyint has the property to execute without exception.
For not modifying fflags, the patch added new machine opcode
PseudoVFROUND_NOEXCEPT_V that expands vfcvt.x.f.v and vfcvt.f.x.v between a pair
of frflags and fsflags.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D137685
The patch also added function expandVPBSWAP to expand ISD::VP_BSWAP nodes.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D137928
All of our insert/extract ops work on 128-bit lanes.
For `Insert`, we need to extract affected 128-bit lane,
unless it's being fully overwritten (FIXME: do we need to be
careful about legalization-induced padding that we obviously don't demand?),
perform insertions, and then insert the 128-bit lane back.
But hold on. If we are operating on an 256-bit legal vector,
and thus have two 128-bit subvectors, and are fully overwriting them both,
we don't actually need to insert *both* subvectors,
only the second one, into the implicitly-widened first one.
Also, `Insert` wasn't actually querying the costs,
but just assuming them to be `1`.
`getShuffleCost(TTI::SK_ExtractSubvector)` notes:
```
// Note that in general, the insertion starting at the beginning of a vector
// isn't free, because we need to preserve the rest of the wide vector.
```
... so as far as i can tell, we didn't account for that.
I was hoping this would allow vectorization at a higher VF at one case i looked at,
but the subvector insertion cost is still dis-advising that.
The change for `Extract` is NFC, and is for consistency only,
i wanted to get rid of of that weird explicit discounting of insertion of 0'th element,
since the general code should already deal with that.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D137913