It seems consistent to always return zero for known poison rather than
varying the value. We do the same elsewhere.
Differential Revision: https://reviews.llvm.org/D150922
Due to the nature of WebAssembly, it's always better to keep
rotates instead of trying to optimize it. Commit 9485d983
disabled the generation of fsh for rotates, however these
tests ensure that future changes don't change the behaviour for
the Wasm backend that tends to have different optimization
requirements than other architectures. Also see:
https://github.com/llvm/llvm-project/issues/62703
Differential Revision: https://reviews.llvm.org/D152126
We can compute a simpler expression for Lo for these cases. This
is an alternative for the test cases in D151180 that works for
more targets.
This is similar to some of the special cases we have for expanding
setcc operands.
Differential Revision: https://reviews.llvm.org/D151182
This is a follow-up to b71edfaa4e
since I forgot the lit.local.cfg files in that one.
Reformatting is done with `black`.
If you end up having problems merging this commit because you
have made changes to a python file, the best way to handle that
is to run git checkout --ours <yourfile> and then reformat it
with black.
If you run into any problems, post to discourse about it and
we will try to help.
RFC Thread below:
https://discourse.llvm.org/t/rfc-document-and-standardize-python-code-style
Reviewed By: barannikov88, kwk
Differential Revision: https://reviews.llvm.org/D150762
This intrinsic is meant to lower directly to the i8x16.shuffle instruction,
which takes its lane index arguments as immmediates. The ISel for the intrinsic
assumed that the lane index arguments were constants, so bitcode that
"incorrectly" used this intrinsic with non-immediate arguments caused an
assertion failure in the backend.
Avoid the crash by defining the lane index arguments to be immediates, matching
the underlying instruction. Update ISel accordingly. This change means that the
bitcode that previously caused a crash will now fail to validate.
Fixes#55559.
Reviewed By: dschuff
Differential Revision: https://reviews.llvm.org/D149898
The wasm64 versions of the v128.storeX_lane instructions was incorrectly defined
as returning a v128 value, which resulted in spurious drop instructions being
emitted and causing validation to fail. This was not caught earlier because
wasm64 has been experimental and not well tested. Update the relevant test file
to test both wasm32 and wasm64.
Fixes#62443.
Differential Revision: https://reviews.llvm.org/D149780
When selecting calls, currently we unconditionally remove `Wrapper`s of
the call target. But we are supposed to do that only when the target is
a function, an external symbol (= library function), or an alias of a
function. Otherwise we end up directly calling globals that are not
functions.
Fixes https://github.com/llvm/llvm-project/issues/60003.
Reviewed By: tlively, HerrCai0907
Differential Revision: https://reviews.llvm.org/D147397
Not sure the distinction between `call.ll` and `call-indirect.ll`,
because `call.ll` also seems to contain many `call_indirect` tests. Also
before D147033 `call-indirect.ll` only contained a single test and it
also tests it with `obj2yaml`, so I guess that file was created for
testing functionalities for object files as well.
We can probably merge these two someday. But anyway, this moves
`call_indirect_alloca` I added in D147033 to `call.ll`, given that that
file contains more `call_indirect` tests and I'm planning to add more
`call_indirect` tests in a followup CL.
Reviewed By: tlively
Differential Revision: https://reviews.llvm.org/D147396
Since clang started emitting roundeven intrinsics in a7d6593a0a, they would
cause a crash in the WebAssembly backend because it did not know the roundeven
library function signatures. Fix the crash by adding the signatures.
Differential Revision: https://reviews.llvm.org/D147476
WebAssembly tries to cast an `undef` to `CosntantSDNode` during `LowerAccessVectorElement`.
These operations will trigger an assertion error in cast.
To avoid this issue, we prevent casting, and abort the lowering operation.
A unit test is also included.
This patch fixes [pr61828](https://github.com/llvm/llvm-project/issues/61828)
Reviewed By: tlively
Differential Revision: https://reviews.llvm.org/D147198
Constants in BUILD_VECTOR may be down cast into a smaller value that fits LaneBits, i.e., the bit width of elements in the vector.
This cast didn't consider 2^N where it would be cast into -2^N, which still doesn't fit into LaneBits after casting.
This will cause an assertion in later legalization.
2^N should be cast into 0, and this patch reflects such behavior.
This patch also includes a test to reflect the fix.
This patch fixes [issue 61780](https://github.com/llvm/llvm-project/issues/61780)
Related patch: https://reviews.llvm.org/D108669
Reviewed By: tlively
Differential Revision: https://reviews.llvm.org/D147208
Currently calling stack locations is selected using `CALL` in ISel,
resulting in an invalid code and crashing in AsmPrinter. FastISel
correctly selects it will `CALL_INDIRECT`.
Fixes the problem reported in D146781.
Reviewed By: tlively, HerrCai0907
Differential Revision: https://reviews.llvm.org/D147033
Previously epilogues were incorrectly inserted after indirect tail calls because
they did not have the `isTerminator` property. Add that property and test that
they get correct epilogues. To be safe, also add other properties that were
defined for direct tail calls.
Differential Revision: https://reviews.llvm.org/D146569
A follow-on commit will add tests to this file and using the
update_llc_test_checks script will make that easier.
Differential Revision: https://reviews.llvm.org/D146568
fixed: #59095
Update libcall signatures to use multivalue return rather than returning via a pointer
when the multivalue features is enabled in the WebAssembly backend.
Reviewed By: tlively
Differential Revision: https://reviews.llvm.org/D146271
This patch skips redundant explicit masks of the shift count since
it is implied inside wasm shift instruction.
Differential Revision: https://reviews.llvm.org/D144619
After change with D144169, the codegen generates redundant instructions
like and and wrap. This fixes it.
Differential Revision: https://reviews.llvm.org/D144360
Recommit bbdf243579.
Original commit message:
If a chain of two selects share a true/false value and are controlled
by two setcc nodes, that are never both true, we can fold away one of
the selects. So, the following:
(select (setcc X, const0, eq), Y,
(select (setcc X, const1, eq), Z, Y))
Can be combined to:
select (setcc X, const1, eq) Z, Y
Differential Revision: https://reviews.llvm.org/D142535
Each operation was missing their inverted condition using olt or ogt.
Also, as we don't need to discern +/-0, I think we should also be
able to use ole and oge.
Differential Revision: https://reviews.llvm.org/D143581
Alignment of an alloca in IR can be lower than the preferred alignment
on purpose, but this override essentially treats the preferred
alignment as the minimum alignment.
The patch changes this behavior to always use the specified
alignment. If alignment is not set explicitly in LLVM IR, it is set to
DL.getPrefTypeAlign(Ty) in computeAllocaDefaultAlign.
Tests are changed as well: explicit alignment is increased to match
the preferred alignment if it changes output, or omitted when it is
hard to determine the right value (e.g. for pointers, some structs, or
weird types).
Differential Revision: https://reviews.llvm.org/D135462
This function was added for ARM targets, but aligning global/stack pointer
arguments passed to memcpy/memmove/memset can improve code size and
performance for all targets that don't have fast unaligned accesses.
This adds a generic implementation that adjusts the alignment to pointer
size if unaligned accesses are slow.
Review D134168 suggests that this significantly improves performance on
synthetic benchmarks such as Dhrystone on RV32 as it avoids memcpy() calls.
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D134282
This is trickier to handle in some other representations of funcrefs
that are being explored, so it makes sense to ensure we have some
coverage of this requirement.
Currently, in TargetLowering, if the target does not support fminnum, we lower
to fminimum if neither operand could be a NaN. But this isn't quite correct
because fminnum and fminimum treat +/-0 differently; so, we need to prove that
one of the operands isn't a zero, or we don't have signed zeros.
Differential Revision: https://reviews.llvm.org/D143256
ValueTracking attempts to match compare+select patterns to FP min/max
operations, but it was created before the newer IEEE-754-2019
minimum/maximum ops were defined. Ie, matchSelectPattern() does not
account for the -0.0/+0.0 behavior that is specified in the newer
standard.
FMINIMUM/FMAXIMUM nodes were created to map to the newer standard:
/// FMINIMUM/FMAXIMUM - NaN-propagating minimum/maximum that also treat -0.0
/// as less than 0.0. While FMINNUM_IEEE/FMAXNUM_IEEE follow IEEE 754-2008
/// semantics, FMINIMUM/FMAXIMUM follow IEEE 754-2018 draft semantics.
We could adjust ValueTracking to deal with signed zero, but it seems like
a moot point given the divergent NaN behavior discussed in D143056, so just
delete this possibility to avoid bugs when converting IR to SDAG.
Differential Revision: https://reviews.llvm.org/D143106
If a chain of two selects share a true/false value and are controlled
by two setcc nodes, that are never both true, we can fold away one of
the selects. So, the following:
(select (setcc X, const0, eq), Y,
(select (setcc X, const1, eq), Z, Y))
Can be combined to:
select (setcc X, const1, eq) Z, Y
Differential Revision: https://reviews.llvm.org/D142535
Previously all indices into [0 x T] arrays were considered in
range, which resulted in us incorrectly inferring inbounds for
all GEPs of that form. We should not consider them in range here,
and instead bail out of the rewriting logic (which would divide
by zero).
Do continue to consider 0 always in range, to avoid changing
behavior for zero-index GEPs.
Recommitting after fixing scalable vector crash.
Check for single smax pattern against zero when converting from a
small enough float.
Differential Revision: https://reviews.llvm.org/D142481
These are essentially add/sub 1 with a clamping value.
AMDGPU has instructions for these. CUDA/HIP expose these as
atomicInc/atomicDec. Currently we use target intrinsics for these,
but those do no carry the ordering and syncscope. Add these to
atomicrmw so we can carry these and benefit from the regular
legalization processes.