In https://github.com/llvm/llvm-project/pull/143183 was mistakenly added
a default value to `python_root_dir` in lit tests of compiler-rt.
This is unused by the lit tests of compiler-rt, as it was meant to be
used by `lldb`.
This patch removes this change.
Currently the loop vectorizer can only vectorize interleave groups for
power-of-2 factors at scalable VFs by recursively interleaving
[de]interleave2 intrinsics.
However after https://github.com/llvm/llvm-project/pull/124825 and
#139893, we now have [de]interleave intrinsics for all factors up to 8,
which is enough to support all types of segmented loads and stores on
RISC-V.
Now that the interleaved access pass has been taught to lower these in
#139373 and #141512, this patch teaches the loop vectorizer to emit
these intrinsics for factors up to 8, which enables scalable
vectorization for non-power-of-2 factors.
As far as I'm aware, no in-tree target will vectorize a scalable
interelave group above factor 8 because the maximum interleave factor is
capped at 4 on AArch64 and 8 on RISC-V, and the
`-max-interleave-group-factor` CLI option defaults to 8, so the
recursive [de]interleaving code has been removed for now.
Factors of 3 with scalable VFs are also turned off in AArch64 since
there's no lowering for [de]interleave3 just yet either.
`narrow-switch.ll` test has been regenerated via latest UTC using
`--prefix-filecheck-ir-name _`, so as to avoid conflicts with
scripted variable names.
As pointed out in https://github.com/llvm/llvm-project/pull/143782,
these tests were specifying the size in bits instead of bytes.
In order to preserve the intent of the tests, add a use of %src,
which prevents stack-move optimization. These are supposed to test
the handling of scoped alias metadata in call slot optimization.
This patch extends the TMA G2S intrinsics with the
support for cta_group::1/2 available from Blackwell onwards.
The existing intrinsics are auto-upgraded with a default
value of '0' for the `cta_group` flag operand.
* lit tests are added for all combinations of the newer variants.
* Negative tests are added to validate the error-handling
when the value of the cta_group flag falls out-of-range.
* The generated PTX is verified with a 12.8 ptxas executable.
Signed-off-by: Durgadoss R <durgadossr@nvidia.com>
The assertions in llvm/test/CodeGen/X86/tailcc-ssp.ll were outdated. The
initial comment indicated they were generated with
`utils/update_llc_test_checks.py UTC_ARGS: --version 5`, but this was
not accurate based on the file's content.
Running `utils/update_llc_test_checks.py` regenerated the assertions,
aligning them with the current `llc` output.
This commit ensures that the test's claimed behavior accurately reflects
the actual `llc` output, even though the tests were already passing.
This was identified by @efriedma-quic during review of #136290.
Submitting a separate PR to make sure these changes stay isolated.
Make use of targets that support BSR "pass through behaviour" on a zero input to remove a CMOV thats performing the same function
BSF will be a trickier patch as we need to make sure it works with the "REP BSF" hack in X86MCInstLower
Manage branch weights for the BranchOnCond in the middle block in VPlan.
This requires updating VPInstruction to inherit from VPIRMetadata, which
in general makes sense as there are a number of opcodes that could take
metadata.
There are other branches (part of the skeleton) that also need branch
weights adding.
PR: https://github.com/llvm/llvm-project/pull/143035
When sharing same compiler instance for multiple compilations, we reset
source manager's file id tables in between runs. Diagnostics engine
keeps a cache based on these file ids, that became dangling references
across compilations.
This patch makes sure we reset those whenever sourcemanager is trashing
its FileIDs.
We can get the `mvendorid/marchid/mimpid` via hwprobe and then we
can compare these IDs with those defined in processors to find the
CPU name.
With this change, `-mcpu/-mtune=native` can set the proper name.
This was defining aliases of the i32 divrem functions for the i8
and i16 cases. This is unnecessary and was unused. The divrem
candidate cases wouldn't have formed with illegal types in the
first place, so codegen wouldn't even query these.
These are the easy cases that do not really depend on the subtarget,
other than for the deceptive predicates on the subtarget class. Most
of the rest of the cases here also do not, but this is obscured by
going through helper predicates added onto the subtarget which hide
dependence on TargetOptions.
Add parentheses around the assert conditions.
Without this gcc warned like
../lib/Target/AMDGPU/GCNSchedStrategy.cpp:2250: warning: suggest parentheses around '&&' within '||' [-Wparentheses]
2250 | NewMI != RegionBounds.second && "cannot remove at region end");
and
../../clang/lib/Sema/SemaOverload.cpp:11326:39: warning: suggest parentheses around '&&' within '||' [-Wparentheses]
11326 | DeferredCandidatesCount == 0 &&
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~
11327 | "Unexpected deferred template candidates");
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
This depth limits a linear search (rather than the usual potentially
exponential one) and is not particularly important for compile-time in
practice.
The change in #137297 is going to increase the length of GEP chains, so
I'd like to increase this limit a bit to reduce the chance of
regressions (https://github.com/dtcxzyw/llvm-opt-benchmark/pull/2419
showed a 13% increase in SearchLimitReached). There is no particular
significance to the new value of 10.
Compile-time is neutral.
Close https://github.com/llvm/llvm-project/issues/119947
As discussed in the above thread, we shouldn't write specializations
with local args in reduced BMI. Since users can't find such
specializations any way.
This testcase was removed in 4cafd28b7d,
as a082f665f8 had made it no longer
trigger the error that it was supposed to do. (Because the latter of
those two commits makes the symbol "__rt_sdiv" be included among the
potential libcalls listed by lto::LTO::getRuntimeLibcallSymbols().)
Readd the test as a positive test, making sure that such libcalls can
get linked.
We do have preexisting test coverage for LTO libcalls overall in
libcall-archive.ll, but readd this test to cover specifically the ARM
division helper functions as well.
This happens because `getShadow(Value *V)` has a special case for fully undefined/poisoned values, but partially undefined values fall-through and are given a clean shadow. This leads to false negatives (no false positives).
Note: MSan correctly handles InsertElementInst, but the shadow of the initial constant vector may still be wrong and be propagated.
Showing that the same approximation happens for other composite types is left as an exercise for the reader.
Introduce MCAsmInfo::UsesSetToEquateSymbol to control the preferred
syntax for symbol equating. We now favor the more readable and common
`symbol = expression` syntax over `.set`. This aligns with pre- https://reviews.llvm.org/D44256 behavior.
On Apple platforms, this resolves a clang -S vs -c behavior difference (resolves#104623).
For targets whose = support is unconfirmed, UsesSetToEquateSymbol is set to false.
This also minimizes test updates.
Pull Request: https://github.com/llvm/llvm-project/pull/142289
`fir.local` ops are not supposed to have any uses at this point (i.e.
during lowering to LLVM). In case of serialization, the
`fir.do_concurrent` users are expected to have been lowered to
`fir.do_loop` nests. In case of parallelization, the `fir.do_concurrent`
users are expected to have been lowered to the target parallel model
(e.g. OpenMP).
This hopefully resolved a build issue introduced by
https://github.com/llvm/llvm-project/pull/142567 (see for example:
https://lab.llvm.org/buildbot/#/builders/199/builds/4009).
* Merge the special case into isStaticLinkTimeConstant
* Generalize isUndefWeak to isUndefined. undefined non-weak is an error
case. We choose to be general, which also brings us in line with GNU ld.
This attribute is unsupported in GCC, so far it worked because before
GCC15 did not define this macros in _CHKFEAT_GCS in arm_acle.h [1]
With gcc15 compiler libunwind's check for this macros is succeeding and
it ends up enabling 'gcs' by using function attribute, this works with
clang but not with gcc.
We can see this in rust compiler bootstrap for aarch64/musl when system
uses gcc15, it ends up with these errors
Building libunwind.a for aarch64-poky-linux-musl
```
cargo:warning=/mnt/b/yoe/master/sources/poky/build/tmp/work/cortexa57-poky-linux-musl/rust/1.85.1/rustc-1.85.1-src/src/llvm-project/libunwind/src/UnwindLevel1.c:191:1: error: arch extension 'gcs' should be prefixed by '+' cargo:warning= 191 | unwind_phase2(unw_context_t *uc, unw_cursor_t *cursor, _Unwind_Exception *exception_object) {
cargo:warning= | ^~~~~~~~~~~~~
cargo:warning=/mnt/b/yoe/master/sources/poky/build/tmp/work/cortexa57-poky-linux-musl/rust/1.85.1/rustc-1.85.1-src/src/llvm-project/libunwind/src/UnwindLevel1.c:337:22: error: arch extension 'gcs' should be prefixed by '+'
cargo:warning= 337 | _Unwind_Stop_Fn stop, void *stop_parameter) {
cargo:warning= | ^~~~~~~~~~~~~~~
```
[1] https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=5a6af707f0af
Signed-off-by: Khem Raj <raj.khem@gmail.com>
AAPCS64 reserves any of X9-X15 for a compiler to choose to use for this
purpose, and says not to use X16 or X18 like GCC (and the previous
implementation) chose to use. The X18 register may need to get used by
the kernel in some circumstances, as specified by the platform ABI, so
it is generally an unwise choice. Simply choosing a different register
fixes the problem of this being broken on any platform that actually
follows the platform ABI (which is all of them except EABI, if I am
reading this linux kernel bug correctly
https://lkml2.uits.iu.edu/hypermail/linux/kernel/2001.2/01502.html). As
a side benefit, also generate slightly better code and avoids needing
the compiler-rt to be present. I did that by following the XCore
implementation instead of PPC (although in hindsight, following the
RISCV might have been slightly more readable). That X18 is wrong to use
for this purpose has been known for many years (e.g.
https://www.mail-archive.com/gcc@gcc.gnu.org/msg76934.html) and also
known that fixing this to use one of the correct registers is not an ABI
break, since this only appears inside of a translation unit. Some of the
other temporary registers (e.g. X9) are already reserved inside llvm for
internal use as a generic temporary register in the prologue before
saving registers, while X15 was already used in rare cases as a scratch
register in the prologue as well, so I felt that seemed the most logical
choice to choose here.
The canonical pattern for bitmasked mul is currently
```
%val = and %x, %bitMask // where %bitMask is some constant
%cmp = icmp eq %val, 0
%sel = select %cmp, 0, %C // where %C is some constant = C' * %bitMask
```
In certain cases, where we are combining multiple of these bitmasked
muls with common factors, we are able to optimize into and->mul (see
https://github.com/llvm/llvm-project/pull/135274 )
This optimization lends itself to further optimizations. This PR
addresses one of such optimizations.
In cases where we have
`or-disjoint ( mul(and (X, C1), D) , mul (and (X, C2), D))`
we can combine into
`mul( and (X, (C1 + C2)), D) `
provided C1 and C2 are disjoint.
Generalized proof: https://alive2.llvm.org/ce/z/MQYMui
Allows forwarding memset to memcpy for mismatching unknown sizes if
overread has undef contents. In that case we can refine the undef bytes
to the memset value.
Refs #140954 which laid some of the groundwork for this.