Three-way splitting can create references between split fragments (warm
to cold or vice versa) that are not handled by
`isChildOf/isParentOf/isChildOrParentOf`. Generalize fragment
relationships to allow checking if two functions belong to one group,
potentially in presence of ICF which can join multiple groups.
Test Plan: NFC for existing tests
Reviewers: maksfb, ayermolo, rafaelauler, dcci
Reviewed By: rafaelauler
Pull Request: https://github.com/llvm/llvm-project/pull/99979
FPBits::signaling_nan() defaults to setting the MSB of the payload to 1.
The tests also used signaling NaNs with a payload of 0x123. With
float16, the 1 in 0x123 aligns to the MSB of the payload, therefore
0x123 is greater than the default payload. However, that is not the case
with more precise floating-point types.
Since the `in_bounds` attribute is mandatory, there's no need for logic
like this (`readOp.getInBounds()` is guaranteed to return a non-empty
ArrayRef):
```cpp
ArrayAttr inBoundsAttr = readOp.getInBounds()
? rewriter.getArrayAttr( readOp.getInBoundsAttr().getValue().drop_back(dimsToDrop))
: ArrayAttr();
```
Instead, we can do this:
```cpp
ArrayAttr inBoundsAttr = rewriter.getArrayAttr(
readOp.getInBoundsAttr().getValue().drop_back(dimsToDrop));
```
This is a small follow-up for #97049 - this change should've been
included there.
This reverts commit 9ba5244273.
Remove the recently added temporary option vectorize-use-legacy-cost-model
as discussed on the PR adding it, now that we branched for 19.x.
The documentation for the __arm_get_current_vg support routine specifies
that the following registers are call-preserved:
- X1-X15, X19-X29 and SP
- Z0-Z31
- P0-P15
This patch rewrites the implementation of this routine in compiler-rt,
as the current version does not guarantee that these registers will be
preserved.
In PowerPC ABI, a few initial arguments are passed through registers,
but their places in parameter save area are reserved, arguments passed
by memory goes after the reserved location.
For debugging purpose, we may want to save copy of the pass-by-reg
arguments into correct places on stack. The new option achieves by
adding new function level attribute and make argument lowering part
aware of it.
This patch preserves `undef` SDNodes that are `volatile` qualified.
Previously, these nodes would be discarded. The motivation behind this
change is to adhere to the
[LangRef](https://llvm.org/docs/LangRef.html#volatile-memory-accesses),
even though that doc is mostly in terms of LLVM-IR, it seems reasonable
to imply that the volatile constraints also imply to SDNodes.
> Certain memory accesses, such as
[load](https://llvm.org/docs/LangRef.html#i-load)’s,
[store](https://llvm.org/docs/LangRef.html#i-store)’s, and
[llvm.memcpy](https://llvm.org/docs/LangRef.html#int-memcpy)’s may be
marked volatile. The optimizers must not change the number of volatile
operations or change their order of execution relative to other volatile
operations. The optimizers may change the order of volatile operations
relative to non-volatile operations. This is not Java’s “volatile” and
has no cross-thread synchronization behavior.
Source: https://llvm.org/docs/LangRef.html#volatile-memory-accesses
#98863 merged AMDGPUAsanInstrumentation module which missed
TransformUtils to be linked to AMDGPUUtils.
This PR moves AMDGPUAsanInstrumentation files outside utils folder and
adds them to AMDGPUCodegen lib.
Summary:
Previously, the GPU built the `libc` in a fat binary version that was
used to pass this to the link job in offloading languages like CUDA or
OpenMP. This was mostly required because NVIDIA couldn't consume the
standard static library version. Recent patches have now created the
`clang-nvlink-wrapper` which lets us do that. Now, the C library is just
included implicitly by the toolchain (or passed with -Xoffload-linker
-lc).
This code can be fully removed, which will heavily simplify the build
(and removed some bugs and garbage files I've encoutnered).
A lot of cases have differing AVLs which aren't foldable, update them so the peephole triggers on them and add explicit cases for non-foldable AVLs.
Also rename it to vmv.v.v-peephole.ll since it's not actually a DAG combine.
And remove a TODO, it's correct to fold if the two passthrus are the same.
Currently, the `mlir-tblgen -verify-openmp-ops` pseudo-backend, which
only performs an OpenMP dialect-specific set of checks and produces no
output, is prevented from being added as a dependency to the
`MLIROpenMPOpsIncGen` tablegen target.
However, a consequence of this is that it is not triggered with every
modification of the OpenMPOps.td file it's intended to check, although
it should. This patch fixes the issue by letting the empty output file
to be added to the `TABLEGEN_OUTPUT` CMake variable used by the
`add_public_tablegen_target` command below to set up dependencies.
PR #91843 changed the algorithm used to find the next unplaced block so
that it iterates through the blocks in BlockFilter instead of iterating
through the blocks in the function and checking if they are in the block
filter. Unfortunately this sometimes results in a different block
ordering being chosen, as the order of blocks in BlockFilter comes from
the order in MachineLoopInfo, and in some cases this differs from the
order they are in the function. This can also give an end result that
has worse performance.
Fix this by making collectLoopBlockSet place blocks in its output in the
order that they are in the function.
clang_modules_include.gen.py works now, and I added some background to
classic_table.pass.cpp.
Opened https://github.com/picolibc/picolibc/issues/778 to see if that
one is possible to fix.
Somewhat confusingly a `SCEVMulExpr` is a `SCEVNAryExpr`, so can have
> 2 operands. Previously, the vscale immediate matching did not check
the number of operands of the `SCEVMulExpr`, so would ignore any
operands after the first two.
This led to incorrect codegen (and results) for ArmSME in IREE
(https://github.com/iree-org/iree), which sometimes addresses things
that are a `vscale * vscale` multiple away. The test added with this
change shows an example reduced from IREE. The second write should
be offset from the first `16 * vscale * vscale` (* 4 bytes), however,
previously LSR dropped the second vscale and instead offset the write by
`#4, mul vl`, which is an offset of `16 * vscale` (* 4 bytes).
Mark these intrinsics as atomic loads within LLVM to prevent hoisting
out of loops in cases where
the load is considered invariant.
Similar to https://github.com/llvm/llvm-project/pull/97707, but for
struct buffer loads.
The builtin computes the discriminator for a type, which can be used to
sign/authenticate function pointers and member function pointers.
If the type passed to the builtin is a C++ member function pointer type,
the result is the discriminator used to signed member function pointers
of that type. If the type is a function, function pointer, or function
reference type, the result is the discriminator used to sign functions
of that type. It is ill-formed to use this builtin with any other type.
A call to this function is an integer constant expression.
Co-Authored-By: John McCall rjmccall@apple.com
Update collectValuesToIgnore to also ignore dead instructions in the
loop. Such instructions will be removed by VPlan-based DCE and won't be
considered by the VPlan-based cost model.
This closes a gap between the legacy and VPlan-based cost model. In
practice with the default pipelines, there shouldn't be any dead
instructions in loops reaching LoopVectorize, but it is easy to generate
such cases by hand or automatically via fuzzers.
Fixes https://github.com/llvm/llvm-project/issues/99701.
Functions returning C_PTR were lowered to function returning intptr (i64
on 64bit arch). This caused conflicts when these functions were defined
as returning !fir.ref<none>/llvm.ptr in other compiler generated
contexts (e.g., malloc).
Lower them to return !fir.ref<none>.
This should deal with https://github.com/llvm/llvm-project/issues/97325
and https://github.com/llvm/llvm-project/issues/98644.
With fast-math, the ordered setcc nodes are converted to setcc nodes
which do not care about NaNs, so add patterns that use setlt, setle,
setgt and setge.
As previously noted in nsan.cpp we can implement
optimized variants of `__nsan_copy_values` and
`__nsan_set_value_unknown` if a memory operation
size is known.
Now the instrumentation creates calls to optimized functions if there is
4, 8 or 16-byte memory operation like
`memset(X, value, 4/8/16)` or `memcpy(dst, src, 4/8/16)`
nsan.cpp provides definitions of the optimized functions.
Some of SIE's post-mortem analysis infrastructure currently makes use of
.debug_aranges, so we'd like to ensure the section's presence in
PlayStation binaries. The simplest way to do this is to force emission
when the debugger tuning is set to SCE (which is in turn typically
initialized from the target triple). This also simplifies the driver.
llvm/test/DebugInfo/debuglineinfo-path.ll has been marked as UNSUPPORTED
on PlayStation. When aranges are emitted, the DWARF in the test case is
such that relocations need to be applied to the aranges section in order
for symbolization to work. An alternative approach would be to implement
the application of relocations in DWARFDebugArangeSet. While experiments
show that this can be made to work with a modest patch, the test cases
would be rather contrived. Since I expect the only utility for such a
change would be to make this test case pass for PlayStation targets, and
few - if any - outside of PlayStation care about aranges, UNSUPPORTED
would seem to be a more practical option.
This was originally commited as 22eb290a96 (#99629) and later reverted
at 84658fb82b (#99711) due to test failures on SIE built bots. These
failures shouldn't recur due to 3b24e5d450 (#99897) and the
aforementioned change to debuglineinfo-path.ll.
SIE tracker: TOOLCHAIN-16951
In a `runtimes` build on Solaris/amd64, there are two failues:
```
AddressSanitizer-Unit :: ./Asan-i386-calls-Dynamic-Test/failed_to_discover_tests_from_gtest
AddressSanitizer-Unit :: ./Asan-i386-inline-Dynamic-Test/failed_to_discover_tests_from_gtest
```
This happens when `lit` enumerates the tests with `--gtest_list_tests
--gtest_filter=-*DISABLED_*`. The error is twofold:
- The `LD_LIBRARY_PATH*` variables point at the 64-bit directory
(`lib/clang/19/lib/x86_64-pc-solaris2.11`) for a 32-bit test:
```
ld.so.1: Asan-i386-calls-Dynamic-Test: fatal:
/var/llvm/local-amd64-release-stage2-A-flang-clang18-runtimes/tools/clang/stage2-bins/./lib/../lib/clang/19/lib/x86_64-pc-solaris2.11/libclang_rt.asan.so:
wrong ELF class: ELFCLASS64
```
- While the tests are linked with `-Wl,-rpath`, that path always is the
64-bit directory again.
Accordingly, the fix consists of two parts:
- The code in `compiler-rt/test/asan/Unit/lit.site.cfg.py.in` to adjust
the `LD_LIBRARY_PATH*` variables is guarded by a `config.target_arch !=
config.host_arch` condition. This is wrong in two ways:
- The adjustment is always needed independent of the host arch. This is
what `compiler-rt/test/lit.common.cfg.py` already does.
- Besides, `config.host_arch` is ultimately set from
`CMAKE_HOST_SYSTEM_PROCESSOR`. On Linux/x86_64, this is `x86_64` (`uname
-m`) while on Solaris/amd64 it's `i386` (`uname -p`), explaining why the
transformation is skipped on Solaris, but not on Linux.
- Besides, `RPATH` needs to be set to the correct subdirectory, so
instead of using the default arch in `compiler-rt/CMakeLists.txt`, this
patch moves the code to a function which takes the test's arch into
account.
Tested on `amd64-pc-solaris2.11` and `x86_64-pc-linux-gnu`.