When creating a standalone build of the runtimes sub-project, the
current CMake implementation looks for a lit executable that might
potentially exist in the build tree and unconditionally overrides the
value of `LLVM_EXTERNAL_LIT`. Due to this, any value passed via
`-DLLVM_EXTERNAL_LIT` when configuring the CMake project is ignored.
This change adds the `ALLOW_EXTERNAL` argument to the
`get_llvm_lit_path` call in the runtimes' CMakeLists.txt, allowing any
value previously set to be considered.
Sony PlayStation now supports C++20, and we wish to change the default
C++ mode to C++20 sometime in the future. As such, the !defined(__SCE__)
guards are redundant and we want to remove them. This in turn makes the
entire guard lines redundant (always true), so this patch removes them
entirely.
Commit 44e875ad5b introduced a change that
replaces `ReplaceInstWithInst` with `Instruction::replaceAllUsesWith`,
without subsequent instruction cleanup.
This results in TSan leaving behind useless `load atomic` instructions
after 'replacing' them.
This commit adds cleanup back, consistent with the context.
The traversal order in sharding propagation was hard-coded. This PR
provides options to the pass to select a suitable order
- forward-only
- backward-only
- forward-backward
- backward-forward
Default is the previous behavior (backward-forward).
Without this gcc warned like
/repo/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp:7559:68: warning: suggest parentheses around '&&' within '||' [-Wparentheses]
7559 | NumStaleCIArgs == (OffloadingArraysToPrivatize.size() + 2) &&
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~
7560 | "Wrong number of arguments for StaleCI when shareds are present");
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
When lowering call arguments to stack, specify a stack MPI, as well as
the stack alignment, instead of using the defaults (which would be an
unknown location with ABI alignment).
I believe the asm diffs are just changes in scheduling.
We can convert non-power-of-2 types into extended value types
and then they will be widen.
Reviewers: lukel97
Reviewed By: lukel97
Pull Request: https://github.com/llvm/llvm-project/pull/114971
When the fixed-order recurrence phi is live-out from the loop, the
vectorizer uses VPInstruction::ExtractPenultimateElement to extract the
penultimate element from the recurrence vector. However, this is not
feasible when the VF is vscale x 1, since vscale could be 1, making the
vector contain only one element.
This patch changes the behavior for vscale x 1 by extracting the last
element from the vector produced by splicing the recurrence phi and the
previous value. This ensures we can still determine the correct live-out
value of the recurrence phi.
If you enable `LIBC_CONF_PRINTF_FLOAT_TO_STR_USE_FLOAT320` and use a
`%f` style printf format directive to print a nonzero number too small
to show up in the output digits, e.g. `printf("%.2f", 0.001)`, then the
output would be intermittently incorrect, because
`DyadicFloat::as_mantissa_type_rounded` would try to shift the 320-bit
mantissa right by more than 320 bits, invoking the 'undefined behavior'
clause commented in the `shift()` function in `big_int.h`.
There were already tests in the libc test suite exercising this case,
e.g. the subnormal tests in `LlvmLibcSPrintfTest.FloatDecimalConv` use
`%f` at the default precision of 6 decimal places on tiny numbers such
as 2^-1027. But because the behavior is undefined, they don't visibly
fail all the time, and in all previous test runs we'd tried with
USE_FLOAT320, they had got lucky.
The fix is simply to detect an out-of-range right shift before doing it,
and instead just set the output value to zero.
This patch introduces enhancements to the Baremetal toolchain to support
GCC toolchain detection.
- If the --gcc-install-dir or --gcc-toolchain options are provided and
point to valid paths, the sysroot is derived from those locations.
- If not, the logic falls back to the existing sysroot inference
mechanism already present in the Baremetal toolchain.
- Support for adding include paths for the libstdc++ library has also
been added.
Additionally, the restriction to always use the integrated assembler has
been removed. With a valid GCC installation, the GNU assembler can now
be used as well.
This patch currently updates and adds tests for the ARM target only.
RISC-V-specific tests will be introduced in a later patch, once the
RISCVToolChain is fully merged into the Baremetal toolchain. At this
stage, there is no way to test the RISC-V target within this PR.
RFC:
https://discourse.llvm.org/t/merging-riscvtoolchain-and-baremetal-toolchains/75524
This commit prevents building a G_UNMERGE_VALUES instruction with
different source and destination vector elements in
`LegalizationArtifactCombiner::ArtifactValueFinder::tryCombineMergeLike()`,
e.g.:
`%1:_(<2 x s8>), %2:_(<2 x s8>) = G_UNMERGE_VALUES %0:_(<2 x s16>)`
This LLVM defect was identified via the AMD Fuzzing project.
MSan should unpoison the parameters of extended signal handlers.
However, MSan unpoisoned the second parameter with the wrong size
`sizeof(__sanitizer_sigaction)`, inconsistent with its real type
`siginfo_t`.
This commit fixes this issue by correcting the size to
`sizeof(__sanitizer_siginfo)`.
By defining `CExpressionInterface`, we move the side effect detection
logic from `emitc.expression` into the individual operations
implementing the interface allowing operations to gradually tune the
side effect.
It also allows checking for side effects each operation individually.
These instructions count leading/trailing ones in the register.
Currently these are only generated when we have `Zbb` enabled (along
with `Xqcibm`) since it contains the `CTTZ/CTLZ` instructions.
Recently I was debugging a Minidump with a few thousand ranges, and came
across the (now deleted) comment:
```
// I don't have a sense of how frequently this is called or how many memory
// ranges a Minidump typically has, so I'm not sure if searching for the
// appropriate range linearly each time is stupid. Perhaps we should build
// an index for faster lookups.
```
blaming this comment, it's 9 years old! Much overdue for this simple fix
with a range data vector.
I had to add a default constructor to Range in order to implement the
RangeDataVector, but otherwise this just a replacement of look up logic.
The motivation for this is that it causes the jump table entry's symbol
to have an st_size equal to the jump table entry size, instead of being
equal to the size of the entire jump table, which is incorrect and can
lead to unexpected behavior in binary analysis tools that rely on the
size field such as Bloaty.
Reviewers: fmayer
Reviewed By: fmayer
Pull Request: https://github.com/llvm/llvm-project/pull/144462
Now we only support D16 folding for `image sample` instructions with a
single user: a `fptrunc` to half.
However, we can actually support D16 folding for image.sample
instructions with multiple users,
as long as each user follows the pattern of extractelement followed by
fptrunc to half.
For example:
```
%sample = call <4 x float> @llvm.amdgcn.image.sample
%e0 = extractelement <4 x float> %sample, i32 0
%h0 = fptrunc float %e0 to half
%e1 = extractelement <4 x float> %sample, i32 1
%h1 = fptrunc float %e1 to half
%e2 = extractelement <4 x float> %sample, i32 2
%h2 = fptrunc float %e2 to half
```
This change enables D16 folding for such cases and avoids generating
`v_cvt_f16_f32_e32` instructions.
Test cleanup:
1) separate layout.mlir from ops.mlir for layout related test
2) remove lane layout for ops working at work item scope.
3) remove redundant test in create_tdesc/update_tdesc/prefetch.
4) remove "test_" from all test function name.
July 14 2024 I landed a change to update progress reporting when
loading kernel/firmware binaries
https://github.com/llvm/llvm-project/pull/98845
In DynamicLoader::LoadBinaryWithUUIDAndAddress I removed code that
was setting the ModuleSpec to the provided name, if the name provided
is that of a file on disk. With this code missing, if a filepath
name is passed in, this code will fail to find that binary on the local
disk. There's nothing in the PR / intention that would lead to this
change, it was unintentional.
## Purpose
This patch makes a minor changes to LLVM and Clang so that LLVM can be
built as a Windows DLL with `clang-cl`. These changes were not required
for building a Windows DLL with MSVC.
## Background
The Windows DLL effort is tracked in #109483. Additional context is
provided in [this
discourse](https://discourse.llvm.org/t/psa-annotating-llvm-public-interface/85307),
and documentation for `LLVM_ABI` and related annotations is found in the
LLVM repo
[here](https://github.com/llvm/llvm-project/blob/main/llvm/docs/InterfaceExportAnnotations.rst).
## Overview
Specific changes made in this patch:
- Remove `constexpr` fields that reference DLL exported symbols. These
symbols cannot be resolved at compile time when building a Windows DLL
using `clang-cl`, so they cannot be `constexpr`. Instead, they are made
`const` and initialized in the implementation file rather than at
declaration in the header.
- Annotate symbols now defined out-of-line with `LLVM_ABI` so they are
exported when building as a shared library.
- Explicitly add default copy assignment operator for `ELFFile` to
resolve a compiler warning.
## Validation
Local builds and tests to validate cross-platform compatibility. This
included llvm, clang, and lldb on the following configurations:
- Windows with MSVC
- Windows with Clang
- Linux with GCC
- Linux with Clang
- Darwin with Clang
Currently, the `EliminateAvailableExternallyPass` only converts certain
available externally functions to local if `avail-extern-to-local` is
set or in
contextual profiling mode. For global variables, it only drops their
initializers.
This PR adds an option to allow the pass to convert global variables in
a
specified address space to local. The motivation for this change is to
correctly
support lowering of LDS variables (`__shared__` variables, in more
generic
terminology) when ThinLTO is enabled for AMDGPU.
A `__shared__` variable is lowered to a hidden global variable in a
particular
address space by the frontend, which is roughly same as a `static` local
variable. To properly lower it in the backend, the compiler needs to
check all
its uses. Enabling ThinLTO currently breaks this when a function
containing a
`__shared__` variable is imported from another module. Even though the
global
variable is imported along with its associated function, and the
function is
privatized by the `EliminateAvailableExternallyPass`, the global
variable itself
is not.
It's safe to privatize such global variables, because they're _local_ to
their
associated functions. If the function itself is privatized, its
associated
global variables should also be privatized accordingly.
We already evaluate the initializers for all global variables, as
required by the standard. Leverage that evaluation instead of trying to
separately validate static class members.
This has a few benefits:
- Improved diagnostics; we now get notes explaining what failed to
evaluate.
- Improved correctness: is_constant_evaluated is handled correctly.
The behavior follows the proposed resolution for CWG1721.
Fixes#88462. Fixes#99680.
CFIPrograms' most common uses are within debug frames, but it is not
their only use. For example, some assembly writers encode them by hand
into .cfi_escape directives. This PR extracts printing code for them
into its own files, which avoids the need for the main class to depend
on DWARFUnit, sections, and similar.
One in a series of NFC DebugInfo/DWARF refactoring changes to layer it
more cleanly, so that binary CFI parsing can be used from low-level
code, (such as byte strings created via .cfi_escape) without circular
dependencies. The final goal is to make a more limited dwarf library
usable from lower-level code.
More information can be found at
https://discourse.llvm.org/t/rfc-debuginfo-dwarf-refactor-into-to-lower-and-higher-level-libraries/86665