This commit prevents building a G_UNMERGE_VALUES instruction with
different source and destination vector elements in
`LegalizationArtifactCombiner::ArtifactValueFinder::tryCombineMergeLike()`,
e.g.:
`%1:_(<2 x s8>), %2:_(<2 x s8>) = G_UNMERGE_VALUES %0:_(<2 x s16>)`
This LLVM defect was identified via the AMD Fuzzing project.
MSan should unpoison the parameters of extended signal handlers.
However, MSan unpoisoned the second parameter with the wrong size
`sizeof(__sanitizer_sigaction)`, inconsistent with its real type
`siginfo_t`.
This commit fixes this issue by correcting the size to
`sizeof(__sanitizer_siginfo)`.
By defining `CExpressionInterface`, we move the side effect detection
logic from `emitc.expression` into the individual operations
implementing the interface allowing operations to gradually tune the
side effect.
It also allows checking for side effects each operation individually.
These instructions count leading/trailing ones in the register.
Currently these are only generated when we have `Zbb` enabled (along
with `Xqcibm`) since it contains the `CTTZ/CTLZ` instructions.
Recently I was debugging a Minidump with a few thousand ranges, and came
across the (now deleted) comment:
```
// I don't have a sense of how frequently this is called or how many memory
// ranges a Minidump typically has, so I'm not sure if searching for the
// appropriate range linearly each time is stupid. Perhaps we should build
// an index for faster lookups.
```
blaming this comment, it's 9 years old! Much overdue for this simple fix
with a range data vector.
I had to add a default constructor to Range in order to implement the
RangeDataVector, but otherwise this just a replacement of look up logic.
The motivation for this is that it causes the jump table entry's symbol
to have an st_size equal to the jump table entry size, instead of being
equal to the size of the entire jump table, which is incorrect and can
lead to unexpected behavior in binary analysis tools that rely on the
size field such as Bloaty.
Reviewers: fmayer
Reviewed By: fmayer
Pull Request: https://github.com/llvm/llvm-project/pull/144462
Now we only support D16 folding for `image sample` instructions with a
single user: a `fptrunc` to half.
However, we can actually support D16 folding for image.sample
instructions with multiple users,
as long as each user follows the pattern of extractelement followed by
fptrunc to half.
For example:
```
%sample = call <4 x float> @llvm.amdgcn.image.sample
%e0 = extractelement <4 x float> %sample, i32 0
%h0 = fptrunc float %e0 to half
%e1 = extractelement <4 x float> %sample, i32 1
%h1 = fptrunc float %e1 to half
%e2 = extractelement <4 x float> %sample, i32 2
%h2 = fptrunc float %e2 to half
```
This change enables D16 folding for such cases and avoids generating
`v_cvt_f16_f32_e32` instructions.
Test cleanup:
1) separate layout.mlir from ops.mlir for layout related test
2) remove lane layout for ops working at work item scope.
3) remove redundant test in create_tdesc/update_tdesc/prefetch.
4) remove "test_" from all test function name.
July 14 2024 I landed a change to update progress reporting when
loading kernel/firmware binaries
https://github.com/llvm/llvm-project/pull/98845
In DynamicLoader::LoadBinaryWithUUIDAndAddress I removed code that
was setting the ModuleSpec to the provided name, if the name provided
is that of a file on disk. With this code missing, if a filepath
name is passed in, this code will fail to find that binary on the local
disk. There's nothing in the PR / intention that would lead to this
change, it was unintentional.
## Purpose
This patch makes a minor changes to LLVM and Clang so that LLVM can be
built as a Windows DLL with `clang-cl`. These changes were not required
for building a Windows DLL with MSVC.
## Background
The Windows DLL effort is tracked in #109483. Additional context is
provided in [this
discourse](https://discourse.llvm.org/t/psa-annotating-llvm-public-interface/85307),
and documentation for `LLVM_ABI` and related annotations is found in the
LLVM repo
[here](https://github.com/llvm/llvm-project/blob/main/llvm/docs/InterfaceExportAnnotations.rst).
## Overview
Specific changes made in this patch:
- Remove `constexpr` fields that reference DLL exported symbols. These
symbols cannot be resolved at compile time when building a Windows DLL
using `clang-cl`, so they cannot be `constexpr`. Instead, they are made
`const` and initialized in the implementation file rather than at
declaration in the header.
- Annotate symbols now defined out-of-line with `LLVM_ABI` so they are
exported when building as a shared library.
- Explicitly add default copy assignment operator for `ELFFile` to
resolve a compiler warning.
## Validation
Local builds and tests to validate cross-platform compatibility. This
included llvm, clang, and lldb on the following configurations:
- Windows with MSVC
- Windows with Clang
- Linux with GCC
- Linux with Clang
- Darwin with Clang
Currently, the `EliminateAvailableExternallyPass` only converts certain
available externally functions to local if `avail-extern-to-local` is
set or in
contextual profiling mode. For global variables, it only drops their
initializers.
This PR adds an option to allow the pass to convert global variables in
a
specified address space to local. The motivation for this change is to
correctly
support lowering of LDS variables (`__shared__` variables, in more
generic
terminology) when ThinLTO is enabled for AMDGPU.
A `__shared__` variable is lowered to a hidden global variable in a
particular
address space by the frontend, which is roughly same as a `static` local
variable. To properly lower it in the backend, the compiler needs to
check all
its uses. Enabling ThinLTO currently breaks this when a function
containing a
`__shared__` variable is imported from another module. Even though the
global
variable is imported along with its associated function, and the
function is
privatized by the `EliminateAvailableExternallyPass`, the global
variable itself
is not.
It's safe to privatize such global variables, because they're _local_ to
their
associated functions. If the function itself is privatized, its
associated
global variables should also be privatized accordingly.
We already evaluate the initializers for all global variables, as
required by the standard. Leverage that evaluation instead of trying to
separately validate static class members.
This has a few benefits:
- Improved diagnostics; we now get notes explaining what failed to
evaluate.
- Improved correctness: is_constant_evaluated is handled correctly.
The behavior follows the proposed resolution for CWG1721.
Fixes#88462. Fixes#99680.
CFIPrograms' most common uses are within debug frames, but it is not
their only use. For example, some assembly writers encode them by hand
into .cfi_escape directives. This PR extracts printing code for them
into its own files, which avoids the need for the main class to depend
on DWARFUnit, sections, and similar.
One in a series of NFC DebugInfo/DWARF refactoring changes to layer it
more cleanly, so that binary CFI parsing can be used from low-level
code, (such as byte strings created via .cfi_escape) without circular
dependencies. The final goal is to make a more limited dwarf library
usable from lower-level code.
More information can be found at
https://discourse.llvm.org/t/rfc-debuginfo-dwarf-refactor-into-to-lower-and-higher-level-libraries/86665
The new interfaces provide getters and setters for the weight
information about the branches of BranchOpInterface and
RegionBranchOpInterface operations.
These interfaces are done the same way as LLVM dialect's
BranchWeightOpInterface.
The plan is to produce this information in Flang, e.g. mark
most probably "cold" code as such and allow LLVM to order
basic blocks accordingly. An example of such a code is
copy loops generated for arrays repacking - we can mark it
as "cold" assuming that the copy will not happen dynamically.
If the copy actually happens the overhead of the copy is probably high
enough so that we may not care about the little overhead
of jumping to the "cold" code and fetching it.
Related to #143219.
Function specialization does not kick in if flang sets `noalias`
attributes on the function arguments of `digits_2`, because PRE
optimizes several `srem` instructions and other memory accesses
from the inner loops causing the latency bonus to be lower than
the current 40% threshold.
While looking at this, I did not really get why we compute the latency
bonus as a ratio of the latency of the "eliminated" instructions
and the code-size of the whole function. It did not make much sense
to me.
I tried computing the total latency as a sum of latencies
of the instructions that belong to non-dead code (including
the instructions that would be executed had they not been
"eliminated" due to the constant propagation). This total
latency should identify the total cost of executing the function
with the given argument being dynamically equal to the tried
constant value. Then the latency bonus would be computed
as the ratio between the latency of the "eliminated" instructions
and the total latency. Unfortunately, this did not given me a good
heuristics either. The bonus was close to 0% on some targets,
and as big as 3-5% on other targets. This does match very well
with the performance gain achieved by function specialization
for exchange2, so it seemd like another artificial heuristic
not better than the current one.
It seems that GCC uses a set of different heuristics for function
specialization, but I am not an expert here and I cannot say
if we can match them in LLVM.
With all that said, I decided to try to lower the threshold
to avoid the regression and be able to re-enable the generally
good change for `noalias` attribute.
With this patch, I was able to reduce the effect of `noalias`,
so that `-force-no-alias=true` is only ~10% slower than
`-force-no-alias=false` code on neoverse-v1 and neoverse-v2.
On neoverse-n1, `-force-no-alias=true` is >2x faster than
`-force-no-alias=false` regardless of this patch.
This threshold has been changed before also due to improved
alias information:
2fb51fba8c (diff-066363256b7b4164e66b28a3028b2cb9e405c9136241baa33db76ebd2edb87cd)
Please let me know what testing I should run to make sure this change
is safe. As I understand, it may affect the compilation time
performance,
and I will appreciate it if someone points out which benchmarks
need to be checked before merging this.
We should only divide the number of pieces to fit the packed instructions
if we actually have pk instructions. This increases the cost of copysign,
but is closer to the current codegen output. It could be much cheaper
than it is now.
Add support for load/store with chunk_size, which requires special
consideration for the operand blocking since offests and masks are
n-D and tensor are n+1-D. Support operations including create_tdesc,
update_tdesc, load, store, and prefetch.
---------
Co-authored-by: Adam Siemieniuk <adam.siemieniuk@intel.com>
In DebugCommunication, we currently are using 2 thread to drive
lldb-dap. At the moment, they make an attempt at only synchronizing the
`recv_packets` between the reader thread and the main test thread. Other
stateful properties of the debug session are not guarded by a
locks/mutex.
To mitigate this, I am moving any state updates to the main thread
inside the `_recv_packet` method to ensure that between calls to
`_recv_packet` the state does not change out from under us in a test.
This does mean the precise timing of events has changed slightly as a
result and I've updated the existing tests that fail for me locally with
this new behavior.
I think this should result in overall more predictable behavior, even if
the test is slow due to the host workload or architecture differences.
---------
Co-authored-by: Ebuka Ezike <yerimyah1@gmail.com>
Summary:
The logic here is flawed, it was only intended to apply to the CPU case
where we use the linker passed in on the command line. This was falsely
applying to SPIR-V which caused issues.
This newline was originally added in https://reviews.llvm.org/D142184
but I think updating `__libcpp_verbose_abort` to add newline instead is
more consistent, and works for other callers of `_LIBCPP_VERBOSE_ABORT`.
The `_LIBCPP_ASSERTION_HANDLER` calls through to either
`_LIBCPP_VERBOSE_ABORT` macro or the `__builtin_verbose_trap`. From what
I can tell neither of these function expect a trailing newline (at least
none of the usage of `_LIBCPP_VERBOSE_ABORT` or `__builtin_verbose_trap`
that I can find include a trailing newline except `_LIBCPP_ASSERTION_HANDLER`).
I noticed this discrepancy when working on
https://github.com/emscripten-core/emscripten/pull/24543
…Baremetal toolchain (#121829)"
This reverts the following stack of commits, due to them breaking the
Fuchsia toolchain and corresponding LLVM buildbot.
Revert "[Driver] Fix Arm/AArch64 Link Argument tests (#144582)" This
reverts commit a79186c1ea.
Revert "[Driver] Add option to force undefined symbols during linking in
BareMetal toolchain object. (#132807)" This reverts commit
9cb7545096.
Revert "[Driver] Fix link order of BareMetal toolchain object (#132806)"
This reverts commit 31523de4b0.
Revert "[Driver] Add support for crtbegin.o, crtend.o and libgloss lib
to BareMetal toolchain object (#121830)" This reverts commit
ec230aa7a7.
Revert "[Driver] Add support for GCC installation detection in Baremetal
toolchain (#121829)" This reverts commit
eb31c422d0.
## Overview
Fix compilation error introduced by #143615. Build failure logs
available
[here](https://lab.llvm.org/buildbot/#/builders/195/builds/10573)
## Background
On `extern` variable declarations, `LLVM_ABI` must appear before
`extern` because `LLVM_ABI` currently resolves to
`[[gnu::visibility("default")]]` when building with gcc.
In #134418 we added support to list/enable/disable `SystemRuntime` and
`InstrumentationRuntime` plugins. We limited it to those two plugin
types to flesh out the idea with a smaller change.
This PR adds support for the remaining plugin types. We now support all
the plugins that can be registered directly with the plugin manager.
Plugins that are added by loading shared objects are still not
supported.