Commit Graph

541377 Commits

Author SHA1 Message Date
Robert Imschweiler
4d71f20b28 [GlobalISel] prevent G_UNMERGE_VALUES for vectors with different elements (#133335)
This commit prevents building a G_UNMERGE_VALUES instruction with
different source and destination vector elements in
`LegalizationArtifactCombiner::ArtifactValueFinder::tryCombineMergeLike()`,
e.g.:
`%1:_(<2 x s8>), %2:_(<2 x s8>) = G_UNMERGE_VALUES %0:_(<2 x s16>)`

This LLVM defect was identified via the AMD Fuzzing project.
2025-06-18 09:07:08 +02:00
Kunqiu Chen
10f29a6072 [MSan] Fix wrong unpoison size in SignalAction (#144071)
MSan should unpoison the parameters of extended signal handlers. 
However, MSan unpoisoned the second parameter with the wrong size 
`sizeof(__sanitizer_sigaction)`, inconsistent with its real type 
`siginfo_t`.

This commit fixes this issue by correcting the size to 
`sizeof(__sanitizer_siginfo)`.
2025-06-18 14:53:33 +08:00
Kirill Chibisov
74687180dd [mlir][emitc] Make CExpression trait into interface (#142771)
By defining `CExpressionInterface`, we move the side effect detection
logic from `emitc.expression` into the individual operations
implementing the interface allowing operations to gradually tune the
side effect.

It also allows checking for side effects each operation individually.
2025-06-18 07:38:47 +02:00
Craig Topper
ad9e591fd5 [SelectionDAG][RISCV] Fold (add (vscale * C0), (vscale * C1)) to (vscale * (C0 + C1)) in getNode. (#144565)
We already have shl/mul vscale related folds in getNode.

This is an alternative to the DAGCombine proposed in #144507.
2025-06-17 21:33:50 -07:00
Matt Arsenault
7b9d10d2e6 PowerPC: Fix using long double libm functions for f128 intrinsics (#144382)
This wasn't setting the correct libcall names, which default to the
l suffixed libm names.
2025-06-18 13:26:15 +09:00
Matt Arsenault
af49a650e1 PowerPC: Add baseline tests for more f128 libcall handling (#144381)
Some of these incorrectly call the l suffixed version of libm
functions and others assert.
2025-06-18 13:23:17 +09:00
Liao Chunyu
e14f327d80 [RISCV] Pre-test for #144461 2025-06-17 23:32:01 -04:00
Sudharsan Veeravalli
a2ad65661a [RISCV] Add patterns for generating QC_CTO and QC_CLO (#144532)
These instructions count leading/trailing ones in the register.

Currently these are only generated when we have `Zbb` enabled (along
with `Xqcibm`) since it contains the `CTTZ/CTLZ` instructions.
2025-06-18 07:54:08 +05:30
Jacob Lalonde
a96a3f1b26 [lldb][Minidump Parser] Implement a range data vector for minidump memory ranges (#136040)
Recently I was debugging a Minidump with a few thousand ranges, and came
across the (now deleted) comment:

```
  // I don't have a sense of how frequently this is called or how many memory
  // ranges a Minidump typically has, so I'm not sure if searching for the
  // appropriate range linearly each time is stupid.  Perhaps we should build
  // an index for faster lookups.
```

blaming this comment, it's 9 years old! Much overdue for this simple fix
with a range data vector.

I had to add a default constructor to Range in order to implement the
RangeDataVector, but otherwise this just a replacement of look up logic.
2025-06-17 18:37:15 -07:00
Jim Lin
8ddada41df [RISCV] Add Andes XAndesVBFHCvt (Andes Vector BFLOAT16 Conversion) extension (#144320)
The spec can be found at:
https://github.com/andestech/andes-v5-isa/releases/tag/ast-v5_4_0-release.

This patch only supports assembler. The instructions are similar to
`Zvfbfmin` and the only difference with `Zvfbfmin` is that
`XAndesVBFHCvt` doesn't have mask variant.
2025-06-18 09:17:46 +08:00
Peter Collingbourne
9265b1f0cf LowerTypeTests: Use jump table entry type as value type of jump table alias.
The motivation for this is that it causes the jump table entry's symbol
to have an st_size equal to the jump table entry size, instead of being
equal to the size of the entire jump table, which is incorrect and can
lead to unexpected behavior in binary analysis tools that rely on the
size field such as Bloaty.

Reviewers: fmayer

Reviewed By: fmayer

Pull Request: https://github.com/llvm/llvm-project/pull/144462
2025-06-17 18:15:06 -07:00
Harrison Hao
0defde8e06 [AMDGPU] Support D16 folding for image.sample with multiple extractelement and fptrunc users (#141758)
Now we only support D16 folding for `image sample` instructions with a
single user: a `fptrunc` to half.
However, we can actually support D16 folding for image.sample
instructions with multiple users,
as long as each user follows the pattern of extractelement followed by
fptrunc to half.
For example:
```
  %sample = call <4 x float> @llvm.amdgcn.image.sample
  %e0 = extractelement <4 x float> %sample, i32 0
  %h0 = fptrunc float %e0 to half
  %e1 = extractelement <4 x float> %sample, i32 1
  %h1 = fptrunc float %e1 to half
  %e2 = extractelement <4 x float> %sample, i32 2
  %h2 = fptrunc float %e2 to half
```
This change enables D16 folding for such cases and avoids generating
`v_cvt_f16_f32_e32` instructions.
2025-06-18 09:00:07 +08:00
Jianhui Li
86a09f3615 [MLIR][XeGPU] Clean up xegpu op tests (#144592)
Test cleanup: 
1) separate layout.mlir from ops.mlir for layout related test 
2) remove lane layout for ops working at work item scope. 
3) remove redundant test in create_tdesc/update_tdesc/prefetch. 
4) remove "test_" from all test function name.
2025-06-17 19:48:09 -05:00
Jason Molenda
4e090b6e84 [lldb] Re-insert code to search for a binary by filepath if provided
July 14 2024 I landed a change to update progress reporting when
loading kernel/firmware binaries
https://github.com/llvm/llvm-project/pull/98845
In DynamicLoader::LoadBinaryWithUUIDAndAddress I removed code that
was setting the ModuleSpec to the provided name, if the name provided
is that of a file on disk.  With this code missing, if a filepath
name is passed in, this code will fail to find that binary on the local
disk.  There's nothing in the PR / intention that would lead to this
change, it was unintentional.
2025-06-17 17:41:31 -07:00
Matt Arsenault
99e263228f github: Add mips backend to PR autolabeler (#140909) 2025-06-18 09:28:24 +09:00
Andrew Rogers
abbdd1670d [llvm] minor fixes for clang-cl Windows DLL build (#144386)
## Purpose

This patch makes a minor changes to LLVM and Clang so that LLVM can be
built as a Windows DLL with `clang-cl`. These changes were not required
for building a Windows DLL with MSVC.

## Background

The Windows DLL effort is tracked in #109483. Additional context is
provided in [this
discourse](https://discourse.llvm.org/t/psa-annotating-llvm-public-interface/85307),
and documentation for `LLVM_ABI` and related annotations is found in the
LLVM repo
[here](https://github.com/llvm/llvm-project/blob/main/llvm/docs/InterfaceExportAnnotations.rst).

## Overview
Specific changes made in this patch:
- Remove `constexpr` fields that reference DLL exported symbols. These
symbols cannot be resolved at compile time when building a Windows DLL
using `clang-cl`, so they cannot be `constexpr`. Instead, they are made
`const` and initialized in the implementation file rather than at
declaration in the header.
- Annotate symbols now defined out-of-line with `LLVM_ABI` so they are
exported when building as a shared library.
- Explicitly add default copy assignment operator for `ELFFile` to
resolve a compiler warning.

## Validation

Local builds and tests to validate cross-platform compatibility. This
included llvm, clang, and lldb on the following configurations:

- Windows with MSVC
- Windows with Clang
- Linux with GCC
- Linux with Clang
- Darwin with Clang
2025-06-17 17:21:40 -07:00
Minding
64155a3229 Added clarifying comment to 'LLVMLinkInMCJIT' and 'LLVMLinkInInterpreter' (#92467)
Clarify that these functions are no-ops when linking to LLVM as a shared object.
2025-06-18 10:09:07 +10:00
Shilei Tian
15482c83aa [ElimAvailExtern] Add an option to allow to convert global variables in a specified address space to local (#144287)
Currently, the `EliminateAvailableExternallyPass` only converts certain
available externally functions to local if `avail-extern-to-local` is
set or in
contextual profiling mode. For global variables, it only drops their
initializers.

This PR adds an option to allow the pass to convert global variables in
a
specified address space to local. The motivation for this change is to
correctly
support lowering of LDS variables (`__shared__` variables, in more
generic
terminology) when ThinLTO is enabled for AMDGPU.

A `__shared__` variable is lowered to a hidden global variable in a
particular
address space by the frontend, which is roughly same as a `static` local
variable. To properly lower it in the backend, the compiler needs to
check all
its uses. Enabling ThinLTO currently breaks this when a function
containing a
`__shared__` variable is imported from another module. Even though the
global
variable is imported along with its associated function, and the
function is
privatized by the `EliminateAvailableExternallyPass`, the global
variable itself
is not.

It's safe to privatize such global variables, because they're _local_ to
their
associated functions. If the function itself is privatized, its
associated
global variables should also be privatized accordingly.
2025-06-17 19:58:24 -04:00
Andrei Safronov
c21a4c6c43 [Xtensa] Implement Xtensa Interrupt/Exception/Debug Options. (#143820)
Implement Xtensa Interrupt. HighInterrupts, Exception, Debug Options.
Also implement small Xtensa Options like PRID, Coprocessor and Timers.
2025-06-18 02:57:47 +03:00
Eli Friedman
f2d2c99866 [clang] Remove separate evaluation step for static class member init. (#142713)
We already evaluate the initializers for all global variables, as
required by the standard. Leverage that evaluation instead of trying to
separately validate static class members.

This has a few benefits:

- Improved diagnostics; we now get notes explaining what failed to
evaluate.
- Improved correctness: is_constant_evaluated is handled correctly.

The behavior follows the proposed resolution for CWG1721.

Fixes #88462. Fixes #99680.
2025-06-17 16:43:55 -07:00
Arthur Eubanks
b164d3613a [gn build] Port 628274dadf 2025-06-17 23:42:47 +00:00
Arthur Eubanks
6652961ae5 [gn build] Manually port 556e69b7 2025-06-17 23:41:29 +00:00
Arthur Eubanks
535291409c [gn build] Port 9ec75a50bc 2025-06-17 23:41:29 +00:00
Arthur Eubanks
a871b919ed [gn build] Port 9e0186d925 2025-06-17 23:41:28 +00:00
Sterling-Augustine
628274dadf [NFC] Extract Printing portions of DWARFCFIProgram to new files (#143762)
CFIPrograms' most common uses are within debug frames, but it is not
their only use. For example, some assembly writers encode them by hand
into .cfi_escape directives. This PR extracts printing code for them
into its own files, which avoids the need for the main class to depend
on DWARFUnit, sections, and similar.

One in a series of NFC DebugInfo/DWARF refactoring changes to layer it
more cleanly, so that binary CFI parsing can be used from low-level
code, (such as byte strings created via .cfi_escape) without circular
dependencies. The final goal is to make a more limited dwarf library
usable from lower-level code.

More information can be found at
https://discourse.llvm.org/t/rfc-debuginfo-dwarf-refactor-into-to-lower-and-higher-level-libraries/86665
2025-06-17 16:35:47 -07:00
Matt Arsenault
a9811340b7 AMDGPU: Report special input intrinsics as free (#141948) 2025-06-18 08:24:58 +09:00
Craig Topper
f3af1cd08c [RISCV] Set the exact flag on the SRL created for converting vscale to a read of vlenb. (#144571)
We know that vlenb is a multiple of RVVBytesPerBlock so we aren't
shifting out any non-zero bits.
2025-06-17 16:24:50 -07:00
Matt Arsenault
f08474ab1f AMDGPU: Add baseline cost model tests for special argument intrinsics (#141947) 2025-06-18 08:21:55 +09:00
Matt Arsenault
54015f36c6 AMDGPU: Cost model for minimumnum/maximumnum (#141946) 2025-06-18 08:19:06 +09:00
Slava Zakharin
70343c8d44 [mlir][flang] Added Weighted[Region]BranchOpInterface's. (#142079)
The new interfaces provide getters and setters for the weight
information about the branches of BranchOpInterface and
RegionBranchOpInterface operations.

These interfaces are done the same way as LLVM dialect's
BranchWeightOpInterface.

The plan is to produce this information in Flang, e.g. mark
most probably "cold" code as such and allow LLVM to order
basic blocks accordingly. An example of such a code is
copy loops generated for arrays repacking - we can mark it
as "cold" assuming that the copy will not happen dynamically.
If the copy actually happens the overhead of the copy is probably high
enough so that we may not care about the little overhead
of jumping to the "cold" code and fetching it.
2025-06-17 16:14:13 -07:00
Matt Arsenault
af65cb68f5 AMDGPU: Move fpenvIEEEMode into TTI (#141945) 2025-06-18 08:13:57 +09:00
Slava Zakharin
bec9ac2daf [llvm] Lower latency bonus threshold in function specialization. (#143954)
Related to #143219.

Function specialization does not kick in if flang sets `noalias`
attributes on the function arguments of `digits_2`, because PRE
optimizes several `srem` instructions and other memory accesses
from the inner loops causing the latency bonus to be lower than
the current 40% threshold.

While looking at this, I did not really get why we compute the latency
bonus as a ratio of the latency of the "eliminated" instructions
and the code-size of the whole function. It did not make much sense
to me.

I tried computing the total latency as a sum of latencies
of the instructions that belong to non-dead code (including
the instructions that would be executed had they not been
"eliminated" due to the constant propagation). This total
latency should identify the total cost of executing the function
with the given argument being dynamically equal to the tried
constant value. Then the latency bonus would be computed
as the ratio between the latency of the "eliminated" instructions
and the total latency. Unfortunately, this did not given me a good
heuristics either. The bonus was close to 0% on some targets,
and as big as 3-5% on other targets. This does match very well
with the performance gain achieved by function specialization
for exchange2, so it seemd like another artificial heuristic
not better than the current one.

It seems that GCC uses a set of different heuristics for function
specialization, but I am not an expert here and I cannot say
if we can match them in LLVM.

With all that said, I decided to try to lower the threshold
to avoid the regression and be able to re-enable the generally
good change for `noalias` attribute.

With this patch, I was able to reduce the effect of `noalias`,
so that `-force-no-alias=true` is only ~10% slower than
`-force-no-alias=false` code on neoverse-v1 and neoverse-v2.
On neoverse-n1, `-force-no-alias=true` is >2x faster than
`-force-no-alias=false` regardless of this patch.

This threshold has been changed before also due to improved
alias information:
2fb51fba8c (diff-066363256b7b4164e66b28a3028b2cb9e405c9136241baa33db76ebd2edb87cd)

Please let me know what testing I should run to make sure this change
is safe. As I understand, it may affect the compilation time
performance,
and I will appreciate it if someone points out which benchmarks
need to be checked before merging this.
2025-06-17 16:13:42 -07:00
Matt Arsenault
3800a83160 AMDGPU: Reduce cost of f64 copysign (#141944)
The real implementation is 1 real instruction plus a constant
materialize. Call that a 1, it's not a real f64 operation.
2025-06-18 08:10:53 +09:00
Matt Arsenault
c9b2816388 AMDGPU: Fix cost model for 16-bit operations on gfx8 (#141943)
We should only divide the number of pieces to fit the packed instructions
if we actually have pk instructions. This increases the cost of copysign,
but is closer to the current codegen output. It could be much cheaper
than it is now.
2025-06-18 08:07:03 +09:00
John Harrison
cb63b75e32 Revert "[lldb-dap] Refactoring DebugCommunication to improve test consistency. (#143818)
This reverts commit 362b9d78b4.

Buildbots using python3.10 are running into errors from this change.
2025-06-17 16:01:40 -07:00
Finn Plummer
87b13ada10 [HLSL][RootSignature] Implement serialization of remaining Root Elements (#143198)
Implements serialization of the remaining `RootElement`s, namely
`RootDescriptor`s and `StaticSampler`s.

- Adds unit testing for the serialization methods

Resolves https://github.com/llvm/llvm-project/issues/138191
Resolves https://github.com/llvm/llvm-project/issues/138193
2025-06-17 15:59:38 -07:00
Matt Arsenault
1cd18bc894 AMDGPU: Add cost model tests for minimumnum/maximumnum (#141904)
The f16 cases in particular look broken since every vector size
has the same reported cost.
2025-06-18 07:59:05 +09:00
Daniel Thornburgh
fd7e46b864 Revert "[libc++] Remove trailing newline from _LIBCPP_ASSERTION_HANDLER calls" (#144615)
Reverts llvm/llvm-project#143573
2025-06-17 15:50:42 -07:00
Jianhui Li
f25f2f7de4 [MLIR][XeGPU] Extend unrolling support for scatter ops with chunk_size (#144447)
Add support for load/store with chunk_size, which requires special
consideration for the operand blocking since offests and masks are
 n-D and tensor are n+1-D. Support operations including create_tdesc,
update_tdesc, load, store, and prefetch.

---------

Co-authored-by: Adam Siemieniuk <adam.siemieniuk@intel.com>
2025-06-17 17:46:35 -05:00
Eli Friedman
3f33c8482f [clang] Add release note for int->enum conversion change. (#144407)
This seems to be having some practical impact, so we should let people
know.
2025-06-17 15:27:41 -07:00
John Harrison
362b9d78b4 [lldb-dap] Refactoring DebugCommunication to improve test consistency. (#143818)
In DebugCommunication, we currently are using 2 thread to drive
lldb-dap. At the moment, they make an attempt at only synchronizing the
`recv_packets` between the reader thread and the main test thread. Other
stateful properties of the debug session are not guarded by a
locks/mutex.

To mitigate this, I am moving any state updates to the main thread
inside the `_recv_packet` method to ensure that between calls to
`_recv_packet` the state does not change out from under us in a test.

This does mean the precise timing of events has changed slightly as a
result and I've updated the existing tests that fail for me locally with
this new behavior.

I think this should result in overall more predictable behavior, even if
the test is slow due to the host workload or architecture differences.

---------

Co-authored-by: Ebuka Ezike <yerimyah1@gmail.com>
2025-06-17 14:42:06 -07:00
Joseph Huber
6fb36db481 [LinkerWrapper] Fix 'save-temps' when targeting SPIR-V (#144605)
Summary:
The logic here is flawed, it was only intended to apply to the CPU case
where we use the linker passed in on the command line. This was falsely
applying to SPIR-V which caused issues.
2025-06-17 16:16:37 -05:00
sribee8
844e41c2ac [libc] Moved shared constexpr to the top (#144569)
Some conversions shared constexpr so moved to the top.

---------

Co-authored-by: Sriya Pratipati <sriyap@google.com>
2025-06-17 21:12:35 +00:00
Sam Clegg
a5a0d88073 [libc++] Remove trailing newline from _LIBCPP_ASSERTION_HANDLER calls (#143573)
This newline was originally added in https://reviews.llvm.org/D142184
but I think updating `__libcpp_verbose_abort` to add newline instead is
more consistent, and works for other callers of `_LIBCPP_VERBOSE_ABORT`.

The `_LIBCPP_ASSERTION_HANDLER` calls through to either
`_LIBCPP_VERBOSE_ABORT` macro or the `__builtin_verbose_trap`. From what
I can tell neither of these function expect a trailing newline (at least
none of the usage of `_LIBCPP_VERBOSE_ABORT` or `__builtin_verbose_trap`
that I can find include a trailing newline except `_LIBCPP_ASSERTION_HANDLER`).

I noticed this discrepancy when working on
https://github.com/emscripten-core/emscripten/pull/24543
2025-06-17 17:07:16 -04:00
Daniel Thornburgh
ecfb8fe5c1 Revert stack "[Driver] Add support for GCC installation detection in … (#144603)
…Baremetal toolchain (#121829)"

This reverts the following stack of commits, due to them breaking the
Fuchsia toolchain and corresponding LLVM buildbot.

Revert "[Driver] Fix Arm/AArch64 Link Argument tests (#144582)" This
reverts commit a79186c1ea.

Revert "[Driver] Add option to force undefined symbols during linking in
BareMetal toolchain object. (#132807)" This reverts commit
9cb7545096.

Revert "[Driver] Fix link order of BareMetal toolchain object (#132806)"
This reverts commit 31523de4b0.

Revert "[Driver] Add support for crtbegin.o, crtend.o and libgloss lib
to BareMetal toolchain object (#121830)" This reverts commit
ec230aa7a7.

Revert "[Driver] Add support for GCC installation detection in Baremetal
toolchain (#121829)" This reverts commit
eb31c422d0.
2025-06-17 14:07:07 -07:00
Piotr Idzik
3c7df98c7b [clang-tidy] Add missing colon in the docs of performance-enum-size (#144525)
There is a syntax error in the provided code example - this PR fixes it.

I did a quick search - I could not find similar _typos_.
2025-06-17 21:59:53 +01:00
Louis Dionne
8d1610afd0 [libc++] Mark two assertion tests as unsupported in C++03 mode
Our assertion checking facility requires at least C++11, so these
tests were failing when run in C++03 mode.
2025-06-17 16:51:35 -04:00
Arthur Eubanks
49bf8d38d8 [gn build] Manually port b4e39e4f 2025-06-17 20:50:34 +00:00
Andrew Rogers
908f74a25e [llvm] re-order LLVM_ABI and extern on NoKernelInfoEndLTO decl (#144601)
## Overview
Fix compilation error introduced by #143615. Build failure logs
available
[here](https://lab.llvm.org/buildbot/#/builders/195/builds/10573)

## Background
On `extern` variable declarations, `LLVM_ABI` must appear before
`extern` because `LLVM_ABI` currently resolves to
`[[gnu::visibility("default")]]` when building with gcc.
2025-06-17 13:49:18 -07:00
David Peixotto
c677a11c8d [lldb] Add support to list/enable/disable remaining plugin types. (#143970)
In #134418 we added support to list/enable/disable `SystemRuntime` and
`InstrumentationRuntime` plugins. We limited it to those two plugin
types to flesh out the idea with a smaller change.

This PR adds support for the remaining plugin types. We now support all
the plugins that can be registered directly with the plugin manager.
Plugins that are added by loading shared objects are still not
supported.
2025-06-17 13:47:20 -07:00