Commit Graph

541482 Commits

Author SHA1 Message Date
Jakub Kuderski
96bbe472ef Revert "[mlir][spirv] Fix int type declaration duplication when serializing" and follow up commits (#144773)
This reverts the following PRs:
* https://github.com/llvm/llvm-project/pull/143108
* https://github.com/llvm/llvm-project/pull/144538
* https://github.com/llvm/llvm-project/pull/144685

Reverting because this disabled tests when building without the llvm
spirv backend enabled.
2025-06-18 16:15:06 -04:00
Andrew Rogers
a88e655809 [llvm] build Blake3 source with LLVM_EXPORTS defined (#144753)
## Purpose
This patch ensures that the BLAKE3 implementation in the LLVM Support
library exports its public interface with `__declspec(dllexport)` when
building LLVM as a Windows DLL.

## Background
The effort to support building LLVM as a Windows DLL is tracked in
#109483. Additional context is provided in [this
discourse](https://discourse.llvm.org/t/psa-annotating-llvm-public-interface/85307).

## Overview
Replicate [this
logic](https://github.com/llvm/llvm-project/blob/main/llvm/cmake/modules/AddLLVM.cmake#L662-L664)
from `llvm_add_library()` for the `LLVMSupportBlake3` target. Without
this change, the `llvm_blake_` functions will only be annotated with
`__declspec(dllimport)` when building LLVM as a Windows DLL which leads
to inconsistent DLL linkage warnings from MSVC and `clang-cl`.
2025-06-18 13:08:05 -07:00
Chelsea Cassanova
a630ca6f6c [lldb][breakpoint] Grey out disabled breakpoints (#91404)
This commit adds colour settings to the list of breakpoints in order to
grey out breakpoints that have been disabled.
2025-06-18 13:06:20 -07:00
Florian Hahn
23b8f11b27 [VPlan] Remove redundant VPWidenRecipe constructors (NFC)
Since the removal of VPWidenEVLRecipe, the constructors taking a
VPDefOpcode are not needed any more. Remove them.
2025-06-18 20:59:16 +01:00
Justin King
22a69a266d lsan: Support free_sized and free_aligned_sized from C23 (#144604)
Adds support to LSan for `free_sized` and `free_aligned_sized` from C23.

Other sanitizers will be handled with their own separate PRs.

For https://github.com/llvm/llvm-project/issues/144435

This is attempt number 2.

Signed-off-by: Justin King <jcking@google.com>
2025-06-18 12:57:49 -07:00
Tobias Stadler
d4b7c0d8b4 [Remarks] Auto-detect remark parser format (#144554)
Add remark format 'Auto', which performs automatic detection of the
remark format using the magic numbers at the beginning of the remarks
files.

The RemarkLinker already did something similar, so we streamlined this
and exposed this to llvm-remarkutil.
2025-06-18 20:49:55 +01:00
Amr Hesham
67c52aacae [CIR] Upstream support for IncompleteArrayType (#144138)
This change adds the basic support for IncompleteArray

Issue https://github.com/llvm/llvm-project/issues/130197
2025-06-18 21:47:50 +02:00
Jameson Nash
c04fc5596e [MemCpyOpt] allow some undef contents overread in processMemCpyMemCpyDependence (#143745)
Allows memcpy to memcpy forwarding in cases where the second memcpy is
larger, but the overread is known to be undef, by shrinking the memcpy
size.

Refs https://github.com/llvm/llvm-project/pull/140954 which laid some of
the groundwork for this.
2025-06-18 15:38:34 -04:00
Jameson Nash
fb0651959b [AArch64] fix trampoline implementation: actually use X15 (#143892)
A incorrect switch statement caused it to try to use X4 instead of X15
in #126743, which would have not worked.
2025-06-18 15:37:56 -04:00
Alan Phipps
88d250729e Revert "[llvm-cov] Export decision coverage to output json" (#144783)
Reverts llvm/llvm-project#144335

Need to resolve test failures
2025-06-18 14:33:59 -05:00
Ramkumar Ramachandra
156a64c585 [HashRecognize] Tighten pre-conditions for analysis (#144757)
Exit early if the TC is not a byte-multiple, as optimization works by
dividing TC by 8. Also delay the SCEV TC query.
2025-06-18 20:19:25 +01:00
Ramkumar Ramachandra
f13b9e3643 [HashRecognize] Don't const-qualify Values in result (#144752)
Const-qualifying Values in the analysis result makes them unusable with
IRBuilder. The issue was discovered when attempting to use the result of
the analysis for a transform.
2025-06-18 20:18:53 +01:00
Ramkumar Ramachandra
a94eb27a29 [HashRecognize] Fix big-endian CRC tables (#144754)
Big-endian CRC tables are incorrect due to the initial value of CRC in
genSarwateTable being hard-coded for CRC-8. 128 is the signed-min value
for CRC-8, but it should be generalized to APInt::getSignedMinValue. The
issue was found when writing CRC verification tests for llvm-test-suite.
2025-06-18 20:18:22 +01:00
Kazu Hirata
ca9a09dbe6 [libc++] Fix a typo in documentation (#144763) 2025-06-18 15:03:17 -04:00
uthmanna
ab6beeca9c [llvm-cov] Export decision coverage to output json (#144335)
This commit adds decision coverage counts derived from MC/DC test vector execution to the JSON output of llvm-cov, as
discussed here: [Missing Decision Coverage (DC) in output
json](https://discourse.llvm.org/t/missing-decision-coverage-dc-in-output-json/86783)
with @evodius96

---------

Co-authored-by: uthmanna <andre.uthmann@vector.com>
2025-06-18 14:00:10 -05:00
Walter J.T.V
8c3fbaf0ee [Clang][OpenMP][LoopTransformations] Fix incorrect number of generated loops for Tile and Reverse directives (#140532)
This patch is closely related to #139293 and addresses an existing issue
in the loop transformation codebase. Specifically, it corrects the
handling of the `NumGeneratedLoops` variable in
`OMPLoopTransformationDirective` AST nodes and its inheritors (such as
OMPUnrollDirective, OMPTileDirective, etc.).

Previously, this variable was inaccurately set for certain
transformations like reverse or tile. While this did not lead to
functional bugs, since the value was only checked to determine whether
it was greater than zero or equal to zero, the inconsistency could
introduce problems when supporting more complex directives in the
future.
2025-06-18 20:52:41 +02:00
Andre Kuhlenschmidt
17f5b8b52a [flang][driver] add ability to look up feature flags without setting them (#144559)
This just adds some convenience methods to feature control and rewrites
old code in terms of those methods. Also cleans up some names that I
just realize were overloads of another method.
2025-06-18 11:21:35 -07:00
zhijian lin
3f3526f36d [NFC][PowerPC] pre-commit running the update_llc_test_checks.py for all-atomics.ll,loop-comment.ll etc (#144411)
Run the update_llc_test_checks.py for all-atomics.ll,loop-comment.ll
,PR35812-neg-cmpxchg.ll (Pre-commit patch for the
https://github.com/llvm/llvm-project/pull/144089)
2025-06-18 14:15:30 -04:00
Florian Hahn
071a6feabd [TTI] Remove PPC hasActiveVectorLength impl, simplify interface (NFC). (#142310)
PPCTTIImpl defines hasActiveVectorLength and also getVPMemoryOpCost, but
they appear unused (i.e. no changes to tests).

Remove them, as they complicate the interface for hasActiveVectorLength.
This simplifies the only use in LV as now no placeholder values need to
be passed.

PR: https://github.com/llvm/llvm-project/pull/142310
2025-06-18 19:02:17 +01:00
Arthur Eubanks
dfe4d44d8d Revert "[VPlan] Remove unnecessary DomTreeUpdater flush (NFC)." (#144758)
This reverts commit 2e337349f4.

Causes breakages internally, will post reproducer later.
2025-06-18 11:00:13 -07:00
sribee8
6f4e4ea177 [libc] Internal getrandom implementation (#144427)
Implemented an internal getrandom to avoid calls to the public one in
table.h

---------

Co-authored-by: Sriya Pratipati <sriyap@google.com>
2025-06-18 17:56:57 +00:00
Tomer Shafir
835d3034fe [AArch64] improve zero-cycle regmov test (#143680)
- Add a `gpr32` suffix to test name to denote the specific register
class being checked
- Expand `-mtriple=arm64-apple-ios` to `-march=arm64` to broaden the
test context to the generic architecture, as the specific triple is not
required
- Port `bl` match to Linux too via the regex: `{{_?foo}}`
- Advance `-mcpu=cyclone` to the newer M series major `-mcpu=apple-m1`
- Use `-mcpu` so that `-mattr=-zcm` has a real effect
- Add a test that generic arm64 doesn't optimize for ZCM
- Distinguish 4 different assembly layouts: NOTCPU, CPU, NOTATTR, ATTR
- Fix broken test logic, for example: `; NOT: mov [[REG2:w[0-9]+]], w3`
matched `mov w1, w3` then `REG2` captured `w1` but then `; NOT: mov w1,
[[REG2]]` matched by prefix `mov, w1, w19` even though it should have
matched `mov w1, w1`. This change adds explicit matches for all of the
generated copies.
2025-06-18 18:56:33 +01:00
Lei Huang
82acd8c377 [PowerPC] Add code to spill and restore DMRp registers (#142443) 2025-06-18 13:50:57 -04:00
Justin King
d9f7979a63 sanitizer_common: add unsupported test for free_sized and free_aligned_sized from C23 (#144727)
Signed-off-by: Justin King <jcking@google.com>
2025-06-18 10:24:38 -07:00
Artem Belevich
298f1c276f Revert "Add missing intrinsics to cuda headers" (#144755)
Reverts llvm/llvm-project#143664
as it breaks CUDA compilation.
2025-06-18 10:08:27 -07:00
John Brawn
77bc254851 [AArch64] Fix build failure with -Werror (#144749)
PR#144387 caused buildbot failures with -Werror due to a comparison
between signed and unsigned types. Fix this with an explicit cast.
2025-06-18 18:05:02 +01:00
Alexis Engelke
2a8c65e983 [CodeGen][NFC] Fix quadratic c-t for large jump tables
Deleting a basic block removes all references from jump tables, which
is O(n). When freeing a MachineFunction, all basic blocks are deleted
before the jump tables, causing O(n^2) runtime. Fix this by deallocating
the jump table first.

Test case generator:

    import sys

    n = int(sys.argv[1])
    print("define void @f(i64 %c, ptr %p) {")
    print("  switch i64 %c, label %d [")
    for i in range(n):
        print(f"    i64 {i}, label %h{i}")
    print(f"  ]")
    for i in range(n):
        print(f'h{i}:')
        print(f'  store i64 {i*i}, ptr %p')
        print(f'  ret void')
    print('d:')
    print('  ret void')
    print('}')

Improvement at 5000 entries:

    Benchmark 1: ./llc.pre -filetype=obj -O0 <switch5k.bc
      Time (mean ± σ):      49.7 ms ±   1.0 ms
      Range (min … max):    48.0 ms …  52.1 ms    57 runs

    Benchmark 2: ./llc.post -filetype=obj -O0 <switch5k.bc
      Time (mean ± σ):      39.4 ms ±   0.8 ms
      Range (min … max):    37.1 ms …  41.1 ms    72 runs

    Summary
      ./llc.post -filetype=obj -O0 <switch5k.bc ran
        1.26 ± 0.04 times faster than ./llc.pre -filetype=obj -O0 <switch5k.bc

Improvement at 20000 entries:

    Benchmark 1: ./llc.pre -filetype=obj -O0 <switch20k.bc
      Time (mean ± σ):     281.7 ms ±   1.0 ms
      Range (min … max):   280.2 ms … 283.0 ms    10 runs

    Benchmark 2: ./llc.post -filetype=obj -O0 <switch20k.bc
      Time (mean ± σ):     123.9 ms ±   1.5 ms
      Range (min … max):   121.4 ms … 129.2 ms    23 runs

    Summary
      ./llc.post -filetype=obj -O0 <switch20k.bc ran
        2.27 ± 0.03 times faster than ./llc.pre -filetype=obj -O0 <switch20k.bc

Pull Request: https://github.com/llvm/llvm-project/pull/144108
2025-06-18 18:56:30 +02:00
Krzysztof Parzyszek
4084ffcf1e [flang] Show types in DumpEvExpr (#143743)
When dumping evaluate::Expr, show type names which contain a lot of
useful information.

For example show
```
expr <Fortran::evaluate::SomeType> {
  expr <Fortran::evaluate::SomeKind<Fortran::common::TypeCategory::Integer>> {
    expr <Fortran::evaluate::Type<Fortran::common::TypeCategory::Integer, 4>> {
      ...
```
instead of
```
expr T {
  expr T {
    expr T {
      ...
```
2025-06-18 11:31:03 -05:00
Yang Bai
fe3933da15 [mlir][vector] Support complete folding in single pass for vector.insert/vector.extract (#142124)
### Description

This patch improves the folding efficiency of `vector.insert` and
`vector.extract` operations by not returning early after successfully
converting dynamic indices to static indices.

This PR also renames the test pass `TestConstantFold` to
`TestSingleFold` and adds comprehensive documentation explaining the
single-pass folding behavior.

### Motivation

Since the `OpBuilder::createOrFold` function only calls `fold` **once**,
the current `fold` methods of `vector.insert` and `vector.extract` may
leave the op in a state that can be folded further. For example,
consider the following un-folded IR:
```
%v1 = vector.insert %e1, %v0 [0] : f32 into vector<128xf32>
%c0 = arith.constant 0 : index
%e2 = vector.extract %v1[%c0] : f32 from vector<128xf32>
```
If we use `createOrFold` to create the `vector.extract` op, then the
result will be:
```
%v1 = vector.insert %e1, %v0 [127] : f32 into vector<128xf32>
%e2 = vector.extract %v1[0] : f32 from vector<128xf32>
```
But this is not the optimal result. `createOrFold` should have returned
`%e1`.
The reason is that the execution of fold returns immediately after
`extractInsertFoldConstantOp`, causing subsequent folding logics to be
skipped.

---------

Co-authored-by: Yang Bai <yangb@nvidia.com>
2025-06-18 09:26:04 -07:00
woruyu
0018921148 [DAG] add (~a | x) & (a | y) -> (a & (x ^ y)) ^y for foldMaskedMerge (#144342)
### Summary
This PR resolves https://github.com/llvm/llvm-project/issues/143864

Add (~a | x) & (a | y) -> (a & (x ^ y)) ^y for foldMaskedMerge func
using SDPatternMatch

aftering adding this pattern, run ```ninja check-llvm-codegen```, all
other cases remain unchanged, so I add a
testcase(fold-masked-merge-demorgan.ll) for it

---------

Co-authored-by: Simon Pilgrim <llvm-dev@redking.me.uk>
2025-06-18 17:22:53 +01:00
Peng Liu
9827440f1e [libc++] Optimize ranges::{for_each, for_each_n} for segmented iterators (#132896)
Previously, the segmented iterator optimization was limited to `std::{for_each, for_each_n}`. This patch
extends the optimization to `std::ranges::for_each` and `std::ranges::for_each_n`, ensuring consistent
optimizations across these algorithms. This patch first generalizes the `std` algorithms by introducing
a `Projection` parameter, which is set to `__identity` for the `std` algorithms. Then we let the `ranges`
algorithms to directly call their `std` counterparts with a general `__proj` argument. Benchmarks
demonstrate performance improvements of up to 21.4x for ``std::deque::iterator`` and 22.3x for
``join_view`` of ``vector<vector<char>>``.

Addresses a subtask of #102817.
2025-06-18 12:22:47 -04:00
Peng Liu
dd40c460c4 [libc++] Clean up casts in std::forward_list (#130310)
The patch removes unnecessary casts to `void*` pointers, inline some
casts, and eliminates an identity cast.
2025-06-18 12:16:01 -04:00
Karlo Basioli
2a41350aab Fix bazel build issue caused by #142986 second attempt (#144721 didnt… (#144743)
… cover everything)
2025-06-18 17:15:12 +01:00
Ying Yi
6d785ca421 [Clang] Fix the clang/test/PCH/ignored-pch.c test. (#144737)
Change the test to check the exit status of the 'ls' command line
(instead of error message) since the error message is different when
running 'ls' command on the different Host machine.
2025-06-18 17:14:33 +01:00
Peng Liu
13510c0736 [libc++] Make list constexpr as part of P3372R3 (#129799)
This patch makes `std::list` constexpr as part of P3372R3.

Fixes #128659.
2025-06-18 12:13:50 -04:00
Christopher Ferris
a2cee05449 [scudo] Make report pointers const. (#144624)
Mark as many of the reportXX functions that take pointers const. This
avoid the need to use const_cast when calling these functions on an
already const pointer.

Fix reportHeaderCorruption calls where an argument was passed into an
append call that didn't use them.
2025-06-18 09:12:53 -07:00
Jon Roelofs
0fa373c77d [Matrix] Propagate shape information through PHI insts (#141681)
... and split them as we lower them, avoiding several shuffles in the
process.
2025-06-18 09:00:48 -07:00
Philip Reames
b5aaf9d988 [InstCombine] Implement vp.reverse reordering/elimination through binop/unop (#143963)
This simply copies the structure of the vector.reverse patterns from
just above, and reimplements them for the vp.reverse intrinsics when the
mask is all ones and the EVLs exactly match.

Its unfortunate that we have three different ways to represent a reverse
(shuffle, vector.reverse, and vp.reverse) but I don't see an obvious way
to remove any them because the semantics are slightly different.

This significantly improves vectorization in TSVC_2's s112 and s1112
loops when using EVL tail folding.
2025-06-18 08:53:45 -07:00
Krzysztof Parzyszek
5d502aeddf [flang][OpenMP] Clarify confusing error message (#144707)
The message "The atomic variable x should occur exactly once among the
arguments of the top-level [...] operator" was intended to convey that
(1) an atomic variable should be an argument, and (2) it should be
exactly one of the arguments. However, the wording turned out to be
sowing confusion instead.

Rework the corresponding check, and emit an individual error message for
each problematic situation:
- "atomic variable cannot be a proper subexpression of an argument",
- "atomic variable should appear as an argument",
- "atomic variable should be exactly one of the arguments".

Fixes https://github.com/llvm/llvm-project/issues/144599
2025-06-18 10:42:39 -05:00
Brox Chen
9da9d32670 [AMDGPU][True16][CodeGen] sext i16 inreg in true16 mode (#144024)
update sext pattern in true16, setting up proper vgpr16 reg use
2025-06-18 11:30:53 -04:00
Graham Hunter
8b8a3699db [AArch64] Use dupq (SVE2.1) for segmented lane splats (#144482)
Use the dupq instructions (when available) to represent a splat of the
same lane within each 128b segment of a wider fixed vector.
2025-06-18 16:27:29 +01:00
Nathan Gauër
3af4d4e810 [HLSL][SPIR-V] Fix LinkageAttribute emission for BuiltIn (#144701)
BuiltIn variables were missing the visibility attribute, which caused
the Linkage capability to be emitted by the backend.
2025-06-18 17:26:40 +02:00
John Brawn
b53c1e4ee8 [AArch64] Add ISel for postindex ld1/st1 in big-endian (#144387)
When big-endian we need to use ld1/st1 for vector loads and stores so
that we get the elements in the correct order, but this prevents
postindex addressing from being used. Fix this by adding the appropriate
ISel patterns, plus the relevant changes in ISelLowering and
ISelDAGToDAG to cause postindex addressing to be used.
2025-06-18 16:16:52 +01:00
amordo
e4c3b037bc [InstCombine] Fold tan(x) * cos(x) => sin(x) (#136319)
This patch enables folding `tan(x) * cos(x) -> sin(x)` under the `contract` flag.

Fixes https://github.com/llvm/llvm-project/issues/34950.
2025-06-18 23:12:31 +08:00
Karlo Basioli
8fc20bffab Fix bazel build issue caused by 142986 (#144721) 2025-06-18 16:07:56 +01:00
Orlando Cazalet-Hyams
36038a1048 [RemoveDIs][NFC] Remove dbg intrinsic handling code from SelectionDAG ISel (#144702) 2025-06-18 16:04:18 +01:00
Omair Javaid
6f4add3480 [compiler-rt] [Fuzzer] Fix ARMv7 test link failure by linking unwinder (#144495)
compiler-rt/lib/fuzzer/tests build was failing on armv7, with undefined
references to unwinder symbols, such as __aeabi_unwind_cpp_pr0.

This occurs because the test is built with `-nostdlib++` but `libunwind`
is not explicitly linked to the final test executable.

This patch resolves the issue by adding CMake logic to explicitly link
the required unwinder to the fuzzer tests, inspired by the same solution
used to fix Scudo build failures by https://reviews.llvm.org/D142888.
2025-06-18 19:23:54 +05:00
Andrei Golubev
ee070d0816 [mlir][bufferization] Support custom types (1/N) (#142986)
Following the addition of TensorLike and BufferLike type interfaces (see
00eaff3e9c), introduce minimal changes
required to bufferize a custom tensor operation into a custom buffer
operation.

To achieve this, new interface methods are added to TensorLike type
interface that abstract away the differences between existing (tensor ->
memref) and custom conversions.

The scope of the changes is intentionally limited (for example,
BufferizableOpInterface is untouched) in order to first understand the
basics and reach consensus design-wise.

---
Notable changes:
* mlir::bufferization::getBufferType() returns BufferLikeType (instead
of BaseMemRefType)
* ToTensorOp / ToBufferOp operate on TensorLikeType / BufferLikeType.
Operation argument "memref" renamed to "buffer"
* ToTensorOp's tensor type inferring builder is dropped (users now need
to provide the tensor type explicitly)
2025-06-18 16:18:12 +02:00
Akira Hatanaka
40d2f39210 [Sema][ObjC] Loosen restrictions on reinterpret_cast involving indirect ARC-managed pointers (#144458)
Allow using reinterpret_cast for conversions between indirect ARC
pointers and other pointer types.

rdar://152905399
2025-06-18 07:08:32 -07:00
Nikolas Klauser
9db7502d22 [libc++] Move __has_iterator_typedefs to the up-to-C++17 implementation of iterator_traits (#144265)
`__has_iterator_typedefs` is only used in the up-to-C++17 implementation
of `type_traits`. To make that clearer the struct is moved into that
code block.
2025-06-18 15:55:06 +02:00