Commit Graph

541462 Commits

Author SHA1 Message Date
sribee8
6f4e4ea177 [libc] Internal getrandom implementation (#144427)
Implemented an internal getrandom to avoid calls to the public one in
table.h

---------

Co-authored-by: Sriya Pratipati <sriyap@google.com>
2025-06-18 17:56:57 +00:00
Tomer Shafir
835d3034fe [AArch64] improve zero-cycle regmov test (#143680)
- Add a `gpr32` suffix to test name to denote the specific register
class being checked
- Expand `-mtriple=arm64-apple-ios` to `-march=arm64` to broaden the
test context to the generic architecture, as the specific triple is not
required
- Port `bl` match to Linux too via the regex: `{{_?foo}}`
- Advance `-mcpu=cyclone` to the newer M series major `-mcpu=apple-m1`
- Use `-mcpu` so that `-mattr=-zcm` has a real effect
- Add a test that generic arm64 doesn't optimize for ZCM
- Distinguish 4 different assembly layouts: NOTCPU, CPU, NOTATTR, ATTR
- Fix broken test logic, for example: `; NOT: mov [[REG2:w[0-9]+]], w3`
matched `mov w1, w3` then `REG2` captured `w1` but then `; NOT: mov w1,
[[REG2]]` matched by prefix `mov, w1, w19` even though it should have
matched `mov w1, w1`. This change adds explicit matches for all of the
generated copies.
2025-06-18 18:56:33 +01:00
Lei Huang
82acd8c377 [PowerPC] Add code to spill and restore DMRp registers (#142443) 2025-06-18 13:50:57 -04:00
Justin King
d9f7979a63 sanitizer_common: add unsupported test for free_sized and free_aligned_sized from C23 (#144727)
Signed-off-by: Justin King <jcking@google.com>
2025-06-18 10:24:38 -07:00
Artem Belevich
298f1c276f Revert "Add missing intrinsics to cuda headers" (#144755)
Reverts llvm/llvm-project#143664
as it breaks CUDA compilation.
2025-06-18 10:08:27 -07:00
John Brawn
77bc254851 [AArch64] Fix build failure with -Werror (#144749)
PR#144387 caused buildbot failures with -Werror due to a comparison
between signed and unsigned types. Fix this with an explicit cast.
2025-06-18 18:05:02 +01:00
Alexis Engelke
2a8c65e983 [CodeGen][NFC] Fix quadratic c-t for large jump tables
Deleting a basic block removes all references from jump tables, which
is O(n). When freeing a MachineFunction, all basic blocks are deleted
before the jump tables, causing O(n^2) runtime. Fix this by deallocating
the jump table first.

Test case generator:

    import sys

    n = int(sys.argv[1])
    print("define void @f(i64 %c, ptr %p) {")
    print("  switch i64 %c, label %d [")
    for i in range(n):
        print(f"    i64 {i}, label %h{i}")
    print(f"  ]")
    for i in range(n):
        print(f'h{i}:')
        print(f'  store i64 {i*i}, ptr %p')
        print(f'  ret void')
    print('d:')
    print('  ret void')
    print('}')

Improvement at 5000 entries:

    Benchmark 1: ./llc.pre -filetype=obj -O0 <switch5k.bc
      Time (mean ± σ):      49.7 ms ±   1.0 ms
      Range (min … max):    48.0 ms …  52.1 ms    57 runs

    Benchmark 2: ./llc.post -filetype=obj -O0 <switch5k.bc
      Time (mean ± σ):      39.4 ms ±   0.8 ms
      Range (min … max):    37.1 ms …  41.1 ms    72 runs

    Summary
      ./llc.post -filetype=obj -O0 <switch5k.bc ran
        1.26 ± 0.04 times faster than ./llc.pre -filetype=obj -O0 <switch5k.bc

Improvement at 20000 entries:

    Benchmark 1: ./llc.pre -filetype=obj -O0 <switch20k.bc
      Time (mean ± σ):     281.7 ms ±   1.0 ms
      Range (min … max):   280.2 ms … 283.0 ms    10 runs

    Benchmark 2: ./llc.post -filetype=obj -O0 <switch20k.bc
      Time (mean ± σ):     123.9 ms ±   1.5 ms
      Range (min … max):   121.4 ms … 129.2 ms    23 runs

    Summary
      ./llc.post -filetype=obj -O0 <switch20k.bc ran
        2.27 ± 0.03 times faster than ./llc.pre -filetype=obj -O0 <switch20k.bc

Pull Request: https://github.com/llvm/llvm-project/pull/144108
2025-06-18 18:56:30 +02:00
Krzysztof Parzyszek
4084ffcf1e [flang] Show types in DumpEvExpr (#143743)
When dumping evaluate::Expr, show type names which contain a lot of
useful information.

For example show
```
expr <Fortran::evaluate::SomeType> {
  expr <Fortran::evaluate::SomeKind<Fortran::common::TypeCategory::Integer>> {
    expr <Fortran::evaluate::Type<Fortran::common::TypeCategory::Integer, 4>> {
      ...
```
instead of
```
expr T {
  expr T {
    expr T {
      ...
```
2025-06-18 11:31:03 -05:00
Yang Bai
fe3933da15 [mlir][vector] Support complete folding in single pass for vector.insert/vector.extract (#142124)
### Description

This patch improves the folding efficiency of `vector.insert` and
`vector.extract` operations by not returning early after successfully
converting dynamic indices to static indices.

This PR also renames the test pass `TestConstantFold` to
`TestSingleFold` and adds comprehensive documentation explaining the
single-pass folding behavior.

### Motivation

Since the `OpBuilder::createOrFold` function only calls `fold` **once**,
the current `fold` methods of `vector.insert` and `vector.extract` may
leave the op in a state that can be folded further. For example,
consider the following un-folded IR:
```
%v1 = vector.insert %e1, %v0 [0] : f32 into vector<128xf32>
%c0 = arith.constant 0 : index
%e2 = vector.extract %v1[%c0] : f32 from vector<128xf32>
```
If we use `createOrFold` to create the `vector.extract` op, then the
result will be:
```
%v1 = vector.insert %e1, %v0 [127] : f32 into vector<128xf32>
%e2 = vector.extract %v1[0] : f32 from vector<128xf32>
```
But this is not the optimal result. `createOrFold` should have returned
`%e1`.
The reason is that the execution of fold returns immediately after
`extractInsertFoldConstantOp`, causing subsequent folding logics to be
skipped.

---------

Co-authored-by: Yang Bai <yangb@nvidia.com>
2025-06-18 09:26:04 -07:00
woruyu
0018921148 [DAG] add (~a | x) & (a | y) -> (a & (x ^ y)) ^y for foldMaskedMerge (#144342)
### Summary
This PR resolves https://github.com/llvm/llvm-project/issues/143864

Add (~a | x) & (a | y) -> (a & (x ^ y)) ^y for foldMaskedMerge func
using SDPatternMatch

aftering adding this pattern, run ```ninja check-llvm-codegen```, all
other cases remain unchanged, so I add a
testcase(fold-masked-merge-demorgan.ll) for it

---------

Co-authored-by: Simon Pilgrim <llvm-dev@redking.me.uk>
2025-06-18 17:22:53 +01:00
Peng Liu
9827440f1e [libc++] Optimize ranges::{for_each, for_each_n} for segmented iterators (#132896)
Previously, the segmented iterator optimization was limited to `std::{for_each, for_each_n}`. This patch
extends the optimization to `std::ranges::for_each` and `std::ranges::for_each_n`, ensuring consistent
optimizations across these algorithms. This patch first generalizes the `std` algorithms by introducing
a `Projection` parameter, which is set to `__identity` for the `std` algorithms. Then we let the `ranges`
algorithms to directly call their `std` counterparts with a general `__proj` argument. Benchmarks
demonstrate performance improvements of up to 21.4x for ``std::deque::iterator`` and 22.3x for
``join_view`` of ``vector<vector<char>>``.

Addresses a subtask of #102817.
2025-06-18 12:22:47 -04:00
Peng Liu
dd40c460c4 [libc++] Clean up casts in std::forward_list (#130310)
The patch removes unnecessary casts to `void*` pointers, inline some
casts, and eliminates an identity cast.
2025-06-18 12:16:01 -04:00
Karlo Basioli
2a41350aab Fix bazel build issue caused by #142986 second attempt (#144721 didnt… (#144743)
… cover everything)
2025-06-18 17:15:12 +01:00
Ying Yi
6d785ca421 [Clang] Fix the clang/test/PCH/ignored-pch.c test. (#144737)
Change the test to check the exit status of the 'ls' command line
(instead of error message) since the error message is different when
running 'ls' command on the different Host machine.
2025-06-18 17:14:33 +01:00
Peng Liu
13510c0736 [libc++] Make list constexpr as part of P3372R3 (#129799)
This patch makes `std::list` constexpr as part of P3372R3.

Fixes #128659.
2025-06-18 12:13:50 -04:00
Christopher Ferris
a2cee05449 [scudo] Make report pointers const. (#144624)
Mark as many of the reportXX functions that take pointers const. This
avoid the need to use const_cast when calling these functions on an
already const pointer.

Fix reportHeaderCorruption calls where an argument was passed into an
append call that didn't use them.
2025-06-18 09:12:53 -07:00
Jon Roelofs
0fa373c77d [Matrix] Propagate shape information through PHI insts (#141681)
... and split them as we lower them, avoiding several shuffles in the
process.
2025-06-18 09:00:48 -07:00
Philip Reames
b5aaf9d988 [InstCombine] Implement vp.reverse reordering/elimination through binop/unop (#143963)
This simply copies the structure of the vector.reverse patterns from
just above, and reimplements them for the vp.reverse intrinsics when the
mask is all ones and the EVLs exactly match.

Its unfortunate that we have three different ways to represent a reverse
(shuffle, vector.reverse, and vp.reverse) but I don't see an obvious way
to remove any them because the semantics are slightly different.

This significantly improves vectorization in TSVC_2's s112 and s1112
loops when using EVL tail folding.
2025-06-18 08:53:45 -07:00
Krzysztof Parzyszek
5d502aeddf [flang][OpenMP] Clarify confusing error message (#144707)
The message "The atomic variable x should occur exactly once among the
arguments of the top-level [...] operator" was intended to convey that
(1) an atomic variable should be an argument, and (2) it should be
exactly one of the arguments. However, the wording turned out to be
sowing confusion instead.

Rework the corresponding check, and emit an individual error message for
each problematic situation:
- "atomic variable cannot be a proper subexpression of an argument",
- "atomic variable should appear as an argument",
- "atomic variable should be exactly one of the arguments".

Fixes https://github.com/llvm/llvm-project/issues/144599
2025-06-18 10:42:39 -05:00
Brox Chen
9da9d32670 [AMDGPU][True16][CodeGen] sext i16 inreg in true16 mode (#144024)
update sext pattern in true16, setting up proper vgpr16 reg use
2025-06-18 11:30:53 -04:00
Graham Hunter
8b8a3699db [AArch64] Use dupq (SVE2.1) for segmented lane splats (#144482)
Use the dupq instructions (when available) to represent a splat of the
same lane within each 128b segment of a wider fixed vector.
2025-06-18 16:27:29 +01:00
Nathan Gauër
3af4d4e810 [HLSL][SPIR-V] Fix LinkageAttribute emission for BuiltIn (#144701)
BuiltIn variables were missing the visibility attribute, which caused
the Linkage capability to be emitted by the backend.
2025-06-18 17:26:40 +02:00
John Brawn
b53c1e4ee8 [AArch64] Add ISel for postindex ld1/st1 in big-endian (#144387)
When big-endian we need to use ld1/st1 for vector loads and stores so
that we get the elements in the correct order, but this prevents
postindex addressing from being used. Fix this by adding the appropriate
ISel patterns, plus the relevant changes in ISelLowering and
ISelDAGToDAG to cause postindex addressing to be used.
2025-06-18 16:16:52 +01:00
amordo
e4c3b037bc [InstCombine] Fold tan(x) * cos(x) => sin(x) (#136319)
This patch enables folding `tan(x) * cos(x) -> sin(x)` under the `contract` flag.

Fixes https://github.com/llvm/llvm-project/issues/34950.
2025-06-18 23:12:31 +08:00
Karlo Basioli
8fc20bffab Fix bazel build issue caused by 142986 (#144721) 2025-06-18 16:07:56 +01:00
Orlando Cazalet-Hyams
36038a1048 [RemoveDIs][NFC] Remove dbg intrinsic handling code from SelectionDAG ISel (#144702) 2025-06-18 16:04:18 +01:00
Omair Javaid
6f4add3480 [compiler-rt] [Fuzzer] Fix ARMv7 test link failure by linking unwinder (#144495)
compiler-rt/lib/fuzzer/tests build was failing on armv7, with undefined
references to unwinder symbols, such as __aeabi_unwind_cpp_pr0.

This occurs because the test is built with `-nostdlib++` but `libunwind`
is not explicitly linked to the final test executable.

This patch resolves the issue by adding CMake logic to explicitly link
the required unwinder to the fuzzer tests, inspired by the same solution
used to fix Scudo build failures by https://reviews.llvm.org/D142888.
2025-06-18 19:23:54 +05:00
Andrei Golubev
ee070d0816 [mlir][bufferization] Support custom types (1/N) (#142986)
Following the addition of TensorLike and BufferLike type interfaces (see
00eaff3e9c), introduce minimal changes
required to bufferize a custom tensor operation into a custom buffer
operation.

To achieve this, new interface methods are added to TensorLike type
interface that abstract away the differences between existing (tensor ->
memref) and custom conversions.

The scope of the changes is intentionally limited (for example,
BufferizableOpInterface is untouched) in order to first understand the
basics and reach consensus design-wise.

---
Notable changes:
* mlir::bufferization::getBufferType() returns BufferLikeType (instead
of BaseMemRefType)
* ToTensorOp / ToBufferOp operate on TensorLikeType / BufferLikeType.
Operation argument "memref" renamed to "buffer"
* ToTensorOp's tensor type inferring builder is dropped (users now need
to provide the tensor type explicitly)
2025-06-18 16:18:12 +02:00
Akira Hatanaka
40d2f39210 [Sema][ObjC] Loosen restrictions on reinterpret_cast involving indirect ARC-managed pointers (#144458)
Allow using reinterpret_cast for conversions between indirect ARC
pointers and other pointer types.

rdar://152905399
2025-06-18 07:08:32 -07:00
Nikolas Klauser
9db7502d22 [libc++] Move __has_iterator_typedefs to the up-to-C++17 implementation of iterator_traits (#144265)
`__has_iterator_typedefs` is only used in the up-to-C++17 implementation
of `type_traits`. To make that clearer the struct is moved into that
code block.
2025-06-18 15:55:06 +02:00
Sergei Lebedev
1d6f1029f7 [mlir] [python] Fixed the return type of MemRefType.get_strides_and_offset (#144523)
Previously, the return type for `offset` was `list[int]`, which clearly
is not right.
2025-06-18 09:53:20 -04:00
lorenzo chelini
c5613dc863 [MLIR] Mark LLVM::FMAOp as legal (#144671)
Mark LLVM::FMAOp as legal in configureGpuToNVVMConversionLegality, since
we can handle intrinsic lowering in the NVPTX backend and emit
fma.rn.f32.
2025-06-18 15:49:00 +02:00
Mircea Trofin
bdac9580f3 [nfc][jt] Drop std::optional pointers (#144548)
The `std::optional` didn't add any semantics that couldn't be modeled with the pointers being `nullptr`.
2025-06-18 06:40:06 -07:00
Eric Fiselier
fda6b751f1 Fix libc++ restarter job.
A while ago, the test workflow was updated with a new preemption regex,
however it was only applied to the test job, and not the job
that's actually restarting the failed libc++ test runs.

This fix should correct the issue and get the restarter working
again.
2025-06-18 09:36:36 -04:00
Jack Styles
671caef379 [Flang][OpenMP] Update relevant warnings to emit when OMP >= v5.2 (#144492)
There has been a number of deprecation warnings that have been added to
Flang, however these features are only deprecated when the OpenMP
Version being used is 5.2 or later. Previously, flang did not consider
the version with the warnings so would always be emitted.

Flang now ensures warnings are emitted for the appropriate version of
OpenMP, and tests are updated to reflect this change.
2025-06-18 14:35:53 +01:00
Tobias Stadler
1f34d68c4f [Remarks] Remove yaml-strtab format (#144527)
Background: The yaml-strtab format looks just like the yaml format,
except that the values in the key/value pairs of the remarks are
deduplicated and replaced by indices into a string table (see removed
test cases for examples). The motivation behind this format was to
reduce size of the remarks files. However, it was quickly superseded by
the bitstream format.

Therefore, remove the yaml-strtab format, as it doesn't have a good
usecase anymore:
  - It isn't particularly efficient
  - It isn't human-readable
  - It isn't straightforward to parse in external tools that can't use the
remarks library. We don't even support it in opt-viewer.

llvm-remarkutil is also missing options to parse/convert yaml-strtab, so
the chance that anyone is actually using this format is low.
2025-06-18 14:25:41 +01:00
Garvit Gupta
c4d99704e2 Revert "Reland [Driver] Add support for GCC installation detection in… (#144684)
… Baremetal toolchain (#144640)"

This reverts commit 45ea46c446.
2025-06-18 18:53:45 +05:30
Kunwar Grover
6729da647a [mlir][amdgpu][nfc] Add PatternBenefit to populate methods (#144663) 2025-06-18 15:19:17 +02:00
Timm Bäder
68471d29ee Revert "Reapply "[clang][bytecode] Allocate IntegralAP and Floating types usi… (#144676)"
This reverts commit 7c15edb306.

This still breaks clang-armv8-quick:
https://lab.llvm.org/buildbot/#/builders/154/builds/17587
2025-06-18 15:17:53 +02:00
Frank Schlimbach
8584abb05a [mlir] mlir/test/lit.local.cfg -> mlir/test/Target/SPIRV/lit.local.cfg (#144685)
renamed: mlir/test/lit.local.cfg -> mlir/test/Target/SPIRV/lit.local.cfg
2025-06-18 15:04:55 +02:00
Tom Eccles
a83d3362f6 [flang][OpenMP] Don't allow DO CONCURRENT inside of a loop nest (#144506)
I don't think DO CONCURRENT fits the definition of a Canonical Loop Nest
(OpenMP 6.0 section 6.4.1).
It is however explicitly allowed for the LOOP construct (6.0 section
13.8).

There's some obscure language in OpenMP 6.0 for the LOOP construct:

> If the collapsed loop is a DO CONCURRENT loop, neither the
> data-sharing attribute clauses nor the collapse clause may be
specified.

From the surrounding context, I think "collapsed loop" just means the
loop that the LOOP construct applies to. So I will interpret this to
mean that DO CONCURRENT can only be used with the LOOP construct if it
does not contain the COLLAPSE clause.

This also fixes a bug where the associated clause was never cleared
after it was set.

Fixes #144178
2025-06-18 14:02:11 +01:00
Krzysztof Parzyszek
4b2ab1494b [flang][OpenMP] Don't crash on iterator modifier in declare mapper (#144359)
Both the declare mapper directive argument, and the iterator modifier
can contain declaration-type-spec, so make sure that the processing of
one ends before processing of the other begins in semantic analysis.
2025-06-18 07:46:49 -05:00
Matthias Springer
66580f77b8 [mlir][Transforms][NFC] Dialect Conversion: Keep unresolvedMaterializations up to date (#144254)
`unresolvedMaterializations` is a mapping from
`UnrealizedConversionCastOp` to `UnresolvedMaterializationRewrite`. This
mapping is needed to find the correct type converter for an unresolved
materialization.

With this commit, `unresolvedMaterializations` is updated immediately
when an op is being erased. This also cleans up the code base a bit:
`SingleEraseRewriter` is now used only during the "cleanup" phase and no
longer needed as a field of `ConversionRewriterImpl`.

This commit is in preparation of the One-Shot Dialect Conversion
refactoring: `allowPatternRollback = false` will in the future trigger
immediate materialization of all IR changes.
2025-06-18 14:42:09 +02:00
Andrei Golubev
a1c2a71293 [mlir][bufferization] Use Type instead of Value in unknown conversion (#144658)
Generally, bufferization should be able to create a memref from a tensor
without needing to know more than just a mlir::Type. Thus, change
BufferizationOptions::UnknownTypeConverterFn to accept just a type
(mlir::TensorType for now) instead of mlir::Value. Additionally, apply
the same rationale to getMemRefType() helper function.

Both changes are prerequisites to enable custom types support in
one-shot bufferization.
2025-06-18 14:38:58 +02:00
Ties Stuij
6265ca686d [AArch64] Add Cortex-A320 scheduling model (#144385)
Instead of using the Cortex-A510 scheduling model, Cortex-A320 now uses
its own scheduling model, based off of the Cortex-A320 Software
Optimization Guide:

https://developer.arm.com/documentation/110285/r0p1

---------

Co-authored-by: Nashe Mncube <Nashe.Mncube@arm.com>
2025-06-18 13:38:49 +01:00
Timm Baeder
7c15edb306 Reapply "[clang][bytecode] Allocate IntegralAP and Floating types usi… (#144676)
…ng an allocator (#144246)"

This reverts commit 57828fec76.
2025-06-18 14:37:29 +02:00
Simon Pilgrim
34a4894149 [X86] detectZextAbsDiff - use SDPatternMatch::m_Abs() matcher. NFC. 2025-06-18 13:21:09 +01:00
Benjamin Maxwell
d8e8ab7977 [AArch64][SME] Fix restoring callee-saves from FP with hazard padding (#143371)
Currently, when hazard-padding is enabled a (fixed-size) hazard slot is
placed in the CS area, just after the frame record. The size of this
slot is part of the "CalleeSaveBaseToFrameRecordOffset". The SVE
epilogue emission code assumed this offset was always zero, and
incorrectly setting the stack pointer, resulting in all SVE registers
being reloaded from incorrect offsets.

```
| prev_lr                           |
| prev_fp                           |
| (a.k.a. "frame record")           |
|-----------------------------------| <- fp(=x29)
|   <hazard padding>                |
|-----------------------------------| <- callee-saved base
|                                   |
| callee-saved fp/simd/SVE regs     |
|                                   |
|-----------------------------------| <- SVE callee-save base
```

i.e. in the above diagram, the code assumed `fp == callee-saved base`.
2025-06-18 12:58:17 +01:00
Oleksandr "Alex" Zinenko
8a469da8b2 [mlir] remove unnecessary atomic_rmw expansions (#144515)
The expansion of `memref.atomic_rmw` into a `memref.generic_atomic_rmw`
for floating-point min/max operations is no longer necessary as those
are now supported by the LLVM dialect and LLVM IR.

Furthermore, combining this expansion with direct lowering of
`generic_atomic_rmw` could leads to invalid LLVM dialect IR with
`cmpxchg` operating on floating-point values that it does not support.
2025-06-18 13:32:46 +02:00
Garvit Gupta
66d6964a55 Fix tests failing on fuchsia clang x86_64 builders (#144655)
Fuchsia sets CLANG_DEFAULT_UNWINDLIB to libunwind. As a result, when
rtlib is set to libgcc and unwindlib is not explicitly specified, tests
using Fuchsia as the default platform will fail. To address this, the
affected tests are now xfailed

This change fixes the following tests introduced in
45ea46c446:

clang/test/Driver/aarch64-toolchain-extra.c
clang/test/Driver/arm-toolchain-extra.c
clang/test/Driver/aarch64-toolchain.c
clang/test/Driver/arm-toolchain.c
2025-06-18 16:50:48 +05:30