Commit Graph

540158 Commits

Author SHA1 Message Date
Aiden Grossman
34e5d8ef16 [CI] Remove buildkite from metrics container (#143049)
Now that buildkite has been sunsetted, remove buildkite tracking from
the metrics container as it does not do anything.
2025-06-06 12:58:57 -07:00
Andy Kaylor
16dda4d3f4 [CIR] Add support for completing forward-declared types (#143176)
This adds the needed handling for completing record types which were
previously declared leading us to create an incomplete record type.
2025-06-06 12:54:43 -07:00
Alex MacLean
107601ed06 [InstCombine] Allow min/max in constant BOp min/max folding (#142878)
Extend folding for `X Pred C2 ? X BOp C1 : C2 BOp C1` to `min/max(X, C2)
BOp C1` to allow min and max as `BOp`. This ensures a constant clamping
pattern is folded into a pair of min/max instructions. Here is a
simplified example of a case where this folding is not occurring
currently.

int clampToU8(int v) {
    if (v < 0) return 0;
    if (v > 255) return 255;
    return v;
}

https://godbolt.org/z/78jhKPWbv

Generic proof: https://alive2.llvm.org/ce/z/cdpLYy
2025-06-06 12:44:04 -07:00
David Blaikie
bc7f1eadbf Fix forward for new DWARF DW_OP enum to address warning in lldb 2025-06-06 19:43:49 +00:00
Peter Collingbourne
e3c72e1075 LowerTypeTests: Shrink check size by 1 instruction on x86.
We currently generate code like this on x86 for a jump table with 5 elements,
assuming the call target is in rbx:

lea global_addr(%rip), %rax # initialize temporary rax with base address
mov %rbx, %rcx              # initialize another temporary rcx for index (rbx will be used for the call, so it is still live)
sub %rax, %rcx              # compute `address - base`
ror $0x3, %rcx              # compute `(address - base) ror 3` i.e. index
cmp $0x4, %rcx              # check index <= 4
ja .Ltrap
[...]
.Ltrap:
ud1

A more efficient instruction sequence, that only needs one temporary
register and one fewer instruction, is possible by subtracting the
address we are testing from the fixed address instead of vice versa:

lea (global_addr + 4*8)(%rip), %rax # initialize temporary rax with address of last element
sub %rbx, %rax                      # compute `last element - address`
ror $0x3, %rax                      # compute `(last element - address) ror 3` i.e. 4 - index
cmp $0x4, %rax                      # check 4 - index <= 4 (same as above)
ja .Ltrap
[...]
.Ltrap:
ud1

Change LowerTypeTests to generate that sequence. As a consequence, the
order of bits in the bitsets is reversed. Because it doesn't matter how we
do the subtraction on other architectures (to the best of my knowledge),
do so unconditionally.

Reviewers: fmayer, vitalybuka

Reviewed By: fmayer

Pull Request: https://github.com/llvm/llvm-project/pull/142887
2025-06-06 12:43:24 -07:00
Peter Collingbourne
faaae66a55 LowerTypeTests: Precommit test for generated x86 asm.
Reviewers: fmayer

Reviewed By: fmayer

Pull Request: https://github.com/llvm/llvm-project/pull/143189
2025-06-06 12:42:33 -07:00
Joseph Huber
5823e92749 [libc] Fix missing includes after transitive dependency changed 2025-06-06 14:34:27 -05:00
vporpo
47d9473e49 [SandboxVec][BottomUpVec] Fix ownership of Legality (#143018)
Fix the ownership of `Legality` member variable of BottomUpVec. It
should get created in runOnFunction() and get destroyed when the
function returns.
2025-06-06 12:21:25 -07:00
Konrad Kleine
7db847df55 Filter out configuration file from compile commands (#131099)
The commands to run the compilation when printed with `-###` contain
various irrelevant lines for the perf-training. Most of them are
filtered out already but when configured with
`CLANG_CONFIG_FILE_SYSTEM_DIR` a new line like the following is
added and needs to be filtered out:

`Configuration file: /etc/clang/x86_64-redhat-linux-gnu-clang.cfg`
2025-06-06 21:08:56 +02:00
erichkeane
39bb267445 [OpenACC][CIR][NFC] Add device_ptr async clause tests
Add a test to ensure that device_ptr properly respects the 'async'
functionality we added for copy/etc.
2025-06-06 11:58:58 -07:00
David Truby
8f7e57485e [llvm] Fix cmake string expansion in CrossCompile.cmake (#138901) 2025-06-06 19:27:54 +01:00
erichkeane
b84127bb13 [OpenACC][CIR] Lowering for 'deviceptr' for compute/combined constructs
This ends up being a simple clause that only adds 'acc.deviceptr' to the
dataOperands list on the compute construct operation.
2025-06-06 11:26:35 -07:00
Michael Jones
59f88a8e92 [libc] clean up string_utils memory functions (#143031)
The string_utils.h file previously included both memcpy and bzero. There
were no uses of bzero, and only one use of memcpy which was replaced
with __builtin_memcpy.

Also fix strsep which was broken by this change, fix a useless assert of
"sizeof(char) == sizeof(cpp::byte)", and update the bazel.
2025-06-06 11:16:54 -07:00
Michael Jones
d8bfb4719d [libc] clean up unused exp_utils (#143181)
This file's just left over from old code, but it doesn't compile
anymore. It's never used so this patch just removes it.
2025-06-06 11:02:55 -07:00
Erich Keane
c02403e37f [OpenACC][CIR] Implement 'host_data' lowering, plus all clauses (#143136)
'host_data' has its own Op kind, so this handles the lowering there, it
looks exactly like the other ones we've done so far, so nothing novel
here.

host_data takes 3 clauses, 1 of which is required.

'use_device' is required, and results in an acc.use_device operation,
  which then feeds into the dataOperands list on acc.host_data.

'if_present' is a simple attribute on the operand.

'if' is a condition on the operand, identical to our other handling of
'if'.

This patch handles all of these.
2025-06-06 10:58:39 -07:00
Thrrreeee
28e2256a1f [llvm][DebugInfo] Add support for DW_OP_GNU_implicit_pointer (#142913)
This patch introduces support for the DWARF operation
`DW_OP_GNU_implicit_pointer `(value 0xf3) within LLVM's DWARF parsing
and expression evaluation infrastructure. This GNU extension is used to
describe the location of a variable that is itself a pointer, where the
value of this pointer is stored at an address derived from another DWARF
location expression plus a constant offset.
Motivation:
Compilers like GCC use `DW_OP_GNU_implicit_pointer `to represent the
location of certain variables.Without support for this opcode, debuggers
like LLDB (and other tools relying on LLVM's DWARF libraries) cannot
correctly resolve the location of such variables, leading to an
inability to inspect their values or an incorrect debugging experience.
2025-06-06 10:56:30 -07:00
Florian Mayer
44a6a44573 [NFC] [DebugCounter] warn if --debug-counter is unused in NDEBUG (#143057)
Co-authored-by: Nikita Popov <npopov@redhat.com>
2025-06-06 10:54:07 -07:00
Sam Elliott
5dc2f4499b [RISCV] Mark QC Relocations as Relaxable (#142794)
Some of the QC relocations are relaxable, in particular there are
relaxations defined for:
- `R_RISCV_QC_E_JUMP_PLT`
- `R_RISCV_QC_E_32`
- `R_RISCV_QC_ABS20_U`

This change ensures that llvm-mc correctly emits R_RISCV_RELAX
relocations for the relevant fixups.
2025-06-06 10:50:33 -07:00
Slava Zakharin
ba8077c9dd [flang] Use optimal shape for assign expansion as a loop. (#143050)
During `hlfir.assign` inlining and `ElementalAssignBufferization`
we can deduce the optimal shape from `lhs` and `rhs` shapes.
It is probably better be done in a separate pass that propagates
constant shapes, but I have not seen any benchmarks that would
benefit from this yet. So consider this as a workaround for a bigger
TODO issue.

The `ElementalAssignBufferization` case is from 465.tonto,
but I do not have performance results yet (I do not expect much).
2025-06-06 10:45:38 -07:00
Slava Zakharin
e16f603351 [flang] Relax conflicts detection in ElementalAssignBufferization. (#143045)
If there is a read-effect operation inside `hlfir.elemental`,
there is no reason to block moving it to the assignment point
unless there are write-effect operations between the elemental
and the assignment. The previous code was disallowing the optimization
even if there were only read-effect operations in between.

This case is from 465.tonto, though this change does not improve
performance at all.
2025-06-06 10:45:26 -07:00
Kazu Hirata
ede9555b0f [clang] Fix a typo in documentation (#143169) 2025-06-06 10:44:06 -07:00
Chenguang Wang
45c3053ae0 [AArch64] Fix unused-variable warning for non-dbg builds. (#143175)
AArch64ISelLowering.cpp currently fails -Wunused-variable because SrcVT
is only used in assert(), so it is an unused variable if not using debug
builds. This behavior was introduced in 2c0a2261.
2025-06-06 10:35:16 -07:00
Joseph Huber
525726a520 [libc] Cleanup unimplemented math functions (#143173)
Summary:
This patch cleans up the leftoever files that were either implemented or
are still unimplemented stubs.
2025-06-06 12:27:13 -05:00
Simon Pilgrim
e6d62c910f [X86] IsElementEquivalent - pull out vector element count mismatch code. NFC.
All cases rely on the ops having the same vector count as the masksize, and this is unlikely to change now that we handle bitcasts, so just early out.
2025-06-06 18:06:54 +01:00
Hui
155fd97a66 [libc++] flat_meow transparent comparator string literals (#133654)
See discussion in https://cplusplus.github.io/LWG/issue4239

    std::flat_map<std::string, int, std::less<>> m;
    m.try_emplace("abc", 5);  // hard error

The reason is that we specify in 23.6.8.7 [flat.map.modifiers]/p21
the effect to be as if `ranges::upper_bound` is called.

`ranges::upper_bound` requires indirect_strict_weak_order, which
requires the comparator to be invocable for all combinations. In this
case, it requires

    const char (&)[4] < const char (&)[4]

to be well-formed, which is no longer the case in C++26 after
https://wg21.link/P2865R6.

This patch uses `std::upper_bound` instead.
2025-06-06 13:05:36 -04:00
David Green
b0f53d95c1 [AArch64] Add SUBS(CSEL) fold from brcond. (#142103)
This folds away subs(csel(1, 0, cc)) from brcond, that can be produced
in certain places from compares that are not already subs (like adc/sbc
generated from i128 add_with_overflow intrinsics).
2025-06-06 18:00:00 +01:00
Darren Wihandi
c9c60172a1 [mlir][spirv] Implement lowering gpu.subgroup_reduce with cluster size for SPIRV (#141402)
Implement lowering of `gpu.subgroup_reduce` with a cluster size
attribute to SPIRV by using the `ClusteredReduce` group operation.
2025-06-06 12:50:18 -04:00
David Green
645c0d509c [AArch64][GlobalISel] Ensure we have a insert-subreg v4i32 GPR pattern (#142724)
This is the GISel equivalent of scalar_to_vector, making sure that when
we insert into undef we use a fmov that avoids the artificial dependency
on the previous register. This adds v2i32 and v2i64 patterns too for
similar reasons.
2025-06-06 17:44:33 +01:00
LLVM GN Syncbot
73a4c363bd [gn build] Port c9c687d8d0 2025-06-06 16:32:37 +00:00
Sterling-Augustine
c9c687d8d0 [NFC] Split portions of DWARFDataExtractor into new class (#140096)
Currently, DWARFDataExtractor can extract data without performing
relocations, (eg, by checking if the section pointer is null) but is
coded such that it still depends on all the relocation machinery, like
DWARFSections and similar. All at build time.

Extract most functionality into a new class, DWARFDataExtractorBase, and 
have DWARFDataExtractor add the relocation dependent pieces via CRTP.  
Add a new class, DWARFDataExtractorSimple, which does no relocation at 
all. This will allow moving DWARFDataExtractorSimple into a new lower-level, 
lighter-weight library with fewer external build-time dependencies.

This is another in a series of refactoring changes to create a new
better-layered, low-level Dwarf library that can be called from
lower-level code without circular dependencies.
2025-06-06 09:26:51 -07:00
Matt Arsenault
cb3d77d107 AArch64: Partially move setting of libcall names out of TargetLowering (#142985)
Move the parts that aren't dependent on the subtarget into
RuntimeLibcallInfo, which should contain the superset of all possible
runtime calls and be accurate outside of codegen.
2025-06-07 01:17:42 +09:00
Michael Buch
30f5240905 [lldb][Modules] Make decls from submodules visible for name lookup (#143098)
This patch ensures we can find decls in submodules during expression
evaluation. Previously, submodules would have all their decls marked as
`Hidden`. When Clang asked LLDB for decls, it would see them in the
submodule but `clang::Sema` would reject them because they weren't
`Visible` (specifically, `getAcceptableDecl` would fail during
`CppNameLookup`). Here we just mark the submodule as visible to work
around this problem.
2025-06-06 17:17:00 +01:00
Kazu Hirata
1eb843b1a0 [mlir] Ensure newline at the end of files (NFC) (#143155) 2025-06-06 09:16:52 -07:00
Kazu Hirata
dd201e50ba [clang] Ensure newline at the end of files (NFC) (#143154) 2025-06-06 09:16:46 -07:00
Kazu Hirata
6ab6321d03 [clang] Use range-based for loops (NFC) (#143153)
Note that use of llvm::for_each is discouraged unless we have functors
readily available.
2025-06-06 09:16:41 -07:00
Guy David
2c0a2261b1 [AArch64] Spare N2I roundtrip when splatting float comparison (#141806)
Transform `select_cc t1, t2, -1, 0` for floats into a vector comparison
which generates a mask, which is later on combined with potential
vectorized DUPs.
2025-06-06 19:07:12 +03:00
David Green
56ebe64ce6 [AArch64] Enable aggressivelyPreferBuildVectorSources (#142729)
This helps to remove some inefficient buildvector lowering by converting
extract_vector_elt(buildvector) to the original source.
2025-06-06 17:03:10 +01:00
Eli Friedman
609023213d [clang] Check constexpr int->enum conversions consistently. (#143034)
In 8de51375f1 and related patches, we
added some code to avoid triggering -Wenum-constexpr-conversion in some
cases. This isn't necessary anymore because -Wenum-constexpr-conversion
doesn't exist anymore. And the checks are subtly wrong: they exclude
cases where we actually do need to check the conversion. This patch gets
rid of the unnecessary checks.
2025-06-06 08:57:11 -07:00
Nikita Popov
cef5a3155b [PhaseOrdering] Add test for #139050 (NFC) 2025-06-06 17:50:59 +02:00
Pranav Bhandarkar
8395912895 [Flang] - Handle BoxCharType in fir.box_offset op (#141713)
To map `fir.boxchar` types reliably onto an offload target, such as a
GPU, the `omp.map.info` operation is used to map the underlying data
pointer (`fir.ref<fir.char<k, ?>>`) wrapped by the `fir.boxchar` MLIR
value. The `omp.map.info` operation needs a pointer to the underlying
data pointer.
Given a reference to a descriptor (`fir.box`), the `fir.box_offset` is
used to obtain the address of the underlying data pointer. This PR
extends `fir.box_offset` to provide the same functionality for
`fir.boxchar` as well.
2025-06-06 10:48:07 -05:00
Simon Pilgrim
399865cbf0 [X86] combineConcatVectorOps - concat per-lane v2f64/v4f64 shuffles into vXf64 vshufpd (#143017)
We can always concatenate v2f64/v4f64 per-lane shuffles into a single vshufpd instruction, assuming we can profitably concatenate at least one of its operands (or its an unary shuffle).

I was really hoping to get this into combineX86ShufflesRecursively but it still can't handle concatenation/length changing as well as combineConcatVectorOps.
2025-06-06 16:41:40 +01:00
Luke Lau
a029ece7b0 [RISCV] Fix coalescing vsetvlis when AVL and vl registers are the same (#141941)
With EVL tail folding we can end up with vsetvlis where the output vl
and the input AVL are the same register. When we try to coalesce it we
crashed because we tried to move the def's live interval before the
kill's live interval, e.g. in this example:

    (vn0 def)
dead $x0 = PseudoVSETIVLI 1, 192, implicit-def $vl, implicit-def $vtype
    renamable $v9 = COPY killed renamable $v8
(vn1 def) %23:gprnox0 = PseudoVSETVLI killed (vn0) %23:gprnox0, 197,
implicit-def $vl, implicit-def $vtype

We would try to move the vn1 def VNInfo up to the previous VSETVLI, in
the middle of vn0's segment.

However separately, we were also assuming that the vl would only have
one definition and thus were just taking the VNInfo from beginIndex(),
so we ended up with a backwards segment and got the error "Cannot create
empty or backwards segment".

This fixes these two issues, the first one by moving the AVL operand +
live interval up first, and the second by taking the VNInfo from
NextMI's slot index.

Fixes #141907
2025-06-06 17:34:27 +02:00
Callum Fare
835497a4dc [Offload] Make olMemcpy src parameter const (#143161) 2025-06-06 10:25:00 -05:00
lntue
891a0abfc2 [libc] Correct x86_64 architecture for string(s) tests. (#143150) 2025-06-06 11:18:55 -04:00
Rahul Joshi
306148b541 [NFC][Clang] Adopt simplified getTrailingObjects in ExprCXX (#143125) 2025-06-06 07:51:06 -07:00
Jay Foad
f57a1e973a [TableGen] Fix variable name in CodeGenRegBank::computeComposites 2025-06-06 15:42:37 +01:00
David Spickett
974ee967ad [lldb][test] Add more context for frame format test
This test is unsupported due to problems I assume with debug info,
but even if we solve that, the formatting elements aren't working
properly.

https://github.com/llvm/llvm-project/issues/143149
2025-06-06 14:42:12 +00:00
LLVM GN Syncbot
612d485bc3 [gn build] Port 0f38c54c6f 2025-06-06 14:32:21 +00:00
Haojian Wu
b6364ab955 [clang] Reduce TemplateDeclInstantiator size. (#142983)
This gives us another ~1.85% improvement (1617->1647 for the
`instantiation-depth-default.cpp`) on clang's template instantiation
depths,

No performance regressions have been observed:
https://llvm-compile-time-tracker.com/compare.php?from=702e228249906d43687952d9a2f3d2f90d8024c6&to=61be4bfea92d52cfc3e48a3cabb1bc80cbebb7fa&stat=instructions:u
2025-06-06 16:25:36 +02:00
Matt Arsenault
3846d84269 Hexagon: Move RuntimeLibcall setting out of TargetLowering (#142543)
RuntimeLibcalls needs to be correct in non-codegen contexts, so
should not be configured in TargetLowering. Hexagon has this exotic,
overly general sounding fast math flag which appear to be untested. I've
renamed and moved it but this should probably be deleted and move to a
combine based on fast math flags.
2025-06-06 23:15:59 +09:00