Commit Graph

536811 Commits

Author SHA1 Message Date
Louis Dionne
45d493b680 [libc++] Add the __is_replaceable type trait (#132408)
That type trait represents whether move-assigning an object is
equivalent to destroying it and then move-constructing a new one from
the same argument. This will be useful in a few places where we may want
to destroy + construct instead of doing an assignment, in particular
when implementing some container operations in terms of relocation.

This is effectively adding a library emulation of P2786R12's
is_replaceable trait, similarly to what we do for trivial relocation.
Eventually, we can replace this library emulation by the real
compiler-backed trait.

This is building towards #129328.
2025-05-08 16:35:00 -04:00
Florian Hahn
c82e2f5c9e [VPlan] Move VPPhiAccessors definition. (NFC)
Move up definition to allow re-use by additional recipes.
2025-05-08 21:22:42 +01:00
Charitha Saumya
e7dcf1b7e5 [mlir][xegpu] Add SIMT distribution patterns for UpdateNdOffset and PrefetchNd ops. (#138033)
This PR adds support for SIMT distribution of UpdateNdOffset and
PrefetchNd ops.

For both these ops distribution will remove the layout attribute from
the tensor descriptor type. Everything else remains unchanged.

Example 1:

 ```
   #lo0 = #xegpu.layout<wi_layout = [1, 8], wi_data = [1, 1]>
   gpu.warp_execute_on_lane_0(%laneid) -> () {
     ...
     xegpu.prefetch_nd %arg0 : !xegpu.tensor_desc<4x8xf32, #lo0>
   }
 ```
 To
 ```
   %r:2 = gpu.warp_execute_on_lane_0(%laneid) -> (
   !xegpu.tensor_desc<4x8xf32, #lo0>) {
     gpu.yield %arg0: !xegpu.tensor_desc<4x8xf32, #lo0>
   }
   %1 = unrealized_conversion_cast %r#0: !xegpu.tensor_desc<4x8xf32,
     #lo0> -> !xegpu.tensor_desc<4x8xf32>
   xegpu.prefetch_nd %0 : !xegpu.tensor_desc<4x8xf32>

 ```
Example 2:
 ```
   #lo0 = #xegpu.layout<wi_layout = [1, 8], wi_data = [1, 1]>
   %r = gpu.warp_execute_on_lane_0(%laneid) ->
                   (!xegpu.tensor_desc<4x8xf32, #lo0>) {
     ...
     %update = xegpu.update_nd_offset %arg0, [%c32, %c16]:
       !xegpu.tensor_desc<4x8xf32, #lo0>
     gpu.yield %update
   }
   ...
 ```
 To
 ```
   %r:2 = gpu.warp_execute_on_lane_0(%laneid) -> (vector<4x1xf32>,
   !xegpu.tensor_desc<4x8xf32, #lo0>) {
     ...
     %dead = xegpu.update_nd_offset %arg0, [%c32, %c16]:
       !xegpu.tensor_desc<4x8xf32, #lo0> gpu.yield %dead, %arg0
     gup.yield %dead, %arg0, %c32, %c16
   }
%0 = xegpu.unrealized_conversion_cast %r#1: !xegpu.tensor_desc<4x8xf32,
        #lo0> -> !xegpu.tensor_desc<4x8xf32>
   %1 = xegpu.update_nd_offset %0, [%c32, %c16]:
     !xegpu.tensor_desc<4x8xf32>
   ...
 ```
2025-05-08 13:17:38 -07:00
Felipe de Azevedo Piovezan
28156539a9 [lldb] Disable test using GetControlFlowKind on arm 2025-05-08 13:14:40 -07:00
Asher Mancinelli
02f61ab46b [flang] Use box for components with non-default lower bounds (#138994)
When designating an array component that has non-default lower bounds
the bridge was producing hlfir designates yielding reference types,
which did not preserve the bounds information. Then, when creating
components, unadjusted indices were used when initializing the
structure.

We could look at the declaration to get the shape parameter, but this
would not be preserved if the component were passed as a block argument.
These results must be boxed, but we also must not lose the contiguity
information either. To address contiguity, annotate these boxes with the
`contiguous` attribute during designation.

Note that other designated entities are handled inside the
HlfirDesignatorBuilder while component designators are built in
HlfirBuilder. I am not sure if this handling should be moved into the
designator builder or left in the general builder, so feedback is
welcome.

Also, I wouldn't mind finding a test that demonstrates a box-designated
component with the contiguous attribute really is determined to be
contiguous by any passes down the line checking for that. I don't have a
test like that yet.
2025-05-08 13:08:08 -07:00
Florian Hahn
d06d43a9e8 [VPlan] Add printPhiOperands to VPPhiAccessors, use for wide phis.
(NFC modulo debug output changes)

Add generic helper to print phi operands (incoming values) together with
their incoming blocks.

As more and more transforms are added, keeping the incoming blocks of
phis becomes more important. Print incoming blocks via VPPhiAcessors, to
make debugging easier.
2025-05-08 20:56:48 +01:00
Ralender
a861f50030 [WinEH] Fix asm in catchpad being turned into unreachable (#138392) 2025-05-08 21:46:51 +02:00
Kareem Ergawy
227e1ff73b [flang][fir] Add locality specifiers modeling to fir.do_concurrent.loop (#138506) 2025-05-08 21:42:52 +02:00
LLVM GN Syncbot
88e68872fd [gn build] Port 515b4a4fdd 2025-05-08 19:31:27 +00:00
Ian Anderson
515b4a4fdd [clang][Darwin] Remove legacy framework search path logic in the frontend (#138234)
Move the Darwin framework search path logic from
InitHeaderSearch::AddDefaultIncludePaths to
DarwinClang::AddClangSystemIncludeArgs. Add a new -internal-iframework
cc1 argument to support the tool chain adding these paths.
Now that the tool chain is adding search paths via cc1 flag, they're
only added if they exist, so the Preprocessor/cuda-macos-includes.cu
test is no longer relevant.
Change Driver/driverkit-path.c and Driver/darwin-subframeworks.c to do
-### style testing similar to the darwin-header-search and
darwin-embedded-search-paths tests. Rename darwin-subframeworks.c to
darwin-framework-search-paths.c and have it test all framework search
paths, not just SubFrameworks.
Add a unit test to validate that the myriad of search path flags result
in the expected search path list.

Fixes https://github.com/llvm/llvm-project/issues/75638
2025-05-08 12:30:51 -07:00
Aleksandar Zecevic
d7987f1ce9 [mlir][memref] Fix typo in BuiltinAttributeInterfaces description (#136774) 2025-05-08 13:05:01 -06:00
Teresa Johnson
8a7b5012c2 [MemProf] Fix summary bitcode record description (NFC) (#139127)
Commit 776476c282 (PR117404), which
introduced the radix tree representation of allocation context summary
records, incorrectly changed the description of the
FS_COMBINED_CALLSITE_INFO record instead of the intended
FS_COMBINED_ALLOC_INFO record.
2025-05-08 11:52:26 -07:00
Guy David
ae6e127623 [AArch64] Merge scaled and unscaled narrow zero stores (#136705) 2025-05-08 21:34:52 +03:00
Philip Reames
21130d3f06 [RISCV] One last migration to getInsertSubvector [nfc] 2025-05-08 11:26:42 -07:00
Kareem Ergawy
5fe69fd95c [flang][OpenMP] Update do concurrent mapping pass to use fir.do_concurrent op (#138489)
This PR updates the `do concurrent` to OpenMP mapping pass to use the
newly added `fir.do_concurrent` ops that were recently added upstream
instead of handling nests of `fir.do_loop ... unordered` ops.

Parent PR: https://github.com/llvm/llvm-project/pull/137928.
2025-05-08 20:22:29 +02:00
Bruno Cardoso Lopes
7f98e5a5ea [MLIR][LLVM] Fix llvm.mlir.global mismatching print and parser order (#138986)
`GlobalOp` was parsing `thread_local` after `unnamed_addr`, but printing in the reverse order.

While here, make `AliasOp` match the same behavior and share common parts of global and alias printing.
2025-05-08 11:17:18 -07:00
David Sankel
652ab98008 [lld][NFC] Fix minor typo in docs (#138898) 2025-05-08 19:12:58 +01:00
Philip Reames
54bb2295c3 [RISCV] Migrate getConstant indexed insert/extract subvector to new API (#139111)
Note that this change is possibly not NFC. The prior routines used
getConstant with XLenVT. The new wrappers will used getVectorIdxConstant
instead. Digging through the code, the type used for the index will be
the integer of pointer width from DL. For typical RV32 and RV64
configurations the pointer will be of equal width to XLEN, but you could
have a 32b pointer on an RV64 machine.
2025-05-08 11:11:55 -07:00
Matt Arsenault
8c61befff8 GlobalISel: Translate minimumnum and maximumnum (#139106) 2025-05-08 20:03:34 +02:00
Teresa Johnson
c526683c7f [MemProf] Simplify unittest save and restore of options (#139117)
Address post-commit review feedback for PR139092 (and fix another
instance of the same code). Save and restore option values via a saved
bool value, instead of invoking cl::ResetAllOptionOccurrences.
2025-05-08 10:57:20 -07:00
Maksim Panchenko
254c13d872 [BOLT][AArch64] Patch functions targeted by optional relocs (#138750)
On AArch64, we create optional/weak relocations that may not be
processed due to the relocated value overflow. When the overflow
happens, we used to enforce patching for all functions in the binary via
--force-patch option. This PR relaxes the requirement, and enforces
patching only for functions that are target of optional relocations.
Moreover, if the compact code model is used, the relocation overflow is
guaranteed not to happen and the patching will be skipped.
2025-05-08 10:53:47 -07:00
Lei Wang
b836f96b8f [Coverage] Support -fprofile-list for cold function coverage (#136333)
Add a new instrumentation section type `[sample-coldcov]` to
support`-fprofile-list` for sample pgo based cold function coverage.
Note that the current cold function coverage is based on sampling PGO
pipeline, which is incompatible with the existing [llvm] option(see
[PGOOptions](https://github.com/llvm/llvm-project/blob/main/llvm/include/llvm/Support/PGOOptions.h#L27-L43)),
so we can't reuse the IR-PGO(-fprofile-instrument=llvm) flag.
2025-05-08 10:51:38 -07:00
Jacques Pienaar
a7b5c303dc Remove unused forward decl (#139108) 2025-05-08 10:43:39 -07:00
Ivan Kosarev
71f8f2b155 [AMDGPU][NFC] Get rid of OPW constants. (#139074)
We can infer the widths from register classes and represent them as
numbers.
2025-05-08 18:42:07 +01:00
Amr Hesham
7feba5febf [CIR] Upstream extract op for VectorType (#138413)
This change adds extract op for VectorType

Issue https://github.com/llvm/llvm-project/issues/136487
2025-05-08 19:39:28 +02:00
Charitha Saumya
7a66746226 [mlir][xegpu] Handle scalar uniform ops in SIMT distribution. (#138593)
This PR adds support for moving scalar uniform (gpu index ops, constants
etc) outside the `gpu.warp_execute_on_lane0` op. These kinds of ops do
not require distribution and are safe to move out of the warp op. This
also avoid adding separate distribution patterns for these ops.

Example:
```
   %1 = gpu.warp_execute_on_lane_0(%laneid) -> (index) {
     ...
     %block_id_x = gpu.block_id x
     gpu.yield %block_id_x
   }
  // use %1
```
To:
```
   %block_id_x = gpu.block_id x
   %1 = gpu.warp_execute_on_lane_0(%laneid) -> (index) {
     ...
     
     gpu.yield %block_id_x
   }
  // use %1

```
2025-05-08 10:35:32 -07:00
Chinmay Deshpande
3a5af231fd [GlobalISel][AMDGPU] Fix handling of v2i128 type for AND, OR, XOR (#138574)
Current behavior crashes the compiler.

This bug was found using the AMDGPU Fuzzing project.

Fixes SWDEV-508816.
2025-05-08 19:31:28 +02:00
Brox Chen
9d907a2bb1 AMDGPU][True16][CodeGen] FP_Round f64 to f16 in true16 (#128911)
Update the f64 to f16 lowering for targets which support f16 types. 

For unsafe mode, lowered to two FP_ROUND. (This patch
https://reviews.llvm.org/D154528 stops from combining these two FP_ROUND
back). In safe mode, select LowerF64ToF16 (round-to-nearest-even
rounding mode)
2025-05-08 13:30:09 -04:00
cor3ntin
09c80e2944 Reland [Clang] Deprecate __is_trivially_relocatable (#139061)
The C++26 standard relocatable type traits has slightly different
semantics, so we introduced a new
``__builtin_is_cpp_trivially_relocatable``
when implementing trivial relocation in #127636.

However, having multiple relocatable traits would be confusing
in the long run, so we deprecate the old trait.

As discussed in #127636

`__builtin_is_cpp_trivially_relocatable` should be used instead.
2025-05-08 19:25:50 +02:00
Ashley Coleman
0beb2f56f6 [HLSL][NFC] Stricter Overload Tests (clamp,max,min,pow) (#138993)
Partial implementation of #138016 to unblock other ongoing work. NFC
2025-05-08 11:20:17 -06:00
Zhuoran Yin
53e8ff13bd [MLIR] Fixing the memref linearization size computation for non-packed memref (#138922)
Credit to @krzysz00 who discovered this subtle bug in `MemRefUtils`. The
problem is in `getLinearizedMemRefOffsetAndSize()` utility. In
particular, how this subroutine computes the linearized size of a memref
is incorrect when given a non-packed memref.

### Background

As context, in a packed memref of `memref<8x8xf32>`, we'd compute the
size by multiplying the size of dimensions together. This is implemented
by composing an affine_map of `affine_map<()[s0, s1] -> (s0 * s1)>` and
then computing the result of size via `%size = affine.apply #map()[%c8,
%c8]`.

However, this is wrong for a non-packed memref of `memref<8x8xf32,
strided<[1024, 1]>>`. Since the previous computed multiplication map
will only consider the dimension sizes, it'd continue to conclude that
the size of the non-packed memref to be 64.

### Solution

This PR come up with a fix such that the linearized size computation
take strides into consideration. It computes the maximum of (dim size *
dim stride) for each dimension. We'd compute the size via the affine_map
of `affine_map<()[stride0, size0, stride1] -> ((stride0 * size0), 1 *
size1)>` and then computing the size via `%size = affine.max
#map()[%stride0, %size0, %size1]`. In particular for the new non-packed
memref, the size will be derived as max(1024\*8, 1\*8) = 8192 (rather
than the wrong size 64 computed by packed memref equation).
2025-05-08 13:14:32 -04:00
Jason Eckhardt
9692dff7b7 [TableGen][NFC] Use early exit to simplify large block in emitAction. (#138220)
Most of the processing in emitAction is in an unneeded else-block--
reduce indentation by exiting after the recursive call.

`XXXGenCallingConv.inc` are identical before and after this patch for
all targets.
2025-05-08 12:12:15 -05:00
Florian Hahn
339dc9500b [VPlan] Retain exit conditions and edges in initial VPlan (NFC). (#137709)
Update initial VPlan construction to include exit conditions and edges.

The loop region is now first constructed without entry/exiting. Those
are set after inserting the region in the CFG, to preserve the original
predecessor/successor order of blocks.

For now, all early exits are disconnected before forming the regions,
but a follow-up will update uncountable exit handling to also happen
here. This is required to enable VPlan predication and remove the
dependence any IR BBs
(https://github.com/llvm/llvm-project/pull/128420).

PR: https://github.com/llvm/llvm-project/pull/137709
2025-05-08 18:10:52 +01:00
Min-Yih Hsu
81786b9185 [RISCV][NFC] Remove unused variable
Remove unused variable in RISCVTargetLowering
2025-05-08 10:09:04 -07:00
Helena Kotas
3bc3b1c6c0 [HLSL][NFC] Rename isImplicit() to hasRegisterStot() on HLSLResourceBindingAttr (#138964)
Renaming because the name `isImplicit` is ambiguous. It can mean
implicit attribute or implicit binding.
2025-05-08 10:03:21 -07:00
Philip Reames
a2b28a6812 [DAG/RISCV] Continue mitgrating to getInsertSubvector and getExtractSubvector
Follow up to 6e654caab and cf2f5585.  I'd apparently missed two cases.
2025-05-08 09:59:24 -07:00
Vitaly Buka
d1da41bf4d [ubsan_minimal] Add __ubsan_report_error_fatal (#138999)
Override may need to know if sanitizer in recover mode.
2025-05-08 09:58:48 -07:00
Tom Tromey
b0bf48d44e Two DWARF variant part improvements (#138953)
This patch adds a couple of improvements to the LLVM emission of DWARF
variant parts. One of these is desirable for Ada, and the other is
required.

Currently, when emitting a discriminant, LLVM follows the precise letter
of the DWARF standard, which says:

    If the variant part has a discriminant, the discriminant is
    represented by a separate debugging information entry which is a
    child of the variant part entry.

However, for Ada this does not really make sense. In Ada, the
discriminant field exists outside of any variant part, and it makes more
sense to emit it separately rather than redundantly emit the field once
for each variant part.

This extension was arrived at when this was implemented in GCC, and was
accepted for DWARF 6, see:

    https://dwarfstd.org/issues/180123.1.html

Here the patch simply lifts this restriction: if the discriminant field
was already emitted, it isn't re-emitted. This approach allows the Ada
compiler to do what it needs without affecting the Rust output.

Second, this patch extends the discriminant to allow multiple values.
This is needed by Ada. Here, I chose to use a ConstantDataArray of pairs
of integers, with each pair representing a range, as Ada also allows
ranges here. This seemed like a reasonably convenient representation.
2025-05-08 09:41:15 -07:00
Philip Reames
cf2f558501 [DAG/RISCV] Continue mitgrating to getInsertSubvector and getExtractSubvector
Follow up to 6e654caab, use the new routines in more places.  Note that
I've excluded from this patch any case which uses a getConstant index
instead of a getVectorIdxConstant index just to minimize room for
error.  I'll get those in a separate follow up.
2025-05-08 09:40:45 -07:00
Brox Chen
7f633b583e [AMDGPU][True16][MC] add true16 mode on a few disasm tests (#139094)
This is a NFC patch.

applied "+real-true16" on a few disasm test and run update script
2025-05-08 12:34:10 -04:00
David Green
e9702ce18a [AArch64] Add some tests for icmp eq chains of loads. NFC 2025-05-08 17:31:39 +01:00
Min-Yih Hsu
808a5f15d7 [RISCV] Removeriscv.segN.load/store in favor of their mask variants (#137045)
RISCVVectorPeepholePass would replace instructions with all-ones mask
with their unmask variant, so there isn't really a point to keep
separate versions of intrinsics.

Note that `riscv.segN.load/store.mask` does not take pointer type (i.e.
address space) as part of its overloading type signature, because RISC-V
doesn't really use address spaces other than the default one.
2025-05-08 09:27:26 -07:00
Deric C.
7c366b041c [DirectX] Implement llvm.is.fpclass lowering for the fcNegZero FPClassTest and the IsNaN, IsInf, IsFinite, IsNormal DXIL ops (#138048)
Fixes #137209

This PR:
- Adds a case to `expandIntrinsic()` in `DXILIntrinsicExpansion.cpp` to
expand the `Intrinsic::is_fpclass` in the case of
`FPClassTest::fcNegZero`
- Defines the `IsNaN`, `IsFinite`, `IsNormal` DXIL ops in `DXIL.td`
- Adds a case to `lowerIntrinsics()` in `DXILOpLowering.cpp` to handle
the lowering of `Intrinsic::is_fpclass` to the DXIL ops `IsNaN`,
`IsInf`, `IsFinite`, `IsNormal` when the FPClassTest is `fcNan`,
`fcInf`, `fcFinite`, and `fcNormal` respectively
- Creates a test `llvm/test/CodeGen/DirectX/is_fpclass.ll` to exercise
the intrinsic expansion and DXIL op lowering of `Intrinsic::is_fpclass`

~~A separate PR will be made to remove the now-redundant `dx_isinf`
intrinsic to address #87777.~~

A proper implementation for the lowering of the `llvm.is.fpclass`
intrinsic to handle all possible combinations of FPClassTest can be
implemented in a separate PR. This PR's implementation focuses primarily
on addressing the current use-cases for DirectML and HLSL intrinsics.
2025-05-08 09:13:26 -07:00
Jonas Devlieghere
45cd708184 [lldb] Change the statusline format to print "no target" (#139021)
Change the default statusline format to print "no target" when lldb is
launched without a target. Currently, the statusline is empty, which
looks rather odd.
2025-05-08 09:09:46 -07:00
Prabhu Rajasekaran
5c6cbe2517 [clang] UEFI default ABI (#138364)
Set MS ABI as default ABI for UEFI.
2025-05-08 09:08:46 -07:00
Volodymyr Sapsai
64bb60a471 [Modules] Don't fail when an unused textual header is missing. (#138227)
According to the documentation
> A header declaration that does not contain `exclude` nor `textual`
specifies a header that contributes to the enclosing module.

Which means that `exclude` and `textual` header don't contribute to the
enclosing module and their presence isn't required to build such a
module. The keywords tell clang how a header should be treated in a
context of the module but they don't add headers to the module.

When a textual header *is* used, clang still emits "file not found"
error pointing to the location where the missing file is included.
2025-05-08 09:07:33 -07:00
Justin Fargnoli
5b7ccdc2a2 [LLVM][Maintainers] Step down as an NVPTX maintainer (#138936) 2025-05-08 09:05:52 -07:00
Lewis Crawford
9c88b6d689 [ConstantFolding] Fold maximumnum and minimumnum (#138700)
Add constant-folding support for the maximumnum and minimumnum
intrinsics, and extend the tests to show the qnan vs snan behavior
differences between maxnum/maximum/maximumnum.
2025-05-08 18:00:49 +02:00
Vivian Zhang
37fecfaa63 [mlir] Support rank-reduced extract_slice in ExtractSliceOfPadTensorSwapPattern (#138921)
This PR fixes `ExtractSliceOfPadTensorSwapPattern` to support
rank-reducing `tensor.extract_slice` ops, which were previously
unhandled and could cause crashes. To support this, an additional
`tensor.extract_slice` is inserted after `tensor.pad` to reduce the
result rank.
2025-05-08 08:51:48 -07:00
Marina Taylor
f2bc7b75dd [AArch64] Allow the clang.arc.attachedcall marker to be optional (#138694)
Now that the clang.arc.attachedcall bundle requires having an operand,
which we emit a call to in the RVMARKER sequence, we can achieve our
real goal: make the marker NOP optional.

The intention is that a new ObjC runtime call will be introduced, which
doesn't require the NOP to be present, but must be adjacent to the
possibly-autorelease-returning call (that the bundle is attached to).

This is achieved by having ISel embed whether the marker is necessary
with an additional boolean target immediate operand.

Co-authored-by: Ahmed Bougacha <ahmed@bougacha.org>
2025-05-08 16:49:31 +01:00