clang-p2996

Author	SHA1	Message	Date
Louis Dionne	45d493b680	[libc++] Add the __is_replaceable type trait (#132408 ) That type trait represents whether move-assigning an object is equivalent to destroying it and then move-constructing a new one from the same argument. This will be useful in a few places where we may want to destroy + construct instead of doing an assignment, in particular when implementing some container operations in terms of relocation. This is effectively adding a library emulation of P2786R12's is_replaceable trait, similarly to what we do for trivial relocation. Eventually, we can replace this library emulation by the real compiler-backed trait. This is building towards #129328.	2025-05-08 16:35:00 -04:00
Florian Hahn	c82e2f5c9e	[VPlan] Move VPPhiAccessors definition. (NFC) Move up definition to allow re-use by additional recipes.	2025-05-08 21:22:42 +01:00
Charitha Saumya	e7dcf1b7e5	[mlir][xegpu] Add SIMT distribution patterns for UpdateNdOffset and PrefetchNd ops. (#138033 ) This PR adds support for SIMT distribution of UpdateNdOffset and PrefetchNd ops. For both these ops distribution will remove the layout attribute from the tensor descriptor type. Everything else remains unchanged. Example 1: ``` #lo0 = #xegpu.layout<wi_layout = [1, 8], wi_data = [1, 1]> gpu.warp_execute_on_lane_0(%laneid) -> () { ... xegpu.prefetch_nd %arg0 : !xegpu.tensor_desc<4x8xf32, #lo0> } ``` To ``` %r:2 = gpu.warp_execute_on_lane_0(%laneid) -> ( !xegpu.tensor_desc<4x8xf32, #lo0>) { gpu.yield %arg0: !xegpu.tensor_desc<4x8xf32, #lo0> } %1 = unrealized_conversion_cast %r#0: !xegpu.tensor_desc<4x8xf32, #lo0> -> !xegpu.tensor_desc<4x8xf32> xegpu.prefetch_nd %0 : !xegpu.tensor_desc<4x8xf32> ``` Example 2: ``` #lo0 = #xegpu.layout<wi_layout = [1, 8], wi_data = [1, 1]> %r = gpu.warp_execute_on_lane_0(%laneid) -> (!xegpu.tensor_desc<4x8xf32, #lo0>) { ... %update = xegpu.update_nd_offset %arg0, [%c32, %c16]: !xegpu.tensor_desc<4x8xf32, #lo0> gpu.yield %update } ... ``` To ``` %r:2 = gpu.warp_execute_on_lane_0(%laneid) -> (vector<4x1xf32>, !xegpu.tensor_desc<4x8xf32, #lo0>) { ... %dead = xegpu.update_nd_offset %arg0, [%c32, %c16]: !xegpu.tensor_desc<4x8xf32, #lo0> gpu.yield %dead, %arg0 gup.yield %dead, %arg0, %c32, %c16 } %0 = xegpu.unrealized_conversion_cast %r#1: !xegpu.tensor_desc<4x8xf32, #lo0> -> !xegpu.tensor_desc<4x8xf32> %1 = xegpu.update_nd_offset %0, [%c32, %c16]: !xegpu.tensor_desc<4x8xf32> ... ```	2025-05-08 13:17:38 -07:00
Felipe de Azevedo Piovezan	28156539a9	[lldb] Disable test using GetControlFlowKind on arm	2025-05-08 13:14:40 -07:00
Asher Mancinelli	02f61ab46b	[flang] Use box for components with non-default lower bounds (#138994 ) When designating an array component that has non-default lower bounds the bridge was producing hlfir designates yielding reference types, which did not preserve the bounds information. Then, when creating components, unadjusted indices were used when initializing the structure. We could look at the declaration to get the shape parameter, but this would not be preserved if the component were passed as a block argument. These results must be boxed, but we also must not lose the contiguity information either. To address contiguity, annotate these boxes with the `contiguous` attribute during designation. Note that other designated entities are handled inside the HlfirDesignatorBuilder while component designators are built in HlfirBuilder. I am not sure if this handling should be moved into the designator builder or left in the general builder, so feedback is welcome. Also, I wouldn't mind finding a test that demonstrates a box-designated component with the contiguous attribute really is determined to be contiguous by any passes down the line checking for that. I don't have a test like that yet.	2025-05-08 13:08:08 -07:00
Florian Hahn	d06d43a9e8	[VPlan] Add printPhiOperands to VPPhiAccessors, use for wide phis. (NFC modulo debug output changes) Add generic helper to print phi operands (incoming values) together with their incoming blocks. As more and more transforms are added, keeping the incoming blocks of phis becomes more important. Print incoming blocks via VPPhiAcessors, to make debugging easier.	2025-05-08 20:56:48 +01:00
Ralender	a861f50030	[WinEH] Fix asm in catchpad being turned into unreachable (#138392 )	2025-05-08 21:46:51 +02:00
Kareem Ergawy	227e1ff73b	[flang][fir] Add locality specifiers modeling to `fir.do_concurrent.loop` (#138506 )	2025-05-08 21:42:52 +02:00
LLVM GN Syncbot	88e68872fd	[gn build] Port `515b4a4fdd`	2025-05-08 19:31:27 +00:00
Ian Anderson	515b4a4fdd	[clang][Darwin] Remove legacy framework search path logic in the frontend (#138234 ) Move the Darwin framework search path logic from InitHeaderSearch::AddDefaultIncludePaths to DarwinClang::AddClangSystemIncludeArgs. Add a new -internal-iframework cc1 argument to support the tool chain adding these paths. Now that the tool chain is adding search paths via cc1 flag, they're only added if they exist, so the Preprocessor/cuda-macos-includes.cu test is no longer relevant. Change Driver/driverkit-path.c and Driver/darwin-subframeworks.c to do -### style testing similar to the darwin-header-search and darwin-embedded-search-paths tests. Rename darwin-subframeworks.c to darwin-framework-search-paths.c and have it test all framework search paths, not just SubFrameworks. Add a unit test to validate that the myriad of search path flags result in the expected search path list. Fixes https://github.com/llvm/llvm-project/issues/75638	2025-05-08 12:30:51 -07:00
Aleksandar Zecevic	d7987f1ce9	[mlir][memref] Fix typo in `BuiltinAttributeInterfaces` description (#136774 )	2025-05-08 13:05:01 -06:00
Teresa Johnson	8a7b5012c2	[MemProf] Fix summary bitcode record description (NFC) (#139127 ) Commit `776476c282` (PR117404), which introduced the radix tree representation of allocation context summary records, incorrectly changed the description of the FS_COMBINED_CALLSITE_INFO record instead of the intended FS_COMBINED_ALLOC_INFO record.	2025-05-08 11:52:26 -07:00
Guy David	ae6e127623	[AArch64] Merge scaled and unscaled narrow zero stores (#136705 )	2025-05-08 21:34:52 +03:00
Philip Reames	21130d3f06	[RISCV] One last migration to getInsertSubvector [nfc]	2025-05-08 11:26:42 -07:00
Kareem Ergawy	5fe69fd95c	[flang][OpenMP] Update `do concurrent` mapping pass to use `fir.do_concurrent` op (#138489 ) This PR updates the `do concurrent` to OpenMP mapping pass to use the newly added `fir.do_concurrent` ops that were recently added upstream instead of handling nests of `fir.do_loop ... unordered` ops. Parent PR: https://github.com/llvm/llvm-project/pull/137928.	2025-05-08 20:22:29 +02:00
Bruno Cardoso Lopes	7f98e5a5ea	[MLIR][LLVM] Fix llvm.mlir.global mismatching print and parser order (#138986 ) `GlobalOp` was parsing `thread_local` after `unnamed_addr`, but printing in the reverse order. While here, make `AliasOp` match the same behavior and share common parts of global and alias printing.	2025-05-08 11:17:18 -07:00
David Sankel	652ab98008	[lld][NFC] Fix minor typo in docs (#138898 )	2025-05-08 19:12:58 +01:00
Philip Reames	54bb2295c3	[RISCV] Migrate getConstant indexed insert/extract subvector to new API (#139111 ) Note that this change is possibly not NFC. The prior routines used getConstant with XLenVT. The new wrappers will used getVectorIdxConstant instead. Digging through the code, the type used for the index will be the integer of pointer width from DL. For typical RV32 and RV64 configurations the pointer will be of equal width to XLEN, but you could have a 32b pointer on an RV64 machine.	2025-05-08 11:11:55 -07:00
Matt Arsenault	8c61befff8	GlobalISel: Translate minimumnum and maximumnum (#139106 )	2025-05-08 20:03:34 +02:00
Teresa Johnson	c526683c7f	[MemProf] Simplify unittest save and restore of options (#139117 ) Address post-commit review feedback for PR139092 (and fix another instance of the same code). Save and restore option values via a saved bool value, instead of invoking cl::ResetAllOptionOccurrences.	2025-05-08 10:57:20 -07:00
Maksim Panchenko	254c13d872	[BOLT][AArch64] Patch functions targeted by optional relocs (#138750 ) On AArch64, we create optional/weak relocations that may not be processed due to the relocated value overflow. When the overflow happens, we used to enforce patching for all functions in the binary via --force-patch option. This PR relaxes the requirement, and enforces patching only for functions that are target of optional relocations. Moreover, if the compact code model is used, the relocation overflow is guaranteed not to happen and the patching will be skipped.	2025-05-08 10:53:47 -07:00
Lei Wang	b836f96b8f	[Coverage] Support -fprofile-list for cold function coverage (#136333 ) Add a new instrumentation section type `[sample-coldcov]` to support`-fprofile-list` for sample pgo based cold function coverage. Note that the current cold function coverage is based on sampling PGO pipeline, which is incompatible with the existing [llvm] option(see [PGOOptions](https://github.com/llvm/llvm-project/blob/main/llvm/include/llvm/Support/PGOOptions.h#L27-L43)), so we can't reuse the IR-PGO(-fprofile-instrument=llvm) flag.	2025-05-08 10:51:38 -07:00
Jacques Pienaar	a7b5c303dc	Remove unused forward decl (#139108 )	2025-05-08 10:43:39 -07:00
Ivan Kosarev	71f8f2b155	[AMDGPU][NFC] Get rid of OPW constants. (#139074 ) We can infer the widths from register classes and represent them as numbers.	2025-05-08 18:42:07 +01:00
Amr Hesham	7feba5febf	[CIR] Upstream extract op for VectorType (#138413 ) This change adds extract op for VectorType Issue https://github.com/llvm/llvm-project/issues/136487	2025-05-08 19:39:28 +02:00
Charitha Saumya	7a66746226	[mlir][xegpu] Handle scalar uniform ops in SIMT distribution. (#138593 ) This PR adds support for moving scalar uniform (gpu index ops, constants etc) outside the `gpu.warp_execute_on_lane0` op. These kinds of ops do not require distribution and are safe to move out of the warp op. This also avoid adding separate distribution patterns for these ops. Example: ``` %1 = gpu.warp_execute_on_lane_0(%laneid) -> (index) { ... %block_id_x = gpu.block_id x gpu.yield %block_id_x } // use %1 ``` To: ``` %block_id_x = gpu.block_id x %1 = gpu.warp_execute_on_lane_0(%laneid) -> (index) { ... gpu.yield %block_id_x } // use %1 ```	2025-05-08 10:35:32 -07:00
Chinmay Deshpande	3a5af231fd	[GlobalISel][AMDGPU] Fix handling of v2i128 type for AND, OR, XOR (#138574 ) Current behavior crashes the compiler. This bug was found using the AMDGPU Fuzzing project. Fixes SWDEV-508816.	2025-05-08 19:31:28 +02:00
Brox Chen	9d907a2bb1	AMDGPU][True16][CodeGen] FP_Round f64 to f16 in true16 (#128911 ) Update the f64 to f16 lowering for targets which support f16 types. For unsafe mode, lowered to two FP_ROUND. (This patch https://reviews.llvm.org/D154528 stops from combining these two FP_ROUND back). In safe mode, select LowerF64ToF16 (round-to-nearest-even rounding mode)	2025-05-08 13:30:09 -04:00
cor3ntin	09c80e2944	Reland [Clang] Deprecate `__is_trivially_relocatable` (#139061 ) The C++26 standard relocatable type traits has slightly different semantics, so we introduced a new ``__builtin_is_cpp_trivially_relocatable`` when implementing trivial relocation in #127636. However, having multiple relocatable traits would be confusing in the long run, so we deprecate the old trait. As discussed in #127636 `__builtin_is_cpp_trivially_relocatable` should be used instead.	2025-05-08 19:25:50 +02:00
Ashley Coleman	0beb2f56f6	[HLSL][NFC] Stricter Overload Tests (clamp,max,min,pow) (#138993 ) Partial implementation of #138016 to unblock other ongoing work. NFC	2025-05-08 11:20:17 -06:00
Zhuoran Yin	53e8ff13bd	[MLIR] Fixing the memref linearization size computation for non-packed memref (#138922 ) Credit to @krzysz00 who discovered this subtle bug in `MemRefUtils`. The problem is in `getLinearizedMemRefOffsetAndSize()` utility. In particular, how this subroutine computes the linearized size of a memref is incorrect when given a non-packed memref. ### Background As context, in a packed memref of `memref<8x8xf32>`, we'd compute the size by multiplying the size of dimensions together. This is implemented by composing an affine_map of `affine_map<()[s0, s1] -> (s0 * s1)>` and then computing the result of size via `%size = affine.apply #map()[%c8, %c8]`. However, this is wrong for a non-packed memref of `memref<8x8xf32, strided<[1024, 1]>>`. Since the previous computed multiplication map will only consider the dimension sizes, it'd continue to conclude that the size of the non-packed memref to be 64. ### Solution This PR come up with a fix such that the linearized size computation take strides into consideration. It computes the maximum of (dim size * dim stride) for each dimension. We'd compute the size via the affine_map of `affine_map<()[stride0, size0, stride1] -> ((stride0 * size0), 1 * size1)>` and then computing the size via `%size = affine.max #map()[%stride0, %size0, %size1]`. In particular for the new non-packed memref, the size will be derived as max(1024\8, 1\8) = 8192 (rather than the wrong size 64 computed by packed memref equation).	2025-05-08 13:14:32 -04:00
Jason Eckhardt	9692dff7b7	[TableGen][NFC] Use early exit to simplify large block in emitAction. (#138220 ) Most of the processing in emitAction is in an unneeded else-block-- reduce indentation by exiting after the recursive call. `XXXGenCallingConv.inc` are identical before and after this patch for all targets.	2025-05-08 12:12:15 -05:00
Florian Hahn	339dc9500b	[VPlan] Retain exit conditions and edges in initial VPlan (NFC). (#137709 ) Update initial VPlan construction to include exit conditions and edges. The loop region is now first constructed without entry/exiting. Those are set after inserting the region in the CFG, to preserve the original predecessor/successor order of blocks. For now, all early exits are disconnected before forming the regions, but a follow-up will update uncountable exit handling to also happen here. This is required to enable VPlan predication and remove the dependence any IR BBs (https://github.com/llvm/llvm-project/pull/128420). PR: https://github.com/llvm/llvm-project/pull/137709	2025-05-08 18:10:52 +01:00
Min-Yih Hsu	81786b9185	[RISCV][NFC] Remove unused variable Remove unused variable in RISCVTargetLowering	2025-05-08 10:09:04 -07:00
Helena Kotas	3bc3b1c6c0	[HLSL][NFC] Rename isImplicit() to hasRegisterStot() on HLSLResourceBindingAttr (#138964 ) Renaming because the name `isImplicit` is ambiguous. It can mean implicit attribute or implicit binding.	2025-05-08 10:03:21 -07:00
Philip Reames	a2b28a6812	[DAG/RISCV] Continue mitgrating to getInsertSubvector and getExtractSubvector Follow up to `6e654caab` and `cf2f5585`. I'd apparently missed two cases.	2025-05-08 09:59:24 -07:00
Vitaly Buka	d1da41bf4d	[ubsan_minimal] Add __ubsan_report_error_fatal (#138999 ) Override may need to know if sanitizer in recover mode.	2025-05-08 09:58:48 -07:00
Tom Tromey	b0bf48d44e	Two DWARF variant part improvements (#138953 ) This patch adds a couple of improvements to the LLVM emission of DWARF variant parts. One of these is desirable for Ada, and the other is required. Currently, when emitting a discriminant, LLVM follows the precise letter of the DWARF standard, which says: If the variant part has a discriminant, the discriminant is represented by a separate debugging information entry which is a child of the variant part entry. However, for Ada this does not really make sense. In Ada, the discriminant field exists outside of any variant part, and it makes more sense to emit it separately rather than redundantly emit the field once for each variant part. This extension was arrived at when this was implemented in GCC, and was accepted for DWARF 6, see: https://dwarfstd.org/issues/180123.1.html Here the patch simply lifts this restriction: if the discriminant field was already emitted, it isn't re-emitted. This approach allows the Ada compiler to do what it needs without affecting the Rust output. Second, this patch extends the discriminant to allow multiple values. This is needed by Ada. Here, I chose to use a ConstantDataArray of pairs of integers, with each pair representing a range, as Ada also allows ranges here. This seemed like a reasonably convenient representation.	2025-05-08 09:41:15 -07:00
Philip Reames	cf2f558501	[DAG/RISCV] Continue mitgrating to getInsertSubvector and getExtractSubvector Follow up to `6e654caab`, use the new routines in more places. Note that I've excluded from this patch any case which uses a getConstant index instead of a getVectorIdxConstant index just to minimize room for error. I'll get those in a separate follow up.	2025-05-08 09:40:45 -07:00
Brox Chen	7f633b583e	[AMDGPU][True16][MC] add true16 mode on a few disasm tests (#139094 ) This is a NFC patch. applied "+real-true16" on a few disasm test and run update script	2025-05-08 12:34:10 -04:00
David Green	e9702ce18a	[AArch64] Add some tests for icmp eq chains of loads. NFC	2025-05-08 17:31:39 +01:00
Min-Yih Hsu	808a5f15d7	[RISCV] Remove`riscv.segN.load/store` in favor of their mask variants (#137045 ) RISCVVectorPeepholePass would replace instructions with all-ones mask with their unmask variant, so there isn't really a point to keep separate versions of intrinsics. Note that `riscv.segN.load/store.mask` does not take pointer type (i.e. address space) as part of its overloading type signature, because RISC-V doesn't really use address spaces other than the default one.	2025-05-08 09:27:26 -07:00
Deric C.	7c366b041c	[DirectX] Implement `llvm.is.fpclass` lowering for the fcNegZero FPClassTest and the `IsNaN`, `IsInf`, `IsFinite`, `IsNormal` DXIL ops (#138048 ) Fixes #137209 This PR: - Adds a case to `expandIntrinsic()` in `DXILIntrinsicExpansion.cpp` to expand the `Intrinsic::is_fpclass` in the case of `FPClassTest::fcNegZero` - Defines the `IsNaN`, `IsFinite`, `IsNormal` DXIL ops in `DXIL.td` - Adds a case to `lowerIntrinsics()` in `DXILOpLowering.cpp` to handle the lowering of `Intrinsic::is_fpclass` to the DXIL ops `IsNaN`, `IsInf`, `IsFinite`, `IsNormal` when the FPClassTest is `fcNan`, `fcInf`, `fcFinite`, and `fcNormal` respectively - Creates a test `llvm/test/CodeGen/DirectX/is_fpclass.ll` to exercise the intrinsic expansion and DXIL op lowering of `Intrinsic::is_fpclass` ~~A separate PR will be made to remove the now-redundant `dx_isinf` intrinsic to address #87777.~~ A proper implementation for the lowering of the `llvm.is.fpclass` intrinsic to handle all possible combinations of FPClassTest can be implemented in a separate PR. This PR's implementation focuses primarily on addressing the current use-cases for DirectML and HLSL intrinsics.	2025-05-08 09:13:26 -07:00
Jonas Devlieghere	45cd708184	[lldb] Change the statusline format to print "no target" (#139021 ) Change the default statusline format to print "no target" when lldb is launched without a target. Currently, the statusline is empty, which looks rather odd.	2025-05-08 09:09:46 -07:00
Prabhu Rajasekaran	5c6cbe2517	[clang] UEFI default ABI (#138364 ) Set MS ABI as default ABI for UEFI.	2025-05-08 09:08:46 -07:00
Volodymyr Sapsai	64bb60a471	[Modules] Don't fail when an unused textual header is missing. (#138227 ) According to the documentation > A header declaration that does not contain `exclude` nor `textual` specifies a header that contributes to the enclosing module. Which means that `exclude` and `textual` header don't contribute to the enclosing module and their presence isn't required to build such a module. The keywords tell clang how a header should be treated in a context of the module but they don't add headers to the module. When a textual header is used, clang still emits "file not found" error pointing to the location where the missing file is included.	2025-05-08 09:07:33 -07:00
Justin Fargnoli	5b7ccdc2a2	[LLVM][Maintainers] Step down as an `NVPTX` maintainer (#138936 )	2025-05-08 09:05:52 -07:00
Lewis Crawford	9c88b6d689	[ConstantFolding] Fold maximumnum and minimumnum (#138700 ) Add constant-folding support for the maximumnum and minimumnum intrinsics, and extend the tests to show the qnan vs snan behavior differences between maxnum/maximum/maximumnum.	2025-05-08 18:00:49 +02:00
Vivian Zhang	37fecfaa63	[mlir] Support rank-reduced extract_slice in ExtractSliceOfPadTensorSwapPattern (#138921 ) This PR fixes `ExtractSliceOfPadTensorSwapPattern` to support rank-reducing `tensor.extract_slice` ops, which were previously unhandled and could cause crashes. To support this, an additional `tensor.extract_slice` is inserted after `tensor.pad` to reduce the result rank.	2025-05-08 08:51:48 -07:00
Marina Taylor	f2bc7b75dd	[AArch64] Allow the clang.arc.attachedcall marker to be optional (#138694 ) Now that the clang.arc.attachedcall bundle requires having an operand, which we emit a call to in the RVMARKER sequence, we can achieve our real goal: make the marker NOP optional. The intention is that a new ObjC runtime call will be introduced, which doesn't require the NOP to be present, but must be adjacent to the possibly-autorelease-returning call (that the bundle is attached to). This is achieved by having ISel embed whether the marker is necessary with an additional boolean target immediate operand. Co-authored-by: Ahmed Bougacha <ahmed@bougacha.org>	2025-05-08 16:49:31 +01:00

1 2 3 4 5 ...

536811 Commits