clang-p2996

Author	SHA1	Message	Date
sribee8	4c97a91dc0	[libc] Added closing quote (#145101 ) Error message was missing a closing quote, added it. Co-authored-by: Sriya Pratipati <sriyap@google.com>	2025-06-20 21:00:56 +00:00
Nishant Patel	9c1ce31f54	[mlir][vector] Add unroll patterns for vector.load and vector.store (#143420 ) This PR adds unroll patterns for vector.load and vector.store. This PR is follow up of #137558	2025-06-20 13:50:25 -07:00
David Green	b6445ac0c5	[GlobalISel] Create a common register_vector_matchinfo (#144306 ) Several combiner use MatchInfo that are just SmallVector<Register>. This creates a common register_vector_matchinfo that they can all use.	2025-06-20 21:37:02 +01:00
Med Ismail Bennani	58f48011b3	[lldb] Add support for x86_64h to scripted process (#145099 ) This patch adds support to the haswell sub-architecture (x86_64h) to scripted processes. rdar://147208252 Signed-off-by: Med Ismail Bennani <ismail@bennani.ma>	2025-06-20 13:28:21 -07:00
Michael Spencer	6110dead89	[clang][scan-deps] Add option to disable caching stat failures (#144000 ) While the source code isn't supposed to change during a build, in some environments it does. This adds an option that disables caching of stat failures, meaning that source files can be added to the build during scanning. This adds a `-no-cache-negative-stats` option to clang-scan-deps to enable this behavior. There are no tests for clang-scan-deps as there's no reliable way to do so from it. A unit test has been added that modifies the filesystem between scans to test it.	2025-06-20 13:28:05 -07:00
Peter Collingbourne	491b82a5ec	ELF: Add branch-to-branch optimization. When code calls a function which then immediately tail calls another function there is no need to go via the intermediate function. By branching directly to the target function we reduce the program's working set for a slight increase in runtime performance. Normally it is relatively uncommon to have functions that just tail call another function, but with LLVM control flow integrity we have jump tables that replace the function itself as the canonical address. As a result, when a function address is taken and called directly, for example after a compiler optimization resolves the indirect call, or if code built without control flow integrity calls the function, the call will go via the jump table. The impact of this optimization was measured using a large internal Google benchmark. The results were as follows: CFI enabled: +0.1% ± 0.05% queries per second CFI disabled: +0.01% queries per second [not statistically significant] The optimization is enabled by default at -O2 but may also be enabled or disabled individually with --{,no-}branch-to-branch. This optimization is implemented for AArch64 and X86_64 only. lld's runtime performance (real execution time) after adding this optimization was measured using firefox-x64 from lld-speed-test [1] with ldflags "-O2 -S" on an Apple M2 Ultra. The results are as follows: ``` N Min Max Median Avg Stddev x 512 1.2264546 1.3481076 1.2970261 1.2965788 0.018620888 + 512 1.2561196 1.3839965 1.3214632 1.3209327 0.019443971 Difference at 95.0% confidence 0.0243538 +/- 0.00233202 1.87831% +/- 0.179859% (Student's t, pooled s = 0.0190369) ``` [1] https://discourse.llvm.org/t/improving-the-reproducibility-of-linker-benchmarking/86057 Pull Request: https://github.com/llvm/llvm-project/pull/138366	2025-06-20 13:16:24 -07:00
Rodolfo Wottrich	3b9795b3d3	[AArch64] Add CodeGen support for scalar FEAT_CPA (#105669 ) CPA stands for Checked Pointer Arithmetic and is part of the 2023 MTE architecture extensions for A-profile. The new CPA instructions perform regular pointer arithmetic (such as base register + offset) but check for overflow in the most significant bits of the result, enhancing security by detecting address tampering. In this patch we intend to capture the semantics of pointer arithmetic when it is not folded into loads/stores, then generate the appropriate scalar CPA instructions. In order to preserve pointer arithmetic semantics through the backend, we use the PTRADD SelectionDAG node type. Use backend option `-aarch64-use-featcpa-codegen=true` to enable CPA CodeGen (for a target with CPA enabled). The story of this PR is that initially it introduced the PTRADD SelectionDAG node and the respective visitPTRADD() function, adapted from the CHERI/Morello LLVM tree. The original authors are @davidchisnall, @jrtc27, @arichardson. After a while, @ritter-x2a took the part of the code that was target-independent and merged it separately in #140017. This PR thus remains as the AArch64-part only. Mode details about the CPA extension can be found at: - https://community.arm.com/arm-community-blogs/b/architectures-and-processors-blog/posts/arm-a-profile-architecture-developments-2023 - https://developer.arm.com/documentation/ddi0602/2023-09/ (e.g ADDPT instruction) This PR follows #79569. It does not address vector FEAT_CPA instructions.	2025-06-20 21:14:52 +01:00
Florian Hahn	f8ffb4e7cd	[VPlan] Simplify ExtractLastElement(Broadcast(A)) -> A. Remove trivial ExtractLastElement VPInstructions.	2025-06-20 21:08:14 +01:00
sribee8	d078ce7c98	[libc] mbrtowc implementation (#144760 ) implemented the internal and public mbrtowc as well as tests for the public function. --------- Co-authored-by: Sriya Pratipati <sriyap@google.com>	2025-06-20 20:00:59 +00:00
Stanislav Mekhanoshin	3a66e20652	[AMDGPU] Add gfx1250 runlines to vop3 dpp tests. NFC. (#145089 ) dpp8 disasm test does not work yet.	2025-06-20 12:57:36 -07:00
nerix	d8924d4da7	[LLDB] Explicitly use python for version fixup (#144217 ) On Windows, the post build command would open the script in the default editor, since it doesn't know about shebangs. This effectively adds `python3` in front of the command. Amends https://github.com/llvm/llvm-project/pull/142871 / https://github.com/llvm/llvm-project/pull/141116	2025-06-20 14:54:06 -05:00
Amir Ayupov	4959e8a1da	[BOLT][NFCI] Use heuristic for matching split global functions (#90429 ) This change speeds up fragment matching for large BOLTed binaries where all fragments of global parent functions are put under `bolt-pseudo.o` file symbol: - before: iterating over symbols under `bolt-pseudo.o` only to fail to find a parent, - after: bail out immediately and use a global parent by name. Test Plan: NFC, updated register-fragments-bolt-symbols.s	2025-06-20 12:46:56 -07:00
Amir Ayupov	6d8c6ef90c	[BOLT][NFC] Simplify doTrace in BAT mode (#143233 ) `BoltAddressTranslation::getFallthroughsInTrace` iterates over address translation map entries and therefore has direct access to both original and translated offsets. Return the translated offsets in fall-throughs list to avoid duplicate address translation inside `doTrace`. Test Plan: NFC	2025-06-20 12:45:21 -07:00
Maksim Levental	227f759644	[mlir][python] expose operation.block (#145088 ) Expose `operation-getBlock()` in python.	2025-06-20 15:34:43 -04:00
Stanislav Mekhanoshin	affcc5e728	[AMDGPU] Add s_wait_xcnt gfx1250 instruction (#145086 )	2025-06-20 12:28:18 -07:00
Farzon Lotfi	2a4207e732	[DirectX] Don't limit visitGetElementPtrInst to global ptrs (#144959 ) fixes #144608 - there is a getPointerOperandIndex function so we don't need to iterate the operands trying to find the pointer. This resulted in a small cleanup to visitStoreInst and visitLoadInst. - The meat of this change was in visitGetElementPtrInst to account for allocas and not bail when we don't find a global.	2025-06-20 15:23:20 -04:00
Stanislav Mekhanoshin	958dc86026	[AMDGPU] Don't insert wait instructions that are not supported by gfx1250 (#145084 ) No tests yet, but it will allow further tests not to be polluted with these waits.	2025-06-20 12:21:45 -07:00
joaosaffran	b5d5708128	[HLSL] Add descriptor table metadata parsing (#142492 ) Implements descriptor table parsing from root signature metadata. This is required to support root signatures in hlsl. Closes: #[126640](https://github.com/llvm/llvm-project/issues/126640) --------- Co-authored-by: joaosaffran <joao.saffran@microsoft.com>	2025-06-20 12:12:02 -07:00
Stanislav Mekhanoshin	8d2eea96b3	[AMDGPU] gfx1250 SOPP MC tests. NFC. (#145082 )	2025-06-20 12:06:55 -07:00
Philip Reames	c103bbc836	[LV] Consider whether vscale is a known power of two for iteration check (#144963 ) Going mostly by the comment here - but it says "vscale is not necessarily a power-of-2". Both in tree targets have vscale as a power of two, and we have an existing TTI hook for that.	2025-06-20 11:37:27 -07:00
Fabian Mora	f159774352	[mlir][core\|ptr] Add `PtrLikeTypeInterface` and casting ops to the `ptr` dialect (#137469 ) This patch adds the `PtrLikeTypeInterface` type interface to identify pointer-like types. This interface is defined as: ``` A ptr-like type represents an object storing a memory address. This object is constituted by: - A memory address called the base pointer. This pointer is treated as a bag of bits without any assumed structure. The bit-width of the base pointer must be a compile-time constant. However, the bit-width may remain opaque or unavailable during transformations that do not depend on the base pointer. Finally, it is considered indivisible in the sense that as a `PtrLikeTypeInterface` value, it has no metadata. - Optional metadata about the pointer. For example, the size of the memory region associated with the pointer. Furthermore, all ptr-like types have two properties: - The memory space associated with the address held by the pointer. - An optional element type. If the element type is not specified, the pointer is considered opaque. ``` This patch adds this interface to `!ptr.ptr` and the `memref` type. Furthermore, this patch adds necessary ops and type to handle casting between `!ptr.ptr` and ptr-like types. First, it defines the `!ptr.ptr_metadata` type. An opaque type to represent the metadata of a ptr-like type. The rationale behind adding this type, is that at high-level the metadata of a type like `memref` cannot be specified, as its structure is tied to its lowering. The `ptr.get_metadata` operation was added to extract the opaque pointer metadata. The concrete structure of the metadata is only known when the op is lowered. Finally, this patch adds the `ptr.from_ptr` and `ptr.to_ptr` operations. Allowing to cast back and forth between `!ptr.ptr` and ptr-like types. ```mlir func.func @func(%mr: memref<f32, #ptr.generic_space>) -> memref<f32, #ptr.generic_space> { %ptr = ptr.to_ptr %mr : memref<f32, #ptr.generic_space> -> !ptr.ptr<#ptr.generic_space> %mda = ptr.get_metadata %mr : memref<f32, #ptr.generic_space> %res = ptr.from_ptr %ptr metadata %mda : !ptr.ptr<#ptr.generic_space> -> memref<f32, #ptr.generic_space> return %res : memref<f32, #ptr.generic_space> } ``` It's future work to replace and remove the `bare-ptr-convention` through the use of these ops. --------- Co-authored-by: Mehdi Amini <joker.eph@gmail.com>	2025-06-20 14:23:39 -04:00
Krzysztof Parzyszek	925dbc7988	[flang][OpenMP] Fix namespace nesting after PR144960 Newly introduced Atomic.cpp fails to compile on its own, but somehow compiles fine in the build. Maybe it's because PCH, but it needs to be fixed nevertheless.	2025-06-20 13:22:58 -05:00
Deric C.	3f42c6bddd	[DirectX] Scalarize `extractelement` and `insertelement` with dynamic indices (#141676 ) Fixes #141136 - Implement `visitExtractElementInst` and `visitInsertElementInst` in `DXILDataScalarizerVisitor` to scalarize `extractelement` and `insertelement` instructions whose index operand is not a `ConstantInt` by converting the vector to an array and then loading from the array - Rename the `replaceVectorWithArray` helper function to `equivalentArrayTypeFromVector`, relocate the function toward the top of the file, and remove the unused `Ctx` parameter	2025-06-20 11:20:30 -07:00
Luke Lau	521adc9fa2	[VPlan] Use createScalarZExtOrTrunc when expanding expandVPWidenIntOrFpInduction Split off from #144666	2025-06-20 19:18:49 +01:00
Diego Caballero	ff6367b470	[[mlir][Vector] Add simple folders for `vector.from_element`/`vector.to_elements` (#144444 ) This PR adds simple folders to remove no-op sequences of `vector.from_elements` and `vector.to_elements`.	2025-06-20 11:16:46 -07:00
Yijia Gu	bae48ac3c0	[mlir][bazel] add missing deps for XeGPUTransforms	2025-06-20 11:14:14 -07:00
Florian Hahn	7f74a377d0	[LV] Regenerate uniform_across_vf* check lines. Re-generate check lines to reduce diff in upcoming changes. Also filters out the code after scalar.ph:, which is dead.	2025-06-20 19:10:26 +01:00
Sam Elliott	ab8b8c1e13	[TargetParser][cmake] Be Smarter about TableGen Deps (#144848 ) This tries to be a bit smarter for the OLD behaviour of CMP0116, to glob more relevant directories looking for possible dependencies. The changes are: - Remove some duplication of lines in the `tablegen` function. - Put CURRENT_SOURCE_DIR into `tblgen_includes` (at the front) - Glob all directories in `tblgen_includes` - Give up on `local_tds` which was wrong when using tablegen to compile a file in a different directory (as TargetParser does) - Use `EXTRA_INCLUDES` in TargetParser `tablegen` calls. This is still an under-approximation of what might be included, at least comparing the RISCVTargetParserDef.inc.d (after building `target_parser_gen`), and the list of deps in the ninja file when explicitly setting CMP0116 to OLD. Fixes #144639	2025-06-20 11:05:25 -07:00
Craig Topper	04e2e581ac	[RISCV] Treat bf16->f32 as separate ExtKind in combineOp_VLToVWOp_VL. (#144653 ) This allows us to better track the narrow type we need and to fix miscompiles if f16->f32 and bf16->f32 extends are mixed. Fixes #144651.	2025-06-20 10:44:51 -07:00
Charitha Saumya	adc6228ea0	[mlir][xegpu] Refine layout assignment in XeGPU SIMT distribution. (#142687 ) Changes: * Decouple layout propagation from subgroup distribution and move it to an independent pass. * Refine layout assignment to handle control-flow ops correctly (scf.for, scf.while). * Refine test cases.	2025-06-20 10:43:19 -07:00
Michal Rostecki	0d21c956a5	[BPF] Handle nested wrapper structs in BPF map definition traversal (#144097 ) In Aya/Rust, BPF map definitions are nested in two nested types: * A struct representing the map type (e.g., `HashMap`, `RingBuf`) that provides methods for interacting with the map type (e.g. `HashMap::get`, `RingBuf::reserve`). * An `UnsafeCell`, which informs the Rust compiler that the type is thread-safe and can be safely mutated even as a global variable. The kernel guarantees map operation safety. This leads to a type hierarchy like: ```rust pub struct HashMap<K, V, const M: usize, const F: usize = 0>( core::cell::UnsafeCell<HashMapDef<K, V, M, F>>, ); const BPF_MAP_TYPE_HASH: usize = 1; pub struct HashMapDef<K, V, const M: usize, const F: usize = 0> { r#type: const [i32; BPF_MAP_TYPE_HASH], key: const K, value: const V, max_entries: const [i32; M], map_flags: const [i32; F], } ``` Then used in the BPF program code as a global variable: ```rust #[link_section = ".maps"] static HASH_MAP: HashMap<u32, u32, 1337> = HashMap::new(); ``` Which is an equivalent of the following BPF map definition in C: ```c #define BPF_MAP_TYPE_HASH 1 struct { int (type)[BPF_MAP_TYPE_HASH]; typeof(int) key; typeof(int) value; int (*max_entries)[1337]; } map_1 __attribute__((section(".maps"))); ``` Accessing the actual map definition requires traversing: ``` HASH_MAP -> __0 -> value ``` Previously, the BPF backend only visited the pointee types of the outermost struct, and didn’t descend into inner wrappers. This caused issues when the key/value types were custom structs: ```rust // Define custom structs for key and values. pub struct MyKey(u32); pub struct MyValue(u32); #[link_section = ".maps"] #[export_name = "HASH_MAP"] pub static HASH_MAP: HashMap<MyKey, MyValue, 10> = HashMap::new(); ``` These types weren’t fully visited and appeared in BTF as forward declarations: ``` #30: <FWD> 'MyKey' kind:struct #31: <FWD> 'MyValue' kind:struct ``` The fix is to enhance `visitMapDefType` to recursively visit inner composite members. If a member is a composite type (likely a wrapper), it is now also visited using `visitMapDefType`, ensuring that the pointee types of the innermost stuct members, like `MyKey` and `MyValue`, are fully resolved in BTF. With this fix, the correct BTF entries are emitted: ``` #6: <STRUCT> 'MyKey' sz:4 n:1 #00 '__0' off:0 --> [7] #7: <INT> 'u32' bits:32 off:0 #8: <PTR> --> [9] #9: <STRUCT> 'MyValue' sz:4 n:1 #00 '__0' off:0 --> [7] ``` Fixes: #143361	2025-06-20 10:17:36 -07:00
Thurston Dang	33a92af1b2	[msan] Add off-by-default flag to fix false negatives from partially undefined constant fixed-length vectors (#143837 ) This patch adds an off-by-default flag which, when enabled via `-mllvm -msan-poison-undef-vectors=true`, fixes a false negative in MSan (partially-undefined constant fixed-length vectors). It is currently off by default since, by fixing the false positive, code/tests that previously passed MSan may start failing. The default will be changed in a future patch. Prior to this patch, MSan computes that partially-undefined constant fixed-length vectors are fully initialized, which leads to false negatives; moreover, benign vector rewriting could theoretically flip MSan's shadow computation from initialized to uninitialized or vice-versa (). `-msan-poison-undef-vectors=true` calculates the shadow precisely: for each element of the vector, the corresponding shadow is fully uninitialized if the element is undefined/poisoned, otherwise it is fully initialized. Updates the test from https://github.com/llvm/llvm-project/pull/143823 () For example: ``` %x = insertelement <2 x i64> <i64 0, i64 poison>, i64 42, i64 0 %y = insertelement <2 x i64> <i64 poison, i64 poison>, i64 42, i64 0 ``` %x and %y are equivalent but, prior to this patch, MSan incorrectly computes the shadow of %x as <0, 0> rather than <0, -1>.	2025-06-20 10:11:12 -07:00
Simon Pilgrim	f8ee5774b6	[X86] combineConcatVectorOps - only concat AVX1 v4i64 shift-by-32 to a shuffle if the concat is free (#145043 )	2025-06-20 18:09:07 +01:00
Maryam Moghadas	65cb3bcf32	[Clang][PowerPC] Add __dmr1024 type and DMF integer calculation builtins (#142480 ) Define the __dmr1024 type used to manipulate the new DMR registers introduced by the Dense Math Facility (DMF) on PowerPC, and add six Clang builtins that correspond to the integer outer-product accumulate to ACC PowerPC instructions: * __builtin_mma_dmxvi8gerx4 * __builtin_mma_pmdmxvi8gerx4 * __builtin_mma_dmxvi8gerx4pp * __builtin_mma_pmdmxvi8gerx4pp * __builtin_mma_dmxvi8gerx4spp * __builtin_mma_pmdmxvi8gerx4spp.	2025-06-20 13:03:14 -04:00
Uzair Nawaz	8d6e29d0d3	[libc] Reworked CharacterConverter isComplete into isFull and isEmpty (#144799 ) isComplete previously meant different things for different conversion directions. Refactored bytes_processed to bytes_stored which now consistently increments on every push and decrements on pop making both directions more consistent with each other	2025-06-20 16:59:30 +00:00
Aiden Grossman	7157f33c6c	[libc++] Disable a std::unexpected test in modules build (#144466 ) This patch disables unexpected_disabled_cpp17.verify.cpp under clang modules builds because it changes diagnostics criteria post #143423, causing the test to fail. This patch follows a similar style to `853059a150`. This was found when working on trying to land #144033.	2025-06-20 12:58:59 -04:00
Jay Foad	6ddb3a69c1	[AMDGPU] Add another test showing unwanted VALU codegen (#145062 )	2025-06-20 17:54:44 +01:00
Hristo Hristov	945ce1aa3d	[libc++] Update the value of __cpp_lib_constrained_equality after P3379R0 (#144553 ) https://wg21.link/P3379R0 updated the value of __cpp_lib_constrained_equality, but we forgot to update it when we implemented the paper.	2025-06-20 12:36:46 -04:00
Shilei Tian	edbaf19c46	[AMDGPU] Fix a potential integer overflow in GCNRegPressure when true16 is enabled (#144968 ) Fixes SWDEV-537014.	2025-06-20 12:29:32 -04:00
Muzammil	379a609dad	[mlir][arith][transforms] Adds f4E2M1FN support to truncf and extf (#144157 ) See work detail: https://github.com/iree-org/iree/issues/20920 Add support for f4E2M1FN in `arith.truncf` and `arith.extf` ops though a software emulation --------- Signed-off-by: Muzammiluddin Syed <muzasyed@amd.com>	2025-06-20 11:27:35 -05:00
Jameson Nash	940ff110d7	[InstCombine] fix hwasan mistake in "remove dead loads" (#145057 ) Detected by CI after #143958.	2025-06-20 12:22:59 -04:00
Michael Buch	877511920d	Revert "[lldb][DWARF] Remove object_pointer from ParsedDWARFAttributes" (#145065 ) Reverts llvm/llvm-project#144880 Caused `TestObjCIvarsInBlocks.py` to fail on macOS CI.	2025-06-20 17:20:58 +01:00
Justin King	bfef8732be	msan: Support free_sized and free_aligned_sized from C23 (#144529 ) Adds support to MSan for `free_sized` and `free_aligned_sized` from C23. Other sanitizers will be handled with their own separate PRs. For https://github.com/llvm/llvm-project/issues/144435 Signed-off-by: Justin King <jcking@google.com>	2025-06-20 09:16:40 -07:00
Krzysztof Parzyszek	6ba1955ba2	[flang][OpenMP] Fix ignore-target-data.f90 test Allow the function definition line to match with and without attrbute set number. This fixes build break after PR144534: https://lab.llvm.org/buildbot/#/builders/157/builds/31331 Also move the test to the OpenMP subdirectory where it should have been from the beginning.	2025-06-20 11:09:07 -05:00
Chenguang Wang	72de0e4584	[TableGen][Docs] Fix empty list syntax in TableGen doc. (#145041 ) `[]<list<int>>` actually produces `list<list<int>>`.	2025-06-20 09:07:35 -07:00
Amir Ayupov	770b16cd49	[BOLT][test] Update X86/perf2bolt-spe.test (#145061 ) Address NFC mismatches caused by running perf2bolt from under the wrapper script: https://lab.llvm.org/buildbot/#/builders/92/builds/20938 > <stdin>:2:64: note: possible intended match here > /home/worker/bolt-worker2/bolt-x86_64-ubuntu-nfc/build/bin/llvm-bolt.old: -spe is available only on AArch64. Test Plan: ninja check-bolt	2025-06-20 09:07:08 -07:00
Timm Baeder	32fc625a3f	Reapply "Reapply "[clang][bytecode] Allocate IntegralAP and Floating … (#145014 ) …types usi… (#144676)" This reverts commit `68471d29ee`. IntegralAP contains a union: union { uint64_t *Memory = nullptr; uint64_t Val; }; On 64bit systems, both Memory and Val have the same size. However, on 32 bit system, Val is 64bit and Memory only 32bit. Which means the default initializer for Memory will only zero half of Val. We fixed this by zero-initializing Val explicitly in the IntegralAP(unsigned BitWidth) constructor. See also the discussion in https://github.com/llvm/llvm-project/pull/144246	2025-06-20 18:06:01 +02:00
Simon Pilgrim	151ee0faad	[X86] SimplifyDemandedVectorEltsForTargetNode - ensure X86ISD::VPERMILPV node use v2f64/v4f32 types When reducing v4f64/v8f32 non-lane crossing X86ISD::VPERMV nodes, we use X86ISD::VPERMILPV nodes for 128-bits, but these are only available for fp types. Fixes #145046	2025-06-20 17:03:30 +01:00
Jonas Devlieghere	749e4a53d2	[lldb] Fix ASCII art in CommandObjectProtocolServer (NFC)	2025-06-20 10:54:00 -05:00
Jay Foad	6e86b7e34b	[AMDGPU] Do not replace SALU floating point multiply with VALU-only ldexp (#145048 )	2025-06-20 16:52:43 +01:00

1 2 3 4 5 ...

541735 Commits