clang-p2996

Author	SHA1	Message	Date
Mingming Liu	dda73336ad	[ThinLTO]Record import type in GlobalValueSummary::GVFlags (#87597 ) The motivating use case is to support import the function declaration across modules to construct call graph edges for indirect calls [1] when importing the function definition costs too much compile time (e.g., the function is too large has no `noinline` attribute). 1. Currently, when the compiled IR module doesn't have a function definition but its postlink combined summary contains the function summary or a global alias summary with this function as aliasee, the function definition will be imported from source module by IRMover. The implementation is in FunctionImporter::importFunctions [2] 2. In order for FunctionImporter to import a declaration of a function, both function summary and alias summary need to carry the def / decl state. Specifically, all existing summary fields doesn't differ across import modules, but the def / decl state of is decided by `<ImportModule, Function>`. This change encodes the def/decl state in `GlobalValueSummary::GVFlags`. In the subsequent changes 1. The indexing step `computeImportForModule` [3] will compute the set of definitions and the set of declarations for each module, and passing on the information to bitcode writer. 2. Bitcode writer will look up the def/decl state and sets the state when it writes out the flag value. This is demonstrated in https://github.com/llvm/llvm-project/pull/87600 3. Function importer will read the def/decl state when reading the combined summary to figure out two sets of global values, and IRMover will be updated to import the declaration (aka linkGlobalValuePrototype [4]) into the destination module. - The next change is https://github.com/llvm/llvm-project/pull/87600 [1] mentioned in rfc https://discourse.llvm.org/t/rfc-for-better-call-graph-sort-build-a-more-complete-call-graph-by-adding-more-indirect-call-edges/74029#support-cross-module-function-declaration-import-5 [2] `3b337242ee/llvm/lib/Transforms/IPO/FunctionImport.cpp (L1608-L1764)` [3] `3b337242ee/llvm/lib/Transforms/IPO/FunctionImport.cpp (L856)` [4] `3b337242ee/llvm/lib/Linker/IRMover.cpp (L605)`	2024-04-10 19:46:01 -07:00
Craig Topper	999b9e6ddb	[RISCV] Use vector getConstant instead of getSplatVector+getConstant. NFC	2024-04-10 19:39:41 -07:00
Jianjian Guan	fd50151180	[RISCV] Only support SPLAT_VECTOR for Zvfhmin when also enable the scalar extension of half fp (#88275 )	2024-04-11 10:23:26 +08:00
Freddy Ye	f4509cf284	[X86][MC] Support enc/dec for SETZUCC and promoted SETCC. (#86473 ) apx-spec: https://cdrdv2.intel.com/v1/dl/getContent/784266 apx-syntax-recommendation: https://cdrdv2.intel.com/v1/dl/getContent/817241	2024-04-11 10:18:29 +08:00
Jordan Rupprecht	6b46166ef2	[llvm][NFC] Suppress `-Wunused-result` call to `write` Commit `87e6f87fe7` adds a call to `::write()`, which may be annotated w/ `warn_unused_result`, leading to `-Wunused-result` failures.	2024-04-11 02:14:07 +00:00
Owen Pan	51f1681424	[clang-format] Don't merge a short block for SBS_Never (#88238 ) Also fix unit tests. Fixes #87484.	2024-04-10 19:06:29 -07:00
Bill Wendling	fca51911d4	[NFC][Clang] Improve const correctness for IdentifierInfo (#79365 ) The IdentifierInfo isn't typically modified. Use 'const' wherever possible.	2024-04-11 00:33:40 +00:00
Arthur Eubanks	be10070f91	Revert "[Driver] Ensure ToolChain::LibraryPaths is not empty for non-Darwin" This reverts commit `ccdebbae4d`. Causes test failures in the presence of Android runtime libraries in resource-dir. See comments on https://github.com/llvm/llvm-project/pull/87866.	2024-04-10 23:41:51 +00:00
Arthur Eubanks	9786a3b4cf	[gn build] Port `0a1317564a`	2024-04-10 23:41:11 +00:00
Arthur Eubanks	233edab876	[gn build] Port `5d7d6ad663`	2024-04-10 23:41:11 +00:00
Arthur Eubanks	4027066683	[gn build] Port `59e66c515a`	2024-04-10 23:41:10 +00:00
Arthur Eubanks	19e516fbed	[gn build] Port `1fda1776e3`	2024-04-10 23:41:10 +00:00
Jie Fu	6ef4450705	[clang] Fix -Wunused-function in CGStmtOpenMP.cpp (NFC) llvm-project/clang/lib/CodeGen/CGStmtOpenMP.cpp:7959:13: error: unused function 'emitTargetTeamsLoopCodegenStatus' [-Werror,-Wunused-function] static void emitTargetTeamsLoopCodegenStatus(CodeGenFunction &CGF, ^ 1 error generated.	2024-04-11 07:37:12 +08:00
Vitaly Buka	d927d1867f	[UBSAN] Emit optimization remarks (#88304 )	2024-04-10 16:30:42 -07:00
Matthias Braun	acb7ddc5cf	[WebAssembly] Remove threadlocal.address when disabling TLS (#88209 ) Remove `llvm.threadlocal.address` intrinsic usage when disabling TLS. This fixes errors revealed by the stricter IR verification introduced in PR #87841.	2024-04-10 16:24:02 -07:00
Daniel Chen	8136ac1c42	[Flang] Define c_int_fast16_t and c_int_fast32_t for PowerPC. (#88292 ) On Linux, PowerPC defines `int_fast16_t` and `int_fast32_t` as `long`. Need to update the corresponding type, `c_int_fast16_t` and `c_int_fast32_t` in `iso_c_binding` module so they are interoparable.	2024-04-10 19:22:38 -04:00
Oskar Wirga	a9d4ddd98a	[MergeFuncs/CFI] Ensure all type metadata is propogated for CFI (#88218 ) I noticed that we weren't propagating ALL type metadata that was attached to CFI functions: # BEFORE ``` ; Function Attrs: minsize nounwind optsize ssp uwtable(sync) define internal void @foo(ptr nocapture noundef readonly %0) #0 !dbg !62311 !type !34028 !type !34029 !type !34030 ... fn merging ; Function Attrs: minsize nounwind optsize ssp uwtable(sync) define internal void @foo(ptr nocapture noundef readonly %0) #0 !type !34028 ``` # AFTER ``` ; Function Attrs: minsize nounwind optsize ssp uwtable(sync) define internal void @foo(ptr nocapture noundef readonly %0) #0 !dbg !62311 !type !34028 !type !34029 !type !34030 ... fn merging ; Function Attrs: minsize nounwind optsize ssp uwtable(sync) define internal void @foo(ptr nocapture noundef readonly %0) #0 !type !type !34028 !type !34029 !type !34030 ``` This patch makes sure that the entire vector of metadata is copied over.	2024-04-10 15:37:27 -07:00
Craig Topper	d8f1e5d289	[APInt] Remove accumulator initialization from tcMultiply and tcFullMultiply. NFCI (#88202 ) The tcMultiplyPart routine has a flag that says whether to add to the accumulator or overwrite it. By using the overwrite mode on the first iteration we don't need to initialize the accumulator to zero. Note, the initialization in tcFullMultiply was only initializing the first rhsParts of dst. tcMultiplyPart always overwrites the rhsParts+1 part that just contains the last carry. The first write to each part of dst past rhsParts is a carry write so that's how the upper part of dst is initialized.	2024-04-10 15:07:16 -07:00
Chelsea Cassanova	9f6d08f256	Revert "[lldb][sbdebugger] Move SBDebugger Broadcast bit enum into lldb-enumerations.h" (#88324 ) Reverts llvm/llvm-project#87409 due a missed update to the broadcast bit causing a build failure on the x86_64 Debian buildbot.	2024-04-10 14:54:30 -07:00
Stanislav Mekhanoshin	2fdfea088c	[AMDGPU] Add v2i32 to the VS_64 types. NFCI. (#88318 ) I am trying to use VOP3Inst with intrinsic taking v2i32 operand and it fails to create patterm without it.	2024-04-10 14:50:54 -07:00
Chelsea Cassanova	af7c196fb8	[lldb][sbdebugger] Move SBDebugger Broadcast bit enum into lldb-enumerations.h (#87409 ) When the `eBroadcastBitProgressCategory` bit was originally added to Debugger.h and SBDebugger.h, each corresponding bit was added in order of the other bits that were previously there. Since `Debugger.h` has an enum bit that `SBDebugger.h` does not, this meant that their offsets did not match. Instead of trying to keep the bit offsets in sync between the two, it's preferable to just move SBDebugger's enum into the main enumerations header and use the bits from there. This also requires that API tests using the bits from SBDebugger update their usage.	2024-04-10 14:45:49 -07:00
Jeff Niu	fb771fe315	[mlir] Slightly optimize bytecode op numbering (#88310 ) If the bytecode encoding supports properties, then the dictionary attribute is always the raw dictionary attribute of the operation, regardless of what it contains. Otherwise, get the dictionary attribute from the op: if the op does not have properties, then it returns the raw dictionary, otherwise it returns the combined inherent and discardable attributes.	2024-04-10 23:34:48 +02:00
Nick Desaulniers	8cfa72ade9	[libc] fix typo in hdr/CMakeLists Fixes #87896	2024-04-10 13:51:23 -07:00
Fangrui Song	c258f57398	[ELF] Move createSyntheticSections from Writer.cpp to SyntheticSections.cpp. NFC SyntheticSections.cpp is more appropriate. This change enables elimination of many explicit template instantiations. Due to `make<SymbolTableSection<ELFT>>(*strtab)` in Arch/ARM.cpp, we do not remove explicit template instantiations for SymbolTableSection.	2024-04-10 13:42:51 -07:00
Farzon Lotfi	05093e2438	[Spirv][HLSL] Add OpAll lowering and float vec support (#87952 ) The main point of this change was to add support for HLSL's all intrinsic. In the process of doing that I found a few issues around creating an `OpConstantComposite` via `buildZerosVal`. First the current code didn't support floats so the process of adding `buildZerosValF` meant I needed a float version of `getOrCreateIntConstVector`. After doing so I renamed both versions to `getOrCreateConstVector`. That meant I needed to create a float type version of `getOrCreateIntCompositeOrNull`. Luckily the type information was low for this function so was able to split it out into a helpwe and rename `getOrCreateIntCompositeOrNull` to `getOrCreateCompositeOrNull` With the exception of type handling differences of the code and Null vs 0 Constant Op codes these functions should be identical. To handle scalar floats I could not use `buildConstantFP` like this PR did: https://github.com/llvm/llvm-project/commit/0a2aaab5aba46#diff-733a189c5a8c3211f3a04fd6e719952a3fa231eadd8a7f11e6ecf1e584d57411R1603 because that would create too many superfluous registers (that causes problems in the validator), I had to create a float version of `getOrCreateConstInt` which I called `getOrCreateConstFP`. similar problems with doing it like this: https://github.com/llvm/llvm-project/blob/main/llvm/lib/Target/SPIRV/SPIRVBuiltins.cpp#L1540. `buildZerosValF` also has a use of a function `getZeroFP`. This is because half, float, and double scalar values of 0 would collide in `SPIRVDuplicatesTracker<Constant> CT` if you use `APFloat(0.0f)`. `getORCreateConstFP` needed its own version of `getOrCreateConstIntReg` which I called `getOrCreateConstFloatReg` The one difference in this function is `getOrCreateConstFloatReg` returns a bit width so we don't have to call `getScalarOrVectorBitWidth` twice ie when it is used again in `getOrCreateConstFP` for `OpConstantF` `addNumImm`. `getOrCreateConstFloatReg` needed an `assignFloatTypeToVReg` helper which called a `getOrCreateSPIRVFloatType` helper. There was no equivalent IntegerType::get for floats so I handled this with a switch statement on bit widths to get the right LLVM float type. Finally, there is the use of `bool ZeroAsNull = STI.isOpenCLEnv();` This is partly a cosmetic change. When Zeros are treated as nulls, we don't create `OpConstantComposite` vectors which is something we do in the DXCs SPIRV backend. The DXC SPIRV backend also does not use `OpConstantNull`. Finally, I needed a means to test the behavior of the OpConstantNull and `OpConstantComposite` changes and this was one way I could do that via the same tests.	2024-04-10 16:27:44 -04:00
Christopher Di Bella	d347235bdd	[Flang] responds to Clang Tidy feedback (#87847 ) Line 267: performance-unnecessary-copy-initialization Line 592: readability-container-size-empty	2024-04-10 13:15:22 -07:00
David Pagan	a12836647e	[OpenMP][CodeGen] Improved codegen for combined loop directives (#87278 ) IR for 'target teams loop' is now dependent on suitability of associated loop-nest. If a loop-nest: - does not contain a function call, or - the -fopenmp-assume-no-nested-parallelism has been specified, - or the call is to an OpenMP API AND - does not contain nested loop bind(parallel) directives then it can be emitted as 'target teams distribute parallel for', which is the current default. Otherwise, it is emitted as 'target teams distribute'. Added debug output indicating how 'target teams loop' was emitted. Flag is -mllvm -debug-only=target-teams-loop-codegen Added LIT tests explicitly verifying 'target teams loop' emitted as a parallel loop and a distribute loop. Updated other 'loop' related tests as needed to reflect change in IR. - These updates account for most of the changed files and additions/deletions.	2024-04-10 13:09:17 -07:00
Xing Xue	b3792ae42a	[OpenMP][AIX] Fix test config for AIX (#88272 ) This patch fixes the test config so that it works for `tasking/omp50_taskdep_depobj.c` which uses different flags to test with compiler's `omp.h`. * set test environment variable `OBJECT_MODE` to `64` if it is set explicitly to `64` in the AIX environment. `OBJECT_MODE` is default to `32` and is recognized by AIX compilers and toolchain. In this way, we don't need to set `-m64` for all compiler flags for 64-bit mode * add option `-Wl,-bmaxdata` to 32-bit `test_openmp_flags` used by `tasking/omp50_taskdep_depobj.c`	2024-04-10 16:06:31 -04:00
erichkeane	a6d1366b73	[NFC] Remove a pair of incorrect comments from ParseOpenACC We attempt to continue parsing, but the comment says the opposite. Just remove the inaccurate comments in this patch.	2024-04-10 12:42:05 -07:00
martinboehme	7549b45825	Revert "[clang][dataflow] Propagate locations from result objects to initializers." (#88315 ) Reverts llvm/llvm-project#87320 This is causing buildbots to fail because `isOriginalRecordConstructor()` is now unused.	2024-04-10 21:27:10 +02:00
shamithoke	e3ef4612c1	Perform bitreverse using AVX512 GFNI for i32 and i64. (#81764 ) Currently, the lowering operation for bitreverse using Intel AVX512 GFNI only supports byte vectors Extend the operation to i32 and i64. --------- Co-authored-by: shami <shami_thoke@yahoo.com>	2024-04-10 20:22:44 +01:00
Fangrui Song	ca6b8469c1	[ELF] Avoid unneeded config->isLE and config->wordsize. NFC	2024-04-10 12:20:28 -07:00
Joseph Huber	fad14707b7	[libc] Add note to use `LIBC_GPU_BUILD=ON` as another form Summary: This is a shorthand to enable GPU support so it should be listed in the docs.	2024-04-10 14:07:57 -05:00
Joseph Huber	f81879c0f7	[Libomptarget] Add RPC-based printf implementation for OpenMP #85638 Summary: Relanding after reverting, only applies to AMDGPU for now. This patch adds an implementation of printf that's provided by the GPU C library runtime. This pritnf currently implemented using the same wrapper handling that OpenMP sets up. This will be removed once we have proper varargs support. This printf differs from the one CUDA offers in that it is synchronous and uses a finite size. Additionally we support pretty much every format specifier except the %n option. Depends on #85331	2024-04-10 13:36:25 -05:00
Mark de Wever	0a1317564a	[libc++] Adds a global private constructor tag. (#87920 ) This removes the similar tags used in the chrono tzdb implementation. Fixes: https://github.com/llvm/llvm-project/issues/85432	2024-04-10 20:34:58 +02:00
Alexey Bataev	2b00a73f62	[SLP]Buildvector for alternate instructions with non-profitable gather operands. If the operands of the potentially alternate node are going to produce buildvector sequences, which result in more instructions, than the original code, then suhinstructions should be vectorized as alternate node, better to end up with the buildvector node. Left column - experimental, Right - reference. Metric: size..text Program size..text results results0 diff test-suite :: SingleSource/Benchmarks/Adobe-C++/loop_unroll.test 413680.00 416272.00 0.6% test-suite :: External/SPEC/CFP2017rate/526.blender_r/526.blender_r.test 12351788.00 12354844.00 0.0% test-suite :: External/SPEC/CINT2017speed/625.x264_s/625.x264_s.test 664901.00 664949.00 0.0% test-suite :: External/SPEC/CINT2017rate/525.x264_r/525.x264_r.test 664901.00 664949.00 0.0% test-suite :: External/SPEC/CFP2017rate/511.povray_r/511.povray_r.test 1171371.00 1171355.00 -0.0% test-suite :: MultiSource/Benchmarks/7zip/7zip-benchmark.test 1036396.00 1036284.00 -0.0% test-suite :: MultiSource/Benchmarks/MiBench/consumer-jpeg/consumer-jpeg.test 111280.00 111248.00 -0.0% test-suite :: External/SPEC/CFP2017rate/538.imagick_r/538.imagick_r.test 1392113.00 1391361.00 -0.1% test-suite :: External/SPEC/CFP2017speed/638.imagick_s/638.imagick_s.test 1392113.00 1391361.00 -0.1% test-suite :: MultiSource/Benchmarks/Prolangs-C/TimberWolfMC/timberwolfmc.test 281676.00 281452.00 -0.1% test-suite :: MultiSource/Benchmarks/VersaBench/ecbdes/ecbdes.test 3025.00 3019.00 -0.2% test-suite :: MultiSource/Benchmarks/Prolangs-C/plot2fig/plot2fig.test 6351.00 6335.00 -0.3% Metric: SLP.NumVectorInstructions Program SLP.NumVectorInstructions results results0 diff test-suite :: MultiSource/Benchmarks/VersaBench/ecbdes/ecbdes.test 15.00 16.00 6.7% test-suite :: External/SPEC/CINT2017rate/525.x264_r/525.x264_r.test 1703.00 1707.00 0.2% test-suite :: External/SPEC/CINT2017speed/625.x264_s/625.x264_s.test 1703.00 1707.00 0.2% test-suite :: External/SPEC/CFP2017rate/526.blender_r/526.blender_r.test 26241.00 26239.00 -0.0% test-suite :: External/SPEC/CFP2017rate/510.parest_r/510.parest_r.test 11761.00 11754.00 -0.1% test-suite :: MultiSource/Benchmarks/Prolangs-C/TimberWolfMC/timberwolfmc.test 824.00 822.00 -0.2% test-suite :: External/SPEC/CFP2017speed/638.imagick_s/638.imagick_s.test 5668.00 5654.00 -0.2% test-suite :: External/SPEC/CFP2017rate/538.imagick_r/538.imagick_r.test 5668.00 5654.00 -0.2% test-suite :: External/SPEC/CINT2017rate/502.gcc_r/502.gcc_r.test 792.00 790.00 -0.3% test-suite :: External/SPEC/CINT2017speed/602.gcc_s/602.gcc_s.test 792.00 790.00 -0.3% test-suite :: MultiSource/Benchmarks/FreeBench/pifft/pifft.test 1389.00 1384.00 -0.4% test-suite :: MultiSource/Benchmarks/7zip/7zip-benchmark.test 596.00 590.00 -1.0% test-suite :: MultiSource/Benchmarks/Prolangs-C/plot2fig/plot2fig.test 6.00 5.00 -16.7% Metric: exec_time Program exec_time results results0 diff test-suite :: External/SPEC/CFP2017rate/526.blender_r/526.blender_r.test 99.14 100.00 0.9% Other changes are not significant (less than 0.1% percent with exectime less 5 secs). SingleSource/Benchmarks/Adobe-C++/loop_unroll - same small patterns remain scalar, smaller code. External/SPEC/CFP2017rate/526.blender_r/526.blender_r - many small changes, some extra stores gets vectorized. External/SPEC/CINT2017speed/625.x264_s/625.x264_s External/SPEC/CINT2017rate/525.x264_r/525.x264_r x264 has one change in a loop body, in function ssim_end4, some code remain scalar, resulting in less code size. External/SPEC/CFP2017rate/511.povray_r/511.povray_r - some extra code gets vectorized, looks like some other patterns were matched. MultiSource/Benchmarks/7zip/7zip-benchmark - extra stores were vectorized (looks like the graphs become profitable) MultiSource/Benchmarks/MiBench/consumer-jpeg/consumer-jpeg - small changes in vectorized code (some small part remain scalar). External/SPEC/CFP2017rate/538.imagick_r/538.imagick_r External/SPEC/CFP2017speed/638.imagick_s/638.imagick_s Many changes cause by the fact that the code of one function becomes smaller (onvertLCHabToRGB) and this functions gets inlined after that. MultiSource/Benchmarks/Prolangs-C/TimberWolfMC/timberwolfmc - some small changes here and there, some extra code is vectorized, some remain scalar (2 x vectors) MultiSource/Benchmarks/VersaBench/ecbdes/ecbdes - emits 2 scalars + 2 insertelems instead of insert, broadcast, alt code (3 instructions, total 5 insts) MultiSource/Benchmarks/Prolangs-C/plot2fig/plot2fig - small graph becomes profitable and gets vectorized. External/SPEC/CINT2017rate/502.gcc_r/502.gcc_r External/SPEC/CINT2017speed/602.gcc_s/602.gcc_s Some small graph becomes profitable and gets vectorized. MultiSource/Benchmarks/FreeBench/pifft/pifft - no changes in final code. Reviewers: RKSimon, dtcxzyw Reviewed By: RKSimon Pull Request: https://github.com/llvm/llvm-project/pull/84978	2024-04-10 14:33:56 -04:00
Noah Goldstein	81cdd35c0c	[ValueTracking] Add support for `xor`/`disjoint or` in `isKnownNonZero` Handles cases like `X ^ Y == X` / `X disjoint\| Y == X`. Both of these cases have identical logic to the existing `add` case, so just converting the `add` code to a more general helper. Proofs: https://alive2.llvm.org/ce/z/Htm7pe Closes #87706	2024-04-10 13:13:43 -05:00
Noah Goldstein	2646790155	[ValueTracking] Add tests for `xor`/`disjoint or` in `isKnownNonZero`; NFC	2024-04-10 13:13:43 -05:00
Noah Goldstein	0c57a2e4b4	[ValueTracking] Add support for `xor`/`disjoint or` in `getInvertibleOperands` This strengthens our `isKnownNonEqual` logic with some fairly trivial cases. Proofs: https://alive2.llvm.org/ce/z/4pxRTj Closes #87705	2024-04-10 13:13:43 -05:00
Noah Goldstein	195d278d50	[ValueTracking] Add tests for `xor`/`disjoint or` in `getInvertibleOperands`; NFC	2024-04-10 13:13:43 -05:00
Noah Goldstein	9c545a14c0	[ValueTracking] Add support for `insertelement` in `isKnownNonZero` Inserts don't modify the data, so if all elements that end up in the destination are non-zero the result is non-zero. Closes #87703	2024-04-10 13:13:43 -05:00
Noah Goldstein	8a28b9b8ec	[ValueTracking] Add tests for `insertelement` in `isKnownNonZero`; NFC	2024-04-10 13:13:43 -05:00
Noah Goldstein	87528bfefb	[ValueTracking] Add support for `shufflevector` in `isKnownNonZero` Shuffles don't modify the data, so if all elements that end up in the destination are non-zero the result is non-zero. Closes #87702	2024-04-10 13:13:42 -05:00
Noah Goldstein	c1d3f39ae9	[ValueTracking] Add tests for `shufflevector` in `isKnownNonZero`	2024-04-10 13:13:42 -05:00
Kevin P. Neal	b9a3551c90	[FPEnv][BitcodeReader] Correct strictfp test. Correct a strictfp test to follow the rules documented in the LangRef: https://llvm.org/docs/LangRef.html#constrained-floating-point-intrinsics This test needed the strictfp attribute added to a function definition. Test changes verified with D146845.	2024-04-10 14:05:24 -04:00
martinboehme	21009f466e	[clang][dataflow] Propagate locations from result objects to initializers. (#87320 ) Previously, we were propagating storage locations the other way around, i.e. from initializers to result objects, using `RecordValue::getLoc()`. This gave the wrong behavior in some cases -- see the newly added or fixed tests in this patch. In addition, this patch now unblocks removing the `RecordValue` class entirely, as we no longer need `RecordValue::getLoc()`. With this patch, the test `TransferTest.DifferentReferenceLocInJoin` started to fail because the framework now always uses the same storge location for a `MaterializeTemporaryExpr`, meaning that the code under test no longer set up the desired state where a variable of reference type is mapped to two different storage locations in environments being joined. Rather than trying to modify this test to set up the test condition again, I have chosen to replace the test with an equivalent test in DataflowEnvironmentTest.cpp that sets up the test condition directly; because this test is more direct, it will also be less brittle in the face of future changes.	2024-04-10 20:03:35 +02:00
Aaron Ballman	4d80dff819	int -> uintptr_t to silence diagnostics 'int' may not be sufficiently large to store a pointer representation anyway, so this is also a correctness fix.	2024-04-10 13:57:18 -04:00
Jun Wang	86842e1f72	[AMDGPU] New clang option for emitting a waitcnt instruction after each memory instruction (#79236 ) This patch introduces a new command-line option for clang, namely, amdgpu-precise-mem-op (or precise-memory in the backend). When this option is specified, a waitcnt instruction is generated after each memory load/store instruction. The counter values are always 0, but which counters are involved depends on the memory instruction. --------- Co-authored-by: Jun Wang <jun.wang7@amd.com>	2024-04-10 10:47:04 -07:00
Craig Topper	f27f369710	[RISCV] Remove interrupt handler special case from RISCVFrameLowering::determineCalleeSaves. (#88069 ) This code was trying to save temporary argument registers in interrupt handler functions that contain calls. With the exception that all FP registers are saved including the normally callee saved registers. If all of the callees use an FP ABI and the interrupt handler doesn't touch the normally callee saved FP registers, we don't need to save them. It doesn't appear that we need to special case functions with calls. The normal callee saved register handling will already check each of the calls and consider a register clobbered if the call doesn't explicitly say it is preserved. All of the test changes are from the removal of the FP callee saved registers. There are tests for interrupt handlers with F and D extension that use ilp32 or lp64 ABIs that are not affected by this change. They still save the FP callee saved registers as they should. gcc appears to have a bug where the D extension being enabled with the ilp32f or lp64f ABI does not save the FP callee saved regs. The callee would only save/restore the lower 32 bits and clobber the upper bits. LLVM saves the FP callee saved regs in this case and there is an unchanged test for it. The unnecessary save/restore was raised in this thread https://discourse.llvm.org/t/has-bugs-when-optimizing-save-restore-csrs-by-changing-csr-xlen-f32-interrupt/78200/1	2024-04-10 10:28:54 -07:00
higher-performance	c54afe5c33	Fix quadratic slowdown in AST matcher parent map generation (#87824 ) Avoids the need to linearly re-scan all seen parent nodes to check for duplicates, which previously caused a slowdown for ancestry checks in Clang AST matchers. Fixes: #86881	2024-04-10 13:24:19 -04:00

1 2 3 4 5 ...

495344 Commits