Commit Graph

495344 Commits

Author SHA1 Message Date
Mingming Liu
dda73336ad [ThinLTO]Record import type in GlobalValueSummary::GVFlags (#87597)
The motivating use case is to support import the function declaration
across modules to construct call graph edges for indirect calls [1]
when importing the function definition costs too much compile time
(e.g., the function is too large has no `noinline` attribute).
1. Currently, when the compiled IR module doesn't have a function
definition but its postlink combined summary contains the function
summary or a global alias summary with this function as aliasee, the
function definition will be imported from source module by IRMover. The
implementation is in FunctionImporter::importFunctions [2]
2. In order for FunctionImporter to import a declaration of a function,
both function summary and alias summary need to carry the def / decl
state. Specifically, all existing summary fields doesn't differ across
import modules, but the def / decl state of is decided by
`<ImportModule, Function>`.

This change encodes the def/decl state in `GlobalValueSummary::GVFlags`.

In the subsequent changes
1. The indexing step `computeImportForModule` [3]
will compute the set of definitions and the set of declarations for each
module, and passing on the information to bitcode writer.
2. Bitcode writer will look up the def/decl state and sets the state
when it writes out the flag value. This is demonstrated in
https://github.com/llvm/llvm-project/pull/87600
3. Function importer will read the def/decl state when reading the
combined summary to figure out two sets of global values, and IRMover
will be updated to import the declaration (aka linkGlobalValuePrototype [4])
into the destination module.

- The next change is https://github.com/llvm/llvm-project/pull/87600

[1] mentioned in rfc https://discourse.llvm.org/t/rfc-for-better-call-graph-sort-build-a-more-complete-call-graph-by-adding-more-indirect-call-edges/74029#support-cross-module-function-declaration-import-5
[2] 3b337242ee/llvm/lib/Transforms/IPO/FunctionImport.cpp (L1608-L1764)
[3] 3b337242ee/llvm/lib/Transforms/IPO/FunctionImport.cpp (L856)
[4] 3b337242ee/llvm/lib/Linker/IRMover.cpp (L605)
2024-04-10 19:46:01 -07:00
Craig Topper
999b9e6ddb [RISCV] Use vector getConstant instead of getSplatVector+getConstant. NFC 2024-04-10 19:39:41 -07:00
Jianjian Guan
fd50151180 [RISCV] Only support SPLAT_VECTOR for Zvfhmin when also enable the scalar extension of half fp (#88275) 2024-04-11 10:23:26 +08:00
Freddy Ye
f4509cf284 [X86][MC] Support enc/dec for SETZUCC and promoted SETCC. (#86473)
apx-spec: https://cdrdv2.intel.com/v1/dl/getContent/784266
apx-syntax-recommendation:
https://cdrdv2.intel.com/v1/dl/getContent/817241
2024-04-11 10:18:29 +08:00
Jordan Rupprecht
6b46166ef2 [llvm][NFC] Suppress -Wunused-result call to write
Commit 87e6f87fe7 adds a call to `::write()`, which may be annotated w/ `warn_unused_result`, leading to `-Wunused-result` failures.
2024-04-11 02:14:07 +00:00
Owen Pan
51f1681424 [clang-format] Don't merge a short block for SBS_Never (#88238)
Also fix unit tests.

Fixes #87484.
2024-04-10 19:06:29 -07:00
Bill Wendling
fca51911d4 [NFC][Clang] Improve const correctness for IdentifierInfo (#79365)
The IdentifierInfo isn't typically modified. Use 'const' wherever
possible.
2024-04-11 00:33:40 +00:00
Arthur Eubanks
be10070f91 Revert "[Driver] Ensure ToolChain::LibraryPaths is not empty for non-Darwin"
This reverts commit ccdebbae4d.

Causes test failures in the presence of Android runtime libraries in resource-dir.
See comments on https://github.com/llvm/llvm-project/pull/87866.
2024-04-10 23:41:51 +00:00
Arthur Eubanks
9786a3b4cf [gn build] Port 0a1317564a 2024-04-10 23:41:11 +00:00
Arthur Eubanks
233edab876 [gn build] Port 5d7d6ad663 2024-04-10 23:41:11 +00:00
Arthur Eubanks
4027066683 [gn build] Port 59e66c515a 2024-04-10 23:41:10 +00:00
Arthur Eubanks
19e516fbed [gn build] Port 1fda1776e3 2024-04-10 23:41:10 +00:00
Jie Fu
6ef4450705 [clang] Fix -Wunused-function in CGStmtOpenMP.cpp (NFC)
llvm-project/clang/lib/CodeGen/CGStmtOpenMP.cpp:7959:13:
error: unused function 'emitTargetTeamsLoopCodegenStatus' [-Werror,-Wunused-function]
static void emitTargetTeamsLoopCodegenStatus(CodeGenFunction &CGF,
            ^
1 error generated.
2024-04-11 07:37:12 +08:00
Vitaly Buka
d927d1867f [UBSAN] Emit optimization remarks (#88304) 2024-04-10 16:30:42 -07:00
Matthias Braun
acb7ddc5cf [WebAssembly] Remove threadlocal.address when disabling TLS (#88209)
Remove `llvm.threadlocal.address` intrinsic usage when disabling TLS.
This fixes errors revealed by the stricter IR verification introduced in
PR #87841.
2024-04-10 16:24:02 -07:00
Daniel Chen
8136ac1c42 [Flang] Define c_int_fast16_t and c_int_fast32_t for PowerPC. (#88292)
On Linux, PowerPC defines `int_fast16_t` and `int_fast32_t` as `long`.
Need to update the corresponding type, `c_int_fast16_t` and
`c_int_fast32_t` in `iso_c_binding` module so they are interoparable.
2024-04-10 19:22:38 -04:00
Oskar Wirga
a9d4ddd98a [MergeFuncs/CFI] Ensure all type metadata is propogated for CFI (#88218)
I noticed that we weren't propagating ALL type metadata that was
attached to CFI functions:

# BEFORE

```
; Function Attrs: minsize nounwind optsize ssp uwtable(sync)
define internal void @foo(ptr nocapture noundef readonly %0) #0 !dbg !62311 !type !34028 !type !34029 !type !34030
... fn merging
; Function Attrs: minsize nounwind optsize ssp uwtable(sync)
define internal void @foo(ptr nocapture noundef readonly %0) #0 !type !34028
```

# AFTER

```
; Function Attrs: minsize nounwind optsize ssp uwtable(sync)
define internal void @foo(ptr nocapture noundef readonly %0) #0 !dbg !62311 !type !34028 !type !34029 !type !34030
... fn merging
; Function Attrs: minsize nounwind optsize ssp uwtable(sync)
define internal void @foo(ptr nocapture noundef readonly %0) #0 !type !type !34028 !type !34029 !type !34030
```

This patch makes sure that the entire vector of metadata is copied over.
2024-04-10 15:37:27 -07:00
Craig Topper
d8f1e5d289 [APInt] Remove accumulator initialization from tcMultiply and tcFullMultiply. NFCI (#88202)
The tcMultiplyPart routine has a flag that says whether to add to the
accumulator or overwrite it. By using the overwrite mode on the first
iteration we don't need to initialize the accumulator to zero.

Note, the initialization in tcFullMultiply was only initializing the
first rhsParts of dst. tcMultiplyPart always overwrites the rhsParts+1
part that just contains the last carry. The first write to each part of
dst past rhsParts is a carry write so that's how the upper part of dst
is initialized.
2024-04-10 15:07:16 -07:00
Chelsea Cassanova
9f6d08f256 Revert "[lldb][sbdebugger] Move SBDebugger Broadcast bit enum into lldb-enumerations.h" (#88324)
Reverts llvm/llvm-project#87409 due a missed update to the broadcast bit
causing a build failure on the x86_64 Debian buildbot.
2024-04-10 14:54:30 -07:00
Stanislav Mekhanoshin
2fdfea088c [AMDGPU] Add v2i32 to the VS_64 types. NFCI. (#88318)
I am trying to use VOP3Inst with intrinsic taking v2i32 operand and it
fails to create patterm without it.
2024-04-10 14:50:54 -07:00
Chelsea Cassanova
af7c196fb8 [lldb][sbdebugger] Move SBDebugger Broadcast bit enum into lldb-enumerations.h (#87409)
When the `eBroadcastBitProgressCategory` bit was originally added to
Debugger.h and SBDebugger.h, each corresponding bit was added in order
of the other bits that were previously there. Since `Debugger.h` has an
enum bit that `SBDebugger.h` does not, this meant that their offsets did
not match.

Instead of trying to keep the bit offsets in sync between the two, it's
preferable to just move SBDebugger's enum into the main enumerations
header and use the bits from there. This also requires that API tests using the bits from SBDebugger update their usage.
2024-04-10 14:45:49 -07:00
Jeff Niu
fb771fe315 [mlir] Slightly optimize bytecode op numbering (#88310)
If the bytecode encoding supports properties, then the dictionary
attribute is always the raw dictionary attribute of the operation,
regardless of what it contains. Otherwise, get the dictionary attribute
from the op: if the op does not have properties, then it returns the raw
dictionary, otherwise it returns the combined inherent and discardable
attributes.
2024-04-10 23:34:48 +02:00
Nick Desaulniers
8cfa72ade9 [libc] fix typo in hdr/CMakeLists
Fixes #87896
2024-04-10 13:51:23 -07:00
Fangrui Song
c258f57398 [ELF] Move createSyntheticSections from Writer.cpp to SyntheticSections.cpp. NFC
SyntheticSections.cpp is more appropriate. This change enables
elimination of many explicit template instantiations.

Due to `make<SymbolTableSection<ELFT>>(*strtab)` in Arch/ARM.cpp,
we do not remove explicit template instantiations for SymbolTableSection.
2024-04-10 13:42:51 -07:00
Farzon Lotfi
05093e2438 [Spirv][HLSL] Add OpAll lowering and float vec support (#87952)
The main point of this change was to add support for HLSL's all
intrinsic.
In the process of doing that I found a few issues around creating an
`OpConstantComposite` via `buildZerosVal`.

First the current code didn't support floats so the process of adding
`buildZerosValF` meant I needed a
float version of `getOrCreateIntConstVector`. After doing so I renamed
both versions to `getOrCreateConstVector`. That meant I needed to create
a float type version of `getOrCreateIntCompositeOrNull`. Luckily the
type information was low for this function so was able to split it out
into a helpwe and rename `getOrCreateIntCompositeOrNull` to
`getOrCreateCompositeOrNull` With the exception of type handling
differences of the code and Null vs 0 Constant Op codes these functions
should be identical.

To handle scalar floats I could not use `buildConstantFP` like this PR
did:
https://github.com/llvm/llvm-project/commit/0a2aaab5aba46#diff-733a189c5a8c3211f3a04fd6e719952a3fa231eadd8a7f11e6ecf1e584d57411R1603
because that would create too many superfluous registers (that causes
problems in the validator), I had to create a float version of
`getOrCreateConstInt` which I called `getOrCreateConstFP`.
similar problems with doing it like this:
https://github.com/llvm/llvm-project/blob/main/llvm/lib/Target/SPIRV/SPIRVBuiltins.cpp#L1540.

`buildZerosValF` also has a use of a function `getZeroFP`. This is
because half, float, and double scalar values of 0 would collide in
`SPIRVDuplicatesTracker<Constant> CT` if you use `APFloat(0.0f)`.

`getORCreateConstFP` needed its own version of `getOrCreateConstIntReg`
which I called `getOrCreateConstFloatReg` The one difference in this
function is `getOrCreateConstFloatReg` returns a bit width so we don't
have to call `getScalarOrVectorBitWidth` twice ie when it is used again
in `getOrCreateConstFP` for `OpConstantF` `addNumImm`.

`getOrCreateConstFloatReg` needed an `assignFloatTypeToVReg` helper
which called a `getOrCreateSPIRVFloatType` helper. There was no
equivalent IntegerType::get for floats so I handled this with a switch
statement on bit widths to get the right LLVM float type.

Finally, there is the use of `bool ZeroAsNull = STI.isOpenCLEnv();` This
is partly a cosmetic change. When Zeros are treated as nulls, we don't
create `OpConstantComposite` vectors which is something we do in the
DXCs SPIRV backend. The DXC SPIRV backend also does not use
`OpConstantNull`. Finally, I needed a means to test the behavior of the
OpConstantNull and `OpConstantComposite` changes and this was one way I
could do that via the same tests.
2024-04-10 16:27:44 -04:00
Christopher Di Bella
d347235bdd [Flang] responds to Clang Tidy feedback (#87847)
Line 267: performance-unnecessary-copy-initialization
Line 592: readability-container-size-empty
2024-04-10 13:15:22 -07:00
David Pagan
a12836647e [OpenMP][CodeGen] Improved codegen for combined loop directives (#87278)
IR for 'target teams loop' is now dependent on suitability of associated
loop-nest.

If a loop-nest:

- does not contain a function call, or
- the -fopenmp-assume-no-nested-parallelism has been specified,
- or the call is to an OpenMP API AND
- does not contain nested loop bind(parallel) directives

then it can be emitted as 'target teams distribute parallel for', which
is the current default. Otherwise, it is emitted as 'target teams
distribute'.

Added debug output indicating how 'target teams loop' was emitted. Flag
is -mllvm -debug-only=target-teams-loop-codegen

Added LIT tests explicitly verifying 'target teams loop' emitted as a
parallel loop and a distribute loop.

Updated other 'loop' related tests as needed to reflect change in IR.
- These updates account for most of the changed files and
additions/deletions.
2024-04-10 13:09:17 -07:00
Xing Xue
b3792ae42a [OpenMP][AIX] Fix test config for AIX (#88272)
This patch fixes the test config so that it works for
`tasking/omp50_taskdep_depobj.c` which uses different flags to test with
compiler's `omp.h`.
* set test environment variable `OBJECT_MODE` to `64` if it is set
explicitly to `64` in the AIX environment. `OBJECT_MODE` is default to
`32` and is recognized by AIX compilers and toolchain. In this way, we
don't need to set `-m64` for all compiler flags for 64-bit mode
* add option `-Wl,-bmaxdata` to 32-bit `test_openmp_flags` used by
`tasking/omp50_taskdep_depobj.c`
2024-04-10 16:06:31 -04:00
erichkeane
a6d1366b73 [NFC] Remove a pair of incorrect comments from ParseOpenACC
We attempt to continue parsing, but the comment says the opposite.  Just
remove the inaccurate comments in this patch.
2024-04-10 12:42:05 -07:00
martinboehme
7549b45825 Revert "[clang][dataflow] Propagate locations from result objects to initializers." (#88315)
Reverts llvm/llvm-project#87320

This is causing buildbots to fail because
`isOriginalRecordConstructor()` is now unused.
2024-04-10 21:27:10 +02:00
shamithoke
e3ef4612c1 Perform bitreverse using AVX512 GFNI for i32 and i64. (#81764)
Currently, the lowering operation for bitreverse using Intel AVX512 GFNI only supports byte vectors

Extend the operation to i32 and i64.

---------

Co-authored-by: shami <shami_thoke@yahoo.com>
2024-04-10 20:22:44 +01:00
Fangrui Song
ca6b8469c1 [ELF] Avoid unneeded config->isLE and config->wordsize. NFC 2024-04-10 12:20:28 -07:00
Joseph Huber
fad14707b7 [libc] Add note to use LIBC_GPU_BUILD=ON as another form
Summary:
This is a shorthand to enable GPU support so it should be listed in the
docs.
2024-04-10 14:07:57 -05:00
Joseph Huber
f81879c0f7 [Libomptarget] Add RPC-based printf implementation for OpenMP #85638
Summary:
Relanding after reverting, only applies to AMDGPU for now.

This patch adds an implementation of printf that's provided by the GPU
C library runtime. This pritnf currently implemented using the same
wrapper handling that OpenMP sets up. This will be removed once we have
proper varargs support.

This printf differs from the one CUDA offers in that it is synchronous
and uses a finite size. Additionally we support pretty much every
format specifier except the %n option.

Depends on #85331
2024-04-10 13:36:25 -05:00
Mark de Wever
0a1317564a [libc++] Adds a global private constructor tag. (#87920)
This removes the similar tags used in the chrono tzdb implementation.

Fixes: https://github.com/llvm/llvm-project/issues/85432
2024-04-10 20:34:58 +02:00
Alexey Bataev
2b00a73f62 [SLP]Buildvector for alternate instructions with non-profitable gather operands.
If the operands of the potentially alternate node are going to produce
buildvector sequences, which result in more instructions, than the
original code, then suhinstructions should be vectorized as alternate
node, better to end up with the buildvector node.

Left column - experimental, Right - reference.

Metric: size..text

Program                                                                                                                                                size..text
                                                                                                                                                       results     results0    diff
                                                                                      test-suite :: SingleSource/Benchmarks/Adobe-C++/loop_unroll.test   413680.00   416272.00  0.6%
                                                                              test-suite :: External/SPEC/CFP2017rate/526.blender_r/526.blender_r.test 12351788.00 12354844.00  0.0%
                                                                                  test-suite :: External/SPEC/CINT2017speed/625.x264_s/625.x264_s.test   664901.00   664949.00  0.0%
                                                                                   test-suite :: External/SPEC/CINT2017rate/525.x264_r/525.x264_r.test   664901.00   664949.00  0.0%
                                                                                test-suite :: External/SPEC/CFP2017rate/511.povray_r/511.povray_r.test  1171371.00  1171355.00 -0.0%
                                                                                         test-suite :: MultiSource/Benchmarks/7zip/7zip-benchmark.test  1036396.00  1036284.00 -0.0%
                                                                         test-suite :: MultiSource/Benchmarks/MiBench/consumer-jpeg/consumer-jpeg.test   111280.00   111248.00 -0.0%
                                                                              test-suite :: External/SPEC/CFP2017rate/538.imagick_r/538.imagick_r.test  1392113.00  1391361.00 -0.1%
                                                                             test-suite :: External/SPEC/CFP2017speed/638.imagick_s/638.imagick_s.test  1392113.00  1391361.00 -0.1%
                                                                        test-suite :: MultiSource/Benchmarks/Prolangs-C/TimberWolfMC/timberwolfmc.test   281676.00   281452.00 -0.1%
                                                                                    test-suite :: MultiSource/Benchmarks/VersaBench/ecbdes/ecbdes.test     3025.00     3019.00 -0.2%
                                                                                test-suite :: MultiSource/Benchmarks/Prolangs-C/plot2fig/plot2fig.test     6351.00     6335.00 -0.3%

Metric: SLP.NumVectorInstructions

Program                                                                                                                                                SLP.NumVectorInstructions
                                                                                                                                                       results                   results0 diff
                                                                                    test-suite :: MultiSource/Benchmarks/VersaBench/ecbdes/ecbdes.test    15.00                     16.00   6.7%
                                                                                   test-suite :: External/SPEC/CINT2017rate/525.x264_r/525.x264_r.test  1703.00                   1707.00   0.2%
                                                                                  test-suite :: External/SPEC/CINT2017speed/625.x264_s/625.x264_s.test  1703.00                   1707.00   0.2%
                                                                              test-suite :: External/SPEC/CFP2017rate/526.blender_r/526.blender_r.test 26241.00                  26239.00  -0.0%
                                                                                test-suite :: External/SPEC/CFP2017rate/510.parest_r/510.parest_r.test 11761.00                  11754.00  -0.1%
                                                                        test-suite :: MultiSource/Benchmarks/Prolangs-C/TimberWolfMC/timberwolfmc.test   824.00                    822.00  -0.2%
                                                                             test-suite :: External/SPEC/CFP2017speed/638.imagick_s/638.imagick_s.test  5668.00                   5654.00  -0.2%
                                                                              test-suite :: External/SPEC/CFP2017rate/538.imagick_r/538.imagick_r.test  5668.00                   5654.00  -0.2%
                                                                                     test-suite :: External/SPEC/CINT2017rate/502.gcc_r/502.gcc_r.test   792.00                    790.00  -0.3%
                                                                                    test-suite :: External/SPEC/CINT2017speed/602.gcc_s/602.gcc_s.test   792.00                    790.00  -0.3%
                                                                                       test-suite :: MultiSource/Benchmarks/FreeBench/pifft/pifft.test  1389.00                   1384.00  -0.4%
                                                                                         test-suite :: MultiSource/Benchmarks/7zip/7zip-benchmark.test   596.00                    590.00  -1.0%
                                                                                test-suite :: MultiSource/Benchmarks/Prolangs-C/plot2fig/plot2fig.test     6.00                      5.00 -16.7%

Metric: exec_time

Program                                                                                                                                                exec_time
                                                                                                                                                       results   results0  diff
                                                                               test-suite :: External/SPEC/CFP2017rate/526.blender_r/526.blender_r.test     99.14    100.00    0.9%

Other changes are not significant (less than 0.1% percent with exectime
less 5 secs).

SingleSource/Benchmarks/Adobe-C++/loop_unroll - same small patterns
remain scalar, smaller code.
External/SPEC/CFP2017rate/526.blender_r/526.blender_r - many small
changes, some extra stores gets vectorized.
External/SPEC/CINT2017speed/625.x264_s/625.x264_s
External/SPEC/CINT2017rate/525.x264_r/525.x264_r
x264 has one change in a loop body, in function ssim_end4, some code
remain scalar, resulting in less code size.
External/SPEC/CFP2017rate/511.povray_r/511.povray_r - some extra code
gets vectorized, looks like some other patterns were matched.
MultiSource/Benchmarks/7zip/7zip-benchmark - extra stores were
vectorized (looks like the graphs become profitable)
MultiSource/Benchmarks/MiBench/consumer-jpeg/consumer-jpeg - small
changes in vectorized code (some small part remain scalar).
External/SPEC/CFP2017rate/538.imagick_r/538.imagick_r
External/SPEC/CFP2017speed/638.imagick_s/638.imagick_s
Many changes cause by the fact that the code of one function becomes
smaller (onvertLCHabToRGB) and this functions gets inlined after that.
MultiSource/Benchmarks/Prolangs-C/TimberWolfMC/timberwolfmc - some small
changes here and there, some extra code is vectorized, some remain
scalar (2 x vectors)
MultiSource/Benchmarks/VersaBench/ecbdes/ecbdes - emits 2 scalars
+ 2 insertelems instead of insert, broadcast, alt code (3 instructions,
  total 5 insts)
MultiSource/Benchmarks/Prolangs-C/plot2fig/plot2fig - small graph
becomes profitable and gets vectorized.
External/SPEC/CINT2017rate/502.gcc_r/502.gcc_r
External/SPEC/CINT2017speed/602.gcc_s/602.gcc_s
Some small graph becomes profitable and gets vectorized.
MultiSource/Benchmarks/FreeBench/pifft/pifft - no changes in final code.

Reviewers: RKSimon, dtcxzyw

Reviewed By: RKSimon

Pull Request: https://github.com/llvm/llvm-project/pull/84978
2024-04-10 14:33:56 -04:00
Noah Goldstein
81cdd35c0c [ValueTracking] Add support for xor/disjoint or in isKnownNonZero
Handles cases like `X ^ Y == X` / `X disjoint| Y == X`.

Both of these cases have identical logic to the existing `add` case,
so just converting the `add` code to a more general helper.

Proofs: https://alive2.llvm.org/ce/z/Htm7pe

Closes #87706
2024-04-10 13:13:43 -05:00
Noah Goldstein
2646790155 [ValueTracking] Add tests for xor/disjoint or in isKnownNonZero; NFC 2024-04-10 13:13:43 -05:00
Noah Goldstein
0c57a2e4b4 [ValueTracking] Add support for xor/disjoint or in getInvertibleOperands
This strengthens our `isKnownNonEqual` logic with some fairly
trivial cases.

Proofs: https://alive2.llvm.org/ce/z/4pxRTj

Closes #87705
2024-04-10 13:13:43 -05:00
Noah Goldstein
195d278d50 [ValueTracking] Add tests for xor/disjoint or in getInvertibleOperands; NFC 2024-04-10 13:13:43 -05:00
Noah Goldstein
9c545a14c0 [ValueTracking] Add support for insertelement in isKnownNonZero
Inserts don't modify the data, so if all elements that end up in the
destination are non-zero the result is non-zero.

Closes #87703
2024-04-10 13:13:43 -05:00
Noah Goldstein
8a28b9b8ec [ValueTracking] Add tests for insertelement in isKnownNonZero; NFC 2024-04-10 13:13:43 -05:00
Noah Goldstein
87528bfefb [ValueTracking] Add support for shufflevector in isKnownNonZero
Shuffles don't modify the data, so if all elements that end up in the
destination are non-zero the result is non-zero.

Closes #87702
2024-04-10 13:13:42 -05:00
Noah Goldstein
c1d3f39ae9 [ValueTracking] Add tests for shufflevector in isKnownNonZero 2024-04-10 13:13:42 -05:00
Kevin P. Neal
b9a3551c90 [FPEnv][BitcodeReader] Correct strictfp test.
Correct a strictfp test to follow the rules documented in the LangRef:
https://llvm.org/docs/LangRef.html#constrained-floating-point-intrinsics

This test needed the strictfp attribute added to a function definition.

Test changes verified with D146845.
2024-04-10 14:05:24 -04:00
martinboehme
21009f466e [clang][dataflow] Propagate locations from result objects to initializers. (#87320)
Previously, we were propagating storage locations the other way around,
i.e.
from initializers to result objects, using `RecordValue::getLoc()`. This
gave
the wrong behavior in some cases -- see the newly added or fixed tests
in this
patch.

In addition, this patch now unblocks removing the `RecordValue` class
entirely,
as we no longer need `RecordValue::getLoc()`.

With this patch, the test `TransferTest.DifferentReferenceLocInJoin`
started to
fail because the framework now always uses the same storge location for
a
`MaterializeTemporaryExpr`, meaning that the code under test no longer
set up
the desired state where a variable of reference type is mapped to two
different
storage locations in environments being joined. Rather than trying to
modify
this test to set up the test condition again, I have chosen to replace
the test
with an equivalent test in DataflowEnvironmentTest.cpp that sets up the
test
condition directly; because this test is more direct, it will also be
less
brittle in the face of future changes.
2024-04-10 20:03:35 +02:00
Aaron Ballman
4d80dff819 int -> uintptr_t to silence diagnostics
'int' may not be sufficiently large to store a pointer representation
anyway, so this is also a correctness fix.
2024-04-10 13:57:18 -04:00
Jun Wang
86842e1f72 [AMDGPU] New clang option for emitting a waitcnt instruction after each memory instruction (#79236)
This patch introduces a new command-line option for clang, namely,
amdgpu-precise-mem-op (or precise-memory in the backend). When this option is specified, a waitcnt
instruction is generated after each memory load/store instruction. The
counter values are always 0, but which counters are involved depends on
the memory instruction.

---------

Co-authored-by: Jun Wang <jun.wang7@amd.com>
2024-04-10 10:47:04 -07:00
Craig Topper
f27f369710 [RISCV] Remove interrupt handler special case from RISCVFrameLowering::determineCalleeSaves. (#88069)
This code was trying to save temporary argument registers in interrupt
handler functions that contain calls. With the exception that all FP
registers are saved including the normally callee saved registers.

If all of the callees use an FP ABI and the interrupt handler doesn't
touch the normally callee saved FP registers, we don't need to save
them.

It doesn't appear that we need to special case functions with calls. The
normal callee saved register handling will already check each of the calls
and consider a register clobbered if the call doesn't explicitly say it is preserved.

All of the test changes are from the removal of the FP callee saved
registers. There are tests for interrupt handlers with F and D extension
that use ilp32 or lp64 ABIs that are not affected by this change. They
still save the FP callee saved registers as they should.

gcc appears to have a bug where the D extension being enabled with the
ilp32f or lp64f ABI does not save the FP callee saved regs. The callee
would only save/restore the lower 32 bits and clobber the upper bits.
LLVM saves the FP callee saved regs in this case and there is an
unchanged test for it.

The unnecessary save/restore was raised in this thread
https://discourse.llvm.org/t/has-bugs-when-optimizing-save-restore-csrs-by-changing-csr-xlen-f32-interrupt/78200/1
2024-04-10 10:28:54 -07:00
higher-performance
c54afe5c33 Fix quadratic slowdown in AST matcher parent map generation (#87824)
Avoids the need to linearly re-scan all seen parent nodes to check for
duplicates, which previously caused a slowdown for ancestry checks in
Clang AST matchers.

Fixes: #86881
2024-04-10 13:24:19 -04:00