Commit Graph

534586 Commits

Author SHA1 Message Date
Slava Zakharin
273aecdb20 [flang-rt] Use runtime::memchr instead of std::memchr. (#135298) 2025-04-18 08:45:52 -07:00
Devon Loehr
915de1a588 Generate empty .clang-format-ignore before running tests (#136154)
Followup to #136022, this ensures formatting tests are run with an empty
`.clang-format-ignore` in their root directory, to prevent failures if
the file also exists higher in the tree.
2025-04-18 08:43:00 -07:00
Arthur Eubanks
be9f72cf37 Revert "[ConstraintElim] Simplify cmp after uadd.sat/usub.sat (#135603)"
This reverts commit fe54d1afcc.

Causes miscompiles, see #135603.
2025-04-18 15:37:37 +00:00
Nick Sarnie
257b727584 [clang][Sema][SYCL] Fix MSVC STL usage on AMDGPU (#135979)
The MSVC STL includes specializations of `_Is_memfunptr` for every
function pointer type, including every calling convention.

The problem is the AMDGPU target doesn't support the x86 `vectorcall`
calling convention so clang sets it to the default CC. This ends up
clashing with the already-existing overload for the default CC, so we
get a duplicate definition error when including `type_traits` (which we
heavily use in the SYCL STL) and compiling for AMDGPU on Windows.

This doesn't happen for pure AMDGPU non-SYCL because it doesn't include
the C++ STL, and it doesn't happen for CUDA/HIP because a similar
workaround was done
[here](fa49c3a888).

I am not an expert in Sema, so I did a kinda of hardcoded fix, please
let me know if there is a better way to fix this.

As far as I can tell we can't do exactly the same fix that was done for
CUDA because we can't differentiate between device and host code so
easily.

---------

Signed-off-by: Sarnie, Nick <nick.sarnie@intel.com>
2025-04-18 15:28:46 +00:00
Jakub Kuderski
c016a65c18 [mlir][vector] Switch to llvm::interleaved in debug prints. NFC. (#136248)
Clean up printing code by switching to `llvm::interleaved` from
https://github.com/llvm/llvm-project/pull/135517.
2025-04-18 11:22:52 -04:00
Jakub Kuderski
4be84a142e [mlir][gpu] Clean up prints in GPU dialect. NFC. (#136250)
Clean up printing code by switching to `llvm::interleaved` from
https://github.com/llvm/llvm-project/pull/135517. Also make some minor
readability & performance fixes.
2025-04-18 11:10:17 -04:00
Jakub Kuderski
d0dd6974b8 [mlir][spirv] Switch to llvm::interleaved. NFC. (#136240)
Clean up printing code by switching to `llvm::interleaved` from
https://github.com/llvm/llvm-project/pull/135517.
2025-04-18 11:08:41 -04:00
Philip Reames
f2ecd86e34 [Analysis] Remove implicit LocationSize conversion from uint64_t (#133342)
This change removes the uint64_t constructor on LocationSize
preventing implicit conversion, and fixes up the using APIs to adapt to
the change. Note that I'm adding a couple of explicit conversion points
on routines where passing in a fixed offset as an integer seems likely
to have well understood semantics.

We had an unfortunate case which arose if you tried to pass a TypeSize
value to a parameter of LocationSize type. We'd find the implicit
conversion path through TypeSize -> uint64_t -> LocationSize which works
just fine for fixed values, but looses information and fails assertions
if the TypeSize was scalable. This change breaks the first link in that
implicit conversion chain since that seemed to be the easier one.
2025-04-18 07:46:31 -07:00
Ivan Butygin
dda4b968e7 [mlir] AMDGPUToROCDL: lower amdgpu.swizzle_bitmode (#136223)
Repack `amdgpu.swizzle_bitmode` arguments and lower it to
`rocdl.ds_swizzle`.

Repacking logic is follows:
* `sizeof(arg) < sizeof(i32)`: bitcast to integer and zext to i32 and
then trunc and bitcast back.
* `sizeof(arg) == sizeof(i32)`: just bitcast to i32 and back if not i32
* `sizeof(arg) > sizeof(i32)`: bitcast to `vector<Nxi32>`, extract
individual elements and do a series of `rocdl.ds_swizzle` and then
compose vector and bitcast back.

Added repacking logic to LLVM utils so it can be used elsewhere. I'm
planning to use it for `gpu.shuffle` later.
2025-04-18 17:19:04 +03:00
Yingwei Zheng
b1b065f2bf [ValueTracking] Refactor isKnownNonEqualFromContext (#127388)
This patch avoids adding RHS for comparisons with two variable operands
(https://github.com/llvm/llvm-project/pull/118493#discussion_r1949397482).
Instead, we iterate over related dominating conditions of both V1 and V2
in `isKnownNonEqualFromContext`, as suggested by goldsteinn
(https://github.com/llvm/llvm-project/pull/117442#discussion_r1944058002).

Compile-time improvement:
https://llvm-compile-time-tracker.com/compare.php?from=c6d95c441a29a45782ff72d6cb82839b86fd0e4a&to=88464baedd7b1731281eaa0ce4438122b4d218a7&stat=instructions:u
2025-04-18 22:14:06 +08:00
Oleksandr "Alex" Zinenko
20a104a7d6 [mlir] allow function type cloning to fail (#136300)
`FunctionOpInterface` assumed the fact that the function type (attribute
of the operation) can be cloned with arbirary lists of function
arguments and results to support argument and result list mutation. This
is not always correct, in particular, LLVM dialect functions require
exactly one result making it impossible to erase the result.

Allow function type cloning to fail and propagate this failure through
various APIs that use it. The common assumption is that existing IR has
not been modified.

Fixes #131142.
2025-04-18 16:05:54 +02:00
amordo
35e6ca47c1 [docs] Add TOC for InstCombine contributor guide (#136293) 2025-04-18 15:35:02 +02:00
Matt Arsenault
730773602f llvm-reduce: Avoid using constantdata uselistorder in thinlto test (#136288)
This also demonstrates a bug that's a consequence of the two different
paths for the single and multithreaded cases. The parallel path goes
through bitcode serialization and does preserve the uselistorder. It
therefore survives and we can observe a reduced uselistorder with deleted
instructions. In the CloneModule case, nothing is reduced.
2025-04-18 15:34:11 +02:00
Oleksandr "Alex" Zinenko
63b8f1c948 [mlir] add a fluent API to GreedyRewriterConfig (#132253)
This is similar to other configuration objects used across MLIR.
2025-04-18 15:19:57 +02:00
Louis Dionne
860e88411d [libc++] Make __config_site modular (#134699)
This patch makes the __config_site header modular, which solves various
problems with non-modular headers. This requires going back to
generating the modulemap file, since we only know how to make
__config_site modular when we're not using the per-target runtime dir.

The patch also adds a test that we support
-Wnon-modular-include-in-module, which warns about non-modular includes
from modules.

---------

Co-authored-by: Konstantin Varlamov <varconst@apple.com>
2025-04-18 06:06:25 -07:00
Rahul Joshi
622765f976 [Clang][GPU] Make NVPTX check more permissive in unit test (#136301)
- Seems based on whether NVPTX backend is enabled or not, this call can
have the range() attribute or not. So make this check more permissive.
2025-04-18 05:52:55 -07:00
Matthias Springer
fc1e311966 [mlir][memref] Fix rollback in test case during convert-to-llvm (#135958)
This commit is in preparation of the One-Shot Dialect Conversion
refactoring, which removes the rollback from the dialect conversion
framework.

`GenericAtomicRMWOpLowering` (`generic_atomic_rmw`) triggered a rollback
in two test cases. The lowering pattern adds additional basic blocks to
the enclosing operation, which used to be a `func.func` (now
`llvm.func`). Adding a basic block triggers legalization of the op that
owns the basic block. This fails when running
`--convert-to-llvm="filter-dialects=memref"` because no lowering
patterns for the `func` dialect were populated and only `llvm` ops are
considered "legal" by the `convert-to-llvm` pass, causing a rollback of
the entire `GenericAtomicRMWOpLowering` pattern.

Also add extra `CHECK-INTERFACE` to make sure that all test cases are
correctly lowered with `--convert-to-llvm="filter-dialects=memref"`.
2025-04-18 14:52:51 +02:00
Joseph Huber
db0f754c5a [OpenMP] Remove 'libomptarget.devicertl.a' fatbinary and use static library (#126143)
Summary:
Currently, we build a single `libomptarget.devicertl.a` which is a
fatbinary. It is a host object file that contains the embedded archive
files for both the NVIDIA and AMDGPU targets. This was done primarily as
a convenience due to naming conflicts. Now that the clang driver for the
GPU targets can appropriate link via the per-target runtime-dir, we can
just make two separate static libraries and remove the indirection.

This patch creates two new static libraries that get installed into
```
/lib/amdgcn-amd-amdhsa/libomp.a
/lib/nvptx64-nvidia-cuda/libomp.a
```
for AMDGPU and NVPTX respectively. The link job created by the linker
wrapper now simply needs to do `-lomp` and it will search those
directories and link those static libraries. This requires far less
special handling.

This patch is a precursor to changing the build system entirely to be a
runtimes based one. Soon this target will be a standard `add_library`
and done through the GPU runtime targets.

NOTE that this actually does remove an additional optimization step.
Previously we merged all of the files into a single bitcode object and
forcibly internalized some definitions. This, instead, just treats them
like a normal static library. This may possibly affect performance for
some files, but I think it's better overall to use static library
semantics because it allows us to have an 'include-what-you-use'
relationship with the library.

Performance testing will be required. If we really need the merged blob
then we can simply pack that into a new static library.
2025-04-18 07:43:31 -05:00
Rahul Joshi
3ed83630b2 [NFC][LLVM][TableGen] Use decodeULEB128 for OPC_SoftFail emission (#136220)
- Use `decodeULEB128` to decode +ve/-ve mask in OPC_SoftFail case.
- Use current `I`/`E` iterators as inputs to `decodeULEB128`.
2025-04-18 05:12:35 -07:00
Rahul Joshi
e1b14d4e1c [Clang][GPU] Fix unit test for NVPTX tid.x intrinsic (#136297)
- llvm.nvvm.read.ptx.sreg.tid.x does not have the result range attribute
yet.
2025-04-18 05:01:01 -07:00
Vladislav Dzhidzhoev
6462fad3d0 [DebugInfo] getMergedLocation: match scopes based on their location (#132286)
getMergedLocation uses a common parent scope of the two input locations
for an output location.
It doesn't consider the case when the common parent scope is from a file
other than L1's and L2's files. In that case, it produces a merged location
with an erroneous scope (https://github.com/llvm/llvm-project/issues/122846).

In some cases, such as https://github.com/llvm/llvm-project/pull/125780#issuecomment-2651657856,
L1, L2 having a common parent scope from another file indicate that 
the code at L1 and L2 is included from the same source location.

With this commit, getMergedLocation detects that L1, L2, or their common parent
scope files are different. If so, it assumes that L1 and L2 were included
from some source location, and tries to attach the output location to a scope
with the nearest common source location with regard to L1 and L2.
If the nearest common location is also from another file, getMergedLocation returns it
as a merged location, assuming that L1 and L2 belong to files that were both included
in the nearest common location.

Fixes https://github.com/llvm/llvm-project/issues/122846.
2025-04-18 13:57:28 +02:00
Raul Tambre
c890b7376f [lldb][Telemetry] Fix unit test compile failure with LLVM_ENABLE_TELEMETRY=0 (#136115)
It needs to be `TEST_F` to access `received_entries`.
Disabling also works based on the test not the fixture name.

Build failure:
```
lldb/unittests/Core/TelemetryTest.cpp:110:17: error: use of undeclared identifier 'received_entries'
  110 |   ASSERT_EQ(1U, received_entries.size());
      |                 ^
lldb/unittests/Core/TelemetryTest.cpp:112:61: error: use of undeclared identifier 'received_entries'
  112 |             llvm::dyn_cast<lldb_private::FakeTelemetryInfo>(received_entries[0])
      |                                                             ^
```

Fixes: 159b872b37
2025-04-18 14:48:30 +03:00
Rahul Joshi
6c4caae449 [LLVM][TableGen] Move DecoderEmitter output to anonymous namespace (#136214)
- Move the code generated by DecoderEmitter to anonymous namespace.
- Move AMDGPU's usage of this code from header file to .cpp file.

Note, we get build errors like "call to function 'decodeInstruction'
that is neither visible in the template definition nor found by
argument-dependent lookup" if we do not change AMDGPU.
2025-04-18 04:35:05 -07:00
Andrew Savonichev
a8fe21f3f5 [clang] Handle instantiated members to determine visibility (#136128)
As reported in issue #103477, visibility of instantiated member
functions used to be ignored when calculating visibility of a
specialization.

This patch modifies `getLVForClassMember` to look up for a source
template for an instantiated member, and changes `mergeTemplateLV` to
apply it.

A similar issue was reported in #31462, but it seems that `extern`
declaration with visibility prevents the function from being emitted as
hidden. This behavior seems correct, even though GCC emits it as with
default visibility instead.

Both tests from #103477 and #31462 are added as LIT tests `test72` and
`test73` respectively.
2025-04-18 20:29:19 +09:00
Aaron Ballman
c609cd2df9 Give this diagnostic a diagnostic group (#136182)
I put this under -Wunitialized because that's the same group it's under
in GCC.

Fixes #41104
2025-04-18 07:09:27 -04:00
Timm Baeder
c5d59723cb [clang][bytecode] Reject constexpr-unknown values in CheckStore (#136279) 2025-04-18 12:48:16 +02:00
Zichen Lu
1d190065d9 [mlir][target] RAII wrap moduleToObject timer to ensure call clear function (#136142)
As title, we need to call `Timer::clear` to avoid extra log like this:
```
===-------------------------------------------------------------------------===
                           ...
===-------------------------------------------------------------------------===
  Total Execution Time: 0.0000 seconds (0.0000 wall clock)

   ---Wall Time---  --- Name ---
        -----       ....
        -----       Total
```
2025-04-18 12:33:31 +02:00
Chengjun
9b8bc53a0b [FlattenCFG] Fix an Imprecise Usage of AA (#128117)
In current `FlattenCFG`, using `isNoAlias` for two instructions is
imprecise. For example, when passing a store instruction and a load
instruction directly into `AA->isNoAlias`, it will always return
`NoAlias`. This happens because when checking the types of the two
Values, the store instruction (which has a `void` type) causes the
analysis to return `NoAlias`.

For instructions, we should use `getModRefInfo` instead of `isNoAlias`,
as aliasing is a concept of memory locations.

In this patch, `AAResults::getModRefInfo` is supported to take in two
instructions. It will check whether two instructions may access the same
memory location or not. And in `FlattenCFG`, we use this new helper
function to do the check instead of `isNoAlias`.

Unit tests and lit tests are also included to this patch.
2025-04-18 12:30:05 +02:00
Matt Arsenault
9bdd9dc895 AMDGPU: Mark workitem ID intrinsics with range attribute (#136196)
This avoids the need to have special handling at every use site.
Unfortunately this means we unnecessarily emit AssertZext in the DAG
(where we already directly understand the range of the intrinsic), andt
we regress in undefined cases as we don't fold out asserts on undef.
2025-04-18 12:27:38 +02:00
Akshat Oke
31ddaef8d1 [CodeGen][NPM] Port UnreachableMachineBlockElim to NPM (#136127) 2025-04-18 15:06:30 +05:30
Christian Sigg
1db03cab70 [mlir][bazel] Port 697aa9995c 2025-04-18 10:31:33 +02:00
Younan Zhang
c7daab259c [Clang] Fix the trailing comma regression (#136273)
925e195 introduced a regression since which we started to accept invalid
trailing commas in many expression lists where they're not allowed by
the grammar. The issue came from the fact that an additional invalid
state - previously handled by ParseExpressionList - was overlooked in
that patch.

Fixes https://github.com/llvm/llvm-project/issues/136254

No release entry because I want to backport it.
2025-04-18 16:27:27 +08:00
Simon Pilgrim
64ffecfc43 [DAG] isKnownNeverNaN - add DemandedElts element mask to isKnownNeverNaN calls (#135952)
Matches what we've done for computeKnownBits etc. to improve vector handling
2025-04-18 09:24:02 +01:00
Kazu Hirata
5db95fd6ca [memprof] Avoid repeated hash lookups (NFC) (#136268)
Note that we don't have to worry about CallstackProfileData[Id]
default-constructing the value side of a new map entry.  If that
happens, AccessHistogramSize > 0 wouldn't be true, and the new map
entry gets deleted right away.
2025-04-18 01:07:05 -07:00
Corentin Jabot
a99c978d1b [Clang] Avoid dereferencing an invalid iterator
Fix msan builds after 8c5a307bd8
https://lab.llvm.org/buildbot/#/builders/94/builds/6321
2025-04-18 10:03:06 +02:00
Sergei Barannikov
d1496313d7 [CodeGen] Add another method to CFIInstBuilder (#136270)
Mainly for use by downstream targets, but it can find applications in
upstream code as well. Use it in MSP430 so that it doesn't look dead.
2025-04-18 10:50:42 +03:00
Timm Baeder
802e7309c0 [lldb] Fix TestExprDiagnostics test (#136269)
Add missing source ranges to the diagnostic output.
2025-04-18 09:33:20 +02:00
Yanzuo Liu
a158352294 [Clang][GitHub][NFC] Auto-add clang:bytecode label for PR (#136148) 2025-04-18 09:27:23 +02:00
Kazu Hirata
f4c76bba59 [clang] Use llvm::append_range (NFC) (#136256)
This patch replaces:

  llvm::copy(Src, std::back_inserter(Dst));

with:

  llvm::append_range(Dst, Src);

for breavity.

One side benefit is that llvm::append_range eventually calls
llvm::SmallVector::reserve if Dst is of llvm::SmallVector.
2025-04-18 00:15:13 -07:00
Nikolas Klauser
e0a6905287 [libc++] Simplify the generic implementation of is_{un}signed (#136095) 2025-04-18 09:06:21 +02:00
Timm Baeder
cc7fc9978f [clang] Add source range to 'use of undeclared identifier' diagnostics (#117671) 2025-04-18 08:27:15 +02:00
Kazu Hirata
d27175d26e [Scalar] Avoid repeated hash lookups (NFC) (#135751) 2025-04-17 23:03:39 -07:00
Kazu Hirata
a42ac55a79 [IPO] Avoid repeated hash lookups (NFC) (#135750) 2025-04-17 23:03:25 -07:00
Fangrui Song
f28408f3af [test] Remove CHECK lines for MCAsmStreamer's fixup output
The fixup output is a debug aid and should not be used to test
target-specific relocation generation implementation. The llvm-mc
-filetype=obj output is what truly matters.
2025-04-17 22:29:42 -07:00
Nico Weber
2b002d6804 [gn] port 1756fcb8b0 2025-04-17 21:56:41 -07:00
Iris
155fc76f20 Recommit "[RISCV] Strengthen register usage validation for XTheadMemPair loads (#136241)"
With test fix.

Closes #136087

https://github.com/XUANTIE-RV/thead-extension-spec/blob/master/xtheadmempair/lwd.adoc
2025-04-17 21:55:16 -07:00
Peter Collingbourne
b07ee6acff LowerTypeTests: Simplify pointer types. 2025-04-17 21:52:40 -07:00
Kazu Hirata
59288761c9 [llvm] Use llvm::binary_search (NFC) (#136228) 2025-04-17 21:46:24 -07:00
Fangrui Song
65d16a8101 [RISCV] Simplify fixup kinds that force relocations
For RELA targets, fixup kinds that force relocations (GOT, TLS, ALIGN,
RELAX, etc) can bypass `applyFixup` and be encoded as
`FirstRelocationKind+i`, as seen in LoongArch. This patch removes
redundant fixup kinds and adopts the `FirstRelocationKind+i` encoding.

The `llvm-mc -show-encoding` output no longer displays descriptive fixup
names, as this information is removed from
`RISCVAsmBackend::getFixupKindInfo`. While a backend hook could be added
to call `llvm::object::getELFRelocationTypeName`, it's unnecessary since
the relocation in `-filetype=obj` output is what truly matters.

Pull Request: https://github.com/llvm/llvm-project/pull/136088
2025-04-17 21:36:15 -07:00
Fangrui Song
d5f94c3915 Revert "[RISCV] Strengthen register usage validation for XTheadMemPair loads (#136241)"
This reverts commit a354564a64.

Broke tests
2025-04-17 21:28:28 -07:00