Commit Graph

543167 Commits

Author SHA1 Message Date
Zhaoxin Yang
2c1900860c [lld][LoongArch] Support TLSDESC GD/LD to IE/LE (#123715)
Support TLSDESC to initial-exec or local-exec optimizations. Introduce a
new hook RE_LOONGARCH_RELAX_TLS_GD_TO_IE_PAGE_PC and use existing
R_RELAX_TLS_GD_TO_IE_ABS to support TLSDESC => IE, while use existing
R_RELAX_TLS_GD_TO_LE to support TLSDESC => LE.
    
In normal or medium code model, there are two forms of code sequences:
* pcalau12i  $a0, %desc_pc_hi20(sym_desc)
* addi.d     $a0, $a0, %desc_pc_lo12(sym_desc)
* ld.d       $ra, $a0, %desc_ld(sym_desc)
* jirl       $ra, $ra, %desc_call(sym_desc)
------
* pcaddi     $a0, %desc_pcrel_20(sym_desc)
* ld.d       $ra, $a0, %desc_ld(sym_desc)
* jirl       $ra, $ra, %desc_call(sym_desc)
    
Convert to IE:
* pcalau12i $a0, %ie_pc_hi20(sym_ie)
* ld.[wd]   $a0, $a0, %ie_pc_lo12(sym_ie)

Convert to LE:
* lu12i.w $a0, %le_hi20(sym_le) # le_hi20 != 0, otherwise NOP
* ori $a0 src, %le_lo12(sym_le) # le_hi20 != 0, src = $a0, otherwise src = $zero

Simplicity, whether tlsdescToIe or tlsdescToLe, we always tend to
convert the preceding instructions to NOPs, due to both forms of code
sequence (corresponding to relocation combinations:
R_LARCH_TLS_DESC_PC_HI20+R_LARCH_TLS_DESC_PC_LO12 and
R_LARCH_TLS_DESC_PCREL20_S2) have same process.
    
TODO: When relaxation enables, redundant NOPs can be removed. It will be
implemented in a future patch.
    
Note: All forms of TLSDESC code sequences should not appear interleaved
in the normal, medium or extreme code model, which compilers do not
generate and lld is unsupported. This is thanks to the guard in
PostRASchedulerList.cpp in llvm.
```
Calls are not scheduling boundaries before register allocation,
but post-ra we don't gain anything by scheduling across calls
since we don't need to worry about register pressure.
```
2025-07-02 16:09:51 +08:00
Antonio Frighetto
f1cc0b607b [IR] Introduce dead_on_return attribute
Add `dead_on_return` attribute, which is meant to be taken advantage
by the frontend, and states that the memory pointed to by the argument
is dead upon function return. As with `byval`, it is supposed to be
used for passing aggregates by value. The difference lies in the ABI:
`byval` implies that the pointer is explicitly passed as argument to
the callee (during codegen the copy is emitted as per byval contract),
whereas a `dead_on_return`-marked argument implies that the copy
already exists in the IR, is located at a specific stack offset within
the caller, and this memory will not be read further by the caller upon
callee return – or otherwise poison, if read before being written.

RFC: https://discourse.llvm.org/t/rfc-add-dead-on-return-attribute/86871.
2025-07-02 09:29:36 +02:00
Fangrui Song
d5608d6751 MC,test: Improve section group test
Also add a case for #146581
```
.section sec,"ax"
.section .foo,"axG",@progbits,sec
nop
```
2025-07-02 00:28:40 -07:00
Matthias Springer
647aa6616f [mlir][SPIRVToLLVM] Set valid insertion point after op erasure (#146551)
Erasing/replacing an op, which is also the current insertion point,
invalidates the insertion point. Explicitly set the insertion point, so
that `copy` does not crash after the One-Shot Dialect Conversion
refactoring. (`ConversionPatternRewriter` will start behaving more like
a "normal" rewriter.)
2025-07-02 09:25:24 +02:00
Nikita Popov
83272a4849 [InstCombine] Fold icmp of gep chain with base (#144065)
Fold icmp between a chain of geps and its base pointer. Previously only
a single gep was supported.
    
This will be extended to handle the case of two gep chains with a common
base in a followup.

This helps to avoid regressions after #137297.
2025-07-02 09:23:36 +02:00
Haojian Wu
0588e8188c [Serialization] Use the SourceLocation::UIntTy instead of the raw type
for the offset, NFC
2025-07-02 09:11:55 +02:00
Markus Böck
6c9be27b52 [mlir][tensor] Fold identity reshape of 0d-tensors (#146375)
Just like 1d-tensors, reshapes of 0d-tensors (aka scalars) are always
no-folds as they only have one possible layout. This PR adds logic to
the `fold` implementation to optimize these away as is currently
implemented for 1d tensors.
2025-07-02 09:09:03 +02:00
Fangrui Song
9262ac3ee4 Revert "ELFObjectWriter: Optimize isInSymtab"
This reverts commit 1108cf6419.

Caused a regression for a weird but interesting case (STT_SECTION symbol
as group signature). We no longer define `sec`
```
.section sec,"ax"
.section .foo,"axG",@progbits,sec
nop
```

Fix #146581
2025-07-02 00:08:42 -07:00
Fangrui Song
eac1a1d3a8 MCAssembler: Consistently place MCFragment parameter before MCFixup
... to be consistent with other places, e.g. `recordRelocation`.
While here, use references instead of non-null pointers.
2025-07-01 23:59:35 -07:00
zbenzion
b68e8f1de7 [mlir][linalg] Allow promotion to use the original subview size (#144334)
linalg promotion attempts to compute a constant upper bound for the
allocated buffer size. Only when failed to compute an upperbound it
fallbacks to the original subview size, which may be dynamic.

Adding a promotion option to use the original subview size by default,
thus minimizing the allocation size.
Fixes #144268.
2025-07-02 08:47:51 +02:00
Fangrui Song
3c6cade485 MCObjectStreamer: De-virtualize emitInstToFragment 2025-07-01 23:05:35 -07:00
Kazu Hirata
f4b938b7c0 [TableGen] Use range-based for loops (NFC) (#146626) 2025-07-01 22:50:11 -07:00
Kazu Hirata
b809d5e2ac [ProfileData] Use lambdas instead of std::bind (NFC) (#146625)
Lambdas are a lot shorter than std::bind here.
2025-07-01 22:50:04 -07:00
Kazu Hirata
838b91d7f6 [clangd] Drop const from a return type (NFC) (#146623)
We don't need const on a return type.
2025-07-01 22:49:56 -07:00
Kazu Hirata
7b4dbb4f37 [Sema] Remove an unnecessary cast (NFC) (#146622)
Since both alignment and Alignment are of the same type, this patch
renames alignment to Alignment while removing the cast statement.
2025-07-01 22:49:48 -07:00
Mateusz Mikuła
2723a6d992 [LLVM][Cygwin] Enable dynamic linking of libLLVM (#146440)
These changes allow to link everything to shared LLVM library with
MSYS2 "Cygwin" toolchain.
2025-07-01 22:30:12 -07:00
Timm Baeder
984c78f27d [clang][bytecode] Add back missing initialize call (#146589)
This was only accidentally dropped, so add it back.
2025-07-02 07:15:47 +02:00
Craig Topper
c9bfdae620 [RISCV] Use uint64_t for Insn in getInstruction32 and getInstruction16. NFC (#146619)
Insn is passed to decodeInstruction which is a template function based
on the type of Insn. By using uint64_t we ensure only one version of
decodeInstruction is created. This reduces the file size of
RISCVDisassembler.cpp.o by ~25% in my local build.
2025-07-01 21:45:02 -07:00
Shilei Tian
f1a4bb6245 [RFC][NFC][AMDGPU] Remove explicit value assignments from AMDGPU::GPUKind (#146567)
We don't seem to rely on the specific values of these enums, so removing
the
explicit assignments simplifies the process of adding new targets.
2025-07-01 23:39:01 -04:00
Alex Crichton
a8a9a7f95a [WebAssembly] Fix inline assembly with vector types (#146574)
This commit fixes using inline assembly with v128 results. Previously
this failed with an internal assertion about a failure to legalize a
`CopyFromReg` where the source register was typed `v8f16`. It looks like
the type used for the destination register was whatever was listed first
in the `def V128 : WebAssemblyRegClass` listing, so the types were
shuffled around to have a default-supported type.

A small test was added as well which failed to generate previously and
should now pass in generation. This test passed on LLVM 18 additionally
and regressed by accident in #93228 which was first included in LLVM 19.
2025-07-01 20:26:30 -07:00
Peter Collingbourne
2a702cdc38 Driver: Avoid llvm::sys::path::append if resource directory absolute.
After #145996 CLANG_RESOURCE_DIR can be an absolute path so we need to
handle it correctly in the driver.

llvm::sys::path::append does not append absolute paths in the way
that I expected (or consistent with other similar APIs such as C++17
std::filesystem::path::append or Python os.path.join); instead, it
effectively discards the leading / and appends the resulting relative path
(e.g. append(P, "/bar") with P = "/foo" sets P to "/foo/bar").

Many tests start failing if I try to align llvm::sys::path::append with
the other APIs because of callers that expect the existing behavior,
so for now let's add a special case here for absolute resource paths,
and document the behavior in Path.h.

Reviewers: MaskRay

Reviewed By: MaskRay

Pull Request: https://github.com/llvm/llvm-project/pull/146449
2025-07-01 20:21:51 -07:00
XiangZhang
aa1d9a4c31 [MLIR][Affine] Enhance simplifyAdd for AffineExpr mod (#146492)
Currently AffineExpr Add has ability to optimize `"s1 + (s1 // c * -c)"
to "s1 % c"`,
but can not optimize `"(s0 + s1) + (s1 // c * -c)"`. 
This patch provide an opportunity to do this simplification, let it can
be simplified to `"s0 + s1 % c"`.
2025-07-02 11:08:58 +08:00
Kazu Hirata
eb07f0d4a9 [Analysis] Use range-based for loops (NFC) (#146466) 2025-07-01 19:38:28 -07:00
Ashwin Banwari
2599a9aeb5 [clang] [modules] Implement P3618R0: Allow attaching main to the global module (#146461)
Remove the prior warning for attaching extern "C++" to main.
2025-07-02 09:52:10 +08:00
Ami-zhang
3deed4211a [docs] Add clang release notes for LoongArch (#146481) 2025-07-02 09:21:33 +08:00
Jonas Devlieghere
a87b27fd51 [lldb] Fix the hardware breakpoint decorator (#146609)
A decorator to skip or XFAIL a test takes effect when the function
that's passed in returns a reason string. The wrappers around
hw_breakpoints_supported were doing that incorrectly by inverting
(calling `not`) on the result, turning it into a boolean, which means
the test is always skipped.
2025-07-01 18:01:19 -07:00
Matt Arsenault
7502af89fc clang: Forward exception_model flag for bitcode inputs (#146342)
This will enable removal of a hack from the wasm backend
in a future change.

This feels unnecessarily clunky. I would assume something was
automatically parsing this and propagating it in the C++ case,
but I can't seem to find it. In particular it feels wrong that
I need to parse out the individual values, given they are listed
in the options.td file. We should also be parsing and forwarding
every flag that corresponds to something else in TargetOptions,
which requires auditing.
2025-07-02 09:39:46 +09:00
Wenju He
b0e6faae08 [libclc] Add missing clc_lgamma_r with generic address space pointer arg (#146495)
There is no change to amdgcn--amdhsa.bc and nvptx64--nvidiacl.bc because
__opencl_c_generic_address_space is not defined for them.
2025-07-02 08:28:01 +08:00
Wenju He
93fe52f19e [libclc] Add __clc_nan implementation with signed nancode argument (#146485)
In OpenCL Extended Instruction Set Specification, nancode can be signed
integer or vector of signed integers values.
This PR has no change to amdgcn--amdhsa.bc and nvptx64--nvidiacl.bc
because the newly added clc functions are not used in OpenCL library.
2025-07-02 08:27:46 +08:00
Kewen12
2b16af8df2 [Offload][cmake] Add GPU test job limit for AMDGPU buildbot cmake cache (#146611)
Added GPU test job limit to make it consistent with current config
https://github.com/llvm/llvm-zorg/blob/main/buildbot/osuosl/master/config/builders.py#L2027C31-L2027C77
2025-07-01 19:18:28 -05:00
Aiden Grossman
6b7e1b97f4 [CI] Use Github Native Groups in monolithic-* scripts
This patch updates monolithic-linux.sh and monolithic-windows.sh to emit
expandable groups in the Github logs. The syntax this replaces
originally worked to produce the same functionality on Buildkite, but
Github uses a different syntax.

https://docs.github.com/en/actions/writing-workflows/choosing-what-your-workflow-does/workflow-commands-for-github-actions#grouping-log-lines

Reviewers: cmtice, DavidSpickett, tstellar, lnihlen, Endilll

Reviewed By: Endilll, DavidSpickett

Pull Request: https://github.com/llvm/llvm-project/pull/143481
2025-07-01 16:15:27 -07:00
Jonas Devlieghere
e89458d398 [lldb] Fix PipeTest name collision in unit tests
We had two classes named `PipeTest`: one in `PipeTestUtilities.h` and
one in `PipeTest.cpp`. The latter was unintentionally using the wrong
class (from the header) which didn't initialize the HostInfo subsystem.

This resulted in a crash due to a nullptr dereference (`g_fields`) when
`PipePosix::CreateWithUniqueName` called `HostInfoBase::GetProcessTempDir`.
2025-07-01 16:01:38 -07:00
jjasmine
e9c9f8f374 [WebAssembly] Fold any/alltrue (setcc x, 0, eq/ne) to [not] any/alltrue x (#144741)
Fixes https://github.com/llvm/llvm-project/issues/50142, a miss of
further vectorization, where we can only achieve zext (xor (any_true),
-1).

Now in test case simd-setcc-reductions, it's converted to all_true.

Also fixes https://github.com/llvm/llvm-project/issues/145177, which is

all_true (setcc x, 0, eq) -> not any_true
any_true (setcc x, 0, ne) -> any_true
all_true (setcc x, 0, ne) -> all_true

---------

Co-authored-by: badumbatish <--show-origin>
2025-07-01 15:27:37 -07:00
jjasmine
4a8c1f7d12 [WebAssembly] [Backend] Wasm optimize illegal bitmask (#145627)
[WebAssembly] [Backend] Wasm optimize illegal bitmask for #131980.

Currently, the case for illegal bitmask (v32i8 or v64i8) is that at the
SelectionDag level, two (four) vectors of v128 will be concatenated
together, then they'll all be SETCC by the same pseudo illegal
instruction, which requires expansion later on.

I opt for SETCC-ing them seperately, bitcast and zext them and then add
them up together in the end.

---------

Co-authored-by: badumbatish <--show-origin>
2025-07-01 15:13:08 -07:00
James Y Knight
ae2104897c [SelectionDAG] Fix NaN regression in fma dag-combine. (#146592)
After 901e1390c9 (#127770), the DAG
combine would transform `fma(x, 0.0, 1.0)` into `1.0` if
`-fp-contract=fast` was enabled, in addition to when 'x' is marked
nnan/ninf.

It's only valid in the latter case, not the former, so delete the extra
condition.
2025-07-01 18:10:30 -04:00
Alex MacLean
475cd8dfaf [NVPTX] Further cleanup call isel (#146411)
This change continues rewriting and cleanup around DAG ISel for
formal-arguments, return values, and function calls. This causes some
incidental changes, mostly to instruction ordering and register naming
but also a couple improvements caused by using scalar types earlier in
the lowering.
2025-07-01 14:55:04 -07:00
Skrai Pardus
5ed852f7f7 [mlir][arith] Add arith::ConstantIntOp constructor (#144638)
This PR adds a `build()` constructor for `ConstantIntOp` that takes in
an `APInt`.


Creating an `arith` constant value with an `APInt` currently requires a
structure like the following:
```c
b.create<arith::ConstantOp>(IntegerAttr::get(apintValue, 5));
```
In comparison, the`ConstantFloatOp` already has an `APFloat` constructor
which allows for the following:
```c
b.create<arith::ConstantFloatOp>(floatType, apfloatValue);
```
Thus, intuitively, it makes sense that a similar `ConstantIntOp`
constructor is made for `APInts` like so:
```c
b.create<arith::ConstantIntOp>(intType, apintValue);
```

Depends on https://github.com/llvm/llvm-project/pull/144636
2025-07-01 23:50:39 +02:00
Florian Hahn
863e17a5be [VPlan] Make Phi operand for VPReductionPHIRecipe optional (NFC).
VPReductionPHIRecipe doesn't rely on the underlying phi any longer,
allow empty underlying values when cloning. NFC at the moment but will
enable follow-up patches.
2025-07-01 22:49:27 +01:00
zGoldthorpe
f393211454 [Reland][IPO] Added attributor for identifying invariant loads (#146584)
Patched and tested the `AAInvariantLoadPointer` attributor from #141800,
which identifies pointers whose loads are eligible to be marked as
`!invariant.load`.

The bug in the attributor was due to `AAMemoryBehavior` always
identifying pointers obtained from `alloca`s as having no writes. I'm
not entirely sure why `AAMemoryBehavior` behaves this way, but it seems
to be beceause it identifies the scope of an `alloca` to be limited to
only that instruction (and, certainly, no memory writes occur within the
`alloca` instructin). This patch just adds a check to disallow all loads
from `alloca` pointers from being marked `!invariant.load` (since any
well-defined program will have to write to stack pointers at some
point).
2025-07-01 17:46:19 -04:00
Changpeng Fang
d99b14623f AMDGPU: Implement tensor_save and tensor_stop for gfx1250 (#146590)
MC layer only.
2025-07-01 14:28:38 -07:00
Florian Hahn
bcbc440712 [VPlan] Add missing VPWidenSelectto VPRecipeWithIRFlags::classof (NFC).
Add missing entry to VPRecipeWithIRFlags. NFC currently as it is never
called on VPWidenSelectRecipes currently.
2025-07-01 22:26:57 +01:00
Han-Chung Wang
42578e8586 [mlir][linalg] Use hasPureTensorSemantics in TransposeMatmul methods. (#146438)
The issue is triggered by
ee070d0816
that checks `TensorLikeType` when downstream projects use the pattern
without registering bufferization::BufferizationDialect. The
registration is needed because the interface implementation for builtin
types locate at `BufferizationDialect::initialize()`. However, we do not
need to fix it by the registration. The proper fix is using the linalg
method, i.e., hasPureTensorSemantics.

No additional tests are added because the functionality is well tested
in
[transpose-matmul.mlir](https://github.com/llvm/llvm-project/blob/main/mlir/test/Dialect/Linalg/transpose-matmul.mlir).
To reproduce the issue, it requires a different setup, e.g., writing a
new C++ pass, which seems not worth it.

Signed-off-by: hanhanW <hanhan0912@gmail.com>
2025-07-01 14:15:27 -07:00
Chao Chen
5d849d3a90 [mlir][xegpu] Fix seg-fault caused by setting a null attribute (#146002) 2025-07-01 15:42:52 -05:00
Florian Hahn
829f2f2448 [VectorCombine] Mark function as changed if shuffle is created.
777d6b5de9 exposed a code path where a function is modified but not
marked accordingly. Make sure we return true from foldShuffleFromReductions
if only a shuffle has been inserted/replaced.

Should fix    https://lab.llvm.org/buildbot/#/builders/187/builds/7578.
2025-07-01 21:38:29 +01:00
Haojian Wu
ac76e4d8a9 [Serialization] Use SourceLocation::UIntTy for the offset type, NFC 2025-07-01 22:33:23 +02:00
Richard Smith
c56c349d39 [clang-tidy] Switch misc-confusable-identifiers check to a faster algorithm. (#130369)
Optimizations:

- Only build the skeleton for each identifier once, rather than once for
each declaration of that identifier.
- Only compute the contexts in which identifiers are declared for
identifiers that have the same skeleton as another identifier in the
translation unit.
- Only compare pairs of declarations that are declared in related
contexts, rather than comparing all pairs of declarations with the same
skeleton.

Also simplify by removing the caching of enclosing `DeclContext` sets,
because with the above changes we don't even compute the enclosing
`DeclContext` sets in common cases. Instead, we terminate the traversal
to enclosing `DeclContext`s immediately if we've already found another
declaration in that context with the same identifier. (This optimization
is not currently applied to the `forallBases` traversal, but could be
applied there too if needed.)

This also fixes two bugs that together caused the check to fail to find
some of the issues it was looking for:

- The old check skipped comparisons of declarations from different
contexts unless both declarations were type template parameters. This
caused the checker to not warn on some instances of the CVE it is
intended to detect.
- The old check skipped comparisons of declarations in all base classes
other than the first one found by the traversal. This appears to be an
oversight, incorrectly returning `false` rather than `true` from the
`forallBases` callback, which terminates traversal.

This also fixes an issue where the check would have false positives for
template parameters and function parameters in some cases, because those
parameters sometimes have a parent `DeclContext` that is the parent of
the parameterized entity, or sometimes is the translation unit. In
either case, this would cause warnings about declarations that are never
visible together in any scope.

This decreases the runtime of this check, especially in the common case
where there are few or no skeletons with two or more different
identifiers. Running this check over LLVM, clang, and clang-tidy, the
wall time for the check as reported by clang-tidy's internal profiler is
reduced from 5202.86s to 3900.90s.
2025-07-01 13:31:46 -07:00
Kazu Hirata
a061171426 [AsmParser] Remove unnecessary casts (NFC) (#146549)
Linkage is already of GlobalValue::LinkageTypes.
2025-07-01 13:11:02 -07:00
Haojian Wu
650d0151c6 [clang] Improve getFileIDLocal binary search. (#146510)
Avoid reading the `LocalSLocEntryTable` twice per loop iteration. NFC.

https://llvm-compile-time-tracker.com/compare.php?from=0b6ddb02efdcbdac9426e8d857499ea0580303cd&to=1aa335ccfb07ba96177b89b1933aa6b980fa14f6&stat=instructions:u
2025-07-01 21:59:09 +02:00
Florian Hahn
777d6b5de9 [VectorCombine] Use InstSimplifyFolder to simplify instrs on creation. (#146350)
Update VectorCombine to use InstSimplifyFolder to simplify redundant
instructions on creation.

PR: https://github.com/llvm/llvm-project/pull/146350
2025-07-01 20:55:51 +01:00
Florian Hahn
6b3d2b629c [VPlan] Add VPExpressionRecipe, replacing extended reduction recipes. (#144281)
This patch adds a new recipe to combine multiple recipes into an
'expression' recipe, which should be considered as single entity for
cost-modeling and transforms. The recipe needs to be 'decomposed', i.e.
replaced by its individual recipes before execute.

This subsumes VPExtendedReductionRecipe and
VPMulAccumulateReductionRecipe and should make it easier to extend to
include more types of bundled patterns, like e.g. extends folded into
loads or various arithmetic instructions, if supported by the target.

It allows avoiding re-creating the original recipes when converting to
concrete recipes, together with removing the need to record various
information. The current version of the patch still retains the original
printing matching VPExtendedReductionRecipe and
VPMulAccumulateReductionRecipe, but this specialized print could be
replaced with printing the bundled recipes directly.

PR: https://github.com/llvm/llvm-project/pull/144281
2025-07-01 20:44:50 +01:00