Commit Graph

9431 Commits

Author SHA1 Message Date
Phoebe Wang
c72a751dab [X86][AMX] Support AMX-TRANSPOSE (#113532)
Ref.: https://cdrdv2.intel.com/v1/dl/getContent/671368
2024-11-01 16:45:03 +08:00
Freddy Ye
8cb6b65542 [X86][CFE] Correct parameter type of _cmpccxadd_epi64 (#114367)
This fixes correctness of https://gcc.godbolt.org/z/vexf5fW5r
2024-11-01 16:03:28 +08:00
Lei Wang
bef3b54ea1 [InstrPGO] Avoid using global variable to fix potential data race (#114364)
In https://github.com/llvm/llvm-project/pull/109837, it sets a global
variable(`PGOInstrumentColdFunctionOnly`) in PassBuilderPipelines.cpp
which introduced a data race detected by TSan. To fix this, I decouple
the flag setting, the flags are now set
separately(`instrument-cold-function-only-path` is required to be used
with `--pgo-instrument-cold-function-only`).
2024-10-31 21:28:13 -07:00
Paul Kirth
913cd11f94 [llvm][fatlto] Drop any CFI related instrumentation after emitting bitcode (#112788)
We want to support CFI instrumentation for the bitcode section, without
miscompiling the object code portion of a FatLTO object. We can reuse
the existing mechanisms in the LowerTypeTestsPass to do that, by just
adding the pass to the FatLTO pipeline after the EmbedBitcodePass with
the correct options set.

Fixes #112053
2024-10-31 12:40:21 -07:00
Tex Riddell
6582785d01 Add CHECK-LABEL to avoid source tree path sensitivity in test (#112461)
The test `clang/test/CodeGen/2004-02-20-Builtins.c` will erroneously
fail if "builtin" is in the path to your source tree.

This change adds a `CHECK-LABEL !llvm.ident` after the `CHECK-NOT` to
avoid searching into the metadata containing the path.
2024-10-31 09:04:48 -07:00
Dmitry Chernenkov
d924a9ba03 Revert "[InstrPGO] Support cold function coverage instrumentation (#109837)"
This reverts commit e517cfc531.
2024-10-31 10:55:17 +00:00
Dmitry Chernenkov
06e28ed84f Revert "specify clang --target to fix breakage on AIX (#114127)"
This reverts commit cc60c46e39.
2024-10-31 10:55:17 +00:00
Feng Zou
8127162427 [X86][AMX] Support AMX-FP8 (#113850)
Ref.: https://cdrdv2.intel.com/v1/dl/getContent/671368
2024-10-31 10:14:25 +08:00
Alexandros Lamprineas
5dac2db5a8 [FMV][AArch64] Remove features which can be expressed as a combination of others. (#113580)
Removes sve-bf16, sve-ebf16, and sve-i8mm since they are obsolete. One
could write target_version("sve+bf16") instead of sve-bf16 for instance.

Approved in ACLE as https://github.com/ARM-software/acle/pull/353
2024-10-30 11:53:50 +00:00
Lei Wang
cc60c46e39 specify clang --target to fix breakage on AIX (#114127)
`-fprofile-sample-use` is not supported on AIX, which caused a CI
failure.
2024-10-29 21:06:43 -07:00
Simon Pilgrim
bf6c483e47 [clang][x86] Add constexpr support for SSE2 _mm_set*_epi* intrinsics 2024-10-29 15:39:15 +00:00
Simon Pilgrim
e281d96a81 [clang][x86] Add constexpr support for _mm_add_epi32/64 and _mm_sub_epi32/64 2024-10-29 14:34:19 +00:00
Simon Pilgrim
f257e9bdbb [clang][x86] Update AVX/AVX512 setzero constexpr tests to use the TEST_CONSTEXPR macro 2024-10-29 14:34:19 +00:00
Simon Pilgrim
f537792f3f [X86] Refactor the SSE intrinsics constexpr tests to simplify future expansion (#112578)
I'm hoping to make a large proportion of the SSE/AVX intrinsics usable in constant expressions - eventually anything that doesn't touch memory or system settings - making it much easier to utilize SSE/AVX intrinsics in various math libraries etc.

My initial implementation placed the tests at the end of the test file, similar to how smaller files already handle their tests.

However, what I'm finding is that this approach doesn't scale when trying to track coverage of so many intrinsics - many keep getting missed, and it gets messy; so what I'm proposing is to instead keep each intrinsic's generic IR test and its constexpr tests together to make them easier to track together, wrapping the static_assert inside a macro to disable on C and pre-C++11 tests.

I'm open to alternative suggestions before I invest too much time getting this work done :)
2024-10-29 11:00:35 +00:00
Jesse Huang
335e68d8bc [Clang][RISCV] Support -fcf-protection=return for RISC-V (#112477)
Enables the support of `-fcf-protection=return` on RISC-V, which
requires Zicfiss. It also adds a string attribute "hw-shadow-stack"
to every function if the option is set on RISC-V
2024-10-29 15:47:49 +08:00
Matthias Braun
36c1194906 Remove optimization flags from clang codegen tests (#113714)
- Remove an -O3 flag from a couple of clang x86 codegen tests so the
  tests do not need to be updated when optimizations in LLVM change.
- Change the tests to use utils/update_cc_test_checks.sh
- Change from apple/darwin triples to generic x86_64-- and
  i386-- because it was not relevant to the test but
  `update_cc_test_checks` seems to be unable to handle platforms that
  prepend `_` to function names.
2024-10-28 15:34:56 -07:00
Lei Wang
e517cfc531 [InstrPGO] Support cold function coverage instrumentation (#109837)
This patch adds support for cold function coverage instrumentation based
on sampling PGO counts. The major motivation is to detect dead functions
for the services that are optimized with sampling PGO. If a function is
covered by sampling profile count (e.g., those with an entry count > 0),
we choose to skip instrumenting those functions, which significantly
reduces the instrumentation overhead.

More details about the implementation and flags:
- Added a flag `--pgo-instrument-cold-function-only` in
`PGOInstrumentation.cpp` as the main switch to control skipping the
instrumentation.
- Built the extra instrumentation passes(a bundle of passes in
`addPGOInstrPasses`) under sampling PGO pipeline. This is controlled by
`--instrument-cold-function-only-path` flag.
- Added a driver flag `-fprofile-generate-cold-function-coverage`: 
- 1) Config the flags in one place, i,e. adding
`--instrument-cold-function-only-path=<...>` and
`--pgo-function-entry-coverage`. Note that the instrumentation file path
is passed through `--instrument-sample-cold-function-path`, because we
cannot use the `PGOOptions.ProfileFile` as it's already used by
`-fprofile-sample-use=<...>`.
- 2) makes linker to link `compiler_rt.profile` lib(see
[ToolChain.cpp#L1125-L1131](https://github.com/llvm/llvm-project/blob/main/clang/lib/Driver/ToolChain.cpp#L1125-L1131)
).
- Added a flag(`--pgo-cold-instrument-entry-threshold`) to config entry
count to determine cold function.

Overall, the full command is like:

```
clang++ -O2 -fprofile-generate-cold-function-coverage=<...> -fprofile-sample-use=<...>  code.cc -o code
```
2024-10-28 10:13:45 -07:00
Aaron Ballman
af7c58b7ea Remove support for RenderScript (#112916)
See
https://discourse.llvm.org/t/rfc-deprecate-and-eventually-remove-renderscript-support/81284
for the RFC
2024-10-28 12:48:42 -04:00
Momchil Velikov
53f7f8ecca [Clang][AArch64] Fix Pure Scalables Types argument passing and return (#112747)
Pure Scalable Types are defined in AAPCS64 here:

https://github.com/ARM-software/abi-aa/blob/main/aapcs64/aapcs64.rst#pure-scalable-types-psts

And should be passed according to Rule C.7 here:

https://github.com/ARM-software/abi-aa/blob/main/aapcs64/aapcs64.rst#682parameter-passing-rules

This part of the ABI is completely unimplemented in Clang, instead it
treats PSTs sometimes as HFAs/HVAs, sometime as general composite types.

This patch implements the rules for passing PSTs by employing the
`CoerceAndExpand` method and extending it to:
* allow array types in the `coerceToType`; Now only `[N x i8]` are
considered padding.
* allow mismatch between the elements of the `coerceToType` and the
elements of the `unpaddedCoerceToType`; AArch64 uses this to map
fixed-length vector types to SVE vector types.

Corectly passing a PST argument needs a decision in Clang about whether
to pass it in memory or registers or, equivalently, whether to use the
`Indirect` or `Expand/CoerceAndExpand` method. It was considered
relatively harder (or not practically possible) to make that decision in
the AArch64 backend.
Hence this patch implements the register counting from AAPCS64 (cf.
`NSRN`, `NPRN`) to guide the Clang's decision.
2024-10-28 15:43:14 +00:00
Momchil Velikov
1df5c94343 [AArch64] Implement FP8 floating-point mode helper intrinsics (#100608)
Implement FP8 mode helper intrinsics (as inline functions) as
specified in ACLE 2024Q3 "14.2 Helper intrinsics"

https://github.com/ARM-software/acle/releases/download/r2024Q3/acle-2024Q3.pdf
2024-10-28 11:22:38 +00:00
Freddy Ye
5aa1275d03 [X86] Support SM4 EVEX version intrinsics/instructions. (#113402)
Ref.: https://cdrdv2.intel.com/v1/dl/getContent/671368
2024-10-28 10:46:16 +08:00
Alex MacLean
fb33af08e4 [NVPTX] Remove nvvm.ldg.global.* intrinsics (#112834)
Remove these intrinsics which can be better represented by load
instructions with `!invariant.load` metadata:

- llvm.nvvm.ldg.global.i
- llvm.nvvm.ldg.global.f
- llvm.nvvm.ldg.global.p
2024-10-27 16:14:13 -07:00
davidtrevelyan
4102625380 [rtsan][llvm][NFC] Rename sanitize_realtime_unsafe attr to sanitize_realtime_blocking (#113155)
# What

This PR renames the newly-introduced llvm attribute
`sanitize_realtime_unsafe` to `sanitize_realtime_blocking`. Likewise,
sibling variables such as `SanitizeRealtimeUnsafe` are renamed to
`SanitizeRealtimeBlocking` respectively. There are no other functional
changes.


# Why?

- There are a number of problems that can cause a function to be
real-time "unsafe",
- we wish to communicate what problems rtsan detects and *why* they're
unsafe, and
- a generic "unsafe" attribute is, in our opinion, too broad a net -
which may lead to future implementations that need extra contextual
information passed through them in order to communicate meaningful
reasons to users.
- We want to avoid this situation and make the runtime library boundary
API/ABI as simple as possible, and
- we believe that restricting the scope of attributes to names like
`sanitize_realtime_blocking` is an effective means of doing so.

We also feel that the symmetry between `[[clang::blocking]]` and
`sanitize_realtime_blocking` is easier to follow as a developer.

# Concerns

- I'm aware that the LLVM attribute `sanitize_realtime_unsafe` has been
part of the tree for a few weeks now (introduced here:
https://github.com/llvm/llvm-project/pull/106754). Given that it hasn't
been released in version 20 yet, am I correct in considering this to not
be a breaking change?
2024-10-26 13:06:11 +01:00
Gang Chen
4ac0e7e400 [AMDGPU] Add a type for the named barrier (#113614) 2024-10-25 11:24:47 -07:00
Caroline Concatto
b3703fa504 [AArch64]Update test aarch64-debug-types.c
This patch fix the failing tests by adding
REQUIRES: aarch64-registered-target

This tests was failing in non aarch64 cpu.
The test was introduced by:
 [CLANG][AArch64] Add the  modal 8 bit floating-point scalar type (#97277)
2024-10-25 13:30:16 +00:00
CarolineConcatto
49940514e2 [CLANG][AArch64] Add the modal 8 bit floating-point scalar type (#97277)
ARM ACLE PR#323[1] adds new modal types for 8-bit floating point
intrinsic.

From the PR#323:
```
ACLE defines the `__mfp8` type, which can be used for the E5M2 and E4M3
8-bit floating-point formats. It is a storage and interchange only type
with no arithmetic operations other than intrinsic calls.
````

The type should be an opaque type and its format in undefined in Clang.
Only defined in the backend by a status/format register, for AArch64 the
FPMR.

This patch is an attempt to the add the mfloat8_t scalar type. It has a
parser and codegen for the new scalar type.

The patch it is lowering to and 8bit unsigned as it has no format. But
maybe we should add another opaque type.

[1]  https://github.com/ARM-software/acle/pull/323
2024-10-25 13:59:46 +01:00
Phoebe Wang
c2d2b3b808 [test] Avoid writing to a potentially write-protected dir (#113674) 2024-10-25 18:43:40 +08:00
Kiran
a96c14eeb8 [Clang] Always forward sret parameters to musttail calls
If a call using the musttail attribute returns it's value through an
sret argument pointer, we must forward an incoming sret pointer to it,
instead of creating a new alloca. This is always possible because the
musttail attribute requires the caller and callee to have the same
argument and return types.
2024-10-25 09:34:08 +01:00
Freddy Ye
c4248fa3ed [X86] Support MOVRS and AVX10.2 instructions. (#113274)
Ref.: https://cdrdv2.intel.com/v1/dl/getContent/671368
2024-10-25 09:00:19 +08:00
Alexandros Lamprineas
a91ebcdd91 [FMV][AArch64] Unify aes with pmull and sve2-aes with sve2-pmull128. (#111673)
According to the Arm Architecture Reference Manual for A-profile
architecture you can't have one feature without having the other:

ID_AA64ZFR0_EL1.AES, bits [7:4]

> FEAT_SVE_AES implements the functionality identified by the value
0b0001.
> FEAT_SVE_PMULL128 implements the functionality identified by the value
0b0010.
> The permitted values are 0b0000 and 0b0010.

(The following was removed from the latest release of the specification,
but it appears to be a mistake that was not intended to relax the
architecture constraints. The discrepancy has been reported)

ID_AA64ISAR0_EL1.AES, bits [7:4]

> FEAT_AES implements the functionality identified by the value 0b0001.
> FEAT_PMULL implements the functionality identified by the value
0b0010.
> From Armv8, the permitted values are 0b0000 and 0b0010.

Approved in ACLE as https://github.com/ARM-software/acle/pull/352
2024-10-23 16:28:55 +01:00
CarolineConcatto
6dad29aebc [CLANG][AArch64]Add Neon vectors for mfloat8_t (#99865)
This patch adds these new vector sizes for neon:
   mfloat8x16_t and mfloat8x8_t

    According to the ARM ACLE PR#323[1].

    [1] ARM-software/acle#323
2024-10-23 13:23:18 +01:00
Florian Hahn
4334f317e7 [TBAA] Extend pointer TBAA to pointers of non-builtin types. (#110569)
Extend the logic added in 123c036bd3
(https://github.com/llvm/llvm-project/pull/76612) to support pointers to
non-builtin types by using the mangled name of the canonical type.

PR: https://github.com/llvm/llvm-project/pull/110569
2024-10-22 16:23:34 -07:00
Paul Walker
5bb34803a4 [NFC] Migrate tests to use autoupdate for CHECK lines. 2024-10-22 12:55:15 +00:00
Congcong Cai
c0c36aa018 [clang codegen] fix crash emitting __array_rank (#113186)
Fixed: #113044
the type of `ArrayTypeTraitExpr` can be changed, use i32 directly is
incorrect.

---------

Co-authored-by: Eli Friedman <efriedma@quicinc.com>
2024-10-22 17:03:51 +08:00
Alexandros Lamprineas
b6e9ba017f [FMV][AArch64] Unify features memtag and memtag2. (#112511)
If we split these features in the compiler (see relevant pull request
https://github.com/llvm/llvm-project/pull/109299), we would only be able
to hand-write a 'memtag2' version using inline assembly since the
compiler cannot generate the instructions that become available with
FEAT_MTE2. However these instructions only work at Exception Level 1, so
they would be unusable since FMV is a user space facility. I am
therefore unifying them.

Approved in ACLE as https://github.com/ARM-software/acle/pull/351
2024-10-21 21:40:57 +01:00
Alex Rønne Petersen
d906ac52ab [clang][AVR] Fix basic type size/alignment values to match avr-gcc. (#111290)
Closes #102172
2024-10-21 12:30:03 +02:00
Piyou Chen
c77e836123 [RISCV][FMV] Remove support for negative priority (#112161)
Ensure that target_version and target_clones do not accept negative
numbers for the priority feature.

Base on discussion on
https://github.com/riscv-non-isa/riscv-c-api-doc/pull/85.
2024-10-21 16:10:22 +08:00
Sam Elliott
228f88fdc8 [RISCV] Inline Assembly: RVC constraint and N modifier (#112561)
This change implements support for the `cr` and `cf` register
constraints (which allocate a RVC GPR or RVC FPR respectively), and the
`N` modifier (which prints the raw encoding of a register rather than
the name).

The intention behind these additions is to make it easier to use inline
assembly when assembling raw instructions that are not supported by the
compiler, for instance when experimenting with new instructions or when
supporting proprietary extensions outside the toolchain.

These implement part of my proposal in riscv-non-isa/riscv-c-api-doc#92

As part of the implementation, I felt there was not enough coverage of
inline assembly and the "in X" floating-point extensions, so I have
added more regression tests around these configurations.
2024-10-18 10:40:38 +01:00
c8ef
761fa5844e [TLI] Add support for the ilogb libcall. (#112725)
This patch adds the `ilogb` libcall. Constant folding will be handled in
subsequent patches.
2024-10-18 14:20:34 +08:00
Keith Packard
44b020a381 [PowerPC][ISelLowering] Support -mstack-protector-guard=tls (#110928)
Add support for using a thread-local variable with a specified offset
for holding the stack guard canary value. This supports both 32- and 64-
bit PowerPC targets.

This mirrors changes from #108942 but targeting PowerPC instead of
RISCV. Because both of these PRs modify the same driver functions, this
series is stack on top of the RISC-V one.

---------

Signed-off-by: Keith Packard <keithp@keithp.com>
2024-10-17 19:06:47 -07:00
goldsteinn
69a798a996 Reapply "[Inliner] Propagate more attributes to params when inlining (#91101)" (2nd Attempt) (#112749)
Root cause of the bug was code hanging onto `range` attr after
changing BitWidth. This was fixed in PR #112633.
2024-10-17 20:28:47 -05:00
Bill Wendling
8c62bf54df [Clang] Disable use of the counted_by attribute for whole struct pointers (#112636)
The whole struct is specificed in the __bdos. The calculation of the
whole size of the structure can be done in two ways:

    1) sizeof(struct S) + count * sizeof(typeof(fam))
    2) offsetof(struct S, fam) + count * sizeof(typeof(fam))

The first will add any remaining whitespace that might exist after
allocation while the second method is more precise, but not quite
expected from programmers. See [1] for a discussion of the topic.

GCC isn't (currently) able to calculate __bdos on a pointer to the whole
structure. Therefore, because of the above issue, we'll choose to match
what GCC does for consistency's sake.

[1] https://lore.kernel.org/lkml/ZvV6X5FPBBW7CO1f@archlinux/

Co-authored-by: Eli Friedman <efriedma@quicinc.com>
2024-10-17 21:52:40 +00:00
Qiongsi Wu
f9d0789064 [PGO] Initialize GCOV Writeout and Reset Functions in the Runtime on AIX (#108570)
This PR registers the writeout and reset functions for `gcov` for all
modules in the PGO runtime, instead of registering them
using global constructors in each module. The change is made for AIX
only, but the same mechanism works on Linux on Power.

When registering such functions using global constructors in each module
without `-ffunction-sections`, the AIX linker cannot garbage collect
unused undefined symbols, because such symbols are grouped in the same
section as the `__sinit` symbol. Keeping such undefined symbols causes
link errors (see test case
https://github.com/llvm/llvm-project/pull/108570/files#diff-500a7e1ba871e1b6b61b523700d5e30987900002add306e1b5e4972cf6d5a4f1R1
for this scenario). This PR implements the initialization in the
runtime, hence avoiding introducing `__sinit` into each module.

The implementation adds a new global variable `__llvm_covinit_functions`
to each module. This new global variable contains the function pointers
to the `Writeout` and `Reset` functions. `__llvm_covinit_functions`'s
section is the named section `__llvm_covinit`. The linker will aggregate
all the `__llvm_covinit` sections from each module
to form one single named section in the final binary. The pair of
functions
```
const __llvm_gcov_init_func_struct *__llvm_profile_begin_covinit();
const __llvm_gcov_init_func_struct *__llvm_profile_end_covinit();
```
are implemented to return the start and end address of this named
section in the final binary, and they are used in function
```
__llvm_profile_gcov_initialize()
```
(which is a constructor function in the runtime) so the runtime knows
the addresses of all the `Writeout` and `Reset` functions from all the
modules.

One noticeable implementation detail relevant to AIX is that to preserve
the `__llvm_covinit` from the linker's garbage collection, a `.ref`
pseudo instruction is inserted into them, referring to the section that
contains the `__llvm_gcov_ctr` variables, which are used in the
instrumented code. The `__llvm_gcov_ctr` variables did not belong to
named sections before, but this PR added them to the
`__llvm_gcov_ctr_section` named section, so we can add a `.ref` pseudo
instruction that refers to them in the `__llvm_covinit` section.
2024-10-17 09:32:10 -04:00
CarolineConcatto
cb43021e57 [CLANG]Add Scalable vectors for mfloat8_t (#101644)
This patch adds these new vector sizes for sve:
    svmfloat8_t

According to the ARM ACLE PR#323[1].

[1] ARM-software/acle#323
2024-10-17 09:22:55 +01:00
thetruestblue
927af63fdd [SanitizerCoverage] Add an option to gate the invocation of the tracing callbacks (#108328)
Implement -sanitizer-coverage-gated-trace-callbacks to gate the
invocation of the tracing callbacks based on the value of a global
variable, which is stored in a specific section.
When this option is enabled, the instrumentation will not call into the
runtime-provided callbacks for tracing, thus only incurring in a trivial
branch without going through a function call. It is up to the runtime to
toggle the value of the global variable in order to enable tracing.

This option is only supported for trace-pc-guard. 

Note: will add additional support for trace-cmp in a follow up PR.

Patch by Filippo Bigarella

rdar://101626834
2024-10-16 21:52:38 -07:00
Arthur Eubanks
9e6d24f61f Revert "[Inliner] Propagate more attributes to params when inlining (#91101)"
This reverts commit ae778ae7ce.

Creates broken IR, see comments in #91101.
2024-10-16 21:21:34 +00:00
goldsteinn
ae778ae7ce [Inliner] Propagate more attributes to params when inlining (#91101)
- **[Inliner] Add tests for propagating more parameter attributes; NFC**
- **[Inliner] Propagate more attributes to params when inlining**

Add support for propagating:
        - `derefereancable`
        - `derefereancable_or_null`
        - `align`
        - `nonnull`
        - `range`
    
These are only propagated if the parameter to the to-be-inlined callsite
match the exact parameter used in the to-be-inlined function.
2024-10-16 11:53:21 -05:00
Daniel Paoliello
c9f27275c1 [clang][aarch64] Add support for the MSVC qualifiers __ptr32, __ptr64, __sptr, __uptr for AArch64 (#111879)
MSVC has a set of qualifiers to allow using 32-bit signed/unsigned
pointers when building 64-bit targets. This is useful for WoW code
(i.e., the part of Windows that handles running 32-bit application on a
64-bit OS). Currently this is supported on x64 using the 270, 271 and
272 address spaces, but does not work for AArch64 at all.

This change adds the same 270, 271 and 272 address spaces to AArch64 and
adjusts the data layout string accordingly. Clang will generate the
correct address space casts, but these will currently be ignored until
the AArch64 backend is updated to handle them.

Partially fixes #62536

This is a resurrected version of <https://reviews.llvm.org/D158857>
(originally created by @a_vorobev) - I've cleaned it up a little, fixed
the rest of the tests and added to auto-upgrade for the data layout.
2024-10-15 10:37:36 -07:00
Brandon Wu
46f953d1d9 [clang][RISCV] Correct the SEW operand of indexed/fault only first segment intrinsics (#111476)
Indexed segment load/store intrinsics don't have SEW information encoded
in the name, so we need to get the information from its pointer type
argument at runtime.
2024-10-15 07:40:37 -07:00
yabinc
627746581b Reapply "[clang][CodeGen] Zero init unspecified fields in initializers in C" (#109898) (#110051)
This reverts commit d50eaac12f. Also fixes
a bug calculating offsets for bit fields in the original patch.
2024-10-14 16:32:24 -07:00