clang-p2996

Author	SHA1	Message	Date
Phoebe Wang	c72a751dab	[X86][AMX] Support AMX-TRANSPOSE (#113532 ) Ref.: https://cdrdv2.intel.com/v1/dl/getContent/671368	2024-11-01 16:45:03 +08:00
Freddy Ye	8cb6b65542	[X86][CFE] Correct parameter type of _cmpccxadd_epi64 (#114367 ) This fixes correctness of https://gcc.godbolt.org/z/vexf5fW5r	2024-11-01 16:03:28 +08:00
Lei Wang	bef3b54ea1	[InstrPGO] Avoid using global variable to fix potential data race (#114364 ) In https://github.com/llvm/llvm-project/pull/109837, it sets a global variable(`PGOInstrumentColdFunctionOnly`) in PassBuilderPipelines.cpp which introduced a data race detected by TSan. To fix this, I decouple the flag setting, the flags are now set separately(`instrument-cold-function-only-path` is required to be used with `--pgo-instrument-cold-function-only`).	2024-10-31 21:28:13 -07:00
Paul Kirth	913cd11f94	[llvm][fatlto] Drop any CFI related instrumentation after emitting bitcode (#112788 ) We want to support CFI instrumentation for the bitcode section, without miscompiling the object code portion of a FatLTO object. We can reuse the existing mechanisms in the LowerTypeTestsPass to do that, by just adding the pass to the FatLTO pipeline after the EmbedBitcodePass with the correct options set. Fixes #112053	2024-10-31 12:40:21 -07:00
Tex Riddell	6582785d01	Add CHECK-LABEL to avoid source tree path sensitivity in test (#112461 ) The test `clang/test/CodeGen/2004-02-20-Builtins.c` will erroneously fail if "builtin" is in the path to your source tree. This change adds a `CHECK-LABEL !llvm.ident` after the `CHECK-NOT` to avoid searching into the metadata containing the path.	2024-10-31 09:04:48 -07:00
Dmitry Chernenkov	d924a9ba03	Revert "[InstrPGO] Support cold function coverage instrumentation (#109837 )" This reverts commit `e517cfc531`.	2024-10-31 10:55:17 +00:00
Dmitry Chernenkov	06e28ed84f	Revert "specify clang --target to fix breakage on AIX (#114127 )" This reverts commit `cc60c46e39`.	2024-10-31 10:55:17 +00:00
Feng Zou	8127162427	[X86][AMX] Support AMX-FP8 (#113850 ) Ref.: https://cdrdv2.intel.com/v1/dl/getContent/671368	2024-10-31 10:14:25 +08:00
Alexandros Lamprineas	5dac2db5a8	[FMV][AArch64] Remove features which can be expressed as a combination of others. (#113580 ) Removes sve-bf16, sve-ebf16, and sve-i8mm since they are obsolete. One could write target_version("sve+bf16") instead of sve-bf16 for instance. Approved in ACLE as https://github.com/ARM-software/acle/pull/353	2024-10-30 11:53:50 +00:00
Lei Wang	cc60c46e39	specify clang --target to fix breakage on AIX (#114127 ) `-fprofile-sample-use` is not supported on AIX, which caused a CI failure.	2024-10-29 21:06:43 -07:00
Simon Pilgrim	bf6c483e47	[clang][x86] Add constexpr support for SSE2 _mm_set_epi intrinsics	2024-10-29 15:39:15 +00:00
Simon Pilgrim	e281d96a81	[clang][x86] Add constexpr support for _mm_add_epi32/64 and _mm_sub_epi32/64	2024-10-29 14:34:19 +00:00
Simon Pilgrim	f257e9bdbb	[clang][x86] Update AVX/AVX512 setzero constexpr tests to use the TEST_CONSTEXPR macro	2024-10-29 14:34:19 +00:00
Simon Pilgrim	f537792f3f	[X86] Refactor the SSE intrinsics constexpr tests to simplify future expansion (#112578 ) I'm hoping to make a large proportion of the SSE/AVX intrinsics usable in constant expressions - eventually anything that doesn't touch memory or system settings - making it much easier to utilize SSE/AVX intrinsics in various math libraries etc. My initial implementation placed the tests at the end of the test file, similar to how smaller files already handle their tests. However, what I'm finding is that this approach doesn't scale when trying to track coverage of so many intrinsics - many keep getting missed, and it gets messy; so what I'm proposing is to instead keep each intrinsic's generic IR test and its constexpr tests together to make them easier to track together, wrapping the static_assert inside a macro to disable on C and pre-C++11 tests. I'm open to alternative suggestions before I invest too much time getting this work done :)	2024-10-29 11:00:35 +00:00
Jesse Huang	335e68d8bc	[Clang][RISCV] Support -fcf-protection=return for RISC-V (#112477 ) Enables the support of `-fcf-protection=return` on RISC-V, which requires Zicfiss. It also adds a string attribute "hw-shadow-stack" to every function if the option is set on RISC-V	2024-10-29 15:47:49 +08:00
Matthias Braun	36c1194906	Remove optimization flags from clang codegen tests (#113714 ) - Remove an -O3 flag from a couple of clang x86 codegen tests so the tests do not need to be updated when optimizations in LLVM change. - Change the tests to use utils/update_cc_test_checks.sh - Change from apple/darwin triples to generic x86_64-- and i386-- because it was not relevant to the test but `update_cc_test_checks` seems to be unable to handle platforms that prepend `_` to function names.	2024-10-28 15:34:56 -07:00
Lei Wang	e517cfc531	[InstrPGO] Support cold function coverage instrumentation (#109837 ) This patch adds support for cold function coverage instrumentation based on sampling PGO counts. The major motivation is to detect dead functions for the services that are optimized with sampling PGO. If a function is covered by sampling profile count (e.g., those with an entry count > 0), we choose to skip instrumenting those functions, which significantly reduces the instrumentation overhead. More details about the implementation and flags: - Added a flag `--pgo-instrument-cold-function-only` in `PGOInstrumentation.cpp` as the main switch to control skipping the instrumentation. - Built the extra instrumentation passes(a bundle of passes in `addPGOInstrPasses`) under sampling PGO pipeline. This is controlled by `--instrument-cold-function-only-path` flag. - Added a driver flag `-fprofile-generate-cold-function-coverage`: - 1) Config the flags in one place, i,e. adding `--instrument-cold-function-only-path=<...>` and `--pgo-function-entry-coverage`. Note that the instrumentation file path is passed through `--instrument-sample-cold-function-path`, because we cannot use the `PGOOptions.ProfileFile` as it's already used by `-fprofile-sample-use=<...>`. - 2) makes linker to link `compiler_rt.profile` lib(see [ToolChain.cpp#L1125-L1131](https://github.com/llvm/llvm-project/blob/main/clang/lib/Driver/ToolChain.cpp#L1125-L1131) ). - Added a flag(`--pgo-cold-instrument-entry-threshold`) to config entry count to determine cold function. Overall, the full command is like: ``` clang++ -O2 -fprofile-generate-cold-function-coverage=<...> -fprofile-sample-use=<...> code.cc -o code ```	2024-10-28 10:13:45 -07:00
Aaron Ballman	af7c58b7ea	Remove support for RenderScript (#112916 ) See https://discourse.llvm.org/t/rfc-deprecate-and-eventually-remove-renderscript-support/81284 for the RFC	2024-10-28 12:48:42 -04:00
Momchil Velikov	53f7f8ecca	[Clang][AArch64] Fix Pure Scalables Types argument passing and return (#112747 ) Pure Scalable Types are defined in AAPCS64 here: https://github.com/ARM-software/abi-aa/blob/main/aapcs64/aapcs64.rst#pure-scalable-types-psts And should be passed according to Rule C.7 here: https://github.com/ARM-software/abi-aa/blob/main/aapcs64/aapcs64.rst#682parameter-passing-rules This part of the ABI is completely unimplemented in Clang, instead it treats PSTs sometimes as HFAs/HVAs, sometime as general composite types. This patch implements the rules for passing PSTs by employing the `CoerceAndExpand` method and extending it to: * allow array types in the `coerceToType`; Now only `[N x i8]` are considered padding. * allow mismatch between the elements of the `coerceToType` and the elements of the `unpaddedCoerceToType`; AArch64 uses this to map fixed-length vector types to SVE vector types. Corectly passing a PST argument needs a decision in Clang about whether to pass it in memory or registers or, equivalently, whether to use the `Indirect` or `Expand/CoerceAndExpand` method. It was considered relatively harder (or not practically possible) to make that decision in the AArch64 backend. Hence this patch implements the register counting from AAPCS64 (cf. `NSRN`, `NPRN`) to guide the Clang's decision.	2024-10-28 15:43:14 +00:00
Momchil Velikov	1df5c94343	[AArch64] Implement FP8 floating-point mode helper intrinsics (#100608 ) Implement FP8 mode helper intrinsics (as inline functions) as specified in ACLE 2024Q3 "14.2 Helper intrinsics" https://github.com/ARM-software/acle/releases/download/r2024Q3/acle-2024Q3.pdf	2024-10-28 11:22:38 +00:00
Freddy Ye	5aa1275d03	[X86] Support SM4 EVEX version intrinsics/instructions. (#113402 ) Ref.: https://cdrdv2.intel.com/v1/dl/getContent/671368	2024-10-28 10:46:16 +08:00
Alex MacLean	fb33af08e4	[NVPTX] Remove nvvm.ldg.global.* intrinsics (#112834 ) Remove these intrinsics which can be better represented by load instructions with `!invariant.load` metadata: - llvm.nvvm.ldg.global.i - llvm.nvvm.ldg.global.f - llvm.nvvm.ldg.global.p	2024-10-27 16:14:13 -07:00
davidtrevelyan	4102625380	[rtsan][llvm][NFC] Rename sanitize_realtime_unsafe attr to sanitize_realtime_blocking (#113155 ) # What This PR renames the newly-introduced llvm attribute `sanitize_realtime_unsafe` to `sanitize_realtime_blocking`. Likewise, sibling variables such as `SanitizeRealtimeUnsafe` are renamed to `SanitizeRealtimeBlocking` respectively. There are no other functional changes. # Why? - There are a number of problems that can cause a function to be real-time "unsafe", - we wish to communicate what problems rtsan detects and why they're unsafe, and - a generic "unsafe" attribute is, in our opinion, too broad a net - which may lead to future implementations that need extra contextual information passed through them in order to communicate meaningful reasons to users. - We want to avoid this situation and make the runtime library boundary API/ABI as simple as possible, and - we believe that restricting the scope of attributes to names like `sanitize_realtime_blocking` is an effective means of doing so. We also feel that the symmetry between `[[clang::blocking]]` and `sanitize_realtime_blocking` is easier to follow as a developer. # Concerns - I'm aware that the LLVM attribute `sanitize_realtime_unsafe` has been part of the tree for a few weeks now (introduced here: https://github.com/llvm/llvm-project/pull/106754). Given that it hasn't been released in version 20 yet, am I correct in considering this to not be a breaking change?	2024-10-26 13:06:11 +01:00
Gang Chen	4ac0e7e400	[AMDGPU] Add a type for the named barrier (#113614 )	2024-10-25 11:24:47 -07:00
Caroline Concatto	b3703fa504	[AArch64]Update test aarch64-debug-types.c This patch fix the failing tests by adding REQUIRES: aarch64-registered-target This tests was failing in non aarch64 cpu. The test was introduced by: [CLANG][AArch64] Add the modal 8 bit floating-point scalar type (#97277)	2024-10-25 13:30:16 +00:00
CarolineConcatto	49940514e2	[CLANG][AArch64] Add the modal 8 bit floating-point scalar type (#97277 ) ARM ACLE PR#323[1] adds new modal types for 8-bit floating point intrinsic. From the PR#323: ``` ACLE defines the `__mfp8` type, which can be used for the E5M2 and E4M3 8-bit floating-point formats. It is a storage and interchange only type with no arithmetic operations other than intrinsic calls. ```` The type should be an opaque type and its format in undefined in Clang. Only defined in the backend by a status/format register, for AArch64 the FPMR. This patch is an attempt to the add the mfloat8_t scalar type. It has a parser and codegen for the new scalar type. The patch it is lowering to and 8bit unsigned as it has no format. But maybe we should add another opaque type. [1] https://github.com/ARM-software/acle/pull/323	2024-10-25 13:59:46 +01:00
Phoebe Wang	c2d2b3b808	[test] Avoid writing to a potentially write-protected dir (#113674 )	2024-10-25 18:43:40 +08:00
Kiran	a96c14eeb8	[Clang] Always forward sret parameters to musttail calls If a call using the musttail attribute returns it's value through an sret argument pointer, we must forward an incoming sret pointer to it, instead of creating a new alloca. This is always possible because the musttail attribute requires the caller and callee to have the same argument and return types.	2024-10-25 09:34:08 +01:00
Freddy Ye	c4248fa3ed	[X86] Support MOVRS and AVX10.2 instructions. (#113274 ) Ref.: https://cdrdv2.intel.com/v1/dl/getContent/671368	2024-10-25 09:00:19 +08:00
Alexandros Lamprineas	a91ebcdd91	[FMV][AArch64] Unify aes with pmull and sve2-aes with sve2-pmull128. (#111673 ) According to the Arm Architecture Reference Manual for A-profile architecture you can't have one feature without having the other: ID_AA64ZFR0_EL1.AES, bits [7:4] > FEAT_SVE_AES implements the functionality identified by the value 0b0001. > FEAT_SVE_PMULL128 implements the functionality identified by the value 0b0010. > The permitted values are 0b0000 and 0b0010. (The following was removed from the latest release of the specification, but it appears to be a mistake that was not intended to relax the architecture constraints. The discrepancy has been reported) ID_AA64ISAR0_EL1.AES, bits [7:4] > FEAT_AES implements the functionality identified by the value 0b0001. > FEAT_PMULL implements the functionality identified by the value 0b0010. > From Armv8, the permitted values are 0b0000 and 0b0010. Approved in ACLE as https://github.com/ARM-software/acle/pull/352	2024-10-23 16:28:55 +01:00
CarolineConcatto	6dad29aebc	[CLANG][AArch64]Add Neon vectors for mfloat8_t (#99865 ) This patch adds these new vector sizes for neon: mfloat8x16_t and mfloat8x8_t According to the ARM ACLE PR#323[1]. [1] ARM-software/acle#323	2024-10-23 13:23:18 +01:00
Florian Hahn	4334f317e7	[TBAA] Extend pointer TBAA to pointers of non-builtin types. (#110569 ) Extend the logic added in `123c036bd3` (https://github.com/llvm/llvm-project/pull/76612) to support pointers to non-builtin types by using the mangled name of the canonical type. PR: https://github.com/llvm/llvm-project/pull/110569	2024-10-22 16:23:34 -07:00
Paul Walker	5bb34803a4	[NFC] Migrate tests to use autoupdate for CHECK lines.	2024-10-22 12:55:15 +00:00
Congcong Cai	c0c36aa018	[clang codegen] fix crash emitting __array_rank (#113186 ) Fixed: #113044 the type of `ArrayTypeTraitExpr` can be changed, use i32 directly is incorrect. --------- Co-authored-by: Eli Friedman <efriedma@quicinc.com>	2024-10-22 17:03:51 +08:00
Alexandros Lamprineas	b6e9ba017f	[FMV][AArch64] Unify features memtag and memtag2. (#112511 ) If we split these features in the compiler (see relevant pull request https://github.com/llvm/llvm-project/pull/109299), we would only be able to hand-write a 'memtag2' version using inline assembly since the compiler cannot generate the instructions that become available with FEAT_MTE2. However these instructions only work at Exception Level 1, so they would be unusable since FMV is a user space facility. I am therefore unifying them. Approved in ACLE as https://github.com/ARM-software/acle/pull/351	2024-10-21 21:40:57 +01:00
Alex Rønne Petersen	d906ac52ab	[clang][AVR] Fix basic type size/alignment values to match avr-gcc. (#111290 ) Closes #102172	2024-10-21 12:30:03 +02:00
Piyou Chen	c77e836123	[RISCV][FMV] Remove support for negative priority (#112161 ) Ensure that target_version and target_clones do not accept negative numbers for the priority feature. Base on discussion on https://github.com/riscv-non-isa/riscv-c-api-doc/pull/85.	2024-10-21 16:10:22 +08:00
Sam Elliott	228f88fdc8	[RISCV] Inline Assembly: RVC constraint and N modifier (#112561 ) This change implements support for the `cr` and `cf` register constraints (which allocate a RVC GPR or RVC FPR respectively), and the `N` modifier (which prints the raw encoding of a register rather than the name). The intention behind these additions is to make it easier to use inline assembly when assembling raw instructions that are not supported by the compiler, for instance when experimenting with new instructions or when supporting proprietary extensions outside the toolchain. These implement part of my proposal in riscv-non-isa/riscv-c-api-doc#92 As part of the implementation, I felt there was not enough coverage of inline assembly and the "in X" floating-point extensions, so I have added more regression tests around these configurations.	2024-10-18 10:40:38 +01:00
c8ef	761fa5844e	[TLI] Add support for the `ilogb` libcall. (#112725 ) This patch adds the `ilogb` libcall. Constant folding will be handled in subsequent patches.	2024-10-18 14:20:34 +08:00
Keith Packard	44b020a381	[PowerPC][ISelLowering] Support -mstack-protector-guard=tls (#110928 ) Add support for using a thread-local variable with a specified offset for holding the stack guard canary value. This supports both 32- and 64- bit PowerPC targets. This mirrors changes from #108942 but targeting PowerPC instead of RISCV. Because both of these PRs modify the same driver functions, this series is stack on top of the RISC-V one. --------- Signed-off-by: Keith Packard <keithp@keithp.com>	2024-10-17 19:06:47 -07:00
goldsteinn	69a798a996	Reapply "[Inliner] Propagate more attributes to params when inlining (#91101 )" (2nd Attempt) (#112749 ) Root cause of the bug was code hanging onto `range` attr after changing BitWidth. This was fixed in PR #112633.	2024-10-17 20:28:47 -05:00
Bill Wendling	8c62bf54df	[Clang] Disable use of the counted_by attribute for whole struct pointers (#112636 ) The whole struct is specificed in the __bdos. The calculation of the whole size of the structure can be done in two ways: 1) sizeof(struct S) + count * sizeof(typeof(fam)) 2) offsetof(struct S, fam) + count * sizeof(typeof(fam)) The first will add any remaining whitespace that might exist after allocation while the second method is more precise, but not quite expected from programmers. See [1] for a discussion of the topic. GCC isn't (currently) able to calculate __bdos on a pointer to the whole structure. Therefore, because of the above issue, we'll choose to match what GCC does for consistency's sake. [1] https://lore.kernel.org/lkml/ZvV6X5FPBBW7CO1f@archlinux/ Co-authored-by: Eli Friedman <efriedma@quicinc.com>	2024-10-17 21:52:40 +00:00
Qiongsi Wu	f9d0789064	[PGO] Initialize GCOV Writeout and Reset Functions in the Runtime on AIX (#108570 ) This PR registers the writeout and reset functions for `gcov` for all modules in the PGO runtime, instead of registering them using global constructors in each module. The change is made for AIX only, but the same mechanism works on Linux on Power. When registering such functions using global constructors in each module without `-ffunction-sections`, the AIX linker cannot garbage collect unused undefined symbols, because such symbols are grouped in the same section as the `__sinit` symbol. Keeping such undefined symbols causes link errors (see test case https://github.com/llvm/llvm-project/pull/108570/files#diff-500a7e1ba871e1b6b61b523700d5e30987900002add306e1b5e4972cf6d5a4f1R1 for this scenario). This PR implements the initialization in the runtime, hence avoiding introducing `__sinit` into each module. The implementation adds a new global variable `__llvm_covinit_functions` to each module. This new global variable contains the function pointers to the `Writeout` and `Reset` functions. `__llvm_covinit_functions`'s section is the named section `__llvm_covinit`. The linker will aggregate all the `__llvm_covinit` sections from each module to form one single named section in the final binary. The pair of functions ``` const __llvm_gcov_init_func_struct __llvm_profile_begin_covinit(); const __llvm_gcov_init_func_struct __llvm_profile_end_covinit(); ``` are implemented to return the start and end address of this named section in the final binary, and they are used in function ``` __llvm_profile_gcov_initialize() ``` (which is a constructor function in the runtime) so the runtime knows the addresses of all the `Writeout` and `Reset` functions from all the modules. One noticeable implementation detail relevant to AIX is that to preserve the `__llvm_covinit` from the linker's garbage collection, a `.ref` pseudo instruction is inserted into them, referring to the section that contains the `__llvm_gcov_ctr` variables, which are used in the instrumented code. The `__llvm_gcov_ctr` variables did not belong to named sections before, but this PR added them to the `__llvm_gcov_ctr_section` named section, so we can add a `.ref` pseudo instruction that refers to them in the `__llvm_covinit` section.	2024-10-17 09:32:10 -04:00
CarolineConcatto	cb43021e57	[CLANG]Add Scalable vectors for mfloat8_t (#101644 ) This patch adds these new vector sizes for sve: svmfloat8_t According to the ARM ACLE PR#323[1]. [1] ARM-software/acle#323	2024-10-17 09:22:55 +01:00
thetruestblue	927af63fdd	[SanitizerCoverage] Add an option to gate the invocation of the tracing callbacks (#108328 ) Implement -sanitizer-coverage-gated-trace-callbacks to gate the invocation of the tracing callbacks based on the value of a global variable, which is stored in a specific section. When this option is enabled, the instrumentation will not call into the runtime-provided callbacks for tracing, thus only incurring in a trivial branch without going through a function call. It is up to the runtime to toggle the value of the global variable in order to enable tracing. This option is only supported for trace-pc-guard. Note: will add additional support for trace-cmp in a follow up PR. Patch by Filippo Bigarella rdar://101626834	2024-10-16 21:52:38 -07:00
Arthur Eubanks	9e6d24f61f	Revert "[Inliner] Propagate more attributes to params when inlining (#91101 )" This reverts commit `ae778ae7ce`. Creates broken IR, see comments in #91101.	2024-10-16 21:21:34 +00:00
goldsteinn	ae778ae7ce	[Inliner] Propagate more attributes to params when inlining (#91101 ) - [Inliner] Add tests for propagating more parameter attributes; NFC - [Inliner] Propagate more attributes to params when inlining Add support for propagating: - `derefereancable` - `derefereancable_or_null` - `align` - `nonnull` - `range` These are only propagated if the parameter to the to-be-inlined callsite match the exact parameter used in the to-be-inlined function.	2024-10-16 11:53:21 -05:00
Daniel Paoliello	c9f27275c1	[clang][aarch64] Add support for the MSVC qualifiers __ptr32, __ptr64, __sptr, __uptr for AArch64 (#111879 ) MSVC has a set of qualifiers to allow using 32-bit signed/unsigned pointers when building 64-bit targets. This is useful for WoW code (i.e., the part of Windows that handles running 32-bit application on a 64-bit OS). Currently this is supported on x64 using the 270, 271 and 272 address spaces, but does not work for AArch64 at all. This change adds the same 270, 271 and 272 address spaces to AArch64 and adjusts the data layout string accordingly. Clang will generate the correct address space casts, but these will currently be ignored until the AArch64 backend is updated to handle them. Partially fixes #62536 This is a resurrected version of <https://reviews.llvm.org/D158857> (originally created by @a_vorobev) - I've cleaned it up a little, fixed the rest of the tests and added to auto-upgrade for the data layout.	2024-10-15 10:37:36 -07:00
Brandon Wu	46f953d1d9	[clang][RISCV] Correct the SEW operand of indexed/fault only first segment intrinsics (#111476 ) Indexed segment load/store intrinsics don't have SEW information encoded in the name, so we need to get the information from its pointer type argument at runtime.	2024-10-15 07:40:37 -07:00
yabinc	627746581b	Reapply "[clang][CodeGen] Zero init unspecified fields in initializers in C" (#109898 ) (#110051 ) This reverts commit `d50eaac12f`. Also fixes a bug calculating offsets for bit fields in the original patch.	2024-10-14 16:32:24 -07:00

1 2 3 4 5 ...

9431 Commits