clang-p2996

Author	SHA1	Message	Date
S. Bharadwaj Yadavalli	f2650c54c9	[DirectX] Set Shader Flag DisableOptimizations (#126813 ) - Set the shader flag `DisableOptimizations` based on `optnone` attribute of shader entry functions. - Add DXIL Metadata Analysis pass as pre-requisite for Shader Flags pass to obtain entry function information collected therein. - Named module metadata `dx.disable_optimizations` is intended to indicate disabling optimizations (`-O0`) via commandline flag. However, its intent is fulfilled by `optnone` attribute of shader entry functions as implemented in a recent change, and thus not needed. Delete generation of named metadata and corresponding test file `disable_opt.ll`. - Add tests to verify correctness of setting shader flag. Closes #112263	2025-02-12 16:45:01 -05:00
Peter Rong	53c618c071	[clang] run clang-format on some CGObjC files (#126644 ) These files are relatively old and don't confront our formatting rules. It's hard to change them without massive clang-format changes. --------- Signed-off-by: Peter Rong <PeterRong@meta.com>	2025-02-12 11:52:49 -08:00
Harald van Dijk	23209eb1d9	Revert "[DebugInfo] Update DIBuilder insertion to take InsertPosition (#126059 )" This reverts commit `3ec9f7494b`.	2025-02-12 17:50:39 +00:00
Harald van Dijk	3ec9f7494b	[DebugInfo] Update DIBuilder insertion to take InsertPosition (#126059 ) After #124287 updated several functions to return iterators rather than Instruction , it was no longer straightforward to pass their result to DIBuilder. This commit updates DIBuilder methods to accept an InsertPosition instead, so that they can be called with an iterator (preferred), or with a deprecation warning an Instruction , or a BasicBlock *. This commit also updates the existing calls to the DIBuilder methods to pass in iterators.	2025-02-12 17:38:59 +00:00
Nick Sarnie	cb3498c670	[OpenMP][OpenMPIRBuilder] Support SPIR-V device variant matches (#126801 ) We should be able to use `spirv64` as a device variant match and it should be considered a GPU. Also add the triple to an RTTI check. Signed-off-by: Sarnie, Nick <nick.sarnie@intel.com>	2025-02-12 16:40:05 +00:00
Alex MacLean	a282b6c486	[NVPTX] Convert scalar function nvvm.annotations to attributes (#125908 ) Replace some more nvvm.annotations with function attributes, auto-upgrading the annotations as needed. These new attributes will be more idiomatic and compile-time efficient than the annotations. - !"maxclusterrank" / !"cluster_max_blocks" -> "nvvm.maxclusterrank" - !"minctasm" -> "nvvm.minctasm" - !"maxnreg" -> "nvvm.maxnreg"	2025-02-12 07:33:22 -08:00
Matt	a1826b4d26	[OpenMP][SIMD][FIX] Use conservative "omp simd ordered" lowering (#126172 ) A proposed fix for the issue #95611, [OpenMP][SIMD] ordered has no effect in a loop SIMD region as of LLVM 18.1.0 Changes: - Implement new lowering behavior: Conservatively serialize "omp simd" loops that have `omp simd ordered` directive to prevent incorrect vectorization (which results in incorrect execution behavior of the miscompiled program). Implementation outline: - We start with the optimistic default initial value of `LoopStack.setParallel(/Enable=/true);` in `CodeGenFunction::EmitOMPSimdInit(const OMPLoopDirective &D)`. - We only disable the loop parallel memory access assumption with `if (HasOrderedDirective) LoopStack.setParallel(/Enable=/false);` using the `HasOrderedDirective` (which tests for the presence of an `OMPOrderedDirective`). - This results in no longer incorrectly vectorizing the loop when the `omp simd ordered` directive is present. Motivation: We'd like to prevent incorrect vectorization of the loops marked with the `#pragma omp ordered simd` directive which has previously resulted in miscompiled code. At the same time, we'd like the usage outside of the `#pragma omp ordered simd` context to remain unaffected: Note that in the test "clang/test/OpenMP/ordered_codegen.cpp" we only "lose" the `!llvm.access.group` metadata in `foo_simd` alone. This is conservative, in that it's possible some of the loops would be possible to vectorize, but we prefer to avoid miscompilation of the loops that are currently illegal to vectorize. A concrete example follows: ```cpp // "test.c" #include <float.h> #include <math.h> #include <omp.h> #include <stdio.h> #include <stdlib.h> #include <time.h> int compare_float(float x1, float x2, float scalar) { const float diff = fabsf(x1 - x2); x1 = fabsf(x1); x2 = fabsf(x2); const float l = (x2 > x1) ? x2 : x1; if (diff <= l * scalar * FLT_EPSILON) return 1; else return 0; } #define ARRAY_SIZE 256 __attribute__((noinline)) void initialization_loop( float X[ARRAY_SIZE][ARRAY_SIZE], float Y[ARRAY_SIZE][ARRAY_SIZE]) { const float max = 1000.0; srand(time(NULL)); for (int r = 0; r < ARRAY_SIZE; r++) { for (int c = 0; c < ARRAY_SIZE; c++) { X[r][c] = ((float)rand() / (float)(RAND_MAX)) * max; Y[r][c] = X[r][c]; } } } __attribute__((noinline)) void omp_simd_loop(float X[ARRAY_SIZE][ARRAY_SIZE]) { for (int r = 1; r < ARRAY_SIZE; ++r) { for (int c = 1; c < ARRAY_SIZE; ++c) { #pragma omp simd for (int k = 2; k < ARRAY_SIZE; ++k) { #pragma omp ordered simd X[r][k] = X[r][k - 2] + sinf((float)(r / c)); } } } } __attribute__((noinline)) int comparison_loop(float X[ARRAY_SIZE][ARRAY_SIZE], float Y[ARRAY_SIZE][ARRAY_SIZE]) { int totalErrors_simd = 0; const float scalar = 1.0; for (int r = 1; r < ARRAY_SIZE; ++r) { for (int c = 1; c < ARRAY_SIZE; ++c) { for (int k = 2; k < ARRAY_SIZE; ++k) { Y[r][k] = Y[r][k - 2] + sinf((float)(r / c)); } } // check row for simd update for (int k = 0; k < ARRAY_SIZE; ++k) { if (!compare_float(X[r][k], Y[r][k], scalar)) { ++totalErrors_simd; } } } return totalErrors_simd; } int main(void) { float X[ARRAY_SIZE][ARRAY_SIZE]; float Y[ARRAY_SIZE][ARRAY_SIZE]; initialization_loop(X, Y); omp_simd_loop(X); const int totalErrors_simd = comparison_loop(X, Y); if (totalErrors_simd) { fprintf(stdout, "totalErrors_simd: %d \n", totalErrors_simd); fprintf(stdout, "%s : %d - FAIL: error in ordered simd computation.\n", __FILE__, __LINE__); } else { fprintf(stdout, "Success!\n"); } return totalErrors_simd; } ``` Before: ``` $ clang -fopenmp-simd -O3 -ffast-math -lm test.c -o test && ./test totalErrors_simd: 15408 test.c : 76 - FAIL: error in ordered simd computation. ``` clang 19.1.0: https://godbolt.org/z/6EvhxqEhe After: ``` $ clang -fopenmp-simd -O3 -ffast-math test.c -o test && ./test Success! ``` Co-authored-by: Matt P. Dziubinski <matt-p.dziubinski@hpe.com>	2025-02-12 08:53:47 -05:00
Kazu Hirata	67e1e98811	Revert "[Clang] [OpenMP] Add support for '#pragma omp stripe'. (#119891 )" This reverts commit `070f84ebc8`. Buildbot failure: https://lab.llvm.org/buildbot/#/builders/51/builds/10694	2025-02-11 12:39:01 -08:00
Zahira Ammarguellat	070f84ebc8	[Clang] [OpenMP] Add support for '#pragma omp stripe'. (#119891 ) Implement basic parsing and semantic support for `#pragma omp stripe` constuct introduced in https://www.openmp.org/wp-content/uploads/[OpenMP-API-Specification-6-0.pdf](https://www.openmp.org/wp-content/uploads/OpenMP-API-Specification-6-0.pdf), section 11.7.	2025-02-11 13:58:21 -05:00
S. Bharadwaj Yadavalli	b92bab3c01	[HLSL] Appropriately set function attribute optnone (#125937 ) When optimization is disabled, set `optnone` attribute all module entry functions. Updated test in accordance with the change. Closes #124796	2025-02-11 12:29:05 -05:00
Kazu Hirata	8e4e144931	[CodeGen] Avoid repeated hash lookups (NFC) (#126672 )	2025-02-11 09:06:40 -08:00
Nick Sarnie	f3cd223838	[OpenMP][OpenMPIRBuilder] Add initial changes for SPIR-V target frontend support (#125920 ) As Intel is working to add support for SPIR-V OpenMP device offloading in upstream clang/liboffload, we need to modify the OpenMP frontend to allow SPIR-V as well as generate valid IR for SPIR-V. For example, we need the frontend to generate code to define and interact with device globals used in the DeviceRTL. This is the beginning of what I expect will be (many) other changes, but let's get started with something simple. --------- Signed-off-by: Sarnie, Nick <nick.sarnie@intel.com>	2025-02-10 16:16:40 +00:00
Wael Yehia	8e61aae4a8	[profile] Add a clang option -fprofile-continuous that enables continuous instrumentation profiling mode (#124353 ) In Continuous instrumentation profiling mode, profile or coverage data collected via compiler instrumentation is continuously synced to the profile file. This feature has existed for a while, and is documented here: https://clang.llvm.org/docs/SourceBasedCodeCoverage.html#running-the-instrumented-program This PR creates a user facing option to enable the feature. --------- Co-authored-by: Wael Yehia <wyehia@ca.ibm.com>	2025-02-08 17:25:07 -05:00
Mats Jun Larsen	a07928c3ce	[CodeGen][Hexagon] Replace PointerType::getUnqual(Type) with opaque version (NFC) (#126274 ) Follow-up to https://github.com/llvm/llvm-project/issues/123569 The obsolete bitcasts on the LoadInsts are also removed.	2025-02-08 15:13:23 +00:00
Mats Jun Larsen	e0fee55a55	[CodeGen] Replace of PointerType::get(Type) with opaque version (NFC) (#124771 ) Follow-up to https://github.com/llvm/llvm-project/issues/123569	2025-02-08 15:13:02 +00:00
Mats Jun Larsen	df2e8ee7ae	[CodeGen][AArch64] Replace PointerType::getUnqual(Type) with opaque version (NFC) (#126278 ) Follow-up to #123569	2025-02-08 13:23:08 +00:00
Mats Jun Larsen	54e0c2bbe2	[CodeGen][SystemZ] Replace PointerType::getUnqual(Type) with opaque version (NFC) (#126280 ) Follow-up to #126278	2025-02-08 13:22:53 +00:00
Mats Jun Larsen	4e29148cca	[CodeGen][XCore] Replace PointerType::getUnqual(Type) with opaque version (NFC) (#126279 ) Follow-up to #123569	2025-02-08 13:22:42 +00:00
Sarah Spall	3f8e280206	[HLSL] Implement HLSL Elementwise casting (excluding splat cases); Re-land #118842 (#126258 ) Implement HLSLElementwiseCast excluding support for splat cases Do not support casting types that contain bitfields. Partly closes https://github.com/llvm/llvm-project/issues/100609 and partly closes https://github.com/llvm/llvm-project/issues/100619 Re-land #118842 after fixing warning as an error, found by a buildbot.	2025-02-07 09:12:55 -08:00
Sjoerd Meijer	612df14c00	[Clang][Driver] Add an option to control loop-interchange (#125830 ) This introduces options `-floop-interchange` and `-fno-loop-interchange` to enable/disable the loop-interchange pass. This is part of the work that tries to get that pass enabled by default (#124911), where it was remarked that a user facing option to control this would be convenient to have. The option name is the same as GCC's.	2025-02-07 10:31:24 +00:00
Michael Buch	e00fc80c19	[clang][DebugInfo] Set EnumKind based on enum_extensibility attribute (#126045 ) This is the 2nd part to https://github.com/llvm/llvm-project/pull/124752. Here we make sure to set the `DICompositeType` `EnumKind` if the enum was declared with `__attribute__((enum_extensibility(...)))`. In DWARF this will be rendered as `DW_AT_APPLE_enum_kind` and will be used by LLDB when creating `clang::EnumDecl`s from debug-info. Depends on https://github.com/llvm/llvm-project/pull/126044	2025-02-07 09:28:10 +00:00
Sarah Spall	14716f2e4b	Revert "[HLSL] Implement HLSL Flat casting (excluding splat cases)" (#126149 ) Reverts llvm/llvm-project#118842	2025-02-06 15:25:20 -08:00
Sarah Spall	01072e546f	[HLSL] Implement HLSL Flat casting (excluding splat cases) (#118842 ) Implement HLSLElementwiseCast excluding support for splat cases Do not support casting types that contain bitfields. Partly closes #100609 and partly closes #100619	2025-02-06 14:38:01 -08:00
Alexey Bataev	3041dd5c20	Revert "[OpenMP][SIMD][FIX] Use conservative "omp simd ordered" lowering" (#126079 ) Reverts llvm/llvm-project#123867 to fix the test failures https://lab.llvm.org/buildbot/#/builders/144/builds/17521	2025-02-06 10:04:11 -05:00
Matt	60d8e6f528	[OpenMP][SIMD][FIX] Use conservative "omp simd ordered" lowering (#123867 ) A proposed fix for #95611 [OpenMP][SIMD] ordered has no effect in a loop SIMD region as of LLVM 18.1.0 Changes: - Implement new lowering behavior: Conservatively serialize "omp simd" loops that have `omp simd ordered` directive to prevent incorrect vectorization (which results in incorrect execution behavior of the miscompiled program). Implementation outline: - We start with the optimistic default initial value of `LoopStack.setParallel(/Enable=/true);` in `CodeGenFunction::EmitOMPSimdInit(const OMPLoopDirective &D)`. - We only disable the loop parallel memory access assumption with `if (HasOrderedDirective) LoopStack.setParallel(/Enable=/false);` using the `HasOrderedDirective` (which tests for the presence of an `OMPOrderedDirective`). - This results in no longer incorrectly vectorizing the loop when the `omp simd ordered` directive is present. Motivation: We'd like to prevent incorrect vectorization of the loops marked with the `#pragma omp ordered simd` directive which has previously resulted in miscompiled code. At the same time, we'd like the usage outside of the `#pragma omp ordered simd` context to remain unaffected: Note that in the test "clang/test/OpenMP/ordered_codegen.cpp" we only "lose" the `!llvm.access.group` metadata in `foo_simd` alone. This is conservative, in that it's possible some of the loops would be possible to vectorize, but we prefer to avoid miscompilation of the loops that are currently illegal to vectorize. A concrete example follows: ```cpp // "test.c" #include <float.h> #include <math.h> #include <omp.h> #include <stdio.h> #include <stdlib.h> #include <time.h> int compare_float(float x1, float x2, float scalar) { const float diff = fabsf(x1 - x2); x1 = fabsf(x1); x2 = fabsf(x2); const float l = (x2 > x1) ? x2 : x1; if (diff <= l * scalar * FLT_EPSILON) return 1; else return 0; } #define ARRAY_SIZE 256 __attribute__((noinline)) void initialization_loop( float X[ARRAY_SIZE][ARRAY_SIZE], float Y[ARRAY_SIZE][ARRAY_SIZE]) { const float max = 1000.0; srand(time(NULL)); for (int r = 0; r < ARRAY_SIZE; r++) { for (int c = 0; c < ARRAY_SIZE; c++) { X[r][c] = ((float)rand() / (float)(RAND_MAX)) * max; Y[r][c] = X[r][c]; } } } __attribute__((noinline)) void omp_simd_loop(float X[ARRAY_SIZE][ARRAY_SIZE]) { for (int r = 1; r < ARRAY_SIZE; ++r) { for (int c = 1; c < ARRAY_SIZE; ++c) { #pragma omp simd for (int k = 2; k < ARRAY_SIZE; ++k) { #pragma omp ordered simd X[r][k] = X[r][k - 2] + sinf((float)(r / c)); } } } } __attribute__((noinline)) int comparison_loop(float X[ARRAY_SIZE][ARRAY_SIZE], float Y[ARRAY_SIZE][ARRAY_SIZE]) { int totalErrors_simd = 0; const float scalar = 1.0; for (int r = 1; r < ARRAY_SIZE; ++r) { for (int c = 1; c < ARRAY_SIZE; ++c) { for (int k = 2; k < ARRAY_SIZE; ++k) { Y[r][k] = Y[r][k - 2] + sinf((float)(r / c)); } } // check row for simd update for (int k = 0; k < ARRAY_SIZE; ++k) { if (!compare_float(X[r][k], Y[r][k], scalar)) { ++totalErrors_simd; } } } return totalErrors_simd; } int main(void) { float X[ARRAY_SIZE][ARRAY_SIZE]; float Y[ARRAY_SIZE][ARRAY_SIZE]; initialization_loop(X, Y); omp_simd_loop(X); const int totalErrors_simd = comparison_loop(X, Y); if (totalErrors_simd) { fprintf(stdout, "totalErrors_simd: %d \n", totalErrors_simd); fprintf(stdout, "%s : %d - FAIL: error in ordered simd computation.\n", __FILE__, __LINE__); } else { fprintf(stdout, "Success!\n"); } return totalErrors_simd; } ``` Before: ``` $ clang -fopenmp-simd -O3 -ffast-math -lm test.c -o test && ./test totalErrors_simd: 15408 test.c : 76 - FAIL: error in ordered simd computation. ``` clang 19.1.0: https://godbolt.org/z/6EvhxqEhe After: ``` $ clang -fopenmp-simd -O3 -ffast-math test.c -o test && ./test Success! ``` Co-authored-by: Matt P. Dziubinski <matt-p.dziubinski@hpe.com>	2025-02-06 09:44:11 -05:00
Joseph Huber	f1e917d07b	[Offload] Unify offloading entries into a single section (#125731 ) Summary: This patch unifies the existing offloading entires into a single section called `llvm_offload_entires`. This lets us use a more unified offloading infrastructure so that all targets share the same handling. The effect is that people in the runtimes now need to check if the kind is what they expect, but the expectation is that you can combine multiple potential providers into a compile job. Doesn't fully work yet because of other runtime issues, but some day. Mostly this helps the future of liboffload where we want to handle different languages than OpenMP.	2025-02-06 08:24:01 -06:00
Scott Constable	e223485c9b	[X86] Extend kCFI with a 3-bit arity indicator (#121070 ) Kernel Control Flow Integrity (kCFI) is a feature that hardens indirect calls by comparing a 32-bit hash of the function pointer's type against a hash of the target function's type. If the hashes do not match, the kernel may panic (or log the hash check failure, depending on the kernel's configuration). These hashes are computed at compile time by applying the xxHash64 algorithm to each mangled canonical function (or function pointer) type, then truncating the result to 32 bits. This hash is written into each indirect-callable function header by encoding it as the 32-bit immediate operand to a `MOVri` instruction, e.g.: ``` __cfi_foo: nop nop nop nop nop nop nop nop nop nop nop movl $199571451, %eax # hash of foo's type = 0xBE537FB foo: ... ``` This PR extends x86-based kCFI with a 3-bit arity indicator encoded in the `MOVri` instruction's register (reg) field as follows: \| Arity Indicator \| Description \| Encoding in reg field \| \| --------------- \| --------------- \| --------------- \| \| 0 \| 0 parameters \| EAX \| \| 1 \| 1 parameter in RDI \| ECX \| \| 2 \| 2 parameters in RDI and RSI \| EDX \| \| 3 \| 3 parameters in RDI, RSI, and RDX \| EBX \| \| 4 \| 4 parameters in RDI, RSI, RDX, and RCX \| ESP \| \| 5 \| 5 parameters in RDI, RSI, RDX, RCX, and R8 \| EBP \| \| 6 \| 6 parameters in RDI, RSI, RDX, RCX, R8, and R9 \| ESI \| \| 7 \| At least one parameter may be passed on the stack \| EDI \| For example, if `foo` takes 3 register arguments and no stack arguments then the `MOVri` instruction in its kCFI header would instead be written as: ``` movl $199571451, %ebx # hash of foo's type = 0xBE537FB ``` This PR will benefit other CFI approaches that build on kCFI, such as FineIBT. For example, this proposed enhancement to FineIBT must be able to infer (at kernel init time) which registers are live at an indirect call target: https://lkml.org/lkml/2024/9/27/982. If the arity bits are available in the kCFI function header, then this information is trivial to infer. Note that there is another existing PR proposal that includes the 3-bit arity within the existing 32-bit immediate field, which introduces different security properties: https://github.com/llvm/llvm-project/pull/117121.	2025-02-06 10:54:22 +08:00
Saleem Abdulrasool	1901f4ac8e	CodeGen: support static linking for libclosure (#125384 ) When building on Windows, dealing with the BlocksRuntime is slightly more complicated. As we are not guaranteed a formward declaration for the blocks runtime ABI symbols, we may generate the declarations for them. In order to properly link against the well-known types, we always annotated them as `__declspec(dllimport)`. This would require the dynamic linking of the blocks runtime under all conditions. However, this is the only the only possible way to us the library. We may be building a fully sealed (static) executable. In such a case, the well known symbols should not be marked as `dllimport` as they are assumed to be statically available with the static linking to the BlocksRuntime. Introduce a new driver/cc1 option `-static-libclosure` which mirrors the myriad of similar options (`-static-libgcc`, `-static-libstdc++`, -static-libsan`, etc).	2025-02-05 15:15:36 -08:00
Daniil Kovalev	84b0c128a7	[PAC] Do not support some values of branch-protection with ptrauth-returns (#125280 ) This patch does two things. 1. Previously, when checking driver arguments, we emitted an error for unsupported values of `-mbranch-protection` when using pauthtest ABI. The reason for that was ptrauth-returns being enabled as part of pauthtest. This patch changes the check against pauthtest to a check against ptrauth-returns. 2. Similarly, check against values of the following function attribute which are unsupported with ptrauth-returns: `__attribute__((target("branch-protection=XXX`. Note that existing `validateBranchProtection` function is used, and current behavior is to ignore the unsupported attribute value, so no error is emitted.	2025-02-05 11:39:27 +03:00
Sameer Sahasrabuddhe	b85e71b9f2	[llvm] Create() functions for ConvergenceControlInst (#125627 )	2025-02-05 11:41:26 +05:30
Bill Wendling	2eb44aa0a9	[Clang][counted-by] Bail out of visitor for LValueToRValue cast (#125571 ) An LValueToRValue cast shouldn't be ignored, so bail out of the visitor if we encounter one.	2025-02-04 11:00:44 -08:00
Chandler Carruth	cd269fee05	[StrTable] Switch Clang builtins to use string tables This both reapplies #118734, the initial attempt at this, and updates it significantly. First, it uses the newly added `StringTable` abstraction for string tables, and simplifies the construction to build the string table and info arrays separately. This should reduce any `constexpr` compile time memory or CPU cost of the original PR while significantly improving the APIs throughout. It also restructures the builtins to support sharding across several independent tables. This accomplishes two improvements from the original PR: 1) It improves the APIs used significantly. 2) When builtins are defined from different sources (like SVE vs MVE in AArch64), this allows each of them to build their own string table independently rather than having to merge the string tables and info structures. 3) It allows each shard to factor out a common prefix, often cutting the size of the strings needed for the builtins by a factor two. The second point is important both to allow different mechanisms of construction (for example a `.def` file and a tablegen'ed `.inc` file, or different tablegen'ed `.inc files), it also simply reduces the sizes of these tables which is valuable given how large they are in some cases. The third builds on that size reduction. Initially, we use this new sharding rather than merging tables in AArch64, LoongArch, RISCV, and X86. Mostly this helps ensure the system works, as without further changes these still push scaling limits. Subsequent commits will more deeply leverage the new structure, including using the prefix capabilities which cannot be easily factored out here and requires deep changes to the targets.	2025-02-04 18:04:57 +00:00
Pranav Kant	e8a486ea97	[clang] Return larger CXX records in memory (#120670 ) We incorrectly return CXX records in AVX registers when they should be returned in memory. This is violation of x86-64 psABI. Detailed discussion is here: https://groups.google.com/g/x86-64-abi/c/BjOOyihHuqg/m/KurXdUcWAgAJ	2025-02-04 09:42:12 -08:00
Hans Wennborg	83ff9d4a34	Revert "[Win/X86] Make _m_prefetch[w] builtins to avoid winnt.h conflicts (#115099 )" This broke the build, see buildbot comments on the PR. This reverts commit `ee92122b53` and follow-up `5dccfd9283`.	2025-02-04 11:19:20 +01:00
Reid Kleckner	ee92122b53	[Win/X86] Make _m_prefetch[w] builtins to avoid winnt.h conflicts (#115099 ) This is similar in spirit to previous changes to make _mm_mfence builtins to avoid conflicts with winnt.h and other MSVC ecosystem headers that pre-declare compiler intrinsics as extern "C" symbols. Also update the feature flag for _mm_prefetch to sse, which is more accurate than mmx. This should fix issue #87515.	2025-02-03 14:05:58 -08:00
Kazu Hirata	cd4e36027f	[CodeGen] Migrate away from PointerUnion::dyn_cast (NFC) (#125456 ) Note that PointerUnion::dyn_cast has been soft deprecated in PointerUnion.h: // FIXME: Replace the uses of is(), get() and dyn_cast() with // isa<T>, cast<T> and the llvm::dyn_cast<T> Literal migration would result in dyn_cast_if_present (see the definition of PointerUnion::dyn_cast), but this patch uses dyn_cast because we expect E to be nonnull.	2025-02-03 12:27:21 -08:00
erichkeane	99a9133a68	[OpenACC] Implement Sema/AST for 'atomic' construct The atomic construct is a particularly complicated one. The directive itself is pretty simple, it has 5 options for the 'atomic-clause'. However, the associated statement is fairly complicated. 'read' accepts: v = x; 'write' accepts: x = expr; 'update' (or no clause) accepts: x++; x--; ++x; --x; x binop= expr; x = x binop expr; x = expr binop x; 'capture' accepts either a compound statement, or: v = x++; v = x--; v = ++x; v = --x; v = x binop= expr; v = x = x binop expr; v = x = expr binop x; IF 'capture' has a compound statement, it accepts: {v = x; x binop= expr; } {x binop= expr; v = x; } {v = x; x = x binop expr; } {v = x; x = expr binop x; } {x = x binop expr ;v = x; } {x = expr binop x; v = x; } {v = x; x = expr; } {v = x; x++; } {v = x; ++x; } {x++; v = x; } {++x; v = x; } {v = x; x--; } {v = x; --x; } {x--; v = x; } {--x; v = x; } While these are all quite complicated, there is a significant amount of similarity between the 'capture' and 'update' lists, so this patch reuses a lot of the same functions. This patch implements the entirety of 'atomic', creating a new Sema file for the sema for it, as it is fairly sizable.	2025-02-03 07:22:22 -08:00
Owen Anderson	8f025f2a93	[clang] Do not emit template parameter objects as COMDATs when they have internal linkage. (#125448 ) Per the ELF spec, section groups may only contain local symbols if those symbols are only referenced from within the section group. [1] In the case of template parameter objects, they can be referenced from outside the group when the type of the object was declared in an anonymous namespace. In that case, we can't place the object in a COMDAT. This matches GCC's linkage behavior on the test input. [1]: https://www.sco.com/developers/gabi/latest/ch4.sheader.html#section_groups	2025-02-03 23:26:22 +13:00
Kazu Hirata	e11e65f08b	[CodeGen] Migrate away from PointerUnion::dyn_cast (NFC) (#125336 ) Note that PointerUnion::dyn_cast has been soft deprecated in PointerUnion.h: // FIXME: Replace the uses of is(), get() and dyn_cast() with // isa<T>, cast<T> and the llvm::dyn_cast<T> Literal migration would result in dyn_cast_if_present (see the definition of PointerUnion::dyn_cast), but this patch uses dyn_cast because we expect E to be nonnull.	2025-02-01 08:13:41 -08:00
Balazs Benics	65708bad57	[clang][CodeGenOpenCL][NFC] Remove redundant map lookups (#125285 )	2025-02-01 08:21:15 +01:00
Florian Hahn	77d3f8a925	[TBAA] Don't emit pointer-tbaa for void pointers. (#122116 ) While there are no special rules in the standards regarding void pointers and strict aliasing, emitting distinct tags for void pointers break some common idioms and there is no good alternative to re-write the code without strict-aliasing violations. An example is to count the entries in an array of pointers: int count_elements(void * values) { void **seq = values; int count; for (count = 0; seq && seq[count]; count++); return count; } https://clang.godbolt.org/z/8dTv51v8W An example in the wild is from https://github.com/llvm/llvm-project/issues/119099 This patch avoids emitting distinct tags for void pointers, to avoid those idioms causing mis-compiles for now. Fixes https://github.com/llvm/llvm-project/issues/119099. Fixes https://github.com/llvm/llvm-project/issues/122537. PR: https://github.com/llvm/llvm-project/pull/122116	2025-01-31 11:38:14 +00:00
Oliver Stannard	97b066f4e9	[ARM] Empty structs are 1-byte for C++ ABI (#124762 ) For C++ (but not C), empty structs should be passed to functions as if they are a 1 byte object with 1 byte alignment. This is defined in Arm's CPPABI32: https://github.com/ARM-software/abi-aa/blob/main/cppabi32/cppabi32.rst For the purposes of parameter passing in AAPCS32, a parameter whose type is an empty class shall be treated as if its type were an aggregate with a single member of type unsigned byte. The AArch64 equivalent of this has an exception for structs containing an array of size zero, I've kept that logic for ARM. I've not found a reason for this exception, but I've checked that GCC does have the same behaviour for ARM as it does for AArch64. The AArch64 version has an Apple ABI with different rules, which ignores empty structs in both C and C++. This is documented at https://developer.apple.com/documentation/xcode/writing-arm64-code-for-apple-platforms. The ARM equivalent of that appears to be AAPCS16_VFP, used for WatchOS, but I can't find any documentation for that ABI, so I'm not sure what rules it should follow. For now I've left it following the AArch64 Apple rules.	2025-01-31 09:03:01 +00:00
David Green	9f1c825fb6	[AArch64] Enable vscale_range with +sme (#124466 ) If we have +sme but not +sve, we would not set vscale_range on functions. It should be valid to apply it with the same range with just +sme, which can help mitigate some performance regressions in cases such as scalable vector bitcasts (https://godbolt.org/z/exhe4jd8d).	2025-01-31 07:57:43 +00:00
Bill Wendling	cff0a460ae	[Clang][counted_by] Refactor __builtin_dynamic_object_size on FAMs (#122198 ) Refactoring of how __builtin_dynamic_object_size() is calculated for flexible array members (in preparation for adding support for the 'counted_by' attribute on pointers in structs). The only functionality change is that we use the already emitted Expr code to build our calculations off of rather than re-emitting the Expr. That allows the 'StructFieldAccess' visitor to sift through all casts and ArraySubscriptExprs to find the first MemberExpr. We build our GEPs and calculate offsets based off of relative distances from that MemberExpr. The testcase passes execution tests. Calculate the flexible array member's object size using these formulae (note: if the calculation is negative, we return 0.): struct p; struct s { /* ... / int count; struct p array[] __attribute__((counted_by(count))); }; 1) 'ptr->array': count = ptr->count; flexible_array_member_base_size = sizeof (ptr->array); flexible_array_member_size = count flexible_array_member_base_size; if (flexible_array_member_size < 0) return 0; return flexible_array_member_size; 2) '&ptr->array[idx]': count = ptr->count; index = idx; flexible_array_member_base_size = sizeof (ptr->array); flexible_array_member_size = count flexible_array_member_base_size; index_size = index * flexible_array_member_base_size; if (flexible_array_member_size < 0 \|\| index < 0) return 0; return flexible_array_member_size - index_size; 3) '&ptr->field': count = ptr->count; sizeof_struct = sizeof (struct s); flexible_array_member_base_size = sizeof (ptr->array); flexible_array_member_size = count flexible_array_member_base_size; field_offset = offsetof (struct s, field); offset_diff = sizeof_struct - field_offset; if (flexible_array_member_size < 0) return 0; return offset_diff + flexible_array_member_size; 4) '&ptr->field_array[idx]': count = ptr->count; index = idx; sizeof_struct = sizeof (struct s); flexible_array_member_base_size = sizeof (ptr->array); flexible_array_member_size = count flexible_array_member_base_size; field_base_size = sizeof (ptr->field_array); field_offset = offsetof (struct s, field) field_offset += index field_base_size; offset_diff = sizeof_struct - field_offset; if (flexible_array_member_size < 0 \|\| index < 0) return 0; return offset_diff + flexible_array_member_size; --------- Signed-off-by: Bill Wendling <morbo@google.com>	2025-01-30 15:36:13 -08:00
Daniel Paoliello	845cc968e9	[clang][llvm][aarch64][win] Add a clang flag and module attribute for import call optimization, and remove LLVM flag (#122831 ) Switches import call optimization from being enabled by an LLVM flag to instead using a module attribute, and creates a new Clang flag that will set that attribute. This addresses the concern raised in the original PR: <https://github.com/llvm/llvm-project/pull/121516#discussion_r1911763991> This change also only creates the Called Global info if the module attribute is present, addressing this concern: <https://github.com/llvm/llvm-project/pull/122762#pullrequestreview-2547595934>	2025-01-30 09:51:43 -08:00
Thurston Dang	9c0606a08b	Reapply "[ubsan] Connect -fsanitize-skip-hot-cutoff to LowerAllowCheckPass<cutoffs>" (#125032 ) (#125037 ) This reverts commit `928cad49be` i.e., relands `dccd271127`, with a fix to avoid use-after-scope by changing the lambda to capture by value.	2025-01-30 09:37:16 -08:00
Thurston Dang	928cad49be	Revert "[ubsan] Connect -fsanitize-skip-hot-cutoff to LowerAllowCheckPass<cutoffs>" (#125032 ) Reverts llvm/llvm-project#124857 due to buildbot breakage (https://lab.llvm.org/buildbot/#/builders/46/builds/11310)	2025-01-29 22:03:05 -08:00
Thurston Dang	dccd271127	[ubsan] Connect -fsanitize-skip-hot-cutoff to LowerAllowCheckPass<cutoffs> (#124857 ) This adds the plumbing between -fsanitize-skip-hot-cutoff (introduced in https://github.com/llvm/llvm-project/pull/121619) and LowerAllowCheckPass<cutoffs> (introduced in https://github.com/llvm/llvm-project/pull/124211). The net effect is that -fsanitize-skip-hot-cutoff now combines the functionality of -ubsan-guard-checks and -lower-allow-check-percentile-cutoff (though this patch does not remove those yet), and generalizes the latter to allow per-sanitizer cutoffs. Note: this patch replaces Intrinsic::allow_ubsan_check's SanitizerHandler parameter with SanitizerOrdinal; this is necessary because the hot cutoffs are specified in terms of SanitizerOrdinal (e.g., null, alignment), not SanitizerHandler (e.g., TypeMismatch). Likewise, CodeGenFunction::EmitCheck is changed to emit allow_ubsan_check() for each individual check. --------- Co-authored-by: Vitaly Buka <vitalybuka@gmail.com> Co-authored-by: Vitaly Buka <vitalybuka@google.com>	2025-01-29 21:03:26 -08:00
Jason Rice	abc8812df0	[Clang][P1061] Add stuctured binding packs (#121417 ) This is an implementation of P1061 Structure Bindings Introduce a Pack without the ability to use packs outside of templates. There is a couple of ways the AST could have been sliced so let me know what you think. The only part of this change that I am unsure of is the serialization/deserialization stuff. I followed the implementation of other Exprs, but I do not really know how it is tested. Thank you for your time considering this. --------- Co-authored-by: Yanzuo Liu <zwuis@outlook.com>	2025-01-29 21:43:52 +01:00
Nikita Popov	29441e4f5f	[IR] Convert from nocapture to captures(none) (#123181 ) This PR removes the old `nocapture` attribute, replacing it with the new `captures` attribute introduced in #116990. This change is intended to be essentially NFC, replacing existing uses of `nocapture` with `captures(none)` without adding any new analysis capabilities. Making use of non-`none` values is left for a followup. Some notes: * `nocapture` will be upgraded to `captures(none)` by the bitcode reader. * `nocapture` will also be upgraded by the textual IR reader. This is to make it easier to use old IR files and somewhat reduce the test churn in this PR. * Helper APIs like `doesNotCapture()` will check for `captures(none)`. * MLIR import will convert `captures(none)` into an `llvm.nocapture` attribute. The representation in the LLVM IR dialect should be updated separately.	2025-01-29 16:56:47 +01:00

1 2 3 4 5 ...

17671 Commits