clang-p2996

Author	SHA1	Message	Date
smanna12	bbe1b06fbb	[NFC][CLANG] Fix static analyzer bugs about unnecessary object copies with auto keyword (#75082 ) Reported by Static Analyzer Tool: In EmitAssemblyHelper::RunOptimizationPipeline(): Using the auto keyword without an & causes the copy of an object of type function. /// List of pass builder callbacks ("CodeGenOptions.h"). std::vector<std::function<void(llvm::PassBuilder &)>> PassBuilderCallbacks;	2023-12-22 20:39:22 -06:00
Dinar Temirbulatov	809f2f3d7d	[AArch64][SME2] Add builtins for FDOT, BFDOT, SUDOT, USDOT, SDOT, UDOT. (#75737 ) Add SME2 DOT builtins.	2023-12-21 19:41:24 +00:00
Tomas Matheson	7bd17212ef	Re-land "[AArch64] Codegen support for FEAT_PAuthLR" (#75947 ) This reverts commit `9f0f558742`. Fix expensive checks failure by properly marking register def for ADR.	2023-12-21 18:32:55 +00:00
Dinar Temirbulatov	77c5c44b01	[AArch64][SME2] Add SME2 MLA/MLS builtins. (#75584 ) Add SME2 MLA/MLS builtins.	2023-12-21 16:42:24 +00:00
Tomas Matheson	9f0f558742	Revert "[AArch64] Codegen support for FEAT_PAuthLR" This reverts commit `5992ce90b8`. Builtbot failures with expensive checks enabled.	2023-12-21 16:25:55 +00:00
Tomas Matheson	5992ce90b8	[AArch64] Codegen support for FEAT_PAuthLR - Adds a new +pc option to -mbranch-protection that will enable the use of PC as a diversifier in PAC branch protection code. - When +pauth-lr is enabled (-march=armv9.5a+pauth-lr) in combination with -mbranch-protection=pac-ret+pc, the new 9.5-a instructions (pacibsppc, retaasppc, etc) are used. Documentation for the relevant instructions can be found here: https://developer.arm.com/documentation/ddi0602/2023-09/Base-Instructions/ Co-authored-by: Lucas Prates <lucas.prates@arm.com>	2023-12-21 14:18:33 +00:00
Dimitry Andric	2c27013fa9	[clang] Add getClangVendor() and use it in CodeGenModule.cpp (#75935 ) In `9a38a72f1d` `ProductId` was assigned from the stringified value of `CLANG_VENDOR`, if that macro was defined. However, `CLANG_VENDOR` is supposed to be a string, as it is defined (optionally) as such in the top-level clang `CMakeLists.txt`. Furthermore, `CLANG_VENDOR` is only passed as a build-time define when compiling `Version.cpp`, so add a `getClangVendor()` function to `Version.h`, and use it in `CodegGenModule.cpp`, instead of relying on the macro. Fixes: `9a38a72f1d`	2023-12-20 20:09:39 +01:00
Dimitry Andric	5c1a41f8ad	Revert "[clang] Add getClangVendor() and use it in CodeGenModule.cpp (#75935 )" This reverts commit `9055519103`, due to an incorrectly chosen commit message.	2023-12-20 20:07:22 +01:00
Dimitry Andric	9055519103	[clang] Add getClangVendor() and use it in CodeGenModule.cpp (#75935 ) In `9a38a72f1d` `ProductId` was assigned from the stringified value of `CLANG_VENDOR`, if that macro was defined. However, `CLANG_VENDOR` is supposed to be a string, as it is defined (optionally) as such in the top-level clang `CMakeLists.txt`. Move the addition of `-DCLANG_VENDOR` to the compiler flags from `clang/lib/Basic/CMakeLists.txt` to the top-level `CMakeLists.txt`, so it is consistent across the whole clang codebase. Then remove the stringification from `CodeGenModule.cpp`, to make it work correctly. Fixes: `9a38a72f1d`	2023-12-20 20:03:19 +01:00
Fangrui Song	207cbbd710	DiagnosticHandler: refactor error checking (#75889 ) In LLVMContext::diagnose, set `HasErrors` for `DS_Error` so that all derived `DiagnosticHandler` have correct `HasErrors` information. An alternative is to set `HasErrors` in `DiagnosticHandler::handleDiagnostics`, but all derived `handleDiagnostics` would have to call the base function.	2023-12-19 21:51:26 -08:00
Nikita Popov	a3d2d34e84	[Clang] Use poison as base for vector literals When constructing vectors from elements, use poison instead of undef as the base value. These literals always initialize all elements (padding the remainder with zero), so that the choice of base value does not affect semantics.	2023-12-19 11:53:18 +01:00
Bill Wendling	cca4d6cfd2	Revert counted_by attribute feature (#75857 ) There are many issues that popped up with the counted_by feature. The patch #73730 has grown too large and approval is blocking Linux testing. Includes reverts of: commit `769bc11f68` ("[Clang] Implement the 'counted_by' attribute (#68750)") commit `bc09ec6962` ("[CodeGen] Revamp counted_by calculations (#70606)") commit `1a09cfb2f3` ("[Clang] counted_by attr can apply only to C99 flexible array members (#72347)") commit `a76adfb992` ("[NFC][Clang] Refactor code to calculate flexible array member size (#72790)") commit `d8447c78ab` ("[Clang] Correct handling of negative and out-of-bounds indices (#71877)") Partial commit `b31cd07de5` ("[Clang] Regenerate test checks (NFC)") Closes #73168 Closes #75173	2023-12-18 15:16:09 -08:00
Paul Kirth	d1e2b96b60	[clang][fatlto] Don't set ThinLTO module flag with FatLTO (#75079 ) Since FatLTO now uses the UnifiedLTO pipeline, we should not set the ThinLTO module flag to true, since it may cause an assertion failure. See https://github.com/llvm/llvm-project/issues/70703 for context.	2023-12-18 13:03:13 -08:00
Justin Bogner	4f54d71501	[HLSL][DirectX] Move handling of resource element types into the frontend Rather than shepherding a type name all the way to the backend as a string and attempting to parse it, get the element type out of the AST and store that in the resource annotation metadata directly. Pull Request: https://github.com/llvm/llvm-project/pull/75674	2023-12-18 11:43:52 -07:00
Fangrui Song	96aca7c517	[LTO] Improve diagnostics handling when parsing module-level inline assembly (#75726 ) Non-LTO compiles set the buffer name to "<inline asm>" (`AsmPrinter::addInlineAsmDiagBuffer`) and pass diagnostics to `ClangDiagnosticHandler` (through the `MCContext` handler in `MachineModuleInfoWrapperPass::doInitialization`) to ensure that the exit code is 1 in the presence of errors. In contrast, LTO compiles spuriously succeed even if error messages are printed. ``` % cat a.c void _start() {} asm("unknown instruction"); % clang -c a.c <inline asm>:1:1: error: invalid instruction mnemonic 'unknown' 1 \| unknown instruction \| ^ 1 error generated. % clang -c -flto a.c; echo $? # -flto=thin is the same error: invalid instruction mnemonic 'unknown' unknown instruction ^~~~~~~ error: invalid instruction mnemonic 'unknown' unknown instruction ^~~~~~~ 0 ``` `CollectAsmSymbols` parses inline assembly and is transitively called by both `ModuleSummaryIndexAnalysis::run` and `WriteBitcodeToFile`, leading to duplicate diagnostics. This patch updates `CollectAsmSymbols` to be similar to non-LTO compiles. ``` % clang -c -flto=thin a.c; echo $? <inline asm>:1:1: error: invalid instruction mnemonic 'unknown' 1 \| unknown instruction \| ^ 1 errors generated. 1 ``` The `HasErrors` check does not prevent duplicate warnings but assembler warnings are very uncommon.	2023-12-18 09:46:58 -08:00
Gheorghe-Teodor Bercea	4ef6587715	[Clang][OpenMP] Fix mapping of structs to device (#75642 ) Fix mapping of structs to device. The following example fails: ``` #include <stdio.h> #include <stdlib.h> struct Descriptor { int datum; long int x; int xi; long int arr[1][30]; }; int main() { Descriptor dat = Descriptor(); dat.datum = (int )malloc(sizeof(int)*10); dat.xi = 3; dat.arr[0][0] = 1; #pragma omp target enter data map(to: dat.datum[:10]) map(to: dat) #pragma omp target { dat.xi = 4; dat.datum[dat.arr[0][0]] = dat.xi; } #pragma omp target exit data map(from: dat) return 0; } ``` This is a rework of the previous attempt: https://github.com/llvm/llvm-project/pull/72410	2023-12-18 09:47:59 -05:00
Paul Walker	dea16ebd26	[LLVM][IR] Replace ConstantInt's specialisation of getType() with getIntegerType(). (#75217 ) The specialisation will not be valid when ConstantInt gains native support for vector types. This is largely a mechanical change but with extra attention paid to constant folding, InstCombineVectorOps.cpp, LoopFlatten.cpp and Verifier.cpp to remove the need to call `getIntegerType()`. Co-authored-by: Nikita Popov <github@npopov.com>	2023-12-18 11:58:42 +00:00
Simon Pilgrim	df3ddd78f6	CGBuiltin - fix gcc Wunused-variable warning. NFC.	2023-12-18 11:51:24 +00:00
Akira Hatanaka	31429e7a89	[CodeGen] Emit a more accurate alignment for non-temporal loads/stores (#75675 ) Call EmitPointerWithAlignment to compute the alignment based on the underlying lvalue's alignment when it's available.	2023-12-17 18:22:44 -08:00
Youngsuk Kim	f49e2b05bf	[clang][CGCUDANV] Unify PointerType members of CGNVCUDARuntime (NFC) (#75668 ) Unify 3 `Pointertype *` members which all refer to the same llvm type. Opaque pointer clean-up effort.	2023-12-16 11:47:37 -05:00
Lei Huang	aaa3f72c1c	[PowerPC] Emit libcall to frexpl for calls to frexp(ppcDoublDouble) (#75226 ) On Linux PPC call lib func ``frexpl`` for calls to ``frexp()`` for input of type PPCDoubleDouble. Fixes bug: https://github.com/llvm/llvm-project/issues/64426	2023-12-15 17:23:16 -05:00
Zequan Wu	ab3430f891	[Profile] Add binary profile correlation for code coverage. (#69493 ) ## Motivation Since we don't need the metadata sections at runtime, we can somehow offload them from memory at runtime. Initially, I explored [debug info correlation](https://discourse.llvm.org/t/instrprofiling-lightweight-instrumentation/59113), which is used for PGO with value profiling disabled. However, it currently only works with DWARF and it's be hard to add such artificial debug info for every function in to CodeView which is used on Windows. So, offloading profile metadata sections at runtime seems to be a platform independent option. ## Design The idea is to use new section names for profile name and data sections and mark them as metadata sections. Under this mode, the new sections are non-SHF_ALLOC in ELF. So, they are not loaded into memory at runtime and can be stripped away as a post-linking step. After the process exits, the generated raw profiles will contains only headers + counters. llvm-profdata can be used correlate raw profiles with the unstripped binary to generate indexed profile. ## Data For chromium base_unittests with code coverage on linux, the binary size overhead due to instrumentation reduced from 64M to 38.8M (39.4%) and the raw profile files size reduce from 128M to 68M (46.9%) ``` $ bloaty out/cov/base_unittests.stripped -- out/no-cov/base_unittests.stripped FILE SIZE VM SIZE -------------- -------------- +121% +30.4Mi +121% +30.4Mi .text [NEW] +14.6Mi [NEW] +14.6Mi __llvm_prf_data [NEW] +10.6Mi [NEW] +10.6Mi __llvm_prf_names [NEW] +5.86Mi [NEW] +5.86Mi __llvm_prf_cnts +95% +1.75Mi +95% +1.75Mi .eh_frame +108% +400Ki +108% +400Ki .eh_frame_hdr +9.5% +211Ki +9.5% +211Ki .rela.dyn +9.2% +95.0Ki +9.2% +95.0Ki .data.rel.ro +5.0% +87.3Ki +5.0% +87.3Ki .rodata [ = ] 0 +13% +47.0Ki .bss +40% +1.78Ki +40% +1.78Ki .got +12% +1.49Ki +12% +1.49Ki .gcc_except_table [ = ] 0 +65% +1.23Ki .relro_padding +62% +1.20Ki [ = ] 0 [Unmapped] +13% +448 +19% +448 .init_array +8.8% +192 [ = ] 0 [ELF Section Headers] +0.0% +136 +0.0% +80 [7 Others] +0.1% +96 +0.1% +96 .dynsym +1.2% +96 +1.2% +96 .rela.plt +1.5% +80 +1.2% +64 .plt [ = ] 0 -99.2% -3.68Ki [LOAD #5 [RW]] +195% +64.0Mi +194% +64.0Mi TOTAL $ bloaty out/cov-cor/base_unittests.stripped -- out/no-cov/base_unittests.stripped FILE SIZE VM SIZE -------------- -------------- +121% +30.4Mi +121% +30.4Mi .text [NEW] +5.86Mi [NEW] +5.86Mi __llvm_prf_cnts +95% +1.75Mi +95% +1.75Mi .eh_frame +108% +400Ki +108% +400Ki .eh_frame_hdr +9.5% +211Ki +9.5% +211Ki .rela.dyn +9.2% +95.0Ki +9.2% +95.0Ki .data.rel.ro +5.0% +87.3Ki +5.0% +87.3Ki .rodata [ = ] 0 +13% +47.0Ki .bss +40% +1.78Ki +40% +1.78Ki .got +12% +1.49Ki +12% +1.49Ki .gcc_except_table +13% +448 +19% +448 .init_array +0.1% +96 +0.1% +96 .dynsym +1.2% +96 +1.2% +96 .rela.plt +1.2% +64 +1.2% +64 .plt +2.9% +64 [ = ] 0 [ELF Section Headers] +0.0% +40 +0.0% +40 .data +1.2% +32 +1.2% +32 .got.plt +0.0% +24 +0.0% +8 [5 Others] [ = ] 0 -22.9% -872 [LOAD #5 [RW]] -74.5% -1.44Ki [ = ] 0 [Unmapped] [ = ] 0 -76.5% -1.45Ki .relro_padding +118% +38.8Mi +117% +38.8Mi TOTAL ``` A few things to note: 1. llvm-profdata doesn't support filter raw profiles by binary id yet, so when a raw profile doesn't belongs to the binary being digested by llvm-profdata, merging will fail. Once this is implemented, llvm-profdata should be able to only merge raw profiles with the same binary id as the binary and discard the rest (with mismatched/missing binary id). The workflow I have in mind is to have scripts invoke llvm-profdata to get all binary ids for all raw profiles, and selectively choose the raw pnrofiles with matching binary id and the binary to llvm-profdata for merging. 2. Note: In COFF, currently they are still loaded into memory but not used. I didn't do it in this patch because I noticed that `.lcovmap` and `.lcovfunc` are loaded into memory. A separate patch will address it. 3. This should works with PGO when value profiling is disabled as debug info correlation currently doing, though I haven't tested this yet.	2023-12-14 14:16:38 -05:00
Alan Phipps	8ecbb0404d	Reland "[Coverage][llvm-cov] Enable MC/DC Support in LLVM Source-based Code Coverage (2/3)" Part 2 of 3. This includes the Visualization and Evaluation components. Differential Revision: https://reviews.llvm.org/D138847	2023-12-13 15:10:05 -06:00
Kazu Hirata	f3dcc2351c	[clang] Use StringRef::{starts,ends}_with (NFC) (#75149 ) This patch replaces uses of StringRef::{starts,ends}with with StringRef::{starts,ends}_with for consistency with std::{string,string_view}::{starts,ends}_with in C++20. I'm planning to deprecate and eventually remove StringRef::{starts,ends}with.	2023-12-13 08:54:13 -08:00
CarolineConcatto	f2464ca317	[SVE2.1][Clang][LLVM]Int/FP reduce builtin in Clang and LLVM intrinsic (#69926 ) This patch implements the builtins in Clang and the LLVM-IR intrinsic for the following: // Variants are also available for: // _s8, _s16, _u16, _s32, _u32, _s64, _u64, // _f16, _f32, _f64uint8x16_t svaddqv[_u8](svbool_t pg, svuint8_t zn); // Variants are also available for: // _s8, _u16, _s16, _u32, _s32, _u64, _s64 uint8x16_t svandqv[_u8](svbool_t pg, svuint8_t zn); uint8x16_t sveorqv[_u8](svbool_t pg, svuint8_t zn); uint8x16_t svorqv[_u8](svbool_t pg, svuint8_t zn); // Variants are also available for: // _s8, _u16, _s16, _u32, _s32, _u64, _s64; uint8x16_t svmaxqv[_u8](svbool_t pg, svuint8_t zn); uint8x16_t svminqv[_u8](svbool_t pg, svuint8_t zn); // Variants are also available for _f32, _f64 float16x8_t svmaxnmqv[_f16](svbool_t pg, svfloat16_t zn); float16x8_t svminnmqv[_f16](svbool_t pg, svfloat16_t zn); According to the PR#257[1] The reduction instruction uses scalable vectors as input and fixed vectors as output, therefore we changed SVEEmitter to emit fixed vector types in case the neon header(arm_neon.h) is not present. [1]https://github.com/ARM-software/acle/pull/257 Co-author: Dinar Temirbulatov <dinar.temirbulatov@arm.com>	2023-12-13 15:45:59 +00:00
Fangrui Song	831484efa0	[DebugInfo] Fix duplicate DIFile when main file is preprocessed (#75022 ) When the main file is preprocessed and we change `MainFileName` to the original source file name (e.g. `a.i => a.c`), the source manager does not contain `a.c`, but we incorrectly associate the DIFile(a.c) with md5(a.i). This causes CGDebugInfo::emitFunctionStart to create a duplicate DIFile and leads to a spurious "inconsistent use of MD5 checksums" warning. ``` % cat a.c void f() {} % clang -c -g a.c # no warning % clang -E a.c -o a.i && clang -g -S a.i && clang -g -c a.s a.s:9:2: warning: inconsistent use of MD5 checksums .file 1 "a.c" ^ % grep DIFile a.ll !1 = !DIFile(filename: "a.c", directory: "/tmp/c", checksumkind: CSK_MD5, checksum: "c5b2e246df7d5f53e176b097a0641c3d") !11 = !DIFile(filename: "a.c", directory: "/tmp/c") % grep 'file.*a.c' a.s .file "a.c" .file 0 "/tmp/c" "a.c" md5 0x2d14ea70fee15102033eb8d899914cce .file 1 "a.c" ``` Fix #56378 by disassociating md5(a.i) with a.c.	2023-12-12 10:13:42 -08:00
Artem Belevich	631c6e834c	[CUDA] Add support for CUDA-12.3 and sm_90a (#74895 )	2023-12-11 12:18:28 -08:00
Zahira Ammarguellat	b40c534656	[clang] Add support for -fcx-limited-range, #pragma CX_LIMITED_RANGE and -fcx-fortran-rules. (#70244 ) This patch adds the #pragma CX_LIMITED_RANGE defined in the C specification. It also adds the options -f[no]cx-limited-range and -f[no]cx-fortran-rules. -fcx-limited-range enables algebraic formulas for complex multiplication and division. This option is enabled with -ffast-math. -fcx-fortran-rules enables algebraic formulas for complex multiplication and enables Smith’s algorithm for complex division (SMITH, R. L. Algorithm 116: Complex division. Commun. ACM 5, 8 (1962)). --------- Signed-off-by: Med Ismail Bennani <ismail@bennani.ma> Co-authored-by: Joseph Huber <jhuber6@vols.utk.edu> Co-authored-by: Guray Ozen <guray.ozen@gmail.com> Co-authored-by: Nishant Patel <nishant.b.patel@intel.com> Co-authored-by: Jessica Clarke <jrtc27@jrtc27.com> Co-authored-by: Petr Hosek <phosek@google.com> Co-authored-by: Joseph Huber <35342157+jhuber6@users.noreply.github.com> Co-authored-by: Craig Topper <craig.topper@sifive.com> Co-authored-by: Alexander Yermolovich <43973793+ayermolo@users.noreply.github.com> Co-authored-by: Usama Hameed <u_hameed@apple.com> Co-authored-by: Philip Reames <preames@rivosinc.com> Co-authored-by: Evgenii Kudriashov <evgenii.kudriashov@intel.com> Co-authored-by: Fangrui Song <i@maskray.me> Co-authored-by: Aart Bik <39774503+aartbik@users.noreply.github.com> Co-authored-by: Valentin Clement <clementval@gmail.com> Co-authored-by: Youngsuk Kim <youngsuk.kim@hpe.com> Co-authored-by: Arthur Eubanks <aeubanks@google.com> Co-authored-by: Jan Svoboda <jan_svoboda@apple.com> Co-authored-by: Walter Erquinigo <a20012251@gmail.com> Co-authored-by: Eric <eric@efcs.ca> Co-authored-by: Fazlay Rabbi <106703039+mdfazlay@users.noreply.github.com> Co-authored-by: Pete Lawrence <plawrence@apple.com> Co-authored-by: Jonas Devlieghere <jonas@devlieghere.com> Co-authored-by: Adrian Prantl <aprantl@apple.com> Co-authored-by: Owen Pan <owenpiano@gmail.com> Co-authored-by: LLVM GN Syncbot <llvmgnsyncbot@gmail.com> Co-authored-by: Med Ismail Bennani <ismail@bennani.ma> Co-authored-by: Congcong Cai <congcongcai0907@163.com> Co-authored-by: Rik Huijzer <github@huijzer.xyz> Co-authored-by: Wang Pengcheng <wangpengcheng.pp@bytedance.com> Co-authored-by: Yuanfang Chen <tabloid.adroit@gmail.com> Co-authored-by: Kazu Hirata <kazu@google.com> Co-authored-by: Mehdi Amini <joker.eph@gmail.com> Co-authored-by: Aiden Grossman <agrossman154@yahoo.com> Co-authored-by: Rana Pratap Reddy <109514914+ranapratap55@users.noreply.github.com> Co-authored-by: Yingwei Zheng <dtcxzyw2333@gmail.com> Co-authored-by: Piotr Zegar <me@piotrzegar.pl> Co-authored-by: KAWASHIMA Takahiro <t-kawashima@fujitsu.com> Co-authored-by: Tobias Hieta <tobias@hieta.se> Co-authored-by: Luke Lau <luke@igalia.com> Co-authored-by: Shivam Gupta <shivam98.tkg@gmail.com> Co-authored-by: cor3ntin <corentinjabot@gmail.com> Co-authored-by: Yeting Kuo <46629943+yetingk@users.noreply.github.com> Co-authored-by: Stanislav Mekhanoshin <rampitec@users.noreply.github.com> Co-authored-by: David Spickett <david.spickett@linaro.org> Co-authored-by: Matthew Devereau <matthew.devereau@arm.com> Co-authored-by: Martin Storsjö <martin@martin.st> Co-authored-by: Qiu Chaofan <qiucofan@cn.ibm.com> Co-authored-by: Pierre van Houtryve <pierre.vanhoutryve@amd.com> Co-authored-by: Mikael Holmen <mikael.holmen@ericsson.com> Co-authored-by: Uday Bondhugula <uday@polymagelabs.com> Co-authored-by: Nikita Popov <npopov@redhat.com> Co-authored-by: Johannes Reifferscheid <jreiffers@google.com> Co-authored-by: Benjamin Kramer <benny.kra@googlemail.com> Co-authored-by: Oliver Stannard <oliver.stannard@arm.com> Co-authored-by: Dmitry Vyukov <dvyukov@google.com> Co-authored-by: Benjamin Maxwell <benjamin.maxwell@arm.com> Co-authored-by: Piotr Sobczak <piotr.sobczak@amd.com> Co-authored-by: Simon Pilgrim <llvm-dev@redking.me.uk> Co-authored-by: Timm Bäder <tbaeder@redhat.com> Co-authored-by: Sunil Kuravinakop <koops@hpe.com> Co-authored-by: zhongyunde 00443407 <zhongyunde@huawei.com> Co-authored-by: Christudasan Devadasan <Christudasan.Devadasan@amd.com> Co-authored-by: bjacob <jacob.benoit.1@gmail.com> Co-authored-by: Weining Lu <luweining@loongson.cn> Co-authored-by: Andrzej Warzyński <andrzej.warzynski@arm.com> Co-authored-by: Jay Foad <jay.foad@amd.com> Co-authored-by: Markus Mützel <markus.muetzel@gmx.de> Co-authored-by: Erik Jonsson <erik.j.jonsson@ericsson.com> Co-authored-by: Pete Steinfeld <47540744+psteinfeld@users.noreply.github.com> Co-authored-by: Alexey Bataev <a.bataev@outlook.com> Co-authored-by: Louis Dionne <ldionne.2@gmail.com> Co-authored-by: Qizhi Hu <836744285@qq.com>	2023-12-11 10:03:27 -05:00
Mircea Trofin	1d608fc755	[NFC][InstrProf] Refactor InstrProfiling lowering pass (#74970 ) Akin other passes - refactored the name to `InstrProfilingLoweringPass` to better communicate what it does, and split the pass part and the transformation part to avoid needing to initialize object state during `::run`. A subsequent PR will move `InstrLowering` to the .cpp file and rename it to `InstrLowerer`.	2023-12-10 18:03:08 -08:00
Kazu Hirata	2ec95c19a2	[clang] Use llvm::to_underlying (NFC)	2023-12-09 17:08:48 -08:00
Justin Bogner	7a13e410fd	[DirectX] Move ROV info into HLSL metadata. NFC Pull Request: https://github.com/llvm/llvm-project/pull/74896	2023-12-09 10:42:45 -08:00
Dinar Temirbulatov	49b27b150b	[AArch64][SME2] Add builtins to cast svbool from/to svcount. (#74720 ) Add builtin: 'svreinterpret_b' to cast from svcount_t to svbool_t. Add builtin: 'svreinterpret_c' to cast from svbool_t to svcount_t. Patch by: Hassnaa Hamdi <hassnaa.hamdi@arm.com>	2023-12-08 16:38:29 +00:00
David Sherwood	c1cfa1757c	[Clang] Emit TBAA info for enums in C (#73326 ) When emitting TBAA information for enums in C code we currently just treat the data as an 'omnipotent char'. However, with C strict aliasing this means we fail to optimise certain cases. For example, in the SPEC2017 xz benchmark there are structs that contain arrays of enums, and clang pessmistically assumes that accesses to those enums could alias with other struct members that have a different type. According to https://en.cppreference.com/w/c/language/enum enums should be treated as 'int' types unless explicitly specified (C23) or if 'int' would not be large enough to hold all the enumerated values. In the latter case the compiler is free to choose a suitable integer that would hold all such values. When compiling C code this patch generates TBAA information for the enum by using an equivalent integer of the size clang has already chosen for the enum. I have ignored C++ for now because the rules are more complex. New test added here: clang/test/CodeGen/tbaa.c	2023-12-08 12:58:39 +00:00
Mike Rice	0808be47b8	[NFC] Remove unneeded nullptr checks after cast<> (#74674 ) Since VD is assigned from a cast<VarDecl> it cannot be a nullptr or it would have asserted. Remove the subsequent checks to clear up any misunderstanding.	2023-12-07 16:20:22 -08:00
Joseph Huber	97f3be2c5a	[CUDA][HIP] Improve variable registration with the new driver (#73177 ) Summary: This patch adds support for registering texture / surface variables from CUDA / HIP. Additionally, we now properly track the `extern` and `const` flags that are also used in these runtime functions. This does not implement the `managed` variables yet as those seem to require some extra handling I'm not familiar with. The issue is that the current offload entry isn't large enough to carry size and alignment information along with an extra global.	2023-12-07 15:44:23 -06:00
Joseph Huber	4e80bc7d71	[Clang] Introduce scoped variants of GNU atomic functions (#72280 ) Summary: The standard GNU atomic operations are a very common way to target hardware atomics on the device. With more heterogenous devices being introduced, the concept of memory scopes has been in the LLVM language for awhile via the `syncscope` modifier. For targets, such as the GPU, this can change code generation depending on whether or not we only need to be consistent with the memory ordering with the entire system, the single GPU device, or lower. Previously these scopes were only exported via the `opencl` and `hip` variants of these functions. However, this made it difficult to use outside of those languages and the semantics were different from the standard GNU versions. This patch introduces a `__scoped_atomic` variant for the common functions. There was some discussion over whether or not these should be overloads of the existing ones, or simply new variants. I leant towards new variants to be less disruptive. The scope here can be one of the following ``` __MEMORY_SCOPE_SYSTEM // All devices and systems __MEMORY_SCOPE_DEVICE // Just this device __MEMORY_SCOPE_WRKGRP // A 'work-group' AKA CUDA block __MEMORY_SCOPE_WVFRNT // A 'wavefront' AKA CUDA warp __MEMORY_SCOPE_SINGLE // A single thread. ``` Naming consistency was attempted, but it is difficult to capture to full spectrum with no many names. Suggestions appreciated.	2023-12-07 13:40:25 -06:00
jyu2-git	0113722d82	[OpenMP] Fix runtime problem due to wrong map size. (#74692 ) Currently we are missing set up-boundary address for FinalArraySection as highests elements in partial struct data. Currently for: \#pragma omp target map(D.a) map(D.b[:2]) The size is: %a = getelementptr inbounds %struct.DataTy, ptr %D, i32 0, i32 0 %b = getelementptr inbounds %struct.DataTy, ptr %D, i32 0, i32 1 %arrayidx = getelementptr inbounds [2 x float], ptr %b, i64 0, i64 0 %2 = getelementptr float, ptr %arrayidx, i32 1 %3 = ptrtoint ptr %2 to i64 %4 = ptrtoint ptr %a to i64 %5 = sub i64 %3, %4 %6 = sdiv exact i64 %5, ptrtoint (ptr getelementptr (i8, ptr null, i32 1) to i64) Where %2 is wrong for (D.b[:2]) is pointer to first element of array section. It should pointe to last element of array section. The fix is to emit the pointer to the last element of array section and use this pointer as the highest element in partial struct data. After change IR: %a = getelementptr inbounds %struct.DataTy, ptr %D, i32 0, i32 0 %b = getelementptr inbounds %struct.DataTy, ptr %D, i32 0, i32 1 %arrayidx = getelementptr inbounds [2 x float], ptr %b, i64 0, i64 0 %b1 = getelementptr inbounds %struct.DataTy, ptr %D, i32 0, i32 1 %arrayidx2 = getelementptr inbounds [2 x float], ptr %b1, i64 0, i64 1 %1 = getelementptr float, ptr %arrayidx2, i32 1 %2 = ptrtoint ptr %1 to i64 %3 = ptrtoint ptr %a to i64 %4 = sub i64 %2, %3 %5 = sdiv exact i64 %4, ptrtoint (ptr getelementptr (i8, ptr null, i32 1) to i64)	2023-12-07 09:38:56 -08:00
Michael Buch	4db54e6597	[clang][DebugInfo] Revert "emit definitions for constant-initialized static data-members" (#74580 ) This commit reverts the changes in https://github.com/llvm/llvm-project/pull/71780 and all of its follow-up patches. We got reports of the `.debug_names/.debug_gnu_pubnames/gdb_index/etc.` sections growing by a non-trivial amount for some large projects. While GCC emits definitions for static data member constants into the Names index, they do so only for explicitly `constexpr` members. We were indexing all constant-initialized const-static members, which is likely where the significant size difference comes from. However, only emitting explicitly `constexpr` variables into the index doesn't seem like a good way forward, since from clang's perspective `const`-static integrals are `constexpr` too, and that shouldn't be any different in the debug-info component. Also, as new code moves to `constexpr` instead of `const` static for constants, such solution would just delay the growth of the Names index. To prevent the size regression we revert to not emitting definitions for static data-members that have no location. To support access to such constants from LLDB we'll most likely have to have to make LLDB find the constants by looking at the containing class first.	2023-12-06 22:13:54 +00:00
elizabethandrews	cee5b8777f	[Clang] Fix linker error for function multiversioning (#71706 ) Currently target_clones attribute results in a linker error when there are no multi-versioned function declarations in the calling TU. In the calling TU, the call is generated with the ‘normal’ assembly name. This does not match any of the versions or the ifunc, since version mangling includes a .versionstring, and the ifunc includes .ifunc suffix. The linker error is not seen with GCC since the mangling for the ifunc symbol in GCC is the ‘normal’ assembly name for function i.e. no ifunc suffix. This PR removes the .ifunc suffix to match GCC. It also adds alias with the .ifunc suffix so as to ensure backward compatibility. The changes exclude aarch64 target because the mangling for default versions on aarch64 does not include a .default suffix and is the 'normal' assembly name, unlike other targets. It is not clear to me what the correct behavior for this target is. Old Phabricator review - https://reviews.llvm.org/D158666 --------- Co-authored-by: Tom Honermann <tom@honermann.net>	2023-12-05 18:11:53 -05:00
Eduard Zingerman	030b8cb156	[BPF] Attribute preserve_static_offset for structs This commit adds a new BPF specific structure attribte `__attribute__((preserve_static_offset))` and a pass to deal with it. This attribute may be attached to a struct or union declaration, where it notifies the compiler that this structure is a "context" structure. The following limitations apply to context structures: - runtime environment might patch access to the fields of this type by updating the field offset; BPF verifier limits access patterns allowed for certain data types. E.g. `struct __sk_buff` and `struct bpf_sock_ops`. For these types only `LD/ST <reg> <static-offset>` memory loads and stores are allowed. This is so because offsets of the fields of these structures do not match real offsets in the running kernel. During BPF program load/verification loads and stores to the fields of these types are rewritten so that offsets match real offsets. For this rewrite to happen static offsets have to be encoded in the instructions. See `kernel/bpf/verifier.c:convert_ctx_access` function in the Linux kernel source tree for details. - runtime environment might disallow access to the field of the type through modified pointers. During BPF program verification a tag `PTR_TO_CTX` is tracked for register values. In case if register with such tag is modified BPF programs are not allowed to read or write memory using register. See kernel/bpf/verifier.c:check_mem_access function in the Linux kernel source tree for details. Access to the structure fields is translated to IR as a sequence: - `(load (getelementptr %ptr %offset))` or - `(store (getelementptr %ptr %offset))` During instruction selection phase such sequences are translated as a single load instruction with embedded offset, e.g. `LDW %ptr, %offset`, which matches access pattern necessary for the restricted set of types described above (when `%offset` is static). Multiple optimizer passes might separate these instructions, this includes: - SimplifyCFGPass (sinking) - InstCombine (sinking) - GVN (hoisting) The `preserve_static_offset` attribute marks structures for which the following transformations happen: - at the early IR processing stage: - `(load (getelementptr ...))` replaced by call to intrinsic `llvm.bpf.getelementptr.and.load`; - `(store (getelementptr ...))` replaced by call to intrinsic `llvm.bpf.getelementptr.and.store`; - at the late IR processing stage this modification is undone. Such handling prevents various optimizer passes from generating sequences of instructions that would be rejected by BPF verifier. The __attribute__((preserve_static_offset)) has a priority over __attribute__((preserve_access_index)). When preserve_access_index attribute is present preserve access index transformations are not applied. This addresses the issue reported by the following thread: https://lore.kernel.org/bpf/CAA-VZPmxh8o8EBcJ=m-DH4ytcxDFmo0JKsm1p1gf40kS0CE3NQ@mail.gmail.com/T/#m4b9ce2ce73b34f34172328f975235fc6f19841b6 This is a second attempt to commit this change, previous reverted commit is: `cb13e9286b`. The following items had been fixed: - test case bpf-preserve-static-offset-bitfield.c now uses `-triple bpfel` to avoid different codegen for little/big endian targets. - BPFPreserveStaticOffset.cpp:removePAICalls() modified to avoid use after free for `WorkList` elements `V`. Differential Revision: https://reviews.llvm.org/D133361	2023-12-05 19:21:42 +02:00
Baodi	4ea268d831	[Clang][OpenMP] Fix private variables registration in `simd` (#74105 ) Fix #69214 In `emitOMPSimdRegion`, the `EmitOMPPrivateLoopCounters` should be after `EmitOMPPrivateClause`. If not, the private variables will be registered too early, which is not allowed by `EmitOMPPrivateClause`.	2023-12-05 09:16:45 -05:00
James Y Knight	4d4c30a37c	Use Address for CGBuilder's CreateAtomicRMW and CreateAtomicCmpXchg. (#74349 ) Update all callers to pass through the Address. For the older builtins such as `__sync_` and MSVC `_Interlocked`, natural alignment of the atomic access is _assumed_. This change preserves that behavior. It will pass through greater-than-required alignments, however.	2023-12-04 13:37:04 -05:00
Ulrich Weigand	c61eb44005	[SystemZ] Implement vector rotate in terms of funnel shift Clang currently implements a set of vector rotate builtins (__builtin_s390_verll) in terms of platform-specific LLVM intrinsics. To simplify the IR (and allow for common code optimizations if applicable), this patch removes those LLVM intrinsics and implements the builtins in terms of the platform-independent funnel shift intrinsics instead. Also, fix the prototype of the __builtin_s390_verll builtins for full compatibility with GCC.	2023-12-04 16:52:00 +01:00
Youngsuk Kim	d43c081aef	[clang][CGOpenMPRuntimeGPU] Merge consecutive AddrSpaceCasts (NFC) (#74279 ) Merge consecutive AddrSpaceCasts into a single AddrSpaceCast.	2023-12-04 07:03:09 -05:00
Nathan Sidwell	1fa35f0b5d	[clang] Avoid recalculating TBAA base type info (#73264 ) As nullptr is a legitimate value, change the BaseTypeMetadataCache hash lookup/insertion to use find and insert rather than the subscript operator. Also adjust getBaseTypeInfoHelper to do no insertion, but let getBaseTypeInfo do that.	2023-12-02 11:54:59 -05:00
Romaric Jodin	d56e0d07cc	clang/OpenCL: set sqrt fp accuracy on call to Z4sqrt (#66651 ) This is reverting the previous implementation to avoid adding inline function in opencl headers. This was breaking clspv flow google/clspv#1231, while https://reviews.llvm.org/D156743 mentioned that just decorating the call node with `!pfmath` was enough. This PR is implementing this idea. The test has been updated with this implementation.	2023-12-01 16:34:44 +09:00
Paul Kirth	cfe1ece833	[clang][llvm][fatlto] Avoid cloning modules in FatLTO (#72180 ) https://github.com/llvm/llvm-project/issues/70703 pointed out that cloning LLVM modules could lead to miscompiles when using FatLTO. This is due to an existing issue when cloning modules with labels (see #55991 and #47769). Since this can lead to miscompilation, we can avoid cloning the LLVM modules, which was desirable anyway. This patch modifies the EmbedBitcodePass to no longer clone the module or run an input pipeline over it. Further, it make FatLTO always perform UnifiedLTO, so we can still defer the Thin/Full LTO decision to link-time. Lastly, it removes dead/obsolete code related to now defunct options that do not work with the EmbedBitcodePass implementation any longer.	2023-11-30 17:09:34 -08:00
Eduard Zingerman	2484469803	Revert "[BPF] Attribute preserve_static_offset for structs" This reverts commit `cb13e9286b`. Buildbot reports MSAN failures in tests added in this commit: https://lab.llvm.org/buildbot/#/builders/5/builds/38806 Failing tests: LLVM :: CodeGen/BPF/preserve-static-offset/load-arr-pai.ll LLVM :: CodeGen/BPF/preserve-static-offset/load-ptr-pai.ll LLVM :: CodeGen/BPF/preserve-static-offset/load-struct-pai.ll LLVM :: CodeGen/BPF/preserve-static-offset/load-union-pai.ll LLVM :: CodeGen/BPF/preserve-static-offset/store-pai.ll	2023-11-30 22:29:45 +02:00
Youngsuk Kim	ff485a0e77	[clang] Remove no-op ptr-to-ptr bitcasts (NFC) Opaque ptr cleanup effort (NFC).	2023-11-30 14:00:31 -06:00
Ivan R. Ivanov	065796bb92	[clang][OpenMP] Fix missing DI for __kmpc_global_thread_num (#73856 ) Co-authored-by: Ivan Radanov Ivanov <ivanov2@llnl.gov>	2023-11-30 13:21:03 -06:00

1 2 3 4 5 ...

16594 Commits