clang-p2996

Author	SHA1	Message	Date
Kazu Hirata	f3dcc2351c	[clang] Use StringRef::{starts,ends}_with (NFC) (#75149 ) This patch replaces uses of StringRef::{starts,ends}with with StringRef::{starts,ends}_with for consistency with std::{string,string_view}::{starts,ends}_with in C++20. I'm planning to deprecate and eventually remove StringRef::{starts,ends}with.	2023-12-13 08:54:13 -08:00
Artem Belevich	631c6e834c	[CUDA] Add support for CUDA-12.3 and sm_90a (#74895 )	2023-12-11 12:18:28 -08:00
Jay Foad	cf1e0c0b07	[AMDGPU] Define new targets gfx1200 and gfx1201 (#73133 ) Define target names and ELF numbers for new GFX12 targets gfx1200 and gfx1201. For now they behave identically to GFX11.	2023-11-23 16:44:05 +00:00
Jay Foad	92542f2a40	[AMDGPU] Add targets gfx1150 and gfx1151 This is the target definition only. Currently they are treated the same as GFX 11.0.x. Differential Revision: https://reviews.llvm.org/D155429	2023-07-17 13:06:12 +01:00
Sergio Afonso	63ca93c7d1	[OpenMP][OMPIRBuilder] Rename IsEmbedded and IsTargetCodegen flags This patch renames the `OpenMPIRBuilderConfig` flags to reduce confusion over their meaning. `IsTargetCodegen` becomes `IsGPU`, whereas `IsEmbedded` becomes `IsTargetDevice`. The `-fopenmp-is-device` compiler option is also renamed to `-fopenmp-is-target-device` and the `omp.is_device` MLIR attribute is renamed to `omp.is_target_device`. Getters and setters of all these renamed properties are also updated accordingly. Many unit tests have been updated to use the new names, but an alias for the `-fopenmp-is-device` option is created so that external programs do not stop working after the name change. `IsGPU` is set when the target triple is AMDGCN or NVIDIA PTX, and it is only valid if `IsTargetDevice` is specified as well. `IsTargetDevice` is set by the `-fopenmp-is-target-device` compiler frontend option, which is only added to the OpenMP device invocation for offloading-enabled programs. Differential Revision: https://reviews.llvm.org/D154591	2023-07-10 14:14:16 +01:00
Konstantin Zhuravlyov	9d05727972	AMDGPU: Add basic gfx942 target Differential Revision: https://reviews.llvm.org/D149983	2023-05-10 11:51:06 -04:00
Konstantin Zhuravlyov	1fc70210a6	AMDGPU: Add basic gfx941 target Differential Revision: https://reviews.llvm.org/D149982	2023-05-10 11:51:06 -04:00
Stoorx	830b359d3a	[clang] Return std::unique_ptr<TargetInfo> from AllocateTarget In file 'clang/lib/Basic/Targets.cpp' the function 'AllocateTarget' had a raw pointer as a return type, which have been wrapped in the 'std::unique_ptr' in all usages. This commit changes the signature of the function to return an instance of 'std::unique_ptr' directly. Reviewed By: DavidSpickett Differential Revision: https://reviews.llvm.org/D148574	2023-04-18 10:07:26 +00:00
Joseph Huber	bed7005eb4	[NVPTX] Add __CUDA_ARCH__ macro to standalone NVPTX compilations We can now target the NVPTX architecture directly via `--target=nvptx64-nvidia-cuda`. This currently does not define the `__CUDA_ARCH__` macro with is used to allow code to target different codes based on support. This patch simply adds this support. Reviewed By: tra, jdoerfert Differential Revision: https://reviews.llvm.org/D146975	2023-03-27 18:08:15 -05:00
Joseph Huber	af54d1e852	[NVPTX] Set the atomic inling threshold when targeting NVPTX directly Since Clang 16.0.0 users can target the `NVPTX` architecture directly via `--target=nvptx64-nvidia-cuda`. However, this does not set the atomic inlining size correctly. This leads to spurious warnings and emission of runtime atomics that are never implemented. This patch ensures that we set this to the appropriate pointer width. This will always be 64 in the future as `nvptx64` will only be supported moving forward. Fixes: https://github.com/llvm/llvm-project/issues/61410 Reviewed By: tra Differential Revision: https://reviews.llvm.org/D146750	2023-03-23 16:30:07 -05:00
serge-sans-paille	5a7f47cc02	[clang] Optimize clang::Builtin::Info density Reorganize clang::Builtin::Info to have them naturally align on 4 bytes boundaries. Instead of storing builtin headers as a straight char pointer, enumerate them and store the enum. It allows to use a small enum instead of a pointer to reference them. On a 64 bit machine, this brings sizeof(clang::Builtin::Info) from 56 down to 48 bytes. On a release build on my Linux 64 bit machine, it shrinks the size of libclang-cpp.so by 193kB. The impact on performance is negligible in terms of instruction count, but the wall time seems better, see https://llvm-compile-time-tracker.com/compare.php?from=b3d8639f3536a4876b511aca9fb7948ff9266cee&to=a89b56423f98b550260a58c41e64aff9e56b76be&stat=task-clock Differential Revision: https://reviews.llvm.org/D142024	2023-01-23 14:27:44 +01:00
serge-sans-paille	a3c248db87	Move from llvm::makeArrayRef to ArrayRef deduction guides - clang/ part This is a follow-up to https://reviews.llvm.org/D140896, split into several parts as it touches a lot of files. Differential Revision: https://reviews.llvm.org/D141139	2023-01-09 12:15:24 +01:00
serge-sans-paille	d9ab3e82f3	[clang] Use a StringRef instead of a raw char pointer to store builtin and call information This avoids recomputing string length that is already known at compile time. It has a slight impact on preprocessing / compile time, see https://llvm-compile-time-tracker.com/compare.php?from=3f36d2d579d8b0e8824d9dd99bfa79f456858f88&to=e49640c507ddc6615b5e503144301c8e41f8f434&stat=instructions:u This a recommit of `e953ae5bbc` and the subsequent fixes `caa713559b` and `06b90e2e9c`. The above patchset caused some version of GCC to take eons to compile clang/lib/Basic/Targets/AArch64.cpp, as spotted in `aa171833ab`. The fix is to make BuiltinInfo tables a compilation unit static variable, instead of a private static variable. Differential Revision: https://reviews.llvm.org/D139881	2022-12-27 09:55:19 +01:00
Alex Richardson	a602f76a24	[clang][TargetInfo] Use LangAS for getPointer{Width,Align}() Mixing LLVM and Clang address spaces can result in subtle bugs, and there is no need for this hook to use the LLVM IR level address spaces. Most of this change is just replacing zero with LangAS::Default, but it also allows us to remove a few calls to getTargetAddressSpace(). This also removes a stale comment+workaround in CGDebugInfo::CreatePointerLikeType(): ASTContext::getTypeSize() does return the expected size for ReferenceType (and handles address spaces). Differential Revision: https://reviews.llvm.org/D138295	2022-11-30 20:24:01 +00:00
Artem Belevich	0e8a414ab3	[CUDA, NVPTX] Added basic __bf16 support for NVPTX. Recent Clang changes expose _bf16 types for SSE2-enabled host compilations and that makes those types visible furing GPU-side compilation, where it currently fails with Sema complaining that __bf16 is not supported. Considering that __bf16 is a storage-only type, enabling it for NVPTX if it's enabled on the host should pose no issues, correctness-wise. Recent NVIDIA GPUs have introduced bf16 support, so we'll likely grow better support for __bf16 on NVPTX going forward. Differential Revision: https://reviews.llvm.org/D136311	2022-10-25 11:08:06 -07:00
Artem Belevich	9a01cca660	Add support for CUDA-11.8 and sm_{87,89,90} GPUs. Differential Revision: https://reviews.llvm.org/D135306	2022-10-07 13:59:28 -07:00
Artem Belevich	f3a2cbcf97	Refactored CUDA version housekeeping to use less boilerplate. Differential Revision: https://reviews.llvm.org/D135328	2022-10-07 13:59:23 -07:00
Joseph Huber	002a63f937	[OpenMP] Add `__CUDA_ARCH__` definition when offloading with OpenMP Currently we define the `__CUDA_ARCH__` macro only in CUDA mode. This patch allows us to use this macro in OpenMP-offloading mode when targeting NVPTX. Reviewed By: tra, tianshilei1992 Differential Revision: https://reviews.llvm.org/D125256	2022-05-13 14:38:35 -04:00
Joe Nash	8bdfc73f63	[AMDGPU][clang] Definition of gfx11 subtarget Contributors: Jay Foad <jay.foad@amd.com> Konstantin Zhuravlyov <kzhuravl_dev@outlook.com> Patch 2/N for upstreaming of AMDGPU gfx11 architecture Depends on D124536 Reviewed By: foad, kzhuravl, #amdgpu, arsenm Differential Revision: https://reviews.llvm.org/D124537	2022-04-29 13:55:56 -04:00
Aakanksha	840695814a	[AMDGPU] Add gfx1036 target Differential Revision: https://reviews.llvm.org/D120846	2022-03-02 23:26:38 +00:00
Stanislav Mekhanoshin	2e2e64df4a	[AMDGPU] Add gfx940 target This is target definition only. Differential Revision: https://reviews.llvm.org/D120688	2022-03-02 13:54:48 -08:00
Yaxun (Sam) Liu	a6786cdd57	[HIPSPV][3/4] Enable SPIR-V emission for HIP This patch enables SPIR-V binary emission for HIP device code via the HIPSPV tool chain. ‘--offload’ option, which is envisioned in [1], is added for specifying offload targets. This option is used to override default device target (amdgcn-amd-amdhsa) for HIP compilation for emitting device code as SPIR-V binary. The option is handled in getHIPOffloadTargetTriple(). getOffloadingDeviceToolChain() function (based on the design in the SYCL repository) is added to select HIPSPVToolChain when HIP offload target is ‘spirv64’. The HIPActionBuilder is modified to produce LLVM IR at the backend phase. HIPSPV tool chain expects to receive HIP device code as LLVM IR so it can run external LLVM passes over them. HIPSPV TC is also responsible for emitting the SPIR-V binary. A Cuda GPU architecture ‘generic’ is added. The name is picked from the LLVM SPIR-V Backend. In the HIPSPV code path the architecture name is inserted to the bundle entry ID as target ID. Target ID is expected to be always present so a component in the target triple is not mistaken as target ID. Tests are added for checking the HIPSPV tool chain. [1]: https://lists.llvm.org/pipermail/cfe-dev/2020-December/067362.html Patch by: Henry Linjamäki Reviewed by: Yaxun Liu, Artem Belevich, Alexey Bader Differential Revision: https://reviews.llvm.org/D110622	2021-12-20 10:45:09 -05:00
Carlos Galvez	7ecec3f0f5	[CUDA] Bump supported CUDA version to 11.5 Differential Revision: https://reviews.llvm.org/D113249	2021-11-09 08:20:53 +00:00
Artem Belevich	49d982d8cb	[CUDA] Add support for CUDA-11.4 Differential Revision: https://reviews.llvm.org/D108239	2021-08-23 13:24:46 -07:00
Jon Chesterfield	c2574e63ff	[openmp][nfc] Refactor GridValues Remove redundant fields and replace pointer with virtual function Of fourteen fields, three are dead and four can be computed from the remainder. This leaves a couple of currently dead fields in place as they are expected to be used from the deviceRTL shortly. Two of the fields that can be computed are only used from codegen and require a log2() implementation so are inlined into codegen instead. This change leaves the new methods in the same location in the struct as the previous fields for convenience at review. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D108380	2021-08-23 16:19:11 +01:00
Jon Chesterfield	b1efeface7	Revert "[openmp][nfc] Refactor GridValues" Failed a nvptx codegen test This reverts commit `2a47a84b40`.	2021-08-20 18:17:27 +01:00
Jon Chesterfield	2a47a84b40	[openmp][nfc] Refactor GridValues Remove redundant fields and replace pointer with virtual function Of fourteen fields, three are dead and four can be computed from the remainder. This leaves a couple of currently dead fields in place as they are expected to be used from the deviceRTL shortly. Two of the fields that can be computed are only used from codegen and require a log2() implementation so are inlined into codegen instead. This change leaves the new methods in the same location in the struct as the previous fields for convenience at review. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D108380	2021-08-20 16:41:26 +01:00
Jon Chesterfield	77579b99e9	[openmp][nfc] Replace OMPGridValues array with struct [nfc] Replaces enum indices into an array with a struct. Named the fields to match the enum, leaves memory layout and initialization unchanged. Motivation is to later safely remove dead fields and replace redundant ones with (compile time) computation. It should also be possible to factor some common fields into a base and introduce a gfx10 amdgpu instance with less duplication than the arrays of integers require. Reviewed By: ronlieb Differential Revision: https://reviews.llvm.org/D108339	2021-08-19 13:25:42 +01:00
Aakanksha Patil	3453f3dd46	[AMDGPU] Add gfx1035 target Differential Revision: https://reviews.llvm.org/D104804	2021-06-24 14:32:41 -04:00
Brendon Cahoon	294efbbd3e	Reland "[AMDGPU] Add gfx1013 target" This reverts commit `211e584fa2`. Fixed a use-after-free error that caused the sanitizers to fail.	2021-06-08 21:15:35 -04:00
Brendon Cahoon	211e584fa2	Revert "[AMDGPU] Add gfx1013 target" This reverts commit `ea10a86984`. A sanitizer buildbot reports an error.	2021-06-08 16:29:41 -04:00
Brendon Cahoon	ea10a86984	[AMDGPU] Add gfx1013 target Differential Revision: https://reviews.llvm.org/D103663	2021-06-08 12:49:49 -04:00
Aakanksha Patil	464e4dc50f	[AMDGPU] Add gfx1034 target Differential Revision: https://reviews.llvm.org/D102306	2021-05-13 14:25:18 -04:00
Stanislav Mekhanoshin	a8d9d50762	[AMDGPU] gfx90a support Differential Revision: https://reviews.llvm.org/D96906	2021-02-17 16:01:32 -08:00
Artem Belevich	2aa01ccec3	[CUDA, NVPTX] Allow targeting sm_86 GPUs. The patch only plumbs through the option necessary for targeting sm_86 GPUs w/o adding any new functionality. Differential Revision: https://reviews.llvm.org/D95974	2021-02-09 11:01:10 -08:00
Tim Renouf	89d41f3a2b	[AMDGPU] Add gfx1033 target Differential Revision: https://reviews.llvm.org/D90447 Change-Id: If2650fc7f31bbdd49c76e74a9ca8e3734d769761	2020-11-03 16:27:48 +00:00
Tim Renouf	ee3e642627	[AMDGPU] Add gfx90c target This differentiates the Ryzen 4000/4300/4500/4700 series APUs that were previously included in gfx909. Differential Revision: https://reviews.llvm.org/D90419 Change-Id: Ia901a7157eb2f73ccd9f25dbacec38427312377d	2020-11-03 16:27:43 +00:00
Stanislav Mekhanoshin	d1beb95d12	[AMDGPU] gfx1032 target Differential Revision: https://reviews.llvm.org/D89487	2020-10-15 12:41:18 -07:00
Tim Renouf	666ef0db20	[AMDGPU] Add gfx602, gfx705, gfx805 targets At AMD, in an internal audit of our code, we found some corner cases where we were not quite differentiating targets enough for some old hardware. This commit is part of fixing that by adding three new targets: * The "Oland" and "Hainan" variants of gfx601 are now split out into gfx602. LLPC (in the GPUOpen driver) and other front-ends could use that to avoid using the shaderZExport workaround on gfx602. * One variant of gfx703 is now split out into gfx705. LLPC and other front-ends could use that to avoid using the shaderSpiCsRegAllocFragmentation workaround on gfx705. * The "TongaPro" variant of gfx802 is now split out into gfx805. TongaPro has a faster 64-bit shift than its former friends in gfx802, and a subtarget feature could be set up for that to take advantage of it. This commit does not make that change; it just adds the target. V2: Add clang changes. Put TargetParser list in order. V3: AMDGCNGPUs table in TargetParser.cpp needs to be in GPUKind order, so fix the GPUKind order. Differential Revision: https://reviews.llvm.org/D88916 Change-Id: Ia901a7157eb2f73ccd9f25dbacec38427312377d	2020-10-10 17:22:22 +01:00
Yaxun (Sam) Liu	cbd420c5ed	[CUDA][HIP] Fix bound arch for offload action for fat binary Currently CUDA/HIP toolchain uses "unknown" as bound arch for offload action for fat binary. This causes -mcpu or -march with "unknown" added in HIPToolChain::TranslateArgs or CUDAToolChain::TranslateArgs. This causes issue for https://reviews.llvm.org/D88377 since HIP toolchain needs to check -mcpu in HIPToolChain::TranslateArgs. The bound arch of offload action for fat binary is not really used, therefore set it to CudaArch::UNUSED. Differential Revision: https://reviews.llvm.org/D88524	2020-10-02 19:05:51 -04:00
Stanislav Mekhanoshin	ea7d0e2996	[AMDGPU] gfx1031 target Differential Revision: https://reviews.llvm.org/D85337	2020-08-05 12:36:26 -07:00
Stanislav Mekhanoshin	9ee272f13d	[AMDGPU] Add gfx1030 target Differential Revision: https://reviews.llvm.org/D81886	2020-06-15 16:18:05 -07:00
Saiyedul Islam	4022bc2a6c	[OpenMP][AMDGCN] Support OpenMP offloading for AMDGCN architecture - Part 2 Summary: New file include to support platform dependent grid constants. It will be used by clang, libomptarget plugins, and deviceRTLs to access constant values consistently and with fast access in the deviceRTLs. Originally authored by Greg Rodgers (@gregrodgers). Reviewers: arsenm, sameerds, jdoerfert, yaxunl, b-sumner, scchan, JonChesterfield Reviewed By: arsenm Subscribers: llvm-commits, pdhaliwal, jholewinski, jvesely, wdng, nhaehnle, guansong, kerbowa, sstefan1, cfe-commits, ronlieb, gregrodgers Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D80917	2020-06-10 18:09:59 +00:00
Artem Belevich	a9627b7ea7	[CUDA] Add partial support for recent CUDA versions. Generate PTX using newer versions of PTX and allow using sm_80 with CUDA-11. None of the new features of CUDA-10.2+ have been implemented yet, so using these versions will still produce a warning. Differential Revision: https://reviews.llvm.org/D77670	2020-04-08 11:19:44 -07:00
Yaxun Liu	6add24adaf	[HIP] Add GPU arch gfx1010, gfx1011, and gfx1012 Differential Revision: https://reviews.llvm.org/D64364 llvm-svn: 365799	2019-07-11 17:50:09 +00:00
Stanislav Mekhanoshin	0cfd75a07d	[AMDGPU] gfx908 clang target Differential Revision: https://reviews.llvm.org/D64430 llvm-svn: 365528	2019-07-09 18:19:00 +00:00
Tom Tan	b7c6d95af5	[COFF, ARM64] Align global symbol by size for ARM64 MSVC ABI According to alignment section in below ARM64 ABI document, MSVC could increase alignment of global data based on its total size. Clang doesn't do this. Compile the same symbol into different alignments by Clang and MSVC could cause link error because some instruction encodings, like 64-bit LDR/STR with immediate, require the target to be 8 bytes aligned, and linker could choose code stream with such LDR/STR instruction from MSVC and 4 bytes aligned data from Clang into final image, which actually cannot be linked together (see https://bugs.llvm.org/show_bug.cgi?id=41506 for more details). https://docs.microsoft.com/en-us/cpp/build/arm64-windows-abi-conventions?view=vs-2019#alignment Differential Revision: https://reviews.llvm.org/D61225 llvm-svn: 359744	2019-05-02 00:38:14 +00:00
Artem Belevich	5fe85a003f	[CUDA] Implemented _[bi]mma* builtins. These builtins provide access to the new integer and sub-integer variants of MMA (matrix multiply-accumulate) instructions provided by CUDA-10.x on sm_75 (AKA Turing) GPUs. Also added a feature for PTX 6.4. While Clang/LLVM does not generate any PTX instructions that need it, we still need to pass it through to ptxas in order to be able to compile code that uses the new 'mma' instruction as inline assembly (e.g used by NVIDIA's CUTLASS library https://github.com/NVIDIA/cutlass/blob/master/cutlass/arch/mma.h#L101) Differential Revision: https://reviews.llvm.org/D60279 llvm-svn: 359248	2019-04-25 22:28:09 +00:00
Chandler Carruth	2946cd7010	Update the file headers across all of the LLVM projects in the monorepo to reflect the new license. We understand that people may be surprised that we're moving the header entirely to discuss the new license. We checked this carefully with the Foundation's lawyer and we believe this is the correct approach. Essentially, all code in the project is now made available by the LLVM project under our new license, so you will see that the license headers include that license only. Some of our contributors have contributed code under our old license, and accordingly, we have retained a copy of our old license notice in the top-level files in each project and repository. llvm-svn: 351636	2019-01-19 08:50:56 +00:00
Tim Renouf	632f35d495	Add gfx909 to GPU Arch Subscribers: jholewinski, cfe-commits Differential Revision: https://reviews.llvm.org/D53558 llvm-svn: 345198	2018-10-24 21:19:02 +00:00

1 2

62 Commits