Commit Graph

62 Commits

Author SHA1 Message Date
Kazu Hirata
f3dcc2351c [clang] Use StringRef::{starts,ends}_with (NFC) (#75149)
This patch replaces uses of StringRef::{starts,ends}with with
StringRef::{starts,ends}_with for consistency with
std::{string,string_view}::{starts,ends}_with in C++20.

I'm planning to deprecate and eventually remove
StringRef::{starts,ends}with.
2023-12-13 08:54:13 -08:00
Artem Belevich
631c6e834c [CUDA] Add support for CUDA-12.3 and sm_90a (#74895) 2023-12-11 12:18:28 -08:00
Jay Foad
cf1e0c0b07 [AMDGPU] Define new targets gfx1200 and gfx1201 (#73133)
Define target names and ELF numbers for new GFX12 targets gfx1200 and
gfx1201. For now they behave identically to GFX11.
2023-11-23 16:44:05 +00:00
Jay Foad
92542f2a40 [AMDGPU] Add targets gfx1150 and gfx1151
This is the target definition only. Currently they are treated the same
as GFX 11.0.x.

Differential Revision: https://reviews.llvm.org/D155429
2023-07-17 13:06:12 +01:00
Sergio Afonso
63ca93c7d1 [OpenMP][OMPIRBuilder] Rename IsEmbedded and IsTargetCodegen flags
This patch renames the `OpenMPIRBuilderConfig` flags to reduce confusion over
their meaning. `IsTargetCodegen` becomes `IsGPU`, whereas `IsEmbedded` becomes
`IsTargetDevice`. The `-fopenmp-is-device` compiler option is also renamed to
`-fopenmp-is-target-device` and the `omp.is_device` MLIR attribute is renamed
to `omp.is_target_device`. Getters and setters of all these renamed properties
are also updated accordingly. Many unit tests have been updated to use the new
names, but an alias for the `-fopenmp-is-device` option is created so that
external programs do not stop working after the name change.

`IsGPU` is set when the target triple is AMDGCN or NVIDIA PTX, and it is only
valid if `IsTargetDevice` is specified as well. `IsTargetDevice` is set by the
`-fopenmp-is-target-device` compiler frontend option, which is only added to
the OpenMP device invocation for offloading-enabled programs.

Differential Revision: https://reviews.llvm.org/D154591
2023-07-10 14:14:16 +01:00
Konstantin Zhuravlyov
9d05727972 AMDGPU: Add basic gfx942 target
Differential Revision: https://reviews.llvm.org/D149983
2023-05-10 11:51:06 -04:00
Konstantin Zhuravlyov
1fc70210a6 AMDGPU: Add basic gfx941 target
Differential Revision: https://reviews.llvm.org/D149982
2023-05-10 11:51:06 -04:00
Stoorx
830b359d3a [clang] Return std::unique_ptr<TargetInfo> from AllocateTarget
In file 'clang/lib/Basic/Targets.cpp' the function 'AllocateTarget' had a raw pointer as a return type, which have been wrapped in the 'std::unique_ptr' in all usages.
This commit changes the signature of the function to return an instance of 'std::unique_ptr' directly.

Reviewed By: DavidSpickett

Differential Revision: https://reviews.llvm.org/D148574
2023-04-18 10:07:26 +00:00
Joseph Huber
bed7005eb4 [NVPTX] Add __CUDA_ARCH__ macro to standalone NVPTX compilations
We can now target the NVPTX architecture directly via
`--target=nvptx64-nvidia-cuda`. This currently does not define the
`__CUDA_ARCH__` macro with is used to allow code to target different
codes based on support. This patch simply adds this support.

Reviewed By: tra, jdoerfert

Differential Revision: https://reviews.llvm.org/D146975
2023-03-27 18:08:15 -05:00
Joseph Huber
af54d1e852 [NVPTX] Set the atomic inling threshold when targeting NVPTX directly
Since Clang 16.0.0 users can target the `NVPTX` architecture directly
via `--target=nvptx64-nvidia-cuda`. However, this does not set the
atomic inlining size correctly. This leads to spurious warnings and
emission of runtime atomics that are never implemented. This patch
ensures that we set this to the appropriate pointer width. This will
always be 64 in the future as `nvptx64` will only be supported moving
forward.

Fixes: https://github.com/llvm/llvm-project/issues/61410

Reviewed By: tra

Differential Revision: https://reviews.llvm.org/D146750
2023-03-23 16:30:07 -05:00
serge-sans-paille
5a7f47cc02 [clang] Optimize clang::Builtin::Info density
Reorganize clang::Builtin::Info to have them naturally align on 4 bytes
boundaries.

Instead of storing builtin headers as a straight char pointer, enumerate
them and store the enum. It allows to use a small enum instead of a
pointer to reference them.

On a 64 bit machine, this brings sizeof(clang::Builtin::Info) from 56
down to 48 bytes.

On a release build on my Linux 64 bit machine, it shrinks the size of
libclang-cpp.so by 193kB.

The impact on performance is negligible in terms of instruction count,
but the wall time seems better, see
https://llvm-compile-time-tracker.com/compare.php?from=b3d8639f3536a4876b511aca9fb7948ff9266cee&to=a89b56423f98b550260a58c41e64aff9e56b76be&stat=task-clock

Differential Revision: https://reviews.llvm.org/D142024
2023-01-23 14:27:44 +01:00
serge-sans-paille
a3c248db87 Move from llvm::makeArrayRef to ArrayRef deduction guides - clang/ part
This is a follow-up to https://reviews.llvm.org/D140896, split into
several parts as it touches a lot of files.

Differential Revision: https://reviews.llvm.org/D141139
2023-01-09 12:15:24 +01:00
serge-sans-paille
d9ab3e82f3 [clang] Use a StringRef instead of a raw char pointer to store builtin and call information
This avoids recomputing string length that is already known at compile time.

It has a slight impact on preprocessing / compile time, see

https://llvm-compile-time-tracker.com/compare.php?from=3f36d2d579d8b0e8824d9dd99bfa79f456858f88&to=e49640c507ddc6615b5e503144301c8e41f8f434&stat=instructions:u

This a recommit of e953ae5bbc and the subsequent fixes caa713559b and 06b90e2e9c.

The above patchset caused some version of GCC to take eons to compile clang/lib/Basic/Targets/AArch64.cpp, as spotted in aa171833ab.
The fix is to make BuiltinInfo tables a compilation unit static variable, instead of a private static variable.

Differential Revision: https://reviews.llvm.org/D139881
2022-12-27 09:55:19 +01:00
Alex Richardson
a602f76a24 [clang][TargetInfo] Use LangAS for getPointer{Width,Align}()
Mixing LLVM and Clang address spaces can result in subtle bugs, and there
is no need for this hook to use the LLVM IR level address spaces.
Most of this change is just replacing zero with LangAS::Default,
but it also allows us to remove a few calls to getTargetAddressSpace().

This also removes a stale comment+workaround in
CGDebugInfo::CreatePointerLikeType(): ASTContext::getTypeSize() does
return the expected size for ReferenceType (and handles address spaces).

Differential Revision: https://reviews.llvm.org/D138295
2022-11-30 20:24:01 +00:00
Artem Belevich
0e8a414ab3 [CUDA, NVPTX] Added basic __bf16 support for NVPTX.
Recent Clang changes expose _bf16 types for SSE2-enabled host compilations and
that makes those types visible furing GPU-side compilation, where it currently
fails with Sema complaining that __bf16 is not supported.

Considering that __bf16 is a storage-only type, enabling it for NVPTX if it's
enabled on the host should pose no issues, correctness-wise.

Recent NVIDIA GPUs have introduced bf16 support, so we'll likely grow better
support for __bf16 on NVPTX going forward.

Differential Revision: https://reviews.llvm.org/D136311
2022-10-25 11:08:06 -07:00
Artem Belevich
9a01cca660 Add support for CUDA-11.8 and sm_{87,89,90} GPUs.
Differential Revision: https://reviews.llvm.org/D135306
2022-10-07 13:59:28 -07:00
Artem Belevich
f3a2cbcf97 Refactored CUDA version housekeeping to use less boilerplate.
Differential Revision: https://reviews.llvm.org/D135328
2022-10-07 13:59:23 -07:00
Joseph Huber
002a63f937 [OpenMP] Add __CUDA_ARCH__ definition when offloading with OpenMP
Currently we define the `__CUDA_ARCH__` macro only in CUDA mode. This
patch allows us to use this macro in OpenMP-offloading mode when
targeting NVPTX.

Reviewed By: tra, tianshilei1992

Differential Revision: https://reviews.llvm.org/D125256
2022-05-13 14:38:35 -04:00
Joe Nash
8bdfc73f63 [AMDGPU][clang] Definition of gfx11 subtarget
Contributors:
Jay Foad <jay.foad@amd.com>
Konstantin Zhuravlyov <kzhuravl_dev@outlook.com>

Patch 2/N for upstreaming of AMDGPU gfx11 architecture

Depends on D124536

Reviewed By: foad, kzhuravl, #amdgpu, arsenm

Differential Revision: https://reviews.llvm.org/D124537
2022-04-29 13:55:56 -04:00
Aakanksha
840695814a [AMDGPU] Add gfx1036 target
Differential Revision: https://reviews.llvm.org/D120846
2022-03-02 23:26:38 +00:00
Stanislav Mekhanoshin
2e2e64df4a [AMDGPU] Add gfx940 target
This is target definition only.

Differential Revision: https://reviews.llvm.org/D120688
2022-03-02 13:54:48 -08:00
Yaxun (Sam) Liu
a6786cdd57 [HIPSPV][3/4] Enable SPIR-V emission for HIP
This patch enables SPIR-V binary emission for HIP device code via the
HIPSPV tool chain.

‘--offload’ option, which is envisioned in [1], is added for specifying
offload targets. This option is used to override default device target
(amdgcn-amd-amdhsa) for HIP compilation for emitting device code as
SPIR-V binary. The option is handled in getHIPOffloadTargetTriple().

getOffloadingDeviceToolChain() function (based on the design in the
SYCL repository) is added to select HIPSPVToolChain when HIP offload
target is ‘spirv64’.

The HIPActionBuilder is modified to produce LLVM IR at the backend
phase. HIPSPV tool chain expects to receive HIP device code as LLVM
IR so it can run external LLVM passes over them. HIPSPV TC is also
responsible for emitting the SPIR-V binary.

A Cuda GPU architecture ‘generic’ is added. The name is picked from
the LLVM SPIR-V Backend. In the HIPSPV code path the architecture
name is inserted to the bundle entry ID as target ID. Target ID is
expected to be always present so a component in the target triple
is not mistaken as target ID.

Tests are added for checking the HIPSPV tool chain.

[1]: https://lists.llvm.org/pipermail/cfe-dev/2020-December/067362.html

Patch by: Henry Linjamäki

Reviewed by: Yaxun Liu, Artem Belevich, Alexey Bader

Differential Revision: https://reviews.llvm.org/D110622
2021-12-20 10:45:09 -05:00
Carlos Galvez
7ecec3f0f5 [CUDA] Bump supported CUDA version to 11.5
Differential Revision: https://reviews.llvm.org/D113249
2021-11-09 08:20:53 +00:00
Artem Belevich
49d982d8cb [CUDA] Add support for CUDA-11.4
Differential Revision: https://reviews.llvm.org/D108239
2021-08-23 13:24:46 -07:00
Jon Chesterfield
c2574e63ff [openmp][nfc] Refactor GridValues
Remove redundant fields and replace pointer with virtual function

Of fourteen fields, three are dead and four can be computed from the
remainder. This leaves a couple of currently dead fields in place as
they are expected to be used from the deviceRTL shortly. Two of the
fields that can be computed are only used from codegen and require a
log2() implementation so are inlined into codegen instead.

This change leaves the new methods in the same location in the struct
as the previous fields for convenience at review.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D108380
2021-08-23 16:19:11 +01:00
Jon Chesterfield
b1efeface7 Revert "[openmp][nfc] Refactor GridValues"
Failed a nvptx codegen test
This reverts commit 2a47a84b40.
2021-08-20 18:17:27 +01:00
Jon Chesterfield
2a47a84b40 [openmp][nfc] Refactor GridValues
Remove redundant fields and replace pointer with virtual function

Of fourteen fields, three are dead and four can be computed from the
remainder. This leaves a couple of currently dead fields in place as
they are expected to be used from the deviceRTL shortly. Two of the
fields that can be computed are only used from codegen and require a
log2() implementation so are inlined into codegen instead.

This change leaves the new methods in the same location in the struct
as the previous fields for convenience at review.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D108380
2021-08-20 16:41:26 +01:00
Jon Chesterfield
77579b99e9 [openmp][nfc] Replace OMPGridValues array with struct
[nfc] Replaces enum indices into an array with a struct. Named the
fields to match the enum, leaves memory layout and initialization unchanged.

Motivation is to later safely remove dead fields and replace redundant ones
with (compile time) computation. It should also be possible to factor some
common fields into a base and introduce a gfx10 amdgpu instance with less
duplication than the arrays of integers require.

Reviewed By: ronlieb

Differential Revision: https://reviews.llvm.org/D108339
2021-08-19 13:25:42 +01:00
Aakanksha Patil
3453f3dd46 [AMDGPU] Add gfx1035 target
Differential Revision: https://reviews.llvm.org/D104804
2021-06-24 14:32:41 -04:00
Brendon Cahoon
294efbbd3e Reland "[AMDGPU] Add gfx1013 target"
This reverts commit 211e584fa2.

Fixed a use-after-free error that caused the sanitizers to fail.
2021-06-08 21:15:35 -04:00
Brendon Cahoon
211e584fa2 Revert "[AMDGPU] Add gfx1013 target"
This reverts commit ea10a86984.

A sanitizer buildbot reports an error.
2021-06-08 16:29:41 -04:00
Brendon Cahoon
ea10a86984 [AMDGPU] Add gfx1013 target
Differential Revision: https://reviews.llvm.org/D103663
2021-06-08 12:49:49 -04:00
Aakanksha Patil
464e4dc50f [AMDGPU] Add gfx1034 target
Differential Revision: https://reviews.llvm.org/D102306
2021-05-13 14:25:18 -04:00
Stanislav Mekhanoshin
a8d9d50762 [AMDGPU] gfx90a support
Differential Revision: https://reviews.llvm.org/D96906
2021-02-17 16:01:32 -08:00
Artem Belevich
2aa01ccec3 [CUDA, NVPTX] Allow targeting sm_86 GPUs.
The patch only plumbs through the option necessary for targeting sm_86 GPUs w/o
adding any new functionality.

Differential Revision: https://reviews.llvm.org/D95974
2021-02-09 11:01:10 -08:00
Tim Renouf
89d41f3a2b [AMDGPU] Add gfx1033 target
Differential Revision: https://reviews.llvm.org/D90447

Change-Id: If2650fc7f31bbdd49c76e74a9ca8e3734d769761
2020-11-03 16:27:48 +00:00
Tim Renouf
ee3e642627 [AMDGPU] Add gfx90c target
This differentiates the Ryzen 4000/4300/4500/4700 series APUs that were
previously included in gfx909.

Differential Revision: https://reviews.llvm.org/D90419

Change-Id: Ia901a7157eb2f73ccd9f25dbacec38427312377d
2020-11-03 16:27:43 +00:00
Stanislav Mekhanoshin
d1beb95d12 [AMDGPU] gfx1032 target
Differential Revision: https://reviews.llvm.org/D89487
2020-10-15 12:41:18 -07:00
Tim Renouf
666ef0db20 [AMDGPU] Add gfx602, gfx705, gfx805 targets
At AMD, in an internal audit of our code, we found some corner cases
where we were not quite differentiating targets enough for some old
hardware. This commit is part of fixing that by adding three new
targets:

* The "Oland" and "Hainan" variants of gfx601 are now split out into
  gfx602. LLPC (in the GPUOpen driver) and other front-ends could use
  that to avoid using the shaderZExport workaround on gfx602.

* One variant of gfx703 is now split out into gfx705. LLPC and other
  front-ends could use that to avoid using the
  shaderSpiCsRegAllocFragmentation workaround on gfx705.

* The "TongaPro" variant of gfx802 is now split out into gfx805.
  TongaPro has a faster 64-bit shift than its former friends in gfx802,
  and a subtarget feature could be set up for that to take advantage of
  it. This commit does not make that change; it just adds the target.

V2: Add clang changes. Put TargetParser list in order.
V3: AMDGCNGPUs table in TargetParser.cpp needs to be in GPUKind order,
    so fix the GPUKind order.

Differential Revision: https://reviews.llvm.org/D88916

Change-Id: Ia901a7157eb2f73ccd9f25dbacec38427312377d
2020-10-10 17:22:22 +01:00
Yaxun (Sam) Liu
cbd420c5ed [CUDA][HIP] Fix bound arch for offload action for fat binary
Currently CUDA/HIP toolchain uses "unknown" as bound arch
for offload action for fat binary. This causes -mcpu or -march
with "unknown" added in HIPToolChain::TranslateArgs or
CUDAToolChain::TranslateArgs.

This causes issue for https://reviews.llvm.org/D88377 since
HIP toolchain needs to check -mcpu in HIPToolChain::TranslateArgs.

The bound arch of offload action for fat binary is not really
used, therefore set it to CudaArch::UNUSED.

Differential Revision: https://reviews.llvm.org/D88524
2020-10-02 19:05:51 -04:00
Stanislav Mekhanoshin
ea7d0e2996 [AMDGPU] gfx1031 target
Differential Revision: https://reviews.llvm.org/D85337
2020-08-05 12:36:26 -07:00
Stanislav Mekhanoshin
9ee272f13d [AMDGPU] Add gfx1030 target
Differential Revision: https://reviews.llvm.org/D81886
2020-06-15 16:18:05 -07:00
Saiyedul Islam
4022bc2a6c [OpenMP][AMDGCN] Support OpenMP offloading for AMDGCN architecture - Part 2
Summary:
New file include to support platform dependent grid constants. It will be
used by clang, libomptarget plugins, and deviceRTLs to access constant
values consistently and with fast access in the deviceRTLs.

Originally authored by Greg Rodgers (@gregrodgers).

Reviewers: arsenm, sameerds, jdoerfert, yaxunl, b-sumner, scchan, JonChesterfield

Reviewed By: arsenm

Subscribers: llvm-commits, pdhaliwal, jholewinski, jvesely, wdng, nhaehnle, guansong, kerbowa, sstefan1, cfe-commits, ronlieb, gregrodgers

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D80917
2020-06-10 18:09:59 +00:00
Artem Belevich
a9627b7ea7 [CUDA] Add partial support for recent CUDA versions.
Generate PTX using newer versions of PTX and allow using sm_80 with CUDA-11.
None of the new features of CUDA-10.2+ have been implemented yet, so using these
versions will still produce a warning.

Differential Revision: https://reviews.llvm.org/D77670
2020-04-08 11:19:44 -07:00
Yaxun Liu
6add24adaf [HIP] Add GPU arch gfx1010, gfx1011, and gfx1012
Differential Revision: https://reviews.llvm.org/D64364

llvm-svn: 365799
2019-07-11 17:50:09 +00:00
Stanislav Mekhanoshin
0cfd75a07d [AMDGPU] gfx908 clang target
Differential Revision: https://reviews.llvm.org/D64430

llvm-svn: 365528
2019-07-09 18:19:00 +00:00
Tom Tan
b7c6d95af5 [COFF, ARM64] Align global symbol by size for ARM64 MSVC ABI
According to alignment section in below ARM64 ABI document, MSVC could increase
alignment of global data based on its total size. Clang doesn't do this. Compile
the same symbol into different alignments by Clang and MSVC could cause link
error because some instruction encodings, like 64-bit LDR/STR with immediate,
require the target to be 8 bytes aligned, and linker could choose code stream
with such LDR/STR instruction from MSVC and 4 bytes aligned data from Clang into
final image, which actually cannot be linked together
(see https://bugs.llvm.org/show_bug.cgi?id=41506 for more details).

https://docs.microsoft.com/en-us/cpp/build/arm64-windows-abi-conventions?view=vs-2019#alignment

Differential Revision: https://reviews.llvm.org/D61225

llvm-svn: 359744
2019-05-02 00:38:14 +00:00
Artem Belevich
5fe85a003f [CUDA] Implemented _[bi]mma* builtins.
These builtins provide access to the new integer and
sub-integer variants of MMA (matrix multiply-accumulate) instructions
provided by CUDA-10.x on sm_75 (AKA Turing) GPUs.

Also added a feature for PTX 6.4. While Clang/LLVM does not generate
any PTX instructions that need it, we still need to pass it through to
ptxas in order to be able to compile code that uses the new 'mma'
instruction as inline assembly (e.g used by NVIDIA's CUTLASS library
https://github.com/NVIDIA/cutlass/blob/master/cutlass/arch/mma.h#L101)

Differential Revision: https://reviews.llvm.org/D60279

llvm-svn: 359248
2019-04-25 22:28:09 +00:00
Chandler Carruth
2946cd7010 Update the file headers across all of the LLVM projects in the monorepo
to reflect the new license.

We understand that people may be surprised that we're moving the header
entirely to discuss the new license. We checked this carefully with the
Foundation's lawyer and we believe this is the correct approach.

Essentially, all code in the project is now made available by the LLVM
project under our new license, so you will see that the license headers
include that license only. Some of our contributors have contributed
code under our old license, and accordingly, we have retained a copy of
our old license notice in the top-level files in each project and
repository.

llvm-svn: 351636
2019-01-19 08:50:56 +00:00
Tim Renouf
632f35d495 Add gfx909 to GPU Arch
Subscribers: jholewinski, cfe-commits

Differential Revision: https://reviews.llvm.org/D53558

llvm-svn: 345198
2018-10-24 21:19:02 +00:00