Commit Graph

208 Commits

Author SHA1 Message Date
Yaxun (Sam) Liu
b008ea304d [CUDA][HIP] Fix device variable linkage
For -fgpu-rdc, shadow variables should not be internalized, otherwise
they cannot be accessed by other TUs. This is necessary because
the shadow variable of external device variables are always
emitted as undefined symbols, which need to resolve to a global
symbols.

Managed variables need to be emitted as undefined symbols
in device compilations.

Reviewed by: Artem Belevich

Differential Revision: https://reviews.llvm.org/D95901
2021-02-05 15:11:12 -05:00
Michael Liao
01bf529db2 Recommit of a2fdf9d4d7.
- The failures are all cc1-based tests due to the missing `-aux-triple` options,
which is always prepared by the driver in CUDA/HIP compilation.
- Add extra check on the missing aux-targetinfo to prevent crashing.

[hip][cuda] Enable extended lambda support on Windows.

- On Windows, extended lambda has extra issues due to the numbering
schemes are different between the host compilation (Microsoft C++ ABI)
and the device compilation (Itanium C++ ABI. Additional device side
lambda number is required per lambda for the host compilation to
correctly mangle the device-side lambda name.
- A hybrid numbering context `MSHIPNumberingContext` is introduced to
number a lambda for both host- and device-compilations.

Reviewed By: rnk

Differential Revision: https://reviews.llvm.org/D69322

This reverts commit 4874ff0241.
2021-02-05 11:27:30 -05:00
Nico Weber
4874ff0241 Revert "[hip][cuda] Enable extended lambda support on Windows."
This reverts commit a2fdf9d4d7.
Slightly speculative, seeing several cuda tests fail on this
Windows bot: http://45.33.8.238/win/32620/step_7.txt
2021-02-04 07:10:46 -05:00
Michael Liao
a2fdf9d4d7 [hip][cuda] Enable extended lambda support on Windows.
- On Windows, extended lambda has extra issues due to the numbering
  schemes are different between the host compilation (Microsoft C++ ABI)
  and the device compilation (Itanium C++ ABI. Additional device side
  lambda number is required per lambda for the host compilation to
  correctly mangle the device-side lambda name.
- A hybrid numbering context `MSHIPNumberingContext` is introduced to
  number a lambda for both host- and device-compilations.

Reviewed By: rnk

Differential Revision: https://reviews.llvm.org/D69322
2021-02-04 01:38:29 -05:00
Yaxun (Sam) Liu
622eaa4a4c [HIP] Support __managed__ attribute
This patch implements codegen for __managed__ variable attribute for HIP.

Diagnostics will be added later.

Differential Revision: https://reviews.llvm.org/D94814
2021-01-22 11:43:58 -05:00
Artem Belevich
127091bfd5 [CUDA] Normalize handling of defauled dtor.
Defaulted destructor was treated inconsistently, compared to other
compiler-generated functions.

When Sema::IdentifyCUDATarget() got called on just-created dtor which didn't
have implicit __host__ __device__ attributes applied yet, it would treat it as a
host function.  That happened to (sometimes) hide the error when dtor referred
to a host-only functions.

Even when we had identified defaulted dtor as a HD function, we still treated it
inconsistently during selection of usual deallocators, where we did not allow
referring to wrong-side functions, while it is allowed for other HD functions.

This change brings handling of defaulted dtors in line with other HD functions.

Differential Revision: https://reviews.llvm.org/D94732
2021-01-21 10:48:07 -08:00
Jan Svoboda
7ab803095a [clang][cli] Remove -f[no-]trapping-math from -cc1 command line
This patch removes the -f[no-]trapping-math flags from the -cc1 command line. These flags are ignored in the command line parser and their semantics is fully handled by -ffp-exception-mode.

This patch does not remove -f[no-]trapping-math from the driver command line. The driver flags are being used and do affect compilation.

Reviewed By: dexonsmith, SjoerdMeijer

Differential Revision: https://reviews.llvm.org/D93395
2021-01-12 10:00:23 +01:00
Fangrui Song
219d00e0d9 [test] Make ELF tests immune to dso_local/dso_preemptable/(none) differences
ELF -cc1 -mrelocation-model pic will default to no semantic interposition plus
setting dso_local on default visibility external linkage definitions, so that
COFF, Mach-O and ELF output will be similar.

This patch makes tests immune to the differences.
2020-12-31 13:59:44 -08:00
Fangrui Song
fd739804e0 [test] Add {{.*}} to make ELF tests immune to dso_local/dso_preemptable/(none) differences
For a default visibility external linkage definition, dso_local is set for ELF
-fno-pic/-fpie and COFF and Mach-O. Since default clang -cc1 for ELF is similar
to -fpic ("PIC Level" is not set), this nuance causes unneeded binary format differences.

To make emitted IR similar, ELF -cc1 -fpic will default to -fno-semantic-interposition,
which sets dso_local for default visibility external linkage definitions.

To make this flip smooth and enable future (dso_local as definition default),
this patch replaces (function) `define ` with `define{{.*}} `,
(variable/constant/alias) `= ` with `={{.*}} `, or inserts appropriate `{{.*}} `.
2020-12-31 00:27:11 -08:00
Yaxun (Sam) Liu
3a781b912f Fix assertion in tryEmitAsConstant
due to cd95338ee3

Need to check if result is LValue before getLValueBase.
2020-12-02 19:10:01 -05:00
Yaxun (Sam) Liu
5c8911d0ba [CUDA][HIP] Diagnose reference of host variable
This patch diagnoses invalid references of global host variables in device,
global, or host device functions.

Differential Revision: https://reviews.llvm.org/D91281
2020-12-02 10:15:56 -05:00
Yaxun (Sam) Liu
cd95338ee3 [CUDA][HIP] Fix capturing reference to host variable
In C++ when a reference variable is captured by copy, the lambda
is supposed to make a copy of the referenced variable in the captures
and refer to the copy in the lambda. Therefore, it is valid to capture
a reference to a host global variable in a device lambda since the
device lambda will refer to the copy of the host global variable instead
of access the host global variable directly.

However, clang tries to avoid capturing of reference to a host global variable
if it determines the use of the reference variable in the lambda function is
not odr-use. Clang also tries to emit load of the reference to a global variable
as load of the global variable if it determines that the reference variable is
a compile-time constant.

For a device lambda to capture a reference variable to host global variable
and use the captured value, clang needs to be taught that in such cases the use of the reference
variable is odr-use and the reference variable is not compile-time constant.

This patch fixes that.

Differential Revision: https://reviews.llvm.org/D91088
2020-12-02 10:14:46 -05:00
Yaxun (Sam) Liu
cb08558caa [HIP] Fix regressions due to fp contract change
Recently HIP toolchain made a change to use clang instead of opt/llc to do compilation
(https://reviews.llvm.org/D81861). The intention is to make HIP toolchain canonical like
other toolchains.

However, this change introduced an unintentional change regarding backend fp fuse
option, which caused regressions in some HIP applications.

Basically before the change, HIP toolchain used clang to generate bitcode, then use
opt/llc to optimize bitcode and generate ISA. As such, the amdgpu backend takes
the default fp fuse mode which is 'Standard'. This mode respect contract flag of
fmul/fadd instructions and do not fuse fmul/fadd instructions without contract flag.

However, after the change, HIP toolchain now use clang to generate IR, do optimization,
and generate ISA as one process. Now amdgpu backend fp fuse option is determined
by -ffp-contract option, which is 'fast' by default. And this -ffp-contract=fast language option
is translated to 'Fast' fp fuse option in backend. Suddenly backend starts to fuse fmul/fadd
instructions without contract flag.

This causes wrong result for some device library functions, e.g. tan(-1e20), which should
return 0.8446, now returns -0.933. What is worse is that since backend with 'Fast' fp fuse
option does not respect contract flag, there is no way to use #pragma clang fp contract
directive to enforce fp contract requirements.

This patch fixes the regression by introducing a new value 'fast-honor-pragmas' for -ffp-contract
and use it for HIP by default. 'fast-honor-pragmas' is equivalent to 'fast' in frontend but
let the backend to use 'Standard' fp fuse option. 'fast-honor-pragmas' is useful since 'Fast'
fp fuse option in backend does not honor contract flag, it is of little use to HIP
applications since all code with #pragma STDC FP_CONTRACT or any IR from a
source compiled with -ffp-contract=on is broken.

Differential Revision: https://reviews.llvm.org/D90174
2020-11-24 08:10:06 -05:00
Yaxun (Sam) Liu
3f4b5893ef [AMDGPU] Add option -munsafe-fp-atomics
Add an option -munsafe-fp-atomics for AMDGPU target.

When enabled, clang adds function attribute "amdgpu-unsafe-fp-atomics"
to any functions for amdgpu target. This allows amdgpu backend to use
unsafe fp atomic instructions in these functions.

Differential Revision: https://reviews.llvm.org/D91546
2020-11-16 21:52:12 -05:00
CJ Johnson
69cd776e1e [CodeGen] Apply 'nonnull' and 'dereferenceable(N)' to 'this' pointer
arguments.

* Adds 'nonnull' and 'dereferenceable(N)' to 'this' pointer arguments
* Gates 'nonnull' on -f(no-)delete-null-pointer-checks
* Introduces this-nonnull.cpp and microsoft-abi-this-nullable.cpp tests to
  explicitly test the behavior of this change
* Refactors hundreds of over-constrained clang tests to permit these
  attributes, where needed
* Updates Clang12 patch notes mentioning this change

Reviewed-by: rsmith, jdoerfert

Differential Revision: https://reviews.llvm.org/D17993
2020-11-16 17:39:17 -08:00
Michael Liao
f375885ab8 [InferAddrSpace] Teach to handle assumed address space.
- In certain cases, a generic pointer could be assumed as a pointer to
  the global memory space or other spaces. With a dedicated target hook
  to query that address space from a given value, infer-address-space
  pass could infer and propagate that to all its users.

Differential Revision: https://reviews.llvm.org/D91121
2020-11-16 17:06:33 -05:00
Michael Liao
8920ef06a1 [hip] Remove the coercion on aggregate kernel arguments.
- If an aggregate argument is indirectly accessed within kernels, direct
  passing results in unpromotable `alloca`, which degrade performance
  significantly. InferAddrSpace pass is enhanced in
  [D91121](https://reviews.llvm.org/D91121) to take the assumption that
  generic pointers loaded from the constant memory could be regarded
  global ones. The need for the coercion on aggregate arguments is
  mitigated.

Differential Revision: https://reviews.llvm.org/D89980
2020-11-12 21:19:30 -05:00
Simon Pilgrim
fc80931b87 [CodeGenCUDA] Fix check prefix typo on device-stub.cu tests
Noticed while fixing unused prefix warnings
2020-11-11 15:44:57 +00:00
Michael Liao
23c6d1501d [amdgpu] Add llvm.amdgcn.endpgm support.
- `llvm.amdgcn.endpgm` is added to enable "abort" support.

Differential Revision: https://reviews.llvm.org/D90809
2020-11-05 19:06:50 -05:00
Artem Belevich
be86b6773b [CUDA] Allow local static variables with target attributes.
While CUDA documentation claims that such variables are not allowed[1], NVCC has
been accepting them since CUDA-10.0[2] and some headers in CUDA-11 rely on this
working.

1. https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#static-variables-function
2. https://godbolt.org/z/zsodzc

Differential Revision: https://reviews.llvm.org/D88345
2020-11-03 10:30:38 -08:00
Yaxun (Sam) Liu
abd8cd9199 [CUDA][HIP] Fix linkage for -fgpu-rdc
Currently for explicit template function instantiation in CUDA/HIP device
compilation clang emits instantiated kernel with external linkage
and instantiated device function with internal linkage.

This is fine for -fno-gpu-rdc since there is only one TU.

However this causes duplicate symbols for kernels for -fgpu-rdc if
the same instantiation happen in multiple TU. Or missing symbols
if a device function calls an explicitly instantiated template function
in a different TU.

To make explicit template function instantiation work for
-fgpu-rdc we need to follow the C++ linkage paradigm, i.e.
use weak_odr linkage.

Differential Revision: https://reviews.llvm.org/D90311
2020-11-03 08:07:19 -05:00
Artem Belevich
0a3ebb4d8d Revert "[CUDA] Allow local static variables with target attributes."
This reverts commit f38a9e5117
Which triggered assertions.
2020-11-02 15:09:07 -08:00
Artem Belevich
f38a9e5117 [CUDA] Allow local static variables with target attributes.
While CUDA documentation claims that such variables are not allowed[1], NVCC has
been accepting them since CUDA-10.0[2] and some headers in CUDA-11 rely on this
working.

1. https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#static-variables-function
2. https://godbolt.org/z/zsodzc

Differential Revision: https://reviews.llvm.org/D88345
2020-11-02 14:37:13 -08:00
Yaxun (Sam) Liu
e384e94fbe Revert "[HIP] Change default --gpu-max-threads-per-block value to 1024"
This reverts commit 187658b8a6 due to
AMDGPU backend issues.
2020-10-15 17:25:55 -04:00
Fangrui Song
a2cc883368 [CUDA] Don't call __cudaRegisterVariable on C++17 inline variables
D17779: host-side shadow variables of external declarations of device-side
global variables have internal linkage and are referenced by
`__cuda_register_globals`.

nvcc from CUDA 11 does not allow `__device__ inline` or `__device__ constexpr`
(C++17 inline variables) but clang has incorrectly supported them for a while:

```
error: A __device__ variable cannot be marked constexpr
error: An inline __device__/__constant__/__managed__ variable must have internal linkage when the program is compiled in whole program mode (-rdc=false)
```

If such a variable (which has a comdat group) is discarded (a copy from another
translation unit is prevailing and selected), accessing the variable from
outside the section group (`__cuda_register_globals`) is a violation of the ELF
specification and will be rejected by linkers:

> A symbol table entry with STB_LOCAL binding that is defined relative to one of a group's sections, and that is contained in a symbol table section that is not part of the group, must be discarded if the group members are discarded. References to this symbol table entry from outside the group are not allowed.

As a workaround, don't register such inline variables for now.
(If we register the variables in all TUs, we will keep multiple instances of the shadow and break the C++ semantics for inline variables).
We should reject such variables in Sema but our internal users need some time to migrate.

Reviewed By: tra

Differential Revision: https://reviews.llvm.org/D88786
2020-10-05 12:53:59 -07:00
Yaxun (Sam) Liu
dc6a0b0ec7 [HIP] Align device binary
To facilitate faster loading of device binaries and share them among processes,
HIP runtime favors their alignment being 4096 bytes. HIP runtime can load
unaligned device binaries, however, aligning them at 4096 bytes results in
faster loading and less shared memory usage.

This patch adds an option -bundle-align to clang-offload-bundler which allows
bundles to be aligned at specified alignment. By default it is 1, which is NFC
compared to existing format.

This patch then aligns embedded fat binary and device binary inside fat binary
at 4096 bytes.

It has been verified this change does not cause significant overall file size increase
for typical HIP applications (less than 1%).

Differential Revision: https://reviews.llvm.org/D88734
2020-10-02 18:10:44 -04:00
Yaxun (Sam) Liu
187658b8a6 Recommit "[HIP] Change default --gpu-max-threads-per-block value to 1024"
Recommit 04abbb3a78
2020-09-28 22:43:17 -04:00
Stanislav Mekhanoshin
59691dc874 [AMDGPU] Make ds fp atomics overloadable
Differential Revision: https://reviews.llvm.org/D87947
2020-09-23 11:39:50 -07:00
Yaxun (Sam) Liu
301e23305d [CUDA][HIP] Fix static device var used by host code only
A static device variable may be accessed in host code through
cudaMemCpyFromSymbol etc. Currently clang does not
emit the static device variable if it is only referenced by
host code, which causes host code to fail at run time.

This patch fixes that.

Differential Revision: https://reviews.llvm.org/D88115
2020-09-23 08:18:19 -04:00
Michael Liao
4d4f092283 [clang][codegen] Skip adding default function attributes on intrinsics.
- After loading builtin bitcode for linking, skip adding default
  function attributes on LLVM intrinsics as their attributes are
  well-defined and retrieved directly from internal definitions. Adding
  extra attributes on intrinsics results in inconsistent result when
  `-save-temps` is present. Also, that makes few optimizations
  conservative.

Differential Revision: https://reviews.llvm.org/D87761
2020-09-16 14:10:05 -04:00
Yaxun (Sam) Liu
62dbb7e54c Revert "[HIP] Change default --gpu-max-threads-per-block value to 1024"
Temporarily revert commit 04abbb3a78
due to regressions in some HIP apps due backend issues revealed by
this change.

Will re-commit it when backend issues are fixed.
2020-09-02 16:12:28 -04:00
Yaxun (Sam) Liu
fb04d7b4a6 [CUDA][HIP] Do not externalize implicit constant static variable
Differential Revision: https://reviews.llvm.org/D85686
2020-08-10 19:02:49 -04:00
Michael Liao
c7b683c126 [PGO][CUDA][HIP] Skip generating profile on the device stub and wrong-side functions.
- Skip generating profile data on `__global__` function in the host
  compilation. It's a host-side stub function only and don't have
  profile instrumentation generated on the real function body. The extra
  profile data results in the malformed instrumentation profile data.
- Skip generating region mapping on functions in the wrong-side, i.e.,
  + For the device compilation, skip host-only functions; and,
  + For the host compilation, skip device-only functions (including
    `__global__` functions.)
- As the device-side profiling is not ready yet, only host-side profile
  code generation is checked.

Differential Revision: https://reviews.llvm.org/D85276
2020-08-10 11:01:46 -04:00
Matt Arsenault
30eeb742f1 clang: Use byref for aggregate kernel arguments
Add address space to indirect abi info and use it for kernels.

Previously, indirect arguments assumed assumed a stack passed object
in the alloca address space using byval. A stack pointer is unsuitable
for kernel arguments, which are passed in a separate, constant buffer
with a different address space.

Start using the new byref for aggregate kernel arguments. Previously
these were emitted as raw struct arguments, and turned into loads in
the backend. These will lower identically, although with byref you now
have the option of applying an explicit alignment. In the future, a
reasonable implementation would use byref for all kernel arguments
(this would be a practical problem at the moment due to losing things
like noalias on pointer arguments).

This is mostly to avoid fighting the optimizer's treatment of
aggregate load/store. SROA and instcombine both turn aggregate loads
and stores into a long sequence of element loads and stores, rather
than the optimizable memcpy I would expect in this situation. Now an
explicit memcpy will be introduced up-front which is better understood
and helps eliminate the alloca in more situations.

This skips using byref in the case where HIP kernel pointer arguments
in structs are promoted to global pointers. At minimum an additional
patch is needed to allow coercion with indirect arguments. This also
skips using it for OpenCL due to the current workaround used to
support kernels calling kernels. Distinct function bodies would need
to be generated up front instead of emitting an illegal call.
2020-08-06 15:52:26 -04:00
Yaxun (Sam) Liu
45f2a56856 [CUDA][HIP] Support accessing static device variable in host code for -fno-gpu-rdc
nvcc supports accessing file-scope static device variables in host code by host APIs
like cudaMemcpyToSymbol etc.

CUDA/HIP let users access device variables in host code by shadow variables. In host compilation,
clang emits a shadow variable for each device variable, and calls __*RegisterVariable to
register it in init function. The address of the shadow variable and the device side mangled
name of the device variable is passed to __*RegisterVariable. Runtime looks up the symbol
by name in the device binary  to find the address of the device variable.

The problem with static device variables is that they have internal linkage, therefore their
name may be changed by the linker if there are multiple symbols with the same name. Also
they end up as local symbols in the elf file, whereas the runtime only looks up the global symbols.

Another reason for making the static device variables external linkage is that they may be
initialized externally by host code and their final value may be accessed by host code
after kernel execution, therefore they actually have external linkage. Giving them internal
linkage will cause incorrect optimizations on them.

To support accessing static device var in host code for -fno-gpu-rdc mode, change the intnernal
linkage to external linkage. The name does not need change since there is only one TU for
-fno-gpu-rdc mode. Also the externalization is done only if the device static var is referenced
by host code.

Differential Revision: https://reviews.llvm.org/D80858
2020-08-05 07:57:38 -04:00
Yaxun (Sam) Liu
1eaad01046 [CUDA][HIP] Let lambda be host device by default
This patch let lambda be host device by default and adds diagnostics for
capturing host variable by reference in device lambda.

Differential Revision: https://reviews.llvm.org/D78655
2020-07-08 13:10:26 -04:00
Michael Liao
471c806a45 [hip] Refine clang/test/CodeGenCUDA/amdgpu-kernel-arg-pointer-type.cu
- Require target x86 being enabled as well.
2020-06-25 23:57:08 -04:00
Michael Liao
0723b1891f [hip] Re-enable clang/test/CodeGenCUDA/amdgpu-kernel-arg-pointer-type.cu
- Require amdgpu target being enabled.
2020-06-25 22:29:27 -04:00
Michael Liao
d3f437d351 [hip] Disable test temporarily due to failures on build servers. 2020-06-25 22:04:20 -04:00
Michael Liao
dccfaacf93 [InferAddressSpaces] Handle the pair of ptrtoint/inttoptr.
Summary:
- `ptrtoint` and `inttoptr` are defined as no-op casts if the integer
  value as the same size as the pointer value. The pair of
  `ptrtoint`/`inttoptr` is in fact a no-op cast sequence between
  different address spaces. Teach `infer-address-spaces` to handle them
  like a `bitcast`.

Reviewers: arsenm, chandlerc

Subscribers: jvesely, wdng, nhaehnle, hiraditya, kerbowa, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D81938
2020-06-25 20:46:56 -04:00
Yaxun (Sam) Liu
049d860707 [CUDA][HIP] Fix constexpr variables for C++17
constexpr variables are compile time constants and implicitly const, therefore
they are safe to emit on both device and host side. Besides, in many cases
they are intended for both device and host, therefore it makes sense
to emit them on both device and host sides if necessary.

In most cases constexpr variables are used as rvalue and the variables
themselves do not need to be emitted. However if their address is taken,
then they need to be emitted.

For C++14, clang is able to handle that since clang emits them with
available_externally linkage together with the initializer.

However for C++17, the constexpr static data member of a class or template class
become inline variables implicitly. Therefore they become definitions with
linkonce_odr or weak_odr linkages. As such, they can not have available_externally
linkage.

This patch fixes that by adding implicit constant attribute to
file scope constexpr variables and constexpr static data members
in device compilation.

Differential Revision: https://reviews.llvm.org/D79237
2020-06-03 21:56:52 -04:00
Yaxun (Sam) Liu
04abbb3a78 [HIP] Change default --gpu-max-threads-per-block value to 1024
Differential Revision: https://reviews.llvm.org/D76795
2020-06-03 11:09:22 -04:00
John McCall
8a8d703be0 Fix how cc1 command line options are mapped into FP options.
Canonicalize on storing FP options in LangOptions instead of
redundantly in CodeGenOptions.  Incorporate -ffast-math directly
into the values of those LangOptions rather than considering it
separately when building FPOptions.  Build IR attributes from
those options rather than a mix of sources.

We should really simplify the driver/cc1 interaction here and have
the driver pass down options that cc1 directly honors.  That can
happen in a follow-up, though.

Patch by Michele Scandale!
https://reviews.llvm.org/D80315
2020-06-01 22:00:30 -04:00
Yaxun (Sam) Liu
361e4f14e3 Fix debug info for NoDebug attr
NoDebug attr does not totally eliminate debug info about a function when
inlining is enabled. This is inconsistent with when inlining is disabled.

This patch fixes that.

Differential Revision: https://reviews.llvm.org/D79967
2020-05-21 09:02:56 -04:00
Eli Friedman
62f3ef2b53 [CGCall] Annotate references with "align" attribute.
If we're going to assume references are dereferenceable, we should also
assume they're aligned: otherwise, we can't actually dereference them.

See also D80072.

Differential Revision: https://reviews.llvm.org/D80166
2020-05-19 20:21:30 -07:00
Yaxun (Sam) Liu
1b7bf1bd75 [HIP] Do not emit debug info for stub function
The stub function is generated by compiler and its instructions have nothing
to do with the kernel source code.

Currently clang generates debug info for the stub function, which causes
confusion for the HIP debugger. For example, when users set break point
on a line of a kernel, the debugger should break on that line when the kernel is
executed and reaches that line, but instead the debugger breaks in the stub function.

This patch disables debug info for stub function for HIP.

Differential Revision: https://reviews.llvm.org/D79866
2020-05-13 17:55:40 -04:00
Michael Liao
9142c0b46b [clang][codegen] Hoist parameter attribute setting in function prolog.
Summary:
- If the coerced type is still a pointer, it should be set with proper
  parameter attributes, such as `noalias`, `nonnull`, and etc. Hoist
  that (pointer) parameter attribute setting so that the coerced pointer
  parameter could be marked properly.

Depends on D79394

Reviewers: rjmccall, kerbowa, yaxunl

Subscribers: jvesely, nhaehnle, cfe-commits

Tags: #clang

Differential Revision: https://reviews.llvm.org/D79395
2020-05-05 15:31:51 -04:00
Michael Liao
612720db87 [hip] Remove test using hip_pinned_shadow attribute. NFC. 2020-04-27 16:44:59 -04:00
Yaxun (Sam) Liu
2c31aa2de1 Speed up deferred diagnostic emitter
Move function emitDeferredDiags from Sema to DeferredDiagsEmitter since it
is only used by DeferredDiagsEmitter.

Also skip visited functions to avoid exponential compile time.

Differential Revision: https://reviews.llvm.org/D77028
2020-04-06 13:07:43 -04:00
Michael Liao
b952d799ca [cuda][hip] Fix RegisterVar function prototype.
Summary:
- `RegisterVar` has `void` return type and `size_t` in its variable size
  parameter in HIP or CUDA 9.0+.

Reviewers: tra, yaxunl

Subscribers: cfe-commits

Tags: #clang

Differential Revision: https://reviews.llvm.org/D77398
2020-04-03 12:57:09 -04:00