Commit Graph

299 Commits

Author SHA1 Message Date
Rafael Espindola
922f2aa9b2 Bring r325915 back.
The tests that failed on a windows host have been fixed.

Original message:

Start setting dso_local for COFF.

With this there are still some GVs where we don't set dso_local
because setGVProperties is never called. I intend to fix that in
followup commits. This is just the bare minimum to teach
shouldAssumeDSOLocal what it should do for COFF.

llvm-svn: 325940
2018-02-23 19:30:48 +00:00
Alexey Sotkin
20f65928e1 [OpenCL] Add '-cl-uniform-work-group-size' compile option
Summary:
OpenCL 2.0 specification defines '-cl-uniform-work-group-size' option,
which requires that the global work-size be a multiple of the work-group
size specified to clEnqueueNDRangeKernel and allows optimizations that
are made possible by this restriction.

The patch introduces the support of this option.

To keep information about whether an OpenCL kernel has uniform work
group size or not, clang generates 'uniform-work-group-size' function
attribute for every kernel:
- "uniform-work-group-size"="true" for OpenCL 1.2 and lower,
- "uniform-work-group-size"="true" for OpenCL 2.0 and higher if
 '-cl-uniform-work-group-size' option was specified,
- "uniform-work-group-size"="false" for OpenCL 2.0 and higher if no
 '-cl-uniform-work-group-size' options was specified.

If the function is not an OpenCL kernel, 'uniform-work-group-size'
attribute isn't generated.

Patch by: krisb

Reviewers: yaxunl, Anastasia, b-sumner

Reviewed By: yaxunl, Anastasia

Subscribers: nhaehnle, yaxunl, Anastasia, cfe-commits

Differential Revision: https://reviews.llvm.org/D43570

llvm-svn: 325771
2018-02-22 11:54:14 +00:00
Yaxun Liu
f8ad59d99d Clean up AMDGCN tests
Differential Revision: https://reviews.llvm.org/D43340

llvm-svn: 325279
2018-02-15 19:12:41 +00:00
Yaxun Liu
fa13d015a3 [OpenCL] Fix __enqueue_block for block with captures
The following test case causes issue with codegen of __enqueue_block

void (^block)(void) = ^{ callee(id, out); };

enqueue_kernel(queue, 0, ndrange, block);
Clang first does codegen for block expression in the first line and deletes its block info.
Clang then tries to do codegen for the same block expression again for the second line,
and fails because the block info is gone.

The fix is to do normal codegen for both lines. Introduce an API to OpenCL runtime to
record llvm block invoke function and llvm block literal emitted for each AST block
expression, and use the recorded information for generating the wrapper kernel.

The EmitBlockLiteral APIs are cleaned up to minimize changes to the normal codegen
of blocks.

Another minor issue is that some clean up AST expression is generated for block
with captures, which can be stripped by IgnoreImplicit.

Differential Revision: https://reviews.llvm.org/D43240

llvm-svn: 325264
2018-02-15 16:39:19 +00:00
Yaxun Liu
651bd73c02 [AMDGPU] Change constant addr space to 4
Differential Revision: https://reviews.llvm.org/D43171

llvm-svn: 325031
2018-02-13 18:01:21 +00:00
Matt Arsenault
e7da136a74 AMDGPU: Update for datalayout change
llvm-svn: 324748
2018-02-09 16:58:41 +00:00
Matt Arsenault
935574a490 Fix crash on array initializer with non-0 alloca addrspace
llvm-svn: 324641
2018-02-08 19:37:09 +00:00
Daniil Fukalov
da2a0558ea Recommit rL323890: [AMDGPU] Add ds_fadd, ds_fmin, ds_fmax builtins functions
Fixed asserts in tests.

llvm-svn: 324201
2018-02-04 22:32:07 +00:00
Yaxun Liu
f5f45e5e63 [AMDGPU] Switch to the new addr space mapping by default
This requires corresponding llvm change.

Differential Revision: https://reviews.llvm.org/D40956

llvm-svn: 324102
2018-02-02 16:08:24 +00:00
Daniil Fukalov
07df4ffae7 Revert "[AMDGPU] Add ds_fadd, ds_fmin, ds_fmax builtins functions"
This reverts https://reviews.llvm.org/rL323890

This reverts commit 251524ebd8c346a936f0e74b09d609d49fbaae4a.

llvm-svn: 323896
2018-01-31 18:49:49 +00:00
Daniil Fukalov
e72cde57d2 [AMDGPU] Add ds_fadd, ds_fmin, ds_fmax builtins functions
Reviewed by arsenm

Differential Revision: https://reviews.llvm.org/D42578

llvm-svn: 323890
2018-01-31 16:55:09 +00:00
Daniel Neilson
6e938effaa Change memcpy/memove/memset to have dest and source alignment attributes (Step 1).
Summary:
  Upstream LLVM is changing the the prototypes of the @llvm.memcpy/memmove/memset
intrinsics. This change updates the Clang tests for this change.

  The @llvm.memcpy/memmove/memset intrinsics currently have an explicit argument
which is required to be a constant integer. It represents the alignment of the
dest (and source), and so must be the minimum of the actual alignment of the
two.

 This change removes the alignment argument in favour of placing the alignment
attribute on the source and destination pointers of the memory intrinsic call.

 For example, code which used to read:
   call void @llvm.memcpy.p0i8.p0i8.i32(i8* %dest, i8* %src, i32 100, i32 4, i1 false)
will now read
   call void @llvm.memcpy.p0i8.p0i8.i32(i8* align 4 %dest, i8* align 4 %src, i32 100, i1 false)

 At this time the source and destination alignments must be the same (Step 1).
Step 2 of the change, to be landed shortly, will relax that contraint and allow
the source and destination to have different alignments.

llvm-svn: 322964
2018-01-19 17:12:54 +00:00
Yaxun Liu
c325d30d2c CodeGen: Fix invalid bitcasts for memcpy
CreateCoercedLoad/CreateCoercedStore assumes pointer argument of
memcpy is in addr space 0, which is not correct and causes invalid
bitcasts for triple amdgcn---amdgiz.

It is fixed by using alloca addr space instead.

Differential Revision: https://reviews.llvm.org/D40806

llvm-svn: 320000
2017-12-07 01:39:52 +00:00
Alexey Bader
bed400957b [OpenCL] Fix code generation of function-scope constant samplers.
Summary:
Constant samplers are handled as static variables and clang's code generation
library, which leads to llvm::unreachable. We bypass emitting sampler variable
as static since it's translated to a function call later.

Reviewers: yaxunl, Anastasia

Reviewed By: yaxunl, Anastasia

Subscribers: cfe-commits

Differential Revision: https://reviews.llvm.org/D34342

llvm-svn: 318290
2017-11-15 11:38:17 +00:00
Matt Arsenault
a5888a730d OpenCL: Assume inline asm is convergent
Already done for CUDA.

llvm-svn: 318098
2017-11-13 22:40:55 +00:00
Yaxun Liu
e45b3d5dad CodeGen: Fix missing debug loc due to alloca
Builder save/restores insertion pointer when emitting addr space cast
for alloca, but does not save/restore debug loc, which causes verifier
failure for certain call instructions.

This patch fixes that.

Differential Revision: https://reviews.llvm.org/D39069

llvm-svn: 316484
2017-10-24 19:14:43 +00:00
Yaxun Liu
bd3823618d CodeGen: Fix invalid bitcast in partial initialization of automatic arrary variable
Differential Revision: https://reviews.llvm.org/D39184

llvm-svn: 316353
2017-10-23 17:49:26 +00:00
Yaxun Liu
f2127d1728 [AMDGPU] Fix bug in enqueued block codegen due to an extra line
llvm-svn: 316165
2017-10-19 15:56:13 +00:00
Yaxun Liu
8ab5ab066a CodeGen: Fix invalid bitcasts for atomic builtins
Currently clang assumes the temporary variables emitted during
codegen of atomic builtins have address space 0, which
is not true for target triple amdgcn---amdgiz and causes invalid
bitcasts.

This patch fixes that.

Differential Revision: https://reviews.llvm.org/D38966

llvm-svn: 316000
2017-10-17 14:19:29 +00:00
Yaxun Liu
c2a87a05f1 [OpenCL] Emit enqueued block as kernel
In OpenCL the kernel function and non-kernel function has different calling conventions.
For certain targets they have different argument ABIs. Also kernels have special function
attributes and metadata for runtime to launch them.

The blocks passed to enqueue_kernel is supposed to be executed as kernels. As such,
the block invoke function should be emitted as kernel with proper calling convention and
argument ABI.

This patch emits enqueued block as kernel. If a block is both called directly and passed
to enqueue_kernel, separate functions will be generated.

Differential Revision: https://reviews.llvm.org/D38134

llvm-svn: 315804
2017-10-14 12:23:50 +00:00
Yaxun Liu
d029234fc6 Fix regression of test/CodeGenOpenCL/address-spaces.cl on ppc
llvm-svn: 315678
2017-10-13 13:53:06 +00:00
Yaxun Liu
b7318e02c1 [OpenCL] Add LangAS::opencl_private to represent private address space in AST
Currently Clang uses default address space (0) to represent private address space for OpenCL
in AST. There are two issues with this:

Multiple address spaces including private address space cannot be diagnosed.
There is no mangling for default address space. For example, if private int* is emitted as
i32 addrspace(5)* in IR. It is supposed to be mangled as PUAS5i but it is mangled as
Pi instead.

This patch attempts to represent OpenCL private address space explicitly in AST. It adds
a new enum LangAS::opencl_private and adds it to the variable types which are implicitly
private:

automatic variables without address space qualifier

function parameter

pointee type without address space qualifier (OpenCL 1.2 and below)

Differential Revision: https://reviews.llvm.org/D35082

llvm-svn: 315668
2017-10-13 03:37:48 +00:00
Matt Arsenault
f12e3b848a AMDGPU: Add read_exec_lo/hi builtins
llvm-svn: 315238
2017-10-09 20:06:37 +00:00
Matt Arsenault
cbe0dd13d2 AMDGPU: Fix missing declaration for __builtin_amdgcn_dispatch_ptr
llvm-svn: 315219
2017-10-09 17:44:18 +00:00
Matt Arsenault
a1cf61b6fc OpenCL: Assume functions are convergent
This was done for CUDA functions in r261779, and for the same
reason this also needs to be done for OpenCL. An arbitrary
function could have a barrier() call in it, which in turn
requires the calling function to be convergent.

llvm-svn: 315094
2017-10-06 19:34:40 +00:00
Yaxun Liu
10712d9203 [OpenCL] Clean up and add missing fields for block struct
Currently block is translated to a structure equivalent to

struct Block {
  void *isa;
  int flags;
  int reserved;
  void *invoke;
  void *descriptor;
};
Except invoke, which is the pointer to the block invoke function,
all other fields are useless for OpenCL, which clutter the IR and
also waste memory since the block struct is passed to the block
invoke function as argument.

On the other hand, the size and alignment of the block struct is
not stored in the struct, which causes difficulty to implement
__enqueue_kernel as library function, since the library function
needs to know the size and alignment of the argument which needs
to be passed to the kernel.

This patch removes the useless fields from the block struct and adds
size and align fields. The equivalent block struct will become

struct Block {
  int size;
  int align;
  generic void *invoke;
 /* custom fields */
};
It also changes the pointer to the invoke function to be
a generic pointer since the address space of a function
may not be private on certain targets.

Differential Revision: https://reviews.llvm.org/D37822

llvm-svn: 314932
2017-10-04 20:32:17 +00:00
Anastasia Stulova
89a947da72 [OpenCL] Fixed CL version in failing test.
llvm-svn: 314317
2017-09-27 17:03:35 +00:00
Anastasia Stulova
0a72ed40d3 [OpenCL] Handle address space conversion while setting type alignment.
Added missing addrspacecast case in alignment computation
logic of pointer type emission in IR generation.

Differential Revision: https://reviews.llvm.org/D37804

llvm-svn: 314304
2017-09-27 14:37:00 +00:00
Yaxun Liu
1f33d8eb19 Add more tests for OpenCL atomic builtin functions
Add tests for different address spaces and insert some blank lines to make them more readable.

Differential Revision: https://reviews.llvm.org/D37742

llvm-svn: 313172
2017-09-13 18:56:25 +00:00
Yaxun Liu
e37793545e [AMDGPU] Change addr space of clk_event_t, queue_t and reserve_id_t to global
Differential Revision: https://reviews.llvm.org/D37703

llvm-svn: 313171
2017-09-13 18:50:42 +00:00
Jan Vesely
31ecb4bf60 [OpenCL] Add half load and store builtins
This enables load/stores of half type, without half being a legal type.

Differential Revision: https://reviews.llvm.org/D37231

llvm-svn: 312742
2017-09-07 19:39:10 +00:00
Yaxun Liu
29a5ee358e [OpenCL] Do not use vararg in emitted functions for enqueue_kernel
Not all targets support vararg (e.g. amdgpu). Instead of using vararg in the emitted functions for enqueue_kernel,
this patch creates a temporary array of size_t, stores the size arguments in the temporary array
and passes it to the emitted functions for enqueue_kernel.

Differential Revision: https://reviews.llvm.org/D36678

llvm-svn: 312441
2017-09-03 13:52:24 +00:00
Adrian Prantl
9e83fb0838 Adapt testcases to LLVM change r312144 in DIGlobalVariableExpression
llvm-svn: 312148
2017-08-30 18:22:23 +00:00
Reid Kleckner
6d353348e5 Parse and print DIExpressions inline to ease IR and MIR testing
Summary:
Most DIExpressions are empty or very simple. When they are complex, they
tend to be unique, so checking them inline is reasonable.

This also avoids the need for CodeGen passes to append to the
llvm.dbg.mir named md node.

See also PR22780, for making DIExpression not be an MDNode.

Reviewers: aprantl, dexonsmith, dblaikie

Subscribers: qcolombet, javed.absar, eraman, hiraditya, llvm-commits

Differential Revision: https://reviews.llvm.org/D37075

llvm-svn: 311594
2017-08-23 20:31:27 +00:00
Yaxun Liu
504d4e2403 Attempt to fix failure in CodeGenOpenCL/atomic-ops.cl again
llvm-svn: 310937
2017-08-15 17:59:26 +00:00
Yaxun Liu
adcfe5d881 Attempt to fix failure in CodeGenOpenCL/atomic-ops.cl
llvm-svn: 310932
2017-08-15 17:16:44 +00:00
Yaxun Liu
99d56d291f Remove -finclude-default-header in OpenCL atomic tests
Differential Revision: https://reviews.llvm.org/D36676

llvm-svn: 310927
2017-08-15 16:30:31 +00:00
Yaxun Liu
30d652a447 [OpenCL] Support variable memory scope in atomic builtins
Differential Revision: https://reviews.llvm.org/D36580

llvm-svn: 310924
2017-08-15 16:02:49 +00:00
Sven van Haastregt
efb4d4c78c [OpenCL] Allow targets to select address space per type
Generalize getOpenCLImageAddrSpace into getOpenCLTypeAddrSpace, such
that targets can select the address space per type.

No functional changes intended.

Initial patch by Simon Perretta.

Differential Revision: https://reviews.llvm.org/D33989

llvm-svn: 310911
2017-08-15 09:38:18 +00:00
Matt Arsenault
3fe7395fbc AMDGPU: Use direct struct returns and arguments
This is an improvement over always using byval for
structs.

This will use registers until ~16 are used, and then
switch back to byval. This needs more work, since I'm
not sure it ever really makes sense to use byval. If
the register limit is exceeded, the arguments still
end up passed on the stack, but with a different ABI.
It also may make sense to base this on number of
registers used for non-struct arguments, rather than
just arguments that appear first in the argument list.

llvm-svn: 310527
2017-08-09 21:44:58 +00:00
Yaxun Liu
39195062c2 Add OpenCL 2.0 atomic builtin functions as Clang builtin
OpenCL 2.0 atomic builtin functions have a scope argument which is ideally
represented as synchronization scope argument in LLVM atomic instructions.

Clang supports translating Clang atomic builtin functions to LLVM atomic
instructions. However it currently does not support synchronization scope
of LLVM atomic instructions. Without this, users have to use LLVM assembly
code to implement OpenCL atomic builtin functions.

This patch adds OpenCL 2.0 atomic builtin functions as Clang builtin
functions, which supports generating LLVM atomic instructions with
synchronization scope operand.

Currently only constant memory scope argument is supported. Support of
non-constant memory scope argument will be added later.

Differential Revision: https://reviews.llvm.org/D28691

llvm-svn: 310082
2017-08-04 18:16:31 +00:00
Joey Gouly
fa76b49cef [OpenCL] Add missing subgroup builtins
This adds get_kernel_max_sub_group_size_for_ndrange and
get_kernel_sub_group_count_for_ndrange.

llvm-svn: 309678
2017-08-01 13:27:09 +00:00
Joey Gouly
53160cdc45 [OpenCL] Enable subgroup extension in tests
This fixes the test, so that it can be run on different hosts that may have
different OpenCL extensions enabled.

llvm-svn: 309571
2017-07-31 15:50:27 +00:00
Joey Gouly
84ae3364df [OpenCL] Add extension Sema check for subgroup builtins
Check the subgroup extension is enabled, before doing other Sema checks.

llvm-svn: 309567
2017-07-31 15:15:59 +00:00
Alexey Sotkin
7d7f0dc08b [OpenCL] Fix access qualifiers metadata for kernel arguments with typedef
Subscribers: cfe-commits, yaxunl, Anastasia

Differential Revision: https://reviews.llvm.org/D35420

llvm-svn: 309155
2017-07-26 18:49:54 +00:00
Egor Churaev
53f9a30543 [OpenCL] Added extended tests on metadata generation for half data type and arrays.
Reviewers: Anastasia

Reviewed By: Anastasia

Subscribers: bader, cfe-commits, yaxunl

Differential Revision: https://reviews.llvm.org/D35000

llvm-svn: 308266
2017-07-18 06:04:01 +00:00
Yaxun Liu
25d1b4341f [AMDGPU] Fix size and alignment of size_t and pointer types
Differential Revision: https://reviews.llvm.org/D34995

llvm-svn: 307121
2017-07-05 04:58:24 +00:00
Yaxun Liu
3ba4a720ad [AMDGPU] Fix regressions on mesa/clover with libclc due to address space
Currently AMDGPUTargetInfo does not initialize AddrSpaceMap in constructor, which causes regressions in mesa/clover with libclc.

This patch fixes that.

Differential Revision: https://reviews.llvm.org/D34987

llvm-svn: 307105
2017-07-04 19:57:18 +00:00
Yaxun Liu
e9e5c4f975 CodeGen: Fix invalid bitcast for coerced function argument
Clang assumes coerced function argument is in address space 0, which is not always true and results in invalid bitcasts.

This patch fixes failure in OpenCL conformance test api/get_kernel_arg_info with amdgcn---amdgizcl triple, where non-zero alloca address space is used.

Differential Revision: https://reviews.llvm.org/D34777

llvm-svn: 306721
2017-06-29 18:47:45 +00:00
Alexey Bader
364a11651e [OpenCL] Fix OpenCL and SPIR version metadata generation.
Summary: OpenCL and SPIR version metadata must be generated once per module instead of once per mangled global value.

Reviewers: Anastasia, yaxunl

Reviewed By: Anastasia

Subscribers: ahatanak, cfe-commits

Differential Revision: https://reviews.llvm.org/D34235

llvm-svn: 305796
2017-06-20 14:30:18 +00:00