Currently clang emits file-scope asm during *both* host and device
compilation modes which is usually a wrong thing to do.
There's no way to attach any attribute to an __asm statement, so
there's no way to differentiate between host-side and device-side
file-scope asm. This patch makes clang to match nvcc behavior and
emit file-scope-asm only during host-side compilation.
Differential Revision: http://reviews.llvm.org/D9270
llvm-svn: 235905
Emit the following code for 'taskwait' directive within tied task:
call i32 @__kmpc_omp_taskwait(<loc>, i32 <thread_id>);
Differential Revision: http://reviews.llvm.org/D9245
llvm-svn: 235836
Emit a code for reduction clause. Next code should be emitted for reductions:
static kmp_critical_name lock = { 0 };
void reduce_func(void *lhs[<n>], void *rhs[<n>]) {
*(Type0*)lhs[0] = ReductionOperation0(*(Type0*)lhs[0], *(Type0*)rhs[0]);
...
*(Type<n>-1*)lhs[<n>-1] =
ReductionOperation<n>-1(*(Type<n>-1*)lhs[<n>-1],
*(Type<n>-1*)rhs[<n>-1]);
}
...
void *RedList[<n>] = {&<RHSExprs>[0], ..., &<RHSExprs>[<n>-1]};
switch (__kmpc_reduce{_nowait}(<loc>, <gtid>, <n>, sizeof(RedList), RedList, reduce_func, &<lock>)) {
case 1:
<LHSExprs>[0] = ReductionOperation0(*<LHSExprs>[0], *<RHSExprs>[0]);
...
<LHSExprs>[<n>-1] = ReductionOperation<n>-1(*<LHSExprs>[<n>-1], *<RHSExprs>[<n>-1]);
__kmpc_end_reduce{_nowait}(<loc>, <gtid>, &<lock>);
break;
case 2:
Atomic(<LHSExprs>[0] = ReductionOperation0(*<LHSExprs>[0], *<RHSExprs>[0]));
...
Atomic(<LHSExprs>[<n>-1] = ReductionOperation<n>-1(*<LHSExprs>[<n>-1], *<RHSExprs>[<n>-1]));
break;
default:;
}
Reduction variables are a kind of a private variables, they have private copies, but initial values are chosen in accordance with the reduction operation.
If sections directive has only single section, then original shared variables are used instead with barrier at the end of the directive.
Differential Revision: http://reviews.llvm.org/D9242
llvm-svn: 235835
#pragma omp sections lastprivate(<var>)
<BODY>;
This construct is translated into something like:
<last_iter> = alloca i32
<init for lastprivates>;
<last_iter> = 0
; No initializer for simple variables or a default constructor is called for objects.
; For arrays perform element by element initialization by the call of the default constructor.
...
OMP_FOR_START(...,<last_iter>, ..); sets <last_iter> to 1 if this is the last iteration.
<BODY>
...
OMP_FOR_END
if (<last_iter> != 0) {
<final copy for lastprivate>; Update original variable with the lastprivate value.
}
call __kmpc_cancel_barrier() ; an implicit barrier to avoid possible data race.
If there is only one section, there is no special code generation, original shared variables are used + barrier is emitted at the end of the directive.
Differential Revision: http://reviews.llvm.org/D9240
llvm-svn: 235834
If there are 2 or more sections in a 'section' directive the following code is generated:
<default init for privates>
@__kmpc_for_static_init_4();
<BODY for sections directive>
@__kmpc_for_static_fini()
If there is only one section, the following code is generated:
if (@__kmpc_single()) {
<default init for privates>
@__kmpc_end_single();
}
Differential Revision: http://reviews.llvm.org/D9239
llvm-svn: 235833
Emit the following code for 'single' directive with 'private' clause:
if (@__kmpc_single()) {
<default init for privates>
@__kmpc_end_single();
}
Differential Revision: http://reviews.llvm.org/D9238
llvm-svn: 235832
Fixes rdar://20621065.
A more elegant fix would preclude this case by defining the
rules such that zero-size classes are always formally empty.
I believe the only extensions which create zero-size classes
right now are flexible arrays and zero-length arrays; it's
not abstractly unreasonable to say that those don't count
as members for the purposes of emptiness, just as zero-width
bitfields don't count. But that's an ABI-affecting change
and requires further discussion; in the meantime, let's not
assert / miscompile.
llvm-svn: 235815
This fixes a crash when we're emitting coverage and a macro appears
between two binary conditional operators, ie, "foo ?: MACRO ?: bar",
and fixes the interaction of macros and conditional operators in
general.
llvm-svn: 235793
Emit the following code for 'single' directive with 'firtstprivate' clause:
if (@__kmpc_single()) {
<init for firstprivates>
@__kmpc_end_single();
}
@__kmpc_cancel_barrier(); // To avoid data race in firstprivate init
Differential Revision: http://reviews.llvm.org/D9223
llvm-svn: 235694
Runtime function for 'copyprivate' directive generates implicit barriers, so no need to emit it.
Differential Revision: http://reviews.llvm.org/D9215
llvm-svn: 235692
If there are 2 or more sections in a 'section' directive the following code is generated:
<init for firstprivates>
@__kmpc_cancel_barrier();// To avoid data race in firstprivate init
@__kmpc_for_static_init_4();
<BODY for sections directive>
@__kmpc_for_static_fini()
If there is only one section, the following code is generated:
if (@__kmpc_single()) {
<init for firstprivates>
@__kmpc_end_single();
}
@__kmpc_cancel_barrier(); // To avoid data race in firstprivate init
Differential Revision: http://reviews.llvm.org/D9214
llvm-svn: 235691
The RegionCounter type does a lot of legwork, but most of it is only
meaningful within the implementation of CodeGenPGO. The uses elsewhere
in CodeGen generally just want to increment or read counters, so do
that directly.
llvm-svn: 235664
In r235553, Clang started emitting lifetime markers more often. This
caused false negative in MSan, because MSan only poisons all allocas
once at function entry. Eventually, MSan should poison allocas at
lifetime start and probably also lifetime end, but until then, let's not
emit markers that aren't going to be useful.
llvm-svn: 235613
Adds codegen for 'atomic capture' constructs with the following forms of expressions/statements:
v = x binop= expr;
v = x++;
v = ++x;
v = x--;
v = --x;
v = x = x binop expr;
v = x = expr binop x;
{v = x; x = binop= expr;}
{v = x; x++;}
{v = x; ++x;}
{v = x; x--;}
{v = x; --x;}
{x = x binop expr; v = x;}
{x binop= expr; v = x;}
{x++; v = x;}
{++x; v = x;}
{x--; v = x;}
{--x; v = x;}
{x = x binop expr; v = x;}
{x = expr binop x; v = x;}
{v = x; x = expr;}
If x and expr are integer and binop is associative or x is a LHS in a RHS of the assignment expression, and atomics are allowed for type of x on the target platform atomicrmw instruction is emitted.
Otherwise compare-and-swap sequence is emitted.
Update of 'v' is not required to be be atomic with respect to the read or write of the 'x'.
bb:
...
atomic load <x>
cont:
<expected> = phi [ <x>, label %bb ], [ <new_failed>, %cont ]
<desired> = <expected> binop <expr>
<res> = cmpxchg atomic &<x>, desired, expected
<new_failed> = <res>.field1;
br <res>field2, label %exit, label %cont
exit:
atomic store <old/new x>, <v>
...
Differential Revision: http://reviews.llvm.org/D9049
llvm-svn: 235573
Summary:
Make sure signed overflow in "x--" is checked with
llvm.ssub.with.overflow intrinsic and is reported as:
"-2147483648 - 1 cannot be represented in type 'int'"
instead of:
"-2147483648 + -1 cannot be represented in type 'int'"
, like we do for unsigned overflow.
Test Plan: clang + compiler-rt regression test suite
Reviewers: rsmith
Subscribers: cfe-commits
Differential Revision: http://reviews.llvm.org/D8236
llvm-svn: 235568
We try to use the member variable "FuncName" here, but we've also used
that name as a parameter. This ends with us getting the length of the
function name wrong when we generate the coverage data.
llvm-svn: 235565
These extra endcatch markers aren't helping identify regions to outline,
so let's get rid of them. LLVM outlines (more or less) from begincatch
to endcatch. Any unwind edge from an enclosed invoke is a transition to
a new exception handler, which has it's own outlining markers.
llvm-svn: 235562
This reverts commit r234700. It turns out that the lifetime markers
were not the cause of Chromium failing but a bug which was uncovered by
optimizations exposed by the markers.
llvm-svn: 235553
Otherwise -fno-omit-frame-pointer and other flags like it aren't
applied.
Basic idea taken from Gao's patch, thanks!
Differential Revision: http://reviews.llvm.org/D9203
llvm-svn: 235537
If condition evaluates to true, the code executes task by calling @__kmpc_omp_task() runtime function.
If condition evaluates to false, the code executes serial version of the code by executing the following code:
call void @__kmpc_omp_task_begin_if0(<loc>, <threadid>, <task_t_ptr, returned by @__kmpc_omp_task_alloc()>);
proxy_task_entry(<gtid>, <task_t_ptr, returned by @__kmpc_omp_task_alloc()>);
call void @__kmpc_omp_task_complete_if0(<loc>, <threadid>, <task_t_ptr, returned by @__kmpc_omp_task_alloc()>);
Also it checks if the condition is constant and if it is constant it evaluates its value and then generates either parallel version of the code (if the condition evaluates to true), or the serial version of the code (if the condition evaluates to false).
Differential Revision: http://reviews.llvm.org/D9143
llvm-svn: 235507
This patch generates helper variables which used as a private copies of the corresponding original variables inside an OpenMP 'for' directive. These generated variables are initialized by default (with the default constructor, if any). In OpenMP region references to original variables are replaced by the references to these private helper variables.
Differential Revision: http://reviews.llvm.org/D9106
llvm-svn: 235503
Patch fixes bugs in codegen for loops with unsigned counters and zero trip count. Previously preconditions for all loops were built using logic (Upper - Lower) > 0. But if the loop is a loop with zero trip count, then Upper - Lower is < 0 only for signed integer, for unsigned we're running into an underflow situation.
In this patch we're using original Lower<Upper condition to check that loop body can be executed at least once. Also this allows to skip code generation for loops, if it is known that preconditions for the loop are always false.
Differential Revision: http://reviews.llvm.org/D9103
llvm-svn: 235500
Add codegen for 'ordered' directive:
__kmpc_ordered(ident_t *, gtid);
<associated statement>;
__kmpc_end_ordered(ident_t *, gtid);
Also for 'for' directives with the dynamic scheduling and an 'ordered' clause added a call to '__kmpc_dispatch_fini_(4|8)[u]()' function after increment expression for loop control variable:
while(__kmpc_dispatch_next(&LB, &UB)) {
idx = LB;
while (idx <= UB) { BODY; ++idx;
__kmpc_dispatch_fini_(4|8)[u](); // For ordered loops only.
} // inner loop
}
Differential Revision: http://reviews.llvm.org/D9070
llvm-svn: 235496
- Changed CUDALaunchBounds arguments from integers to Expr* so they can
be saved in AST for instantiation.
- Added support for template instantiation of launch_bounds attrubute.
- Moved evaluation of launch_bounds arguments to NVPTXTargetCodeGenInfo::
SetTargetAttributes() where it can be done after template instantiation.
- Added a warning on negative launch_bounds arguments.
- Amended test cases.
Differential Revision: http://reviews.llvm.org/D8985
llvm-svn: 235452
An upcoming LLVM commit will remove the `DIArray` and `DITypeArray`
typedefs that shadow `DebugNodeArray` and `MDTypeRefArray`,
respectively. Use those types directly.
llvm-svn: 235412
Code in CodeGenModule::GetOrCreateLLVMGlobal that sets up GlobalValue
object for LLVM external symbols has this comment:
// FIXME: This code is overly simple and should be merged with other global
// handling.
One part does seems to be "overly simple" currently is that this code
never sets any alignment info on the GlobalValue, so that the emitted
IR does not have any align attribute on external globals. This can
lead to unnecessarily inefficient code generation.
This patch adds a GV->setAlignment call to set alignment info.
llvm-svn: 235396
Prepare for the deletion in LLVM of the subclasses of (the already
deleted) `DIScope` by using the raw pointers they were wrapping
directly.
llvm-svn: 235355
Subclasses of (the already deleted) `DIType` will be deleted by an
upcoming LLVM commit. Remove references.
While `DICompositeType` wraps `MDCompositeTypeBase` and `DIDerivedType`
wraps `MDDerivedTypeBase`, most uses of each really meant the more
specific `MDCompositeType` and `MDDerivedType`. I updated accordingly.
llvm-svn: 235350
Something like { void*, void * } would be passed to a function as a [2 x i64], but returned as an i128. This patch unifies the 2 behaviours so that we also return it as a [2 x i64].
This is better for the quality of the IR, and the size of the final LLVM binary as we tend to want to insert/extract values from these types and do so with the insert/extract instructions is less IR than shifting, truncating, and or'ing values.
Reviewed by Tim Northover.
llvm-svn: 235231
LLVM r235111 changed the `DIBuilder` API to stop using `DIDescriptor`
and its subclasses. Rolled into this was some tightening up of types:
- Scopes: `DIDescriptor` => `MDScope*`.
- Generic debug nodes: `DIDescriptor` => `DebugNode*`.
- Subroutine types: `DICompositeType` => `MDSubroutineType*`.
- Composite types: `DICompositeType` => `MDCompositeType*`.
Note that `DIDescriptor` wraps `MDNode`, and `DICompositeType` wraps
`MDCompositeTypeBase`.
It's this new type strictness that requires changes here.
llvm-svn: 235112
Emits the following code for the clause at the beginning of the outlined function for implicit threads:
if (<not a master thread>) {
...
<thread local copy of var> = <master thread local copy of var>;
...
}
<sync point>;
Checking for a non-master thread is performed by comparing of the address of the thread local variable with the address of the master's variable. Master thread always uses original variables, so you always know the address of the variable in the master thread.
Differential Revision: http://reviews.llvm.org/D9026
llvm-svn: 235075
#pragma omp for lastprivate(<var>)
for (i = a; i < b; ++b)
<BODY>;
This construct is translated into something like:
<last_iter> = alloca i32
<lastprivate_var> = alloca <type>
<last_iter> = 0
; No initializer for simple variables or a default constructor is called for objects.
; For arrays perform element by element initialization by the call of the default constructor.
...
OMP_FOR_START(...,<last_iter>, ..); sets <last_iter> to 1 if this is the last iteration.
<BODY>
...
OMP_FOR_END
if (<last_iter> != 0) {
<var> = <lastprivate_var> ; Update original variable with the lastprivate value.
}
call __kmpc_cancel_barrier() ; an implicit barrier to avoid possible data race.
Differential Revision: http://reviews.llvm.org/D8658
llvm-svn: 235074
Things can't both be in comdats and have common linkage, so never give things
in comdats common linkage. Common linkage is only used in .c files, and the
only thing that can trigger a comdat in c is selectany from what I can tell.
Fixes PR23243.
Also address an over-the-shoulder review comment from rnk by moving the
hasAttr<SelectAnyAttr>() in Decl.cpp around a bit. It only makes a minor
difference for selectany on global variables, so it goes well with the rest of
this patch.
http://reviews.llvm.org/D9042
llvm-svn: 235053