Commit Graph

191 Commits

Author SHA1 Message Date
Siddharth Bhat
34eeabbca3 [PPCGCodeGeneration] Compute element size in bytes for arrays correctly.
Previously, we used to compute this with `elementSizeInBits / 8`. This
would yield an element size of 0 when the array had element size < 8 in
bits.

To fix this, ask data layout what the size in bytes should be.

Differential Revision: https://reviews.llvm.org/D36459

llvm-svn: 310448
2017-08-09 08:29:16 +00:00
Siddharth Bhat
71dfb3eb07 [Polly] [PPCGCodeGeneration] Handle failing of invariant load hoisting gracefully.
To do this, we replicate what `CodeGeneration` does. We expose
`markNodeUnreachable` from `CodeGeneration` to `PPCGCodeGeneration`.

Differential Revision: https://reviews.llvm.org/D36457

llvm-svn: 310350
2017-08-08 12:00:59 +00:00
Tobias Grosser
d70ea7fed0 [GPGPU] Remove redundant constructors
llvm-svn: 310284
2017-08-07 19:20:57 +00:00
Tobias Grosser
61bd3a4840 [ScopInfo] Move Scop::getPwAffOnly to isl++ [NFC]
llvm-svn: 310231
2017-08-06 21:42:38 +00:00
Tobias Grosser
31df6f31c0 [ScopInfo] Move Scop::getDomains to isl++ [NFC]
llvm-svn: 310230
2017-08-06 21:42:25 +00:00
Tobias Grosser
b65ccc4302 [ScopInfo] Translate Scop::getParamSpace to isl++ [NFC]
llvm-svn: 310224
2017-08-06 20:11:59 +00:00
Tobias Grosser
8ea1fc19b3 [ScopInfo] Translate Scop::getContext to isl++ [NFC]
llvm-svn: 310221
2017-08-06 19:52:38 +00:00
Tobias Grosser
9a63570b13 [ScopInfo] Translate Scop::getIdForParam to isl++ [NFC]
llvm-svn: 310220
2017-08-06 19:31:27 +00:00
Tobias Grosser
5ab39ff224 [ScopInfo] Move get*Writes/getReads/getAccesses to isl++
llvm-svn: 310219
2017-08-06 19:22:27 +00:00
Tobias Grosser
dcf8d696ff Move ScopInfo::getDomain(), getDomainSpace(), getDomainId() to isl++
llvm-svn: 310209
2017-08-06 16:39:52 +00:00
Tobias Grosser
b99c11710c [GPGPU] Make sure managed arrays are prepared at the beginning of the scop
Summary:
This resolves some "instruction does not dominate use" errors, as we used to
prepare the arrays at the location of the first kernel, which not necessarily
dominated all other kernel calls.

Reviewers: Meinersbur, bollu, singam-sanjay

Subscribers: nemanjai, pollydev, llvm-commits, kbarton

Differential Revision: https://reviews.llvm.org/D36372

llvm-svn: 310196
2017-08-06 11:10:38 +00:00
Tobias Grosser
5b307cdb8a [GPGPU] Rename all, not only the first libdevice function
llvm-svn: 310194
2017-08-06 03:04:15 +00:00
Siddharth Bhat
e53c924b0f [Polly] [PPCGCodeGeneration] Deal with loops outside the Scop correctly in PPCGCodeGeneration.
A Scop with a loop outside it is not handled currently by
PPCGCodeGeneration. The test case is such that the Scop has only one inner loop
that is detected. This currently breaks codegen.

The fix is to reuse the existing mechanism in `IslNodeBuilder` within
`GPUNodeBuilder.

Differential Revision: https://reviews.llvm.org/D36290

llvm-svn: 310193
2017-08-06 02:39:05 +00:00
Siddharth Bhat
638316da5b [PPCGCodeGeneration] [NFC] Log every location from which PPCGCodegen bails.
This is useful when trying to understand why no GPU code was produced.

Differential Revision: https://reviews.llvm.org/D36318

llvm-svn: 310103
2017-08-04 19:36:40 +00:00
Tobias Grosser
b5563c6817 Make sure that all parameter dimensions are set in schedule
Summary:
In case the option -polly-ignore-parameter-bounds is set, not all parameters
will be added to context and domains. This is useful to keep the size of the
sets and maps we work with small. Unfortunately, for AST generation it is
necessary to ensure all parameters are part of the schedule tree. Hence,
we modify the GPGPU code generation to make sure this is the case.

To obtain the necessary information we expose a new function
Scop::getFullParamSpace(). We also make a couple of functions const to be
able to make SCoP::getFullParamSpace() const.

Reviewers: Meinersbur, bollu, gareevroman, efriedma, huihuiz, sebpop, simbuerg

Subscribers: nemanjai, kbarton, pollydev, llvm-commits

Tags: #polly

Differential Revision: https://reviews.llvm.org/D36243

llvm-svn: 309939
2017-08-03 13:51:15 +00:00
Siddharth Bhat
eadf76d34a [PPCGCodeGeneration] Construct isl_multi_pw_aff of PPCGArray.bounds even when polly-ignore-parameter-bounds is turned on.
When we have `-polly-ignore-parameter-bounds`, `Scop::Context` does not contain
all the paramters present in the program.

The construction of the `isl_multi_pw_aff` requires all the indivisual `pw_aff`
to have the same parameter dimensions. To achieve this, we used to realign
every `pw_aff` with `Scop::Context`. However, in conjunction with
`-polly-ignore-parameter-bounds`, this is now incorrect, since `Scop::Context`
does not contain all parameters.

We set this up correctly by creating a space that has all the parameters
used by all the `isl_pw_aff`. Then, we realign all `isl_pw_aff` to this space.

llvm-svn: 309934
2017-08-03 12:09:33 +00:00
Siddharth Bhat
edf9581e4c [PPCGCodeGeneration] Correct usage of llvm::Value with getLatestValue.
It is possible that the `HostPtr` that coresponds to an array could be
invariant load hoisted. Make sure we use the invariant load hoisted
value by using `IslNodeBuilder::getLatestValue`.

Differential Revision: https://reviews.llvm.org/D36001

llvm-svn: 309681
2017-08-01 14:26:39 +00:00
Siddharth Bhat
4d5820d171 [NFC] [PPCGCodeGeneration] Convert GPUNodeBuilder::getGridSizes to isl++.
llvm-svn: 309671
2017-08-01 10:45:41 +00:00
Siddharth Bhat
ccbf4b509c [NFC] [PPCGCodeGeneration] Convert GPUNodeBuilder::getArrayOffset to isl++.
llvm-svn: 309669
2017-08-01 09:58:55 +00:00
Tobias Grosser
8fc6cdfb1c [GPGPU] Add support for NVIDIA libdevice
Summary:
This allows us to map functions such as exp, expf, expl, for which no
LLVM intrinsics exist. Instead, we link to NVIDIA's libdevice which provides
high-performance implementations of a wide range of (math) functions. We
currently link only a small subset, the exp, cos and copysign functions. Other
functions will be enabled as needed.

Reviewers: bollu, singam-sanjay

Reviewed By: bollu

Subscribers: tstellar, tra, nemanjai, pollydev, mgorny, llvm-commits, kbarton

Tags: #polly

Differential Revision: https://reviews.llvm.org/D35703

llvm-svn: 309560
2017-07-31 14:03:16 +00:00
Siddharth Bhat
4ebeb3568a [PPCGCodeGeneration] Check that invariant load hoisting succeeded.
If we fail, throw an error for now. We can gracefully handle this later.

llvm-svn: 309387
2017-07-28 14:48:32 +00:00
Tobias Grosser
25271b91b2 [GPGPU] Do not require the Scop::Context to have information about all parameters
llvm-svn: 309368
2017-07-28 06:49:44 +00:00
Tobias Grosser
30caae6d23 [GPGPU] Fix compilation issue with latest CUDA upgrade to i128
llvm-svn: 309366
2017-07-28 06:38:49 +00:00
Siddharth Bhat
43f178bbc9 [PPCGCodeGeneration] Skip arrays with empty extent.
Invariant load hoisted scalars, and arrays whose size we can statically compute
to be 0 do not need to be allocated as arrays.

Invariant load hoisted scalars are sent to the kernel directly as parameters.

Earlier, we used to allocate `0` bytes of memory for these because our
computation of size from `PPCGCodeGeneration::getArraySize` would result in `0`.

Now, since we don't invariant loads as arrays in PPCGCodeGeneration, this
problem does not occur anymore.

Differential Revision: https://reviews.llvm.org/D35795

llvm-svn: 308971
2017-07-25 12:35:36 +00:00
Tobias Grosser
206e9e3b3b Move ScopArrayInfo::getFromAccessFunction and getFromId to isl++
llvm-svn: 308892
2017-07-24 16:22:27 +00:00
Siddharth Bhat
f7face4bc4 Convert GPUNodeBuilder::getArraySize to islcpp.
Note: PPCGCodeGeneration::pollyBuildAstExprForStmt is at
      https://reviews.llvm.org/D35770

Differential Revision: https://reviews.llvm.org/D35771

llvm-svn: 308870
2017-07-24 09:08:21 +00:00
Siddharth Bhat
35de900917 [NFC] Move PPCGCodeGeneration::pollyBuildAstExprForStmt to isl++.
Differential Revision: https://reviews.llvm.org/D35771

llvm-svn: 308869
2017-07-24 08:34:24 +00:00
Tobias Grosser
6a87036e0f Move MemoryAccess::getAddressFunction to isl++
llvm-svn: 308841
2017-07-23 04:08:45 +00:00
Tobias Grosser
1515f6b937 Move MemoryAccess::NewAccessRelation to isl++
We also move related accessor functions

llvm-svn: 308840
2017-07-23 04:08:38 +00:00
Tobias Grosser
fe46c3ff3a Move MemoryAccess::id to isl++
llvm-svn: 308836
2017-07-23 04:08:11 +00:00
Tobias Grosser
77eef90f50 Move ScopArrayInfo to isl++
This moves the full ScopArrayInfo class to isl++

llvm-svn: 308801
2017-07-21 23:07:56 +00:00
Philipp Schaad
2f3073b5cb [Polly][GPGPU] Added SPIR Code Generation and Corresponding Runtime Support for Intel
Summary:
Added SPIR Code Generation to the PPCG Code Generator. This can be invoked using
the polly-gpu-arch flag value 'spir32' or 'spir64' for 32 and 64 bit code respectively.
In addition to that, runtime support has been added to execute said SPIR code on Intel
GPU's, where the system is equipped with Intel's open source driver Beignet (development
version). This requires the cmake flag 'USE_INTEL_OCL' to be turned on, and the polly-gpu-runtime
flag value to be 'libopencl'.
The transformation of LLVM IR to SPIR is currently quite a hack, consisting in part of regex
string transformations.
Has been tested (working) with Polybench 3.2 on an Intel i7-5500U (integrated graphics chip).

Reviewers: bollu, grosser, Meinersbur, singam-sanjay

Reviewed By: grosser, singam-sanjay

Subscribers: pollydev, nemanjai, mgorny, Anastasia, kbarton

Tags: #polly

Differential Revision: https://reviews.llvm.org/D35185

llvm-svn: 308751
2017-07-21 16:11:06 +00:00
Siddharth Bhat
a0fb8b23e1 [NFC] [PPCGCodeGeneration] Print verifyModule failure to debug stream.
If verifyModule fails, it is helpful to know why it failed. Add a log to
the debug stream that prints the failure.

llvm-svn: 308727
2017-07-21 11:21:44 +00:00
Tobias Grosser
018103d34e Fix typo in function name Bllock -> Block
llvm-svn: 308715
2017-07-21 06:00:38 +00:00
Tobias Grosser
54491db687 Support fabs and copysign in Polly-ACC
llvm-svn: 308649
2017-07-20 18:26:34 +00:00
Siddharth Bhat
9e3db2b756 [PPCGCodeGen] [3/3] Update PPCGCodeGen + tests to latest ppcg.
This commit *WILL COMPILE*.

1. `PPCG` now uses `isl_multi_pw_aff` instead of an array of `pw_aff`.
   This needs us to adjust how we index array bounds and how we construct
   array bounds.

2. `PPCG` introduces two new kinds of nodes: `init_device` and `clear_device`.
   We should investigate what the correct way to handle these are.

3. `PPCG` has gotten smarter with its use of live range reordering, so some of
   the tests have a qualitative improvement.

4. `PPCG` changed its output style, so many test cases need to be updated to
   fit the new style for `polly-acc-dump-code` checks.

Differential Revision: https://reviews.llvm.org/D35677

llvm-svn: 308625
2017-07-20 15:48:36 +00:00
Siddharth Bhat
edfef5ae8e [NFC] [PPCGCodeGeneration] cleanup kills related code.
We extended kills in Polly to handle both `phi` nodes and scalars that
    are not used within the Scop. Update the comments and choice of
    variable names to reflect this.

llvm-svn: 308279
2017-07-18 09:15:16 +00:00
Siddharth Bhat
233d717ec1 [PPCGCodeGeneration] Generate invariant loads before trying to generate IR.
- We should call `preloadInvariantLoads` to make sure that code is
   generated for invariant loads in the kernel.

Differential Revision: https://reviews.llvm.org/D35410

llvm-svn: 308187
2017-07-17 15:57:01 +00:00
Siddharth Bhat
03346c2701 [PPCGCodeGeneration] Fix runtime check adjustments since they make assumptions about BB layout.
- There is a conditional branch that is used to switch between the old
   and new versions of the code.

- If we detect that the build was unsuccessful, `PPCGCodeGeneration` will
  change the runtime check to be always set to false.

- To actually *reach* this runtime check instruction, `PPCGCodeGeneration`
  was using assumptions about the layout of the BBs.

- However, invariant load hoisting violates this assumption by inserting
  an extra basic block in the middle.

- Fix the assumption on the layout by having `createScopConditionally`
   return the conditional branch instruction.

- Use this reference to set to always-false.

llvm-svn: 308010
2017-07-14 10:00:25 +00:00
Siddharth Bhat
a1b2086a33 [Invariant Loads] Do not consider invariant loads to have dependences.
We need to relax constraints on invariant loads so that they do not
create fake RAW dependences. So, we do not consider invariant loads as
scalar dependences in a region.

During these changes, it turned out that we do not consider `llvm::Value`
replacements correctly within `PPCGCodeGeneration` and `ISLNodeBuilder`.
The replacements dictated by `ValueMap` were not being followed in all
places. This was fixed in this commit. There is no clean way to decouple
this change because this bug only seems to arise when the relaxed
version of invariant load hoisting was enabled.

Differential Revision: https://reviews.llvm.org/D35120

llvm-svn: 307907
2017-07-13 12:18:56 +00:00
Singapuram Sanjay Srivallabh
1abd9ffa37 [PPCGCodeGen] Differentiate kernels based on their parent Scop
Summary:
Add a sequence number that identifies a ptx_kernel's parent Scop within a function to it's name to differentiate it from other kernels produced from the same function, yet different Scops.

Kernels produced from different Scops can end up having the same name. Consider a function with 2 Scops and each Scop being able to produce just one kernel. Both of these kernels have the name "kernel_0". This can lead to the wrong kernel being launched when the runtime picks a kernel from its cache based on the name alone. This patch supplements D33985, by differentiating kernels across Scops as well.

Previously (even before D33985) while profiling kernels generated through JIT e.g. Julia, [[ https://groups.google.com/d/msg/polly-dev/J1j587H3-Qw/mR-jfL16BgAJ | kernels associated with different functions, and even different SCoPs within a function, would be grouped together due to the common name ]]. This patch prevents this grouping and the kernels are reported separately.

Reviewers: grosser, bollu

Reviewed By: grosser

Subscribers: mehdi_amini, nemanjai, pollydev, kbarton

Tags: #polly

Differential Revision: https://reviews.llvm.org/D35176

llvm-svn: 307814
2017-07-12 16:46:19 +00:00
Siddharth Bhat
761e5b9310 [Polly] [PPCGCodeGeneration] Teach must_kills to kill scalars that are local to the scop.
- By definition, we can pass something as a `kill` to PPCG if we know
that no data can flow across a kill.
- This is useful for more complex examples where we have scalars that
are local to a scop.
- If the local is only used within a scop, we are free to kill it.

Differential Revision: https://reviews.llvm.org/D35045

llvm-svn: 307260
2017-07-06 13:42:42 +00:00
Singapuram Sanjay Srivallabh
79f13b9a80 Prefix the name of the calling host function in the name of callee GPU kernel
Summary:
Provide more context to the name of a GPU kernel by prefixing its name with the host function that calls it. E.g. The first kernel called by `gemm` would be `FUNC_gemm_KERNEL_0`.

Kernels currently follow the "kernel_#" (# = 0,1,2,3,...) nomenclature. This patch makes it easier to map host caller and device callee, especially when there are many kernels produced by Polly-ACC.

Reviewers: grosser, Meinersbur, bollu, philip.pfaffe, kbarton!

Reviewed By: grosser

Subscribers: nemanjai, pollydev

Tags: #polly

Differential Revision: https://reviews.llvm.org/D33985

llvm-svn: 307173
2017-07-05 16:48:21 +00:00
Siddharth Bhat
a82f2d264a [PPCGCodeGeneration] Teach Polly to start using live range reordering.
Polly did not use PPCG's live range reordering feature. Teach
PPCGCodeGeneration to use this.

Documentation on this is sparse, so much of the code is conservative.

We currently kill all phi nodes in a Scop by appending them to the
must_kill map we pass to PPCG. I do not have a proof of correctness,
but it seems to be intuitively correct.

We also do not handle `array_order`, which, quoting PPCG, is:
PPCG/gpu.h: "Order dependences on non-scalars."
It seems to consist of RAW dependences between arrays. We need to
pass this information for more complex privatization cases.

Differential Revision: https://reviews.llvm.org/D34941

llvm-svn: 307163
2017-07-05 14:57:04 +00:00
Singapuram Sanjay Srivallabh
02ca346e48 Introduce a hybrid target to generate code for either the GPU or CPU
Summary:
Introduce a "hybrid" `-polly-target` option to optimise code for either the GPU or CPU.

When this target is selected, PPCGCodeGeneration will attempt first to optimise a Scop. If the Scop isn't modified, it is then sent to the passes that form the CPU pipeline, i.e. IslScheduleOptimizerPass, IslAstInfoWrapperPass and CodeGeneration.

In case the Scop is modified, it is marked to be skipped by the subsequent CPU optimisation passes.

Reviewers: grosser, Meinersbur, bollu

Reviewed By: grosser

Subscribers: kbarton, nemanjai, pollydev

Tags: #polly

Differential Revision: https://reviews.llvm.org/D34054

llvm-svn: 306863
2017-06-30 19:42:21 +00:00
Siddharth Bhat
65d7f72f2c [PPCGCodeGeneration] Add flag to allow polly to fail in GPU kernel fails.
- This is useful for debugging GPU code.

llvm-svn: 306290
2017-06-26 14:56:56 +00:00
Siddharth Bhat
f291c8d510 [PPCGCodeGeneration] Allow intrinsics within kernels.
- In D33414, if any function call was found within a kernel, we would bail out.

- This is an over-approximation. This patch changes this by allowing the
  `llvm.sqrt.*` family of intrinsics.

- This introduces an additional step when creating a separate llvm::Module
  for a kernel (GPUModule). We now copy function declarations from the
  original module to new module.

- We also populate IslNodeBuilder::ValueMap so it replaces the function
  references to the old module to the ones in the new module
  (GPUModule).

Differential Revision: https://reviews.llvm.org/D34145

llvm-svn: 306284
2017-06-26 13:12:06 +00:00
Andreas Simbuerger
256070d85c [NFC] Return both polly.start and polly.exiting from executeScopConditionally.
This commit returns both the start and the exit block that are created
by executeScopConditionally.

In a future commit we will make use of the exit block. Before we would
have to use the implicit property that there won't be any code generated
between polly.start and polly.exiting at the time of use to find the
correct block ('polly.exiting').

All usage location are semantically unchanged.

llvm-svn: 306283
2017-06-26 12:17:11 +00:00
Siddharth Bhat
a12f807f33 [PPCGCodeGeneration] Enable GPU code generation with invariant loads.
The condition that disallowed code generation in PPCGCodeGeneration with
invariant loads is not required. I haven't been able to construct a
counterexample where this generates invalid code.

Differential Revision: https://reviews.llvm.org/D34604

llvm-svn: 306245
2017-06-25 14:48:24 +00:00
Siddharth Bhat
bccaea57c0 [Polly] [PPCGCodeGeneration] Skip Scops which contain function pointers.
In `PPCGCodeGeneration`, we try to take the references of every `Value`
that is used within a Scop to offload to the kernel. This occurs in
`GPUNodeBuilder::createLaunchParameters`.

This breaks if one of the values is a function pointer, since one of
these cases will trigger:

1. We try to to take the references of an intrinsic function, and this
breaks at `verifyModule`, since it is illegal to take the reference of
an intrinsic.

2. We manage to take the reference to a function, but this fails at
`verifyModule` since the function will not be present in the module that
is created in the kernel.

3. Even if `verifyModule` succeeds (which should not occur), we would
then try to call a *host function* from the *device*, which is
illegal runtime behaviour.

So, we disable this entire range of possibilities by simply not allowing
function references within a `Scop` which corresponds to a kernel.

However, note that this is too conservative. We *can* allow intrinsics
within kernels if the backend can lower the intrinsic correctly. For
example, an intrinsic like `llvm.powi.*` can actually be lowered by the `NVPTX`
backend.

We will now gradually whitelist intrinsics which are known to be safe.

Differential Revision: https://reviews.llvm.org/D33414

llvm-svn: 305185
2017-06-12 11:41:09 +00:00