Commit Graph

161 Commits

Author SHA1 Message Date
Siddharth Bhat
e2950f46c6 [PPCGCodeGen] Document pre-composition with Zero in getExtent. [NFC]
It's weird at first glance that we do this, so I wrote up some
documentation on why we need to perform this process.

llvm-svn: 312715
2017-09-07 11:57:33 +00:00
Siddharth Bhat
56572c6a5e [PPCGCodeGen] Convert intrinsics to libdevice functions whenever possible.
This is useful when we face certain intrinsics such as `llvm.exp.*`
which cannot be lowered by the NVPTX backend while other intrinsics can.

So, we would need to keep blacklists of intrinsics that cannot be
handled by the NVPTX backend. It is much simpler to try and promote
all intrinsics to libdevice versions.

This patch makes function/intrinsic very uniform, and will always try to use
a libdevice version if it exists.

Differential Revision: https://reviews.llvm.org/D37056

llvm-svn: 312239
2017-08-31 13:03:37 +00:00
Michael Kruse
a4f447c2a4 [PM] Properly require and preserve OptimizationRemarkEmitter. NFCI.
Properly require and preserve the OptimizationRemarkEmitter for use in
ScopPass. Previously one had to get the ORE from ScopDetection because
CodeGeneration did not mark it as preserved. It would need to be
recomputed which results in the legacy PM to throw away all previous
SCoP analysis.

This also changes the implementation of ScopPass::getAnalysisUsage to
not unconditionally preserve all passes, but only those needed to be
preserved by any SCoP pass (at least when using the legacy PM). This
allows invalidating DependenceInfo (and IslAstInfo) in case the pass
would cause them to change (e.g. OpTree, DeLICM, MaximalArrayExpansion)

JSONImporter should also invalidate the DependenceInfo. In this patch
it marks DependenceInfo as preserved anyway because some regression
tests depend on it.

Differential Revision: https://reviews.llvm.org/D37010

llvm-svn: 311888
2017-08-28 14:07:33 +00:00
Siddharth Bhat
78027437e6 [Polly] [PPCGCodeGeneration] Mild refactoring of checking validity of functions in a kernel.
This is a stylistic change to make the function a little more readable.
Also add a debug print to show what instruction contains a use of a
function we don't understand in the kernel.

Differential Revision: https://reviews.llvm.org/D37058

llvm-svn: 311648
2017-08-24 09:54:15 +00:00
Michael Kruse
3044dc51cf [PPCGCodeGen] Fix compiler warning: '<': signed/unsigned mismatch. NFC.
MSVC warns about comparison between a signed and unsigned integer.
The rules of C(++) define that an unsigned comparison has to be
carried-out in this case. This is unlikely to be intended.

Fix by assigning the loop's upper bound to a signed integer first.
This also avoids repeated evaluation of the invariant upper bound.

llvm-svn: 311548
2017-08-23 12:45:25 +00:00
Siddharth Bhat
7b9f5ca27e [PPCGCodeGeneration] Enable polly-codegen-perf-monitoring for PPCGCodegen.
This feature was not enabled for `PPCGCodeGeneration`. Now that this is
enabled, we can benchmark Scops that have been optimised with
`-polly-codegen-ppcg` with the `-polly-codegen-perf-monitoring` option.

Differential Revision: https://reviews.llvm.org/D36934

llvm-svn: 311328
2017-08-21 11:44:01 +00:00
Tobias Grosser
b09bd74da8 [GPGPU] Add llvm.powi to the libdevice supported functions
These intrinsics are used in COSMO.

llvm-svn: 311324
2017-08-21 09:52:08 +00:00
Tobias Grosser
5170b6627a [GPGPU] Add log / logf to the libdevice supported functions
These two functions are used in COSMO

llvm-svn: 311322
2017-08-21 09:00:31 +00:00
Tobias Grosser
e32498c9c3 Revert "[GPGPU] Simplify PPCGSCop to reduce compile time [NFC]"
We still see some issues with parameter space mismatches. Revert this to get
a clean baseline. We will recommit after these issues have been resolved.

This reverts commit 0e360a14194f722ded7aa2bc9d4be2ed2efeeb49.

llvm-svn: 311268
2017-08-19 23:49:26 +00:00
Tobias Grosser
ecb94a0392 [GPGPU] Correctly initialize array order and fixed_element information
Summary:
This information is necessary for PPCG to perform correct life range reordering.
With these changes applied we can live-range reorder some of the important
kernels in COSMO.

We also update and rename one test case, which previously could not be optimized
and now is optimized thanks to live-range reordering. To preserve test coverage
we add a new test case scalar-writes-in-scop-requires-abort.ll, which exercises
our automatic abort in case of scalar writes in the kernel.

Reviewers: Meinersbur, bollu, singam-sanjay

Subscribers: nemanjai, pollydev, llvm-commits, kbarton

Tags: #polly

Differential Revision: https://reviews.llvm.org/D36929

llvm-svn: 311259
2017-08-19 20:21:22 +00:00
Philipp Schaad
50139f0f38 [PPCG] Only add Kernel argument sizes for OpenCL, not CUDA runtime
Kernel argument sizes now only get appended to the kernel launch parameter list if the OpenCL runtime is selected, not if CUDA runtime is chosen.

Differential revision: D36925

llvm-svn: 311248
2017-08-19 17:04:57 +00:00
Tobias Grosser
43df2020e7 [GPGPU] Collect parameter dimension used in MemoryAccesses
When using -polly-ignore-integer-wrapping and -polly-acc-codegen-managed-memory
we add parameter dimensions lazily to the domains, which results in PPCG not
including parameter dimensions that are only used in memory accesses in the
kernel space. To make sure these parameters are still passed to the kernel, we
collect these parameter dimensions and align the kernel's parameter space
before code-generating it.

llvm-svn: 311239
2017-08-19 12:58:28 +00:00
Tobias Grosser
ec02acfb98 [GPGPU] Simplify PPCGSCop to reduce compile time [NFC]
Summary:
Drop unused parameter dimensions to reduce the size of the sets we are working
with. Especially the computed dependences tend to accumulate a lot of parameters
that are present in the input memory accesses, but often not necessary to
express the actual dependences. As isl represents maps and sets with dense
matrices, reducing the dimensionality of isl sets commonly reduces code
generation performance.

This reduces compile time from 17 to 11 seconds for our test case. While this is
not impressive, this patch helped me to identify the previous two performance
improvements and additionally also increases readability of the isl data
structures we use.

Reviewers: Meinersbur, bollu, singam-sanjay

Reviewed By: bollu

Subscribers: nemanjai, pollydev, llvm-commits, kbarton

Tags: #polly

Differential Revision: https://reviews.llvm.org/D36869

llvm-svn: 311161
2017-08-18 13:38:12 +00:00
Siddharth Bhat
656e629572 [Polly] [PPCGCodeGeneration] Print current Scop and loop depth in PPCGCodeGen. [NFC]
Differential Revision: https://reviews.llvm.org/D36871

llvm-svn: 311158
2017-08-18 13:16:58 +00:00
Tobias Grosser
861a387fac [GPGPU] Do not create copy statements when targetting managed memory
Summary:
They are not used and consequently do not even need to be computed. This reduces
the overall compile time for our kernel from 1m33s to 17s.

Reviewers: Meinersbur, bollu, singam-sanjay

Reviewed By: bollu

Subscribers: nemanjai, pollydev, llvm-commits, kbarton

Tags: #polly

Differential Revision: https://reviews.llvm.org/D36868

llvm-svn: 311157
2017-08-18 13:11:05 +00:00
Tobias Grosser
62acb344d0 [GPGPU] Synchronize after each kernel, not each copy out
Summary:
This change reduces the overall number of synchronize calls for kernels with
a lot of output data at the cost of additional synchronize calls for kernels
launched in sequence without any device to host transfers in between. As the
latter pattern is a lot less frequent, this seems a better tradeoff.

Even though the above motivation would be motivation enough, this is just
a step towards enabling ppcg to not compute to and from device copy calls
at all, which would be incorrect in case we still relied on these calls to
place our synchronization statements.

Reviewers: Meinersbur, bollu, singam-sanjay

Reviewed By: bollu

Subscribers: nemanjai, kbarton, pollydev, llvm-commits

Tags: #polly

Differential Revision: https://reviews.llvm.org/D36867

llvm-svn: 311155
2017-08-18 12:55:58 +00:00
Tobias Grosser
fa03cb7687 [GPGPU] Only collect the access that belong to an array [NFC]
This avoid the construction of very large sets and in many cases also keeps the
number of parameters low. As a result, we see a compile time reduction from 5
minutes to only slightly above 1 minute for one of our larger test cases.

llvm-svn: 311127
2017-08-17 22:04:53 +00:00
Tobias Grosser
d2e57981fd [GPGPU] Move getExtend to C++ [NFC]
llvm-svn: 311123
2017-08-17 21:20:28 +00:00
Tobias Grosser
cff9696e11 [GPGPU] Make the ast_build available to block generator
This is necessary for partial writes (as used by delicm) to work.

llvm-svn: 310553
2017-08-10 08:00:56 +00:00
Siddharth Bhat
c4a4af47f3 [ManagedMemoryRewrite] Introduce a new pass to rewrite modules to use managed memory.
This pass is useful to automatically convert a codebase that uses malloc/free
to use their managed memory counterparts.

Currently, rewrite malloc and free to the `polly_{malloc,free}Managed` variants.

A future patch will teach ManagedMemoryRewrite to rewrite global arrays
as pointers to globally allocated managed memory.

Differential Revision: https://reviews.llvm.org/D36513

llvm-svn: 310471
2017-08-09 12:59:23 +00:00
Siddharth Bhat
34eeabbca3 [PPCGCodeGeneration] Compute element size in bytes for arrays correctly.
Previously, we used to compute this with `elementSizeInBits / 8`. This
would yield an element size of 0 when the array had element size < 8 in
bits.

To fix this, ask data layout what the size in bytes should be.

Differential Revision: https://reviews.llvm.org/D36459

llvm-svn: 310448
2017-08-09 08:29:16 +00:00
Siddharth Bhat
71dfb3eb07 [Polly] [PPCGCodeGeneration] Handle failing of invariant load hoisting gracefully.
To do this, we replicate what `CodeGeneration` does. We expose
`markNodeUnreachable` from `CodeGeneration` to `PPCGCodeGeneration`.

Differential Revision: https://reviews.llvm.org/D36457

llvm-svn: 310350
2017-08-08 12:00:59 +00:00
Tobias Grosser
d70ea7fed0 [GPGPU] Remove redundant constructors
llvm-svn: 310284
2017-08-07 19:20:57 +00:00
Tobias Grosser
61bd3a4840 [ScopInfo] Move Scop::getPwAffOnly to isl++ [NFC]
llvm-svn: 310231
2017-08-06 21:42:38 +00:00
Tobias Grosser
31df6f31c0 [ScopInfo] Move Scop::getDomains to isl++ [NFC]
llvm-svn: 310230
2017-08-06 21:42:25 +00:00
Tobias Grosser
b65ccc4302 [ScopInfo] Translate Scop::getParamSpace to isl++ [NFC]
llvm-svn: 310224
2017-08-06 20:11:59 +00:00
Tobias Grosser
8ea1fc19b3 [ScopInfo] Translate Scop::getContext to isl++ [NFC]
llvm-svn: 310221
2017-08-06 19:52:38 +00:00
Tobias Grosser
9a63570b13 [ScopInfo] Translate Scop::getIdForParam to isl++ [NFC]
llvm-svn: 310220
2017-08-06 19:31:27 +00:00
Tobias Grosser
5ab39ff224 [ScopInfo] Move get*Writes/getReads/getAccesses to isl++
llvm-svn: 310219
2017-08-06 19:22:27 +00:00
Tobias Grosser
dcf8d696ff Move ScopInfo::getDomain(), getDomainSpace(), getDomainId() to isl++
llvm-svn: 310209
2017-08-06 16:39:52 +00:00
Tobias Grosser
b99c11710c [GPGPU] Make sure managed arrays are prepared at the beginning of the scop
Summary:
This resolves some "instruction does not dominate use" errors, as we used to
prepare the arrays at the location of the first kernel, which not necessarily
dominated all other kernel calls.

Reviewers: Meinersbur, bollu, singam-sanjay

Subscribers: nemanjai, pollydev, llvm-commits, kbarton

Differential Revision: https://reviews.llvm.org/D36372

llvm-svn: 310196
2017-08-06 11:10:38 +00:00
Tobias Grosser
5b307cdb8a [GPGPU] Rename all, not only the first libdevice function
llvm-svn: 310194
2017-08-06 03:04:15 +00:00
Siddharth Bhat
e53c924b0f [Polly] [PPCGCodeGeneration] Deal with loops outside the Scop correctly in PPCGCodeGeneration.
A Scop with a loop outside it is not handled currently by
PPCGCodeGeneration. The test case is such that the Scop has only one inner loop
that is detected. This currently breaks codegen.

The fix is to reuse the existing mechanism in `IslNodeBuilder` within
`GPUNodeBuilder.

Differential Revision: https://reviews.llvm.org/D36290

llvm-svn: 310193
2017-08-06 02:39:05 +00:00
Siddharth Bhat
638316da5b [PPCGCodeGeneration] [NFC] Log every location from which PPCGCodegen bails.
This is useful when trying to understand why no GPU code was produced.

Differential Revision: https://reviews.llvm.org/D36318

llvm-svn: 310103
2017-08-04 19:36:40 +00:00
Tobias Grosser
b5563c6817 Make sure that all parameter dimensions are set in schedule
Summary:
In case the option -polly-ignore-parameter-bounds is set, not all parameters
will be added to context and domains. This is useful to keep the size of the
sets and maps we work with small. Unfortunately, for AST generation it is
necessary to ensure all parameters are part of the schedule tree. Hence,
we modify the GPGPU code generation to make sure this is the case.

To obtain the necessary information we expose a new function
Scop::getFullParamSpace(). We also make a couple of functions const to be
able to make SCoP::getFullParamSpace() const.

Reviewers: Meinersbur, bollu, gareevroman, efriedma, huihuiz, sebpop, simbuerg

Subscribers: nemanjai, kbarton, pollydev, llvm-commits

Tags: #polly

Differential Revision: https://reviews.llvm.org/D36243

llvm-svn: 309939
2017-08-03 13:51:15 +00:00
Siddharth Bhat
eadf76d34a [PPCGCodeGeneration] Construct isl_multi_pw_aff of PPCGArray.bounds even when polly-ignore-parameter-bounds is turned on.
When we have `-polly-ignore-parameter-bounds`, `Scop::Context` does not contain
all the paramters present in the program.

The construction of the `isl_multi_pw_aff` requires all the indivisual `pw_aff`
to have the same parameter dimensions. To achieve this, we used to realign
every `pw_aff` with `Scop::Context`. However, in conjunction with
`-polly-ignore-parameter-bounds`, this is now incorrect, since `Scop::Context`
does not contain all parameters.

We set this up correctly by creating a space that has all the parameters
used by all the `isl_pw_aff`. Then, we realign all `isl_pw_aff` to this space.

llvm-svn: 309934
2017-08-03 12:09:33 +00:00
Siddharth Bhat
edf9581e4c [PPCGCodeGeneration] Correct usage of llvm::Value with getLatestValue.
It is possible that the `HostPtr` that coresponds to an array could be
invariant load hoisted. Make sure we use the invariant load hoisted
value by using `IslNodeBuilder::getLatestValue`.

Differential Revision: https://reviews.llvm.org/D36001

llvm-svn: 309681
2017-08-01 14:26:39 +00:00
Siddharth Bhat
4d5820d171 [NFC] [PPCGCodeGeneration] Convert GPUNodeBuilder::getGridSizes to isl++.
llvm-svn: 309671
2017-08-01 10:45:41 +00:00
Siddharth Bhat
ccbf4b509c [NFC] [PPCGCodeGeneration] Convert GPUNodeBuilder::getArrayOffset to isl++.
llvm-svn: 309669
2017-08-01 09:58:55 +00:00
Tobias Grosser
8fc6cdfb1c [GPGPU] Add support for NVIDIA libdevice
Summary:
This allows us to map functions such as exp, expf, expl, for which no
LLVM intrinsics exist. Instead, we link to NVIDIA's libdevice which provides
high-performance implementations of a wide range of (math) functions. We
currently link only a small subset, the exp, cos and copysign functions. Other
functions will be enabled as needed.

Reviewers: bollu, singam-sanjay

Reviewed By: bollu

Subscribers: tstellar, tra, nemanjai, pollydev, mgorny, llvm-commits, kbarton

Tags: #polly

Differential Revision: https://reviews.llvm.org/D35703

llvm-svn: 309560
2017-07-31 14:03:16 +00:00
Siddharth Bhat
4ebeb3568a [PPCGCodeGeneration] Check that invariant load hoisting succeeded.
If we fail, throw an error for now. We can gracefully handle this later.

llvm-svn: 309387
2017-07-28 14:48:32 +00:00
Tobias Grosser
25271b91b2 [GPGPU] Do not require the Scop::Context to have information about all parameters
llvm-svn: 309368
2017-07-28 06:49:44 +00:00
Tobias Grosser
30caae6d23 [GPGPU] Fix compilation issue with latest CUDA upgrade to i128
llvm-svn: 309366
2017-07-28 06:38:49 +00:00
Siddharth Bhat
43f178bbc9 [PPCGCodeGeneration] Skip arrays with empty extent.
Invariant load hoisted scalars, and arrays whose size we can statically compute
to be 0 do not need to be allocated as arrays.

Invariant load hoisted scalars are sent to the kernel directly as parameters.

Earlier, we used to allocate `0` bytes of memory for these because our
computation of size from `PPCGCodeGeneration::getArraySize` would result in `0`.

Now, since we don't invariant loads as arrays in PPCGCodeGeneration, this
problem does not occur anymore.

Differential Revision: https://reviews.llvm.org/D35795

llvm-svn: 308971
2017-07-25 12:35:36 +00:00
Tobias Grosser
206e9e3b3b Move ScopArrayInfo::getFromAccessFunction and getFromId to isl++
llvm-svn: 308892
2017-07-24 16:22:27 +00:00
Siddharth Bhat
f7face4bc4 Convert GPUNodeBuilder::getArraySize to islcpp.
Note: PPCGCodeGeneration::pollyBuildAstExprForStmt is at
      https://reviews.llvm.org/D35770

Differential Revision: https://reviews.llvm.org/D35771

llvm-svn: 308870
2017-07-24 09:08:21 +00:00
Siddharth Bhat
35de900917 [NFC] Move PPCGCodeGeneration::pollyBuildAstExprForStmt to isl++.
Differential Revision: https://reviews.llvm.org/D35771

llvm-svn: 308869
2017-07-24 08:34:24 +00:00
Tobias Grosser
6a87036e0f Move MemoryAccess::getAddressFunction to isl++
llvm-svn: 308841
2017-07-23 04:08:45 +00:00
Tobias Grosser
1515f6b937 Move MemoryAccess::NewAccessRelation to isl++
We also move related accessor functions

llvm-svn: 308840
2017-07-23 04:08:38 +00:00
Tobias Grosser
fe46c3ff3a Move MemoryAccess::id to isl++
llvm-svn: 308836
2017-07-23 04:08:11 +00:00