Splitting basic blocks into multiple statements if there are now
additional scalar dependencies gives more freedom to the scheduler, but
more statements also means higher compile-time complexity. Switch to
finer statement granularity, the additional compile time should be
limited by the number of operations quota.
The regression tests are written for the -polly-stmt-granularity=bb
setting, therefore we add that flag to those tests that break with the
new default. Some of the tests only fail because the statements are
named differently due to a basic block resulting in multiple statements,
but which are removed during simplification of statements without
side-effects. Previous commits tried to reduce this effect, but it is
not completely avoidable.
Differential Revision: https://reviews.llvm.org/D42151
llvm-svn: 324169
When collecting base pointers that need to be made available in parallel
subfunctions, use the base pointer associated with the latest
ScopArrayInfo, instead of the original one.
llvm-svn: 316983
This test XFAILs two test that start to fail when verifying DT's
DFS numbers, as per Tobias' suggestion.
Related VerifyDFSNumbers patch: D38331.
llvm-svn: 314800
Codegen with -polly-parallel queried the unmapped MemoryAccess, but only
the MemoryKind after mapping is relevant for codegen.
This should fix various fails of the
perf-x86_64-penryn-O3-polly-parallel-fast buildbot.
llvm-svn: 310466
Today Polly generates induction variable in this way:
polly.indvar = phi 0, polly.indvar.next
...
polly.indvar.next = polly.indvar + stide
polly.loop_cond = predicate polly.indvar, (UB - stride)
Instead of:
polly.indvar = phi 0, polly.indvar.next
...
polly.indvar.next = polly.indvar + stide
polly.loop_cond = predicate polly.indvar.next, UB
The way Polly generate induction variable cause some problem in the indvar simplify pass.
This patch make polly generate the later form, by assuming the induction variable never overflow
Differential Revision: https://reviews.llvm.org/D33089
llvm-svn: 302866
In commit r219005 lifetime markers have been introduced to mark the lifetime of
the OpenMP context data structure. However, their use seems incorrect and
recently caused a miscompile in ASC_Sequoia/CrystalMk after r298053 which was
not at all related to r298053. r298053 only caused a change in the loop order,
as this change resulted in a different isl internal representation which caused
the scheduler to derive a different schedule. This change then caused the IR to
change, which apparently created a pattern in which LLVM exploites the lifetime
markers. It seems we are using the OpenMP context outside of the lifetime
markers. Even though CrystalMk could probably be fixed by expanding the scope of
the lifetime markers, it is not clear what happens in case the OpenMP function
call is in a loop which will cause a sequence of starting and ending lifetimes.
As it is unlikely that the lifetime markers give any performance benefit, we
just drop them to remove complexity.
llvm-svn: 298192
Only when load-hoisted we can be sure the base pointer is invariant
during the SCoP's execution. Most of the time it would be added to
the required hoists for the alias checks anyway, except with
-polly-ignore-aliasing, -polly-use-runtime-alias-checks=0 or if
AliasAnalysis is already sure it doesn't alias with anything
(for instance if there is no other pointer to alias with).
Two more parts in Polly assume that this load-hoisting took place:
- setNewAccessRelation() which contains an assert which tests this.
- BlockGenerator which would use to the base ptr from the original
code if not load-hoisted (if the access expression is regenerated)
Differential Revision: https://reviews.llvm.org/D30694
llvm-svn: 297195
There is no point in optimizing unreachable code, hence our test cases should
always return.
This commit is part of a series that makes Polly more robust on the presence of
unreachables.
llvm-svn: 297147
This will make it easier to switch the default of Polly's invariant load
hoisting strategy and also makes it very clear that these test cases
indeed require invariant code hoisting to work.
llvm-svn: 278667
The recent expression type changes still need more discussion, which will happen
on phabricator or on the mailing list. The precise list of commits reverted are:
- "Refactor division generation code"
- "[NFC] Generate runtime checks after the SCoP"
- "[FIX] Determine insertion point during SCEV expansion"
- "Look through IntToPtr & PtrToInt instructions"
- "Use minimal types for generated expressions"
- "Temporarily promote values to i64 again"
- "[NFC] Avoid unnecessary comparison for min/max expressions"
- "[Polly] Fix -Wunused-variable warnings (NFC)"
- "[NFC] Simplify min/max expression generation"
- "Simplify the type adjustment in the IslExprBuilder"
Some of them are just reverted as we would otherwise get conflicts. I will try
to re-commit them if possible.
llvm-svn: 272483
We now use the minimal necessary bit width for the generated code. If
operations might overflow (add/sub/mul) we will try to adjust the types in
order to ensure a non-wrapping computation. If the type adjustment is not
possible, thus the necessary type is bigger than the type value of
--polly-max-expr-bit-width, we will use assumptions to verify the computation
will not wrap. However, for run-time checks we cannot build assumptions but
instead utilize overflow tracking intrinsics.
llvm-svn: 271878
We utilize assumptions on the input to model IR in polyhedral world.
To verify these assumptions we version the code and guard it with a
runtime-check (RTC). However, since the RTCs are themselves generated
from the polyhedral representation we generate them under the same
assumptions that they should verify. In other words, the guarantees
that we try to provide with the RTCs do not hold for the RTCs
themselves. To this end it is necessary to employ a different check
for the RTCs that will verify the assumptions did hold for them too.
Differential Revision: http://reviews.llvm.org/D20165
llvm-svn: 269299
This reverts commit 2879c53e80e05497f408f21ce470d122e9f90f94.
Additionally, it adds SDiv and SRem instructions to the set of values
discovered by the findValues function even if we add the operands to
be able to recompute the SCEVs. In subfunctions we do not want to
recompute SDiv and SRem instructions but pass them instead as they
might have been created through the IslExprBuilder and are more
complicated than simple SDiv/SRem instructions in the code.
llvm-svn: 265873
So far we separated constant factors from multiplications, however,
only when they are at the outermost level of a parameter SCEV. Now,
we also separate constant factors from the parameter SCEV if the
outermost expression is a SCEVAddRecExpr. With the changes to the
SCEVAffinator we can now improve the extractConstantFactor(...)
function at will without worrying about any other code part. Thus,
if needed we can implement a more comprehensive
extractConstantFactor(...) function that will traverse the SCEV
instead of looking only at the outermost level.
Four test cases were affected. One did not change much and the other
three were simplified.
llvm-svn: 260859
When introducing separate control flow for the original and optimized code we
introduce now a special 'ExitingBlock':
\ /
EnteringBB
|
SplitBlock---------\
_____|_____ |
/ EntryBB \ StartBlock
| (region) | |
\_ExitingBB_/ ExitingBlock
| |
MergeBlock---------/
|
ExitBB
/ \
This 'ExitingBlock' contains code such as the final_reloads for scalars, which
previously were just added to whichever statement/loop_exit/branch-merge block
had been generated last. Having an explicit basic block makes it easier to
find these constructs when looking at the CFG.
llvm-svn: 255107
If a (assumed) invariant location is loaded multiple times we
generated a parameter for each location. However, this caused compile
time problems for several benchmarks (e.g., 445_gobmk in SPEC2006 and
BT in the NAS benchmarks). Additionally, the code we generate is
suboptimal as we preload the same location multiple times and perform
the same checks on all the parameters that refere to the same value.
With this patch we consolidate the invariant loads in three steps:
1) During SCoP initialization required invariant loads are put in
equivalence classes based on their pointer operand. One
representing load is used to generate a parameter for the whole
class, thus we never generate multiple parameters for the same
location.
2) During the SCoP simplification we remove invariant memory
accesses that are in the same equivalence class. While doing so
we build the union of all execution domains as it is only
important that the location is at least accessed once.
3) During code generation we only preload one element of each
equivalence class with the unified execution domain. All others
are mapped to that preloaded value.
Differential Revision: http://reviews.llvm.org/D13338
llvm-svn: 249853
These flags are now always passed to all tests and need to be disabled if
not needed. Disabling these flags, rather than passing them to almost all
tests, significantly simplfies our RUN: lines.
llvm-svn: 249422
If a value is globally mapped (IslNodeBuilder::ValueMap) and
referenced in the code that will be put into a subfunction, we hand
down the new value to the subfunction.
This patch also removes code that handed down all invariant loads to
the subfunction. Instead, only needed invariant loads are given to the
subfunction. There are two possible reasons for an invariant load to
be handed down:
1) The invariant load is used in a block that is placed in the
subfunction but which is not the parent of the load. In this
case, the scalar access that will read the loaded value, will
cause its base pointer (the preloaded value) to be handed down to
the subfunction.
2) The invariant load is defined and used in a block that is placed
in the subfunction. With this patch we will hand down the
preloaded value to the subfunction as the invariant load is
globally mapped to that value.
llvm-svn: 249126
Instructions which we can synthesis from a SCEV expression are not
generated directly, but only when they are used as an operand of
another instruction. This avoids generating unnecessary instructions
and works more reliably than first inserting them and then deleting
them later on.
This commit was reverted in r248860 due to a remaining miscompile, where
we forgot to synthesis the operand values that were referenced from scalar
writes. test/Isl/CodeGen/scalar-store-from-same-bb.ll tests that we do this
now correctly.
llvm-svn: 248900
This reverts commit 07830c18d789ee72812d5b5b9b4f8ce72ebd4207.
The commit broke at least one test in lnt,
MultiSource/Benchmarks/Ptrdist/bc/number.c
was miss compiled and the test produced a wrong result.
One Polly test case that was added later was adjusted too.
llvm-svn: 248860
Instructions which we can synthesis from a SCEV expression are not generated
directly, but only when they are used as an operand of another instruction. This
avoids generating unnecessary instruction and works more reliably than first
inserting them and then deleting them later on.
Suggested-by: Johannes Doerfert <doerfert@cs.uni-saarland.de>
Differential Revision: http://reviews.llvm.org/D13208
llvm-svn: 248712
After having generated a new user statement a couple of inefficient or trivially
dead instructions may remain. This commit runs instruction simplification over
the newly generated blocks to ensure unneeded instructions are removed right
away.
This commit does not yet add simplification for non-affine subregions.
llvm-svn: 248681
This commit basically reverts r246427 but still solves the issue
tackled by that commit. Instead of emitting initialization code in the
beginning of the start block we now generate parallel code in its own
block and thereby guarantee separation. This is necessary as we cannot
generate code for hoisted loads prior to the start block but it still
needs to be placed prior to everything else.
llvm-svn: 248674
At some point we build loop trip counts using this method. It was replaced by
a simpler trick that works only for affine (e.g., not modulo) constraints and
relies on the removal of unbounded parts. In order to allow modulo constrains
again we go back to the former, more accurate method.
llvm-svn: 247540
This patch replaces the last legacy part of the domain generation, namely the
ScalarEvolution part that was used to obtain loop bounds. We now iterate over
the loops in the region and propagate the back edge condition to the header
blocks. Afterwards we propagate the new information once through the whole
region. In this process we simply ignore unbounded parts of the domain and
thereby assume the absence of infinite loops.
+ This patch already identified a couple of broken unit tests we had for
years.
+ We allow more loops already and the step to multiple exit and multiple back
edges is minimal.
+ It allows to model the overflow checks properly as we actually visit
every block in the SCoP and know where which condition is evaluated.
- It is currently not compatible with modulo constraints in the
domain.
Differential Revision: http://reviews.llvm.org/D12499
llvm-svn: 247279
Certain backends, e.g. NVPTX, do not support '.' in function names. Hence,
we ensure all '.' are replaced by '_' when generating function names for
subfunctions. For the current OpenMP code generation, this is not strictly
necessary, but future uses cases (e.g. GPU offloading) need this issue to be
fixed.
llvm-svn: 246980
When computing the index expressions for new, multi-dimensional memory accesses
these new index expressions may reference original llvm::Values that are not
transfered into the OpenMP subfunction. Using GlobalMap we now replace
references to such values with the rewritten values that have e.g. been passed
to the OpenMP subfunction.
llvm-svn: 246923
Our OpenMP code generation generated part of its launching code directly into
the start basic block and without this change the scalar initialization was
run _after_ the OpenMP threads have been launched. This resulted in
uninitialized scalar values to be used.
llvm-svn: 246427
In order to compute domain conditions for conditionals we will now
traverse the region in the ScopInfo once and build the domains for
each block in the region. The SCoP statements can then use these
constraints when they build their domain.
The reason behind this change is twofold:
1) This removes a big chunk of preprocessing logic from the
TempScopInfo, namely the Conditionals we used to build there.
Additionally to moving this logic it is also simplified. Instead
of walking the dominance tree up for each basic block in the
region (as we did before), we now traverse the region only
once in order to collect the domain conditions.
2) This is the first step towards the isl based domain creation.
The second step will traverse the region similar to this step,
however it will propagate back edge conditions. Once both are in
place this conditional handling will allow multiple exit loops
additional logic.
Reviewers: grosser
Differential Revision: http://reviews.llvm.org/D12428
llvm-svn: 246398
To make alias scope metadata generation work in OpenMP mode we now provide
the ScopAnnotator with information about the base pointer rewrite that happens
when passing arrays into the OpenMP subfunction.
llvm-svn: 245451
Modified two test cases to adjust to the above change in renaming.
These two files were causing the buildbot failure in Polly, #30204 for example.
Details in http://reviews.llvm.org/D9483
This checkin goes with r237150 and r237151
llvm-svn: 237203
Besides class, function and file names, we also change the command line option
from -polly-codegen-isl to just -polly-codegen. The isl postfix is a leftover
from the times when we still had the CLooG based -polly-codegen. Today it is
just redundant and we drop it.
llvm-svn: 237099
I just learned that target triples prevent test cases to be run on other
architectures. Polly test cases are until now sufficiently target independent
to not require any target triples. Hence, we drop them.
llvm-svn: 235384
Scops that only read seem generally uninteresting and scops that only write are
most likely initializations where there is also little to optimize. To not
waste compile time we bail early.
Differential Revision: http://reviews.llvm.org/D7735
llvm-svn: 229820
Schedule dimensions that have the same constant value accross all statements do
not carry any information, but due to the increased dimensionality of the
schedule cost compile time. To not pay this cost, we remove constant dimensions
if possible.
llvm-svn: 225067