as long as possible.
** Context **
Each time the dominance information is modified, the dominator tree analysis
switches in a slow query mode. After a few queries without any modification on
the dominator tree, it performs an expensive update of its internal structure to
provide fast queries again.
** Problem **
Prior to this patch, the MachineSink pass was splitting the critical edges on
demand while relying heavy on the dominator tree information. In some cases,
this leads to pathological behavior where:
- We end up in the slow query mode right after splitting an edge.
- We update the dominance information.
- We break the dominance information again, thus ending up in the slow query
mode and so on.
** Proposed Solution **
To mitigate this effect, this patch postpones all the splitting of the edges at
the end of each iteration of the main loop.
The benefits are:
- The dominance information is valid for the life time of an iteration.
- This simplifies the code as we do not have to special treat instructions that
are sunk on critical edges. Indeed, the related block will be available
through the next iteration.
The downside is that when edges splitting is required, this incurs an additional
iteration of the main loop compared to the previous scheme.
** Performance **
Thanks to this patch, the motivating example compiles in 6+ minutes instead of
10+ minutes. No test case added as the motivating example as nothing special but
being huge!
I have measured only noise for both the compile time and the runtime on the llvm
test-suite + SPECs with Os and O3.
Note: The current implementation of MachineBasicBlock::SplitCriticalEdge also
uses the dominance information and therefore, hits this problem. A subsequent
patch will address that.
<rdar://problem/17894619>
llvm-svn: 215410
This patch adds a new property: isRegSequence and the related target hooks:
TargetIntrInfo::getRegSequenceInputs and
TargetInstrInfo::getRegSequenceLikeInputs to specify that a target specific
instruction is a (kind of) REG_SEQUENCE.
<rdar://problem/12702965>
llvm-svn: 215394
buildLocationLists easier to read.
The previous implementation conflated the merging of individual pieces
and the merging of entire DebugLocEntries.
By splitting this functionality into two separate functions the intention
of the code should be clearer.
llvm-svn: 215383
be propagated to all its users, and this propagation could increase the
probability of finding common subexpressions. If the COPY has only one user,
the COPY itself can be removed.
llvm-svn: 215344
That broke the build:
/data/buildslave/clang-amd64-freebsd/src-llvm/lib/CodeGen/PeepholeOptimizer.cpp:729:46: error: non-const lvalue reference to type 'SmallPtrSet<[...], 8>' cannot bind to a value of unrelated type 'SmallPtrSet<[...], 16>'
Changed |= optimizeExtInstr(MI, MBB, LocalMIs);
^~~~~~~~
/data/buildslave/clang-amd64-freebsd/src-llvm/lib/CodeGen/PeepholeOptimizer.cpp:265:49: note: passing argument to parameter 'LocalMIs' here
SmallPtrSet<MachineInstr*, 8> &LocalMIs) {
^
llvm-svn: 215341
Follow up to r214266. Add missing case in ScalarizeVectorResult() for
cttz_zero_undef.
Differential Revision: http://reviews.llvm.org/D4813
llvm-svn: 215330
floating point exceptions, added use of flag to fold potentially exception
raising floating point math in selection DAG. No functionality change, as
targets have to explicitly ask for this behavior and none does today.
llvm-svn: 215222
__stack_chk_guard.
Handle the case where the pointer operand of the load instruction that loads the
stack guard is not a global variable but instead a bitcast.
%StackGuard = load i8** bitcast (i64** @__stack_chk_guard to i8**)
call void @llvm.stackprotector(i8* %StackGuard, i8** %StackGuardSlot)
Original test case provided by Ana Pazos.
This fixes PR20558.
llvm-svn: 215167
Due to an unnecessary special case, inlined arguments that happened to
be from the same function as they were inlined into were misclassified
as non-inline arguments and would overwrite the non-inlined arguments.
Assert that we never overwrite a function's arguments, and stop
misclassifying inlined arguments as non-inline arguments to fix this
issue.
Excuse the rather crappy test case - handcrafted IR might do better, or
someone who understands better how to tickle the inliner to create a
recursive inlining situation like this (though it may also be necessary
to tickle the variable in a particular way to cause it to be recorded in
the MMI side table and go down this particular path for location
information).
llvm-svn: 215157
be deleted. This will be reapplied as soon as possible and before
the 3.6 branch date at any rate.
Approved by Jim Grosbach, Lang Hames, Rafael Espindola.
This reverts commits r215111, 215115, 215116, 215117, 215136.
llvm-svn: 215154
Re-commit of r214832,r21469 with a work-around that
avoids the previous problem with gcc build compilers
The work-around is to use SmallVector instead of ArrayRef
of basic blocks in preservesResourceLen()/MachineCombiner.cpp
llvm-svn: 215151
BranchFolderPass was not correctly setting the basic block branch weights when
tail-merging created or merged blocks. This patch recomutes the weights of
tail-merged blocks using the following formula:
branch_weight(merged block to successor j) =
sum(block_frequency(bb) * branch_probability(bb -> j))
bb is a block that is in the set of merged blocks.
<rdar://problem/16256423>
llvm-svn: 215135
I am sure we will be finding bits and pieces of dead code for years to
come, but this is a good start.
Thanks to Lang Hames for making MCJIT a good replacement!
llvm-svn: 215111
to get the subtarget and that's accessible from the MachineFunction
now. This helps clear the way for smaller changes where we getting
a subtarget will require passing in a MachineFunction/Function as
well.
llvm-svn: 214988
In r210492 the logic of calculateDbgValueHistory was changed to end
register variable live ranges at the end of MBB conditionally on
the fact that the register was or not clobbered by the function body.
This requires an initial scan of all the operands of the function
to collect all clobbered registers. In a second pass over all
instructions, we compare this set with the set of clobbered
registers for the current MachineInstruction. This modification
incurred a compilation time regression on some benchmarks: the
debug info emission phase takes ~10% more time.
While a small performance hit is unavoidable due to the initial
scan requirement, we can improve the situation by avoiding to
create too many temporary sets and just use lambdas to work directly
on the result of the initial scan.
Fixes <rdar://problem/17884104>
Patch by Frederic Riss!
llvm-svn: 214987
The handling of the epilogue is best expressed as an early exit and
there is no reason to look for register defs in DbgValue MIs.
Patch by Frederic Riss!
llvm-svn: 214986
Otherwise we can end up with an argument frame size that is not a
multiple of stack slot size, which is very awkward.
This fixes PR20547, which was a bug in x86_64 Sys V vararg handling.
However, it's much easier to test this with x86 callee-cleanup
functions, which previously ended in "retl $6" instead of "retl $8".
This does affect behavior of all backends, but it presumably fixes the
same bug in all of them.
llvm-svn: 214980
This patch addresses 2 FIXME comments that I added to CriticalAntiDepBreaker while fixing PR20020.
Initialize an MCSubRegIterator and an MCRegAliasIterator to include the self reg.
Assuming that works as advertised, there should be functional difference with this patch, just less code.
Also, remove the associated asserts - we're setting those values just before, so the asserts don't do anything meaningful.
Differential Revision: http://reviews.llvm.org/D4566
llvm-svn: 214973
This was coming in weird debug info that had variables (and hence
debug_locs) but was in GMLT mode (because it was missing the 13th field
of the compile_unit metadata) so no ranges were constructed. We should
always have at least one range for any CU with a debug_loc in it -
because the range should cover the debug_loc.
The assertion just ensures that the "!= 1" range case inside the
subsequent loop doesn't get entered for the case where there are no
ranges at all, which should never reach here in the first place.
llvm-svn: 214939
This simplifies construction and usage while making the data structure
smaller. It was a holdover from the days when we didn't have a separate
DebugLocList and all we had was a flat list of DebugLocEntries.
llvm-svn: 214933
Allow vector fabs operations on bitcasted constant integer values to be optimized
in the same way that we already optimize scalar fabs.
So for code like this:
%bitcast = bitcast i64 18446744069414584320 to <2 x float> ; 0xFFFF_FFFF_0000_0000
%fabs = call <2 x float> @llvm.fabs.v2f32(<2 x float> %bitcast)
%ret = bitcast <2 x float> %fabs to i64
Instead of generating something like this:
movabsq (constant pool loadi of mask for sign bits)
vmovq (move from integer register to vector/fp register)
vandps (mask off sign bits)
vmovq (move vector/fp register back to integer return register)
We should generate:
mov (put constant value in return register)
I have also removed a redundant clause in the first 'if' statement:
N0.getOperand(0).getValueType().isInteger()
is the same thing as:
IntVT.isInteger()
Testcases for x86 and ARM added to existing files that deal with vector fabs.
One existing testcase for x86 removed because it is no longer ideal.
For more background, please see:
http://reviews.llvm.org/D4770
And:
http://llvm.org/bugs/show_bug.cgi?id=20354
Differential Revision: http://reviews.llvm.org/D4785
llvm-svn: 214892
shorter/easier and have the DAG use that to do the same lookup. This
can be used in the future for TargetMachine based caching lookups from
the MachineFunction easily.
Update the MIPS subtarget switching machinery to update this pointer
at the same time it runs.
llvm-svn: 214838
This code is completely wrong. It is also dead, as if it were to *ever*
run, it would crash. Fortunately, after my work to the combiner, it is
at least *possible* to reach the code, and llvm-stress has found a test
case. Thanks to Patrick for reporting.
It would be really good if anyone who remembers how this code works and
what it was intended to do could add some more obvious test coverage
instead of my completely contrived and reduced test case. My test case
was so brittle I left a bread crumb comment in it to help the next
person to stumble on it and not know what it was actually testing for.
llvm-svn: 214785
Originally reverted in r213432 with flakey failures on an ASan self-host
build. After reduction it seems to be the same issue fixed in r213805
(ArgPromo + DebugInfo: Handle updating debug info over multiple
applications of argument promotion) and r213952 (by having
LiveDebugVariables strip dbg_value intrinsics in functions that are not
described by debug info). Though I cannot explain why this failure was
flakey...
llvm-svn: 214761