Now local value sinking only scans and numbers instructions added
between the current flush point and the last flush point. This ensures
that ISel is overall linear in the size of the BB.
Fixes PR37010 and re-enables local value sinking by default.
llvm-svn: 331087
Some of the bots were failing in a different way to the others. These were
unable to compare tuples. Fix this by changing to a struct, thereby avoiding
the quirks of tuples.
llvm-svn: 331081
Extend the live-in check for all aliased registers so that we can
allow sinking Copy instructions when only implicit def is in successor's
live-in.
llvm-svn: 331072
Summary:
Currently only the memory size is supported but others can be added as
needed.
narrowScalar for G_LOAD and G_STORE now correctly update the
MachineMemOperand and will refuse to legalize atomics since those need more
careful expansions to maintain atomicity.
Reviewers: ab, aditya_nandakumar, bogner, rtereshin, aemerson, javed.absar
Reviewed By: aemerson
Subscribers: aemerson, rovka, kristof.beyls, javed.absar, llvm-commits
Differential Revision: https://reviews.llvm.org/D45466
llvm-svn: 331071
The main goal of this change is to make it much easier to track which
rules are actually covered by Testgen'erated regression tests.
Reviewers: aemerson, dsanders
Differential Revision: https://reviews.llvm.org/D46095
llvm-svn: 330988
For local variables the first DW_OP_deref is consumed by turning the
location kind into a memeory location, but that only makes sense for
values that are in a register to begin with, which cannot happen for
global variables that are attached to a symbol.
rdar://problem/39741860
llvm-svn: 330970
As noted, the attribute name is subject to change once we have
the clang side implemented, but it's clear that we need some
kind of attribute-based predication here based on the discussion
for:
rL330437
llvm-svn: 330951
As discussed in the post-review comments for rL330437,
we need to guard this fold to allow existing code to
keep working with the undefined behavior that they've
come to rely on.
That would mean duplicating more code than we already
have, so let's fix that first.
llvm-svn: 330947
Debug var, expr and loc were only supported for non-fixed stack objects.
This patch adds the following fields to the "fixedStack:" entries, and
renames the ones from "stack:" to:
* debug-info-variable
* debug-info-expression
* debug-info-location
Differential Revision: https://reviews.llvm.org/D46032
llvm-svn: 330859
We were previously prefering ZEXTLOAD over EXTLOAD if it is legal. This triggers during X86's promotion of i16->i32. Not sure about other targets.
Using ZEXTLOAD can prevent folding it to SEXTLOAD later if we were to promote a sign extended operand like we would need for SRA. However, X86 doesn't currently promote i16 SRA. I was looking into doing that which is how I found this issue.
This is also blocking our ability to fold 4 byte aligned EXTLOADs with "loadi32". This is what caused most of the test changes here.
Differential Revision: https://reviews.llvm.org/D45585#inline-402825
llvm-svn: 330781
Previously, _any_ store or load instruction was considered to be
operating on a spill if it had a frameindex as an operand, and thus
was fair game for optimisations such as "StackSlotColoring". This
usually works, except on architectures where spills can be partially
restored, for example on X86 where a spilt vector can have a single
component loaded (zeroing the rest of the target register). This can be
mis-interpreted and the zero extension unsoundly eliminated, see
pr30821.
To avoid this, this commit optionally provides the caller to
isLoadFromStackSlot and isStoreToStackSlot with the number of bytes
spilt/loaded by the given instruction. Optimisations can then determine
that a full spill followed by a partial load (or vice versa), for
example, cannot necessarily be commuted.
Patch by Jeremy Morse!
Differential Revision: https://reviews.llvm.org/D44782
llvm-svn: 330778
This patch aims to provide correct dwarf unwind information in function
epilogue for X86.
It consists of two parts. The first part inserts CFI instructions that set
appropriate cfa offset and cfa register in emitEpilogue() in
X86FrameLowering. This part is X86 specific.
The second part is platform independent and ensures that:
* CFI instructions do not affect code generation (they are not counted as
instructions when tail duplicating or tail merging)
* Unwind information remains correct when a function is modified by
different passes. This is done in a late pass by analyzing information
about cfa offset and cfa register in BBs and inserting additional CFI
directives where necessary.
Added CFIInstrInserter pass:
* analyzes each basic block to determine cfa offset and register are valid
at its entry and exit
* verifies that outgoing cfa offset and register of predecessor blocks match
incoming values of their successors
* inserts additional CFI directives at basic block beginning to correct the
rule for calculating CFA
Having CFI instructions in function epilogue can cause incorrect CFA
calculation rule for some basic blocks. This can happen if, due to basic
block reordering, or the existence of multiple epilogue blocks, some of the
blocks have wrong cfa offset and register values set by the epilogue block
above them.
CFIInstrInserter is currently run only on X86, but can be used by any target
that implements support for adding CFI instructions in epilogue.
Patch by Violeta Vukobrat.
Differential Revision: https://reviews.llvm.org/D42848
llvm-svn: 330706
Summary:
The pass is supposed to scalarize such intrinsics if the target does not support
them natively, so if the scalarization does not happen instruction selection
crashes due to inability to lower these intrinsics.
Reviewers: andrew.w.kaylor, craig.topper
Reviewed By: andrew.w.kaylor
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D45947
llvm-svn: 330700
Summary:
This is [[ https://bugs.llvm.org/show_bug.cgi?id=37104 | PR37104 ]].
[[ https://bugs.llvm.org/show_bug.cgi?id=6773 | PR6773 ]] will introduce an IR canonicalization that is likely bad for the end assembly.
Previously, `andl`+`andn`/`andps`+`andnps` / `bic`/`bsl` would be generated. (see `@out`)
Now, they would no longer be generated (see `@in`).
So we need to make sure that they are still generated.
If the mask is constant, we do nothing. InstCombine should have unfolded it.
Else, i use `hasAndNot()` TLI hook.
For now, only handle scalars.
https://rise4fun.com/Alive/bO6
----
I *really* don't like the code i wrote in `DAGCombiner::unfoldMaskedMerge()`.
It is super fragile. Is there something like IR Pattern Matchers for this?
Reviewers: spatel, craig.topper, RKSimon, javed.absar
Reviewed By: spatel
Subscribers: andreadb, courbet, kristof.beyls, javed.absar, rengolin, nemanjai, llvm-commits
Differential Revision: https://reviews.llvm.org/D45733
llvm-svn: 330646
This helps debug issues where selection-dag assigns the wrong location
to an instruction.
Differential Revision: https://reviews.llvm.org/D45913
llvm-svn: 330618
Summary:
This just refactors the lowering of the atomic memory intrinsics to more
closely match the code patterns used in the lowering of the non-atomic
memory intrinsics. Specifically, we encapsulate the lowering in
SelectionDAG::getAtomicMem*() functions rather than embedding
the code directly in the SelectionDAGBuilder code.
llvm-svn: 330603
In certain cases, the compiler might try to merge __stack_chk_guard with
another global variable. (Or someone could theoretically define
__stack_chk_guard as an alias.) In that case, make sure we don't crash.
Differential Revision: https://reviews.llvm.org/D45746
llvm-svn: 330495
This was originally committed at rL328921 and reverted at rL329920 to
investigate failures in Chrome. This time I've added to the ReleaseNotes
to warn users of the potential of exposing UB and let me repeat that
here for more exposure:
Optimization of floating-point casts is improved. This may cause surprising
results for code that is relying on undefined behavior. Code sanitizers can
be used to detect affected patterns such as this:
int main() {
float x = 4294967296.0f;
x = (float)((int)x);
printf("junk in the ftrunc: %f\n", x);
return 0;
}
$ clang -O1 ftrunc.c -fsanitize=undefined ; ./a.out
ftrunc.c:5:15: runtime error: 4.29497e+09 is outside the range of
representable values of type 'int'
junk in the ftrunc: 0.000000
Original commit message:
fptosi / fptoui round towards zero, and that's the same behavior as ISD::FTRUNC,
so replace a pair of casts with the equivalent node. We don't have to account for
special cases (NaN, INF) because out-of-range casts are undefined.
Differential Revision: https://reviews.llvm.org/D44909
llvm-svn: 330437
This moves the EnableLinkOnceODROutlining flag from TargetPassConfig.cpp into
MachineOutliner.cpp. It also removes OutlineFromLinkOnceODRs from the
MachineOutliner constructor. This is now handled by the moved command-line
flag.
llvm-svn: 330373
Summary:
Patch adds initial emission of the debug info for NVPTX target.
Currently, only .file and .loc directives are emitted, everything else is
commented out to not break the compilation of Cuda.
Reviewers: echristo, jlebar, tra, jholewinski
Subscribers: mgorny, aprantl, JDevlieghere, llvm-commits
Differential Revision: https://reviews.llvm.org/D41827
llvm-svn: 330271
Summary:
A change to use divergence analysis in the AMDGPU backend was getting formal
arguments incorrect (not tagged as divergent) unless they were VGPR0, VGPR1 or
VGPR2
For graphics shaders it is possible to have more than these passed in as VGPR
Modified the checking code to check for any VGPR registers passed in as formal
arguments.
Also, some intrinsics that are sources of divergence may have been lowered
during instruction selection and are missed on subsequent calls to
isSDNodeSourceOfDivergence - added the relevant AMDGPUISD checks as well.
Finally, the FunctionLoweringInfo tracks virtual registers that are live across
basic block boundaries. This is used to check for divergence of CopyFromRegister
registers using the DivergenceAnalysis analysis. For multiple blocks the lazily
evaluated inverted map VirtReg2Value was not cleared when the ValueMap map was.
Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, tpr, t-tye, llvm-commits
Differential Revision: https://reviews.llvm.org/D45372
Change-Id: I112f3bd6dfe0f62e63ce9b43b893982778e4bee3
llvm-svn: 330257
Summary:
Previously we crashed for the combination of the two features because we
tried to reference the dwo CU from the main object file. The fix
consists of two items:
- reference the skeleton CU from the name index (the consumer is
expected to use the skeleton CU to find the real data).
- use the main object file string pool for the strings in the index
Reviewers: JDevlieghere, aprantl, dblaikie
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D45566
llvm-svn: 330249
Summary:
Add an LLVM intrinsic for type discriminated event logging with XRay.
Similar to the existing intrinsic for custom events, but also accepts
a type tag argument to allow plugins to be aware of different types
and semantically interpret logged events they know about without
choking on those they don't.
Relies on a symbol defined in compiler-rt patch D43668. I may wait
to submit before I can see demo everything working together including
a still to come clang patch.
Reviewers: dberris, pelikan, eizan, rSerge, timshen
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D45633
llvm-svn: 330219
When emitting CodeView debug information, compiler-generated thunk routines
should be emitted using S_THUNK32 symbols instead of S_GPROC32_ID symbols so
Visual Studio can properly step into the user code. This initial support only
handles standard thunk ordinals.
Differential Revision: https://reviews.llvm.org/D43838
llvm-svn: 330132
TargetSchedModel now always delegates to MCSchedModel the computation of
instruction latency and reciprocal throughput.
No functional change intended.
llvm-svn: 330099
This is a transform that I limited in instcombine in rL329821 because it was
creating more instructions in IR when the cast has multiple uses.
But if the cast is free, then we can do the transform regardless of other
uses because it improves the potential throughput of the calculation by
removing a dependency on the fneg.
Differential Revision: https://reviews.llvm.org/D45598
llvm-svn: 330098
This change is exposing UB in source code - as was warned/predicted. :)
See D44909 for discussion. Reverting while we figure out how to fix things.
llvm-svn: 329920
There are cases when individual NodeSets can be equal with respect to
the ordering criteria. Since they are stored in an ordered container,
use stable_sort to preserve the relative order of equal NodeSets.
This should remove non-determinism discovered by shuffling done in
llvm::sort with expensive checks enabled.
llvm-svn: 329915