Add support for MIR serialization of PowerPC-specific operand target flags
(based on the generic infrastructure added in r244185 and r245383).
I won't even pretend that this is good test coverage, but this includes the
regression test associated with r246372. Adding an MIR test for that fix is far
superior to adding an IR-level test because particular instruction-scheduling
decisions are necessary in order to expose the bug, and using an MIR test we
can start the pipeline post-scheduling.
llvm-svn: 246373
This is especially visible in softfp mode, for example in the implementation of libm fabs/fneg functions. If we have:
%1 = vmovdrr r0, r1
%2 = fabs %1
then move the fabs before the vmovdrr:
%1 = and r1, #0x7FFFFFFF
%2 = vmovdrr r0, r1
This is never a lose, and could be a serious win because the vmovdrr may be followed by a vmovrrd, which would enable us to remove the conversion into FPRs completely.
We already do this for f32, but not for f64. Tests are added for both.
llvm-svn: 246360
The VOP3 encoding of these allows any SGPR pair for the i1
output, but this was forced before to always use vcc.
This doesn't yet try to use this, but does add the operand
to the definitions so the main change is adding vcc to the
output of the VOP2 encoding.
llvm-svn: 246358
Without a memory operand, mayLoad or mayStore instructions
are treated as hasUnorderedMemRef, which results in much worse
scheduling.
We really should have a verifier check that any
non-side effecting mayLoad or mayStore has a memory operand.
There are a few instructions (interp and images) which I'm
not sure what / where to add these.
llvm-svn: 246356
I'm working on adding !dbg attachments to functions (PR23367), which
we'll use to determine the canonical subprogram for a function (instead
of the `subprograms:` array in the compile units). This updates a few
old tests in preparation.
Transforms/Mem2Reg/ConvertDebugInfo2.ll had an old-style grep+count
based test that would start to fail because I've added an extra line
with `!dbg`. Instead, explicitly `CHECK` for what I think the test
actually cares about.
All three testcases have subprograms with a valid `function:` reference
-- which means my upgrade script will add a `!dbg` attachment -- but
that aren't referenced from any compile unit. I suspect these testcases
were handreduced over-zealously (or have bitrotted?). Add a reference
from the compile unit so that upcoming Verifier checks won't fail here.
llvm-svn: 246351
As a follow-up to r246098, require `DISubprogram` definitions
(`isDefinition: true`) to be 'distinct'. Specifically, add an assembler
check, a verifier check, and bitcode upgrading logic to combat testcase
bitrot after the `DIBuilder` change.
While working on the testcases, I realized that
test/Linker/subprogram-linkonce-weak-odr.ll isn't relevant anymore. Its
purpose was to check for a corner case in PR22792 where two subprogram
definitions match exactly and share the same metadata node. The new
verifier check, requiring that subprogram definitions are 'distinct',
precludes that possibility.
I updated almost all the IR with the following script:
git grep -l -E -e '= !DISubprogram\(.* isDefinition: true' |
grep -v test/Bitcode |
xargs sed -i '' -e 's/= \(!DISubprogram(.*, isDefinition: true\)/= distinct \1/'
Likely some variant of would work for out-of-tree testcases.
llvm-svn: 246327
When combiner AA is enabled, look at stores on the same chain.
Non-aliasing stores are moved to the same chain so the existing
code fails because it expects to find an adajcent store on a consecutive
chain.
Because of how DAGCombiner tries these store combines,
MergeConsecutiveStores doesn't see the correct set of stores on the chain
when it visits the other stores. Each store individually has its chain
fixed before trying to merge consecutive stores, and then tries to merge
stores from that point before the other stores have been processed to
have their chains fixed. To fix this, attempt to use FindBetterChain
on any possibly neighboring stores in visitSTORE.
Suppose you have 4 32-bit stores that should be merged into 1 vector
store. One store would be visited first, fixing the chain. What happens is
because not all of the store chains have yet been fixed, 2 of the stores
are merged. The other 2 stores later have their chains fixed,
but because the other stores were already merged, they have different
memory types and merging the two different sized stores is not
supported and would be more difficult to handle.
llvm-svn: 246307
Summary:
Change the coloring algorithm in WinEHPrepare to visit a funclet's exits
in its parents' contexts and so properly classify the continuations of
nested funclets.
Also change the placement of cloned blocks to be deterministic and to
maintain the relative order of each funclet's blocks.
Add a lit test showing various patterns that require cloning, the last
several of which don't have CHECKs yet because they require cloning
entire funclets which is NYI.
Reviewers: rnk, andrew.w.kaylor, majnemer
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D12353
llvm-svn: 246245
more than 2 instructions.
I introduced this regression a while back and did not noticed it because I
somehow forgot to push the initial test cases for the pass!
Fix that as well!
llvm-svn: 246239
We can now run 32-bit programs with empty catch bodies. The next step
is to change PEI so that we get funclet prologues and epilogues.
llvm-svn: 246235
Fixes PR24602: r245689 introduced an unguarded use of
SelectionDAG::FoldConstantArithmetic, which returns 0 when it fails
because of opaque (hoisted) constants.
llvm-svn: 246217
This is a one-line-change patch that moves the update to UnhandledWeights to the correct position: it should be updated for all clusters instead of just range clusters.
Differential Revision: http://reviews.llvm.org/D12391
llvm-svn: 246129
Summary:
Let NVPTX backend detect integer min and max patterns during isel and emit intrinsics that enable hardware support.
Reviewers: jholewinski, meheff, jingyue
Subscribers: arsenm, llvm-commits, meheff, jingyue, eliben, jholewinski
Differential Revision: http://reviews.llvm.org/D12377
llvm-svn: 246107
Currently, when lowering switch statement and a new basic block is built for jump table / bit test header, the edge to this new block is not assigned with a correct weight. This patch collects the edge weight from all its successors and assign this sum of weights to the edge (and also the other fall-through edge). Test cases are adjusted accordingly.
Differential Revision: http://reviews.llvm.org/D12166#fae6eca7
llvm-svn: 246104
Things of note:
- Other linkage types aren't handled yet. We'll figure it out with dynamic linking.
- Special LLVM globals are either ignored, or error out for now.
- TLS isn't supported yet (WebAssembly will have threads later).
- There currently isn't a syntax for alignment, I left it in a comment so it's easy to hook up.
- Undef is convereted to whatever the type's appropriate null value is.
- assert versus report_fatal_error: follow what other AsmPrinters do, and assert only on what should have been caught elsewhere.
llvm-svn: 246092
This was causing problems when some functions use a GuardReg and some
don't as can happen when mixing SelectionDAG and FastISel generated
functions.
llvm-svn: 246075
If you're going to realign %sp to get object alignment properly (which
the code does), and stack offsets and alignments are calculated going
down from %fp (which they are), then the total stack size had better
be a multiple of the alignment. LLVM did indeed ensure that.
And then, after aligning, the sparc frame code added 96 (for sparcv8)
to the frame size, making any requested alignment of 64-bytes or
higher *guaranteed* to be misaligned. The test case added with r245668
even tests this exact scenario, and asserted the incorrect behavior,
which I somehow failed to notice. D'oh.
This change fixes the frame lowering code to align the stack size
*after* adding the spill area, instead.
Differential Revision: http://reviews.llvm.org/D12349
llvm-svn: 246042
Summary:
This change makes the variable argument intrinsics, `llvm.va_start` and
`llvm.va_copy`, and the `va_arg` instruction behave as they do on Windows
inside a `CallingConv::X86_64_Win64` function. It's needed for a Clang patch
I have to add support for GCC's `__builtin_ms_va_list` constructs.
Reviewers: nadav, asl, eugenis
CC: llvm-commits
Differential Revision: http://llvm-reviews.chandlerc.com/D1622
llvm-svn: 245990
When lowering switch statement, if bit tests are used then LLVM will always generates a jump to the default statement in the last bit test. However, this is not necessary when all cases in bit tests cover a contiguous range. This is because when generating the bit tests header MBB, there is a range check that guarantees cases in bit tests won't go outside of [low, high], where low and high are minimum and maximum case values in the bit tests. This patch checks if this is the case and then doesn't emit jump to default statement and hence saves a bit test and a branch.
Differential Revision: http://reviews.llvm.org/D12249
llvm-svn: 245976
This is a follow-on from the discussion in http://reviews.llvm.org/D12154.
This change allows memset/memcpy to use SSE or AVX memory accesses for any chip that has
generally fast unaligned memory ops.
A motivating use case for this change is a clang invocation that doesn't explicitly set
the CPU, but does target a feature that we know only exists on a CPU that supports fast
unaligned memops. For example:
$ clang -O1 foo.c -mavx
This resolves a difference in lowering noted in PR24449:
https://llvm.org/bugs/show_bug.cgi?id=24449
Before this patch, we used different store types depending on whether the example can be
lowered as a memset or not.
Differential Revision: http://reviews.llvm.org/D12288
llvm-svn: 245950
This fixes two issues in x86 fptoui lowering.
1) Makes conversions from f80 go through the right path on AVX-512.
2) Implements an inline sequence for fptoui i64 instead of a library
call. This improves performance by 6X on SSE3+ and 3X otherwise.
Incidentally, it also removes the use of ftol2 for fptoui, which was
wrong to begin with, as ftol2 converts to a signed i64, producing
wrong results for values >= 2^63.
Patch by: mitch.l.bodart@intel.com
Differential Revision: http://reviews.llvm.org/D11316
llvm-svn: 245924
We might end up with a trivial copy as the addend, and if so, we should ignore
the corresponding FMA instruction. The trivial copy can be coalesced away later,
so there's nothing to do here. We should not, however, assert. Fixes PR24544.
llvm-svn: 245907
Summary: I forgot to squash git commits before doing an svn dcommit of D12219. Reverting, and re-submitting.
Subscribers: jfb, llvm-commits
Differential Revision: http://reviews.llvm.org/D12298
llvm-svn: 245886
This patch fixes PR24546, which demonstrates a segfault during the VSX
swap removal pass. The problem is that debug value instructions were
not excluded from the list of instructions to be analyzed for webs of
related computation. I've added the test case from the PR as a crash
test in test/CodeGen/PowerPC.
llvm-svn: 245862
The FP16_TO_FP node only uses the bottom 16 bits of its input, so the
following pattern can be optimised by removing the AND:
(FP16_TO_FP (AND op, 0xffff)) -> (FP16_TO_FP op)
This is a common pattern for ARM targets when functions have __fp16
arguments, as they are passed as floats (so that they get passed in the
correct registers), but then bitcast and truncated to ignore the top 16
bits.
llvm-svn: 245832