Make the kind of cost explicit throughout the cost model which,
apart from making the cost clear, will allow the generic parts to
calculate better costs. It will also allow some backends to
approximate and correlate the different costs if they wish. Another
benefit is that it will also help simplify the cost model around
immediate and intrinsic costs, where we currently have multiple APIs.
RFC thread:
http://lists.llvm.org/pipermail/llvm-dev/2020-April/141263.html
Differential Revision: https://reviews.llvm.org/D79002
Summary:
I have fixed several places in getSplatSourceVector and isSplatValue
to work correctly with scalable vectors. I added new support for
the ISD::SPLAT_VECTOR DAG node as one of the obvious cases we can
support with scalable vectors. In other places I have tried to do
the sensible thing, such as bail out for vector types we don't yet
support or don't intend to support.
It's not possible to add IR test cases to cover these changes, since
they are currently only ever exercised on certain targets, e.g.
only X86 targets use the result of getSplatSourceVector. I've
assumed that X86 tests already exist to test these code paths for
fixed vectors. However, I have added some AArch64 unit tests that
test the specific functions I have changed.
Differential revision: https://reviews.llvm.org/D79083
Register live ranges may have had gaps that after coalescing should be
removed. This is done by adding a new segment to the range, and merging
it with neighboring segments. When doing so, do not assume that each
subrange of the register ended at the same index. If a subrange ended
earlier, adding this segment could make the live range invalid.
Instead, if the subrange is not live at the start of the segment,
extend it first.
Today symbol names generated for machine basic block sections use a
unary encoding to reduce bloat. This is essential when every basic block
in the binary is assigned a symbol however with basic block clusters
(rG05192e585ce175b55f2a26b83b4ed7882785c8e6) when we only need to
generate a few non-temporary symbols we can assign more descriptive
names making them more user friendly. With this change -
Cold cluster section for function foo is named "foo.cold"
Exception cluster section for function foo is named "foo.eh"
Other cluster sections identified by their ids are named "foo.ID"
Using this format works well with existing tools. It will demangle as
expected and works with existing symbolizers, profilers and debuggers
out of the box.
$ c++filt _Z3foov.cold
foo() [clone .cold]
$ c++filt _Z3foov.eh
foo() [clone .eh]
$c++filt _Z3foov.1234
foo() [clone 1234]
Tests for basicblock-sections are updated with some cleanup where
appropriate.
Differential Revision: https://reviews.llvm.org/D79221
Before this patch, global variables didn't have their namespace prepended in the Codeview debug symbol stream. This prevented Visual Studio from displaying them in the debugger (they appeared as 'unspecified error')
Differential Revision: https://reviews.llvm.org/D79028
We allocated a suitably aligned frame index so we know that all the values
have ABI alignment.
For MIPS this avoids using pair of lwl + lwr instructions instead of a
single lw. I found this when compiling CHERI pure capability code where
we can't use the lwl/lwr unaligned loads/stores and and were to falling
back to a byte load + shift + or sequence.
This should save a few instructions for MIPS and possibly other backends
that don't have fast unaligned loads/stores.
It also improves code generation for CodeGen/X86/pr34653.ll and
CodeGen/WebAssembly/offset.ll since they can now use aligned loads.
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D78999
The two code paths have the same goal, legalizing a load of a non-byte-sized vector by loading the "flattened" representation in memory, slicing off each single element and then building a vector out of those pieces.
The technique employed by `ExpandLoad` is slightly more convoluted and produces slightly better codegen on ARM, AMDGPU and x86 but suffers from some bugs (D78480) and is wrong for BE machines.
Differential Revision: https://reviews.llvm.org/D79096
rL368553 added SimplifyMultipleUseDemandedBits handling for ISD::TRUNCATE to SimplifyDemandedBits so we don't need to duplicate this (and it gets rid of another GetDemandedBits call which is slowly being replaced with SimplifyMultipleUseDemandedBits anyhow).
Also fix some cost tables for vXi1 types to match the costs entries for the types they will be promoted to.
Differential Revision: https://reviews.llvm.org/D79045
X86 matches several 'shift+xor' funnel shift patterns:
fold (or (srl (srl x1, 1), (xor y, 31)), (shl x0, y)) -> (fshl x0, x1, y)
fold (or (shl (shl x0, 1), (xor y, 31)), (srl x1, y)) -> (fshr x0, x1, y)
fold (or (shl (add x0, x0), (xor y, 31)), (srl x1, y)) -> (fshr x0, x1, y)
These patterns are also what we end up with the proposed expansion changes in D77301.
This patch moves these to DAGCombine's generic MatchFunnelPosNeg.
All existing X86 test cases still pass, and we just have a small codegen change in pr32282.ll.
Reviewed By: @spatel
Differential Revision: https://reviews.llvm.org/D78935
Summary:
This patch tries to ensure that we do something sensible when
generating code for the ISD::INSERT_VECTOR_ELT DAG node when operating
on scalable vectors. Previously we always returned 'undef' when
inserting an element into an out-of-bounds lane index, whereas now
we only do this for fixed length vectors. For scalable vectors it
is assumed that the backend will do the right thing in the same way
that we have to deal with variable lane indices.
In this patch I have permitted a few basic combinations for scalable
vector types where it makes sense, but in general avoided most cases
for now as they currently require the use of BUILD_VECTOR nodes.
This patch includes tests for all scalable vector types when inserting
into lane 0, but I've only included one or two vector types for other
cases such as variable lane inserts.
Differential Revision: https://reviews.llvm.org/D78992
Prior to D69446 I had done some NFC cleanup to make landing an iterative
outliner a cleaner more straight-forward patch. Since then, it seems that has
landed but I noticed some ways it could be cleaned up. Specifically:
1) doOutline was meant to be the re-runable function, but instead
runOnceOnModule was created that just calls doOutline.
2) In D69446 we discussed that the flag allowing the re-run of the
outliner should be a flag to tell how many additional times to run
the outliner again, not the total number of times. I don't think it
makes sense to introduce a flag, but print an error if the flag is
set to 0.
This is an NFCi, the i being that I get rid of the way that the
machine-outline-runs flag could be used to tell the outliner to not run
at all, and because I renamed the flag to '-machine-outliner-reruns'.
Differential Revision: https://reviews.llvm.org/D79070
Call getNegatedExpression(Cost) and check the Cost to make the code more clear.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D78347
There are several different types of cost that TTI tries to provide
explicit information for: throughput, latency, code size along with
a vague 'intersection of code-size cost and execution cost'.
The vectorizer is a keen user of RecipThroughput and there's at least
'getInstructionThroughput' and 'getArithmeticInstrCost' designed to
help with this cost. The latency cost has a single use and a single
implementation. The intersection cost appears to cover most of the
rest of the API.
getUserCost is explicitly called from within TTI when the user has
been explicit in wanting the code size (also only one use) as well
as a few passes which are concerned with a mixture of size and/or
a relative cost. In many cases these costs are closely related, such
as when multiple instructions are required, but one evident diverging
cost in this function is for div/rem.
This patch adds an argument so that the cost required is explicit,
so that we can make the important distinction when necessary.
Differential Revision: https://reviews.llvm.org/D78635
This method has been commented as deprecated for a while. Remove
it and replace all uses with the equivalent getCalledOperand().
I also made a few cleanups in here. For example, to removes use
of getElementType on a pointer when we could just use getFunctionType
from the call.
Differential Revision: https://reviews.llvm.org/D78882
The code assumed that zero-extending the integer constant to the
designated alloc size would be fine even for BE targets, but that's not
the case as that pulls in zeros from the MSB side while we actually
expect the padding zeros to go after the LSB.
I've changed the codepath handling the constant integers to use the
store size for both small(er than u64) and big constants and then add
zero padding right after that.
Differential Revision: https://reviews.llvm.org/D78011
like .cfi_restore"
Insert .cfi_offset/.cfi_register when IncomingCSRSaved of current block
is larger than OutgoingCSRSaved of its previous block.
Original commit message:
https://reviews.llvm.org/D42848 only handled CFA related cfi directives but
didn't handle CSR related cfi. The patch adds the CSR part. Basically it reuses
the framework created in D42848. For each basicblock, the patch tracks which
CSR set have been saved at its CFG predecessors's exits, and compare the CSR
set with the set at its previous basicblock's exit (The previous block is the
block laid before the current block). If the saved CSR set at its previous
basicblock's exit is larger, .cfi_restore will be inserted.
The patch also generates proper .cfi_restore in epilogue to make sure the
saved CSR set is consistent for the incoming edges of each block.
Differential Revision: https://reviews.llvm.org/D74303
Summary:
Reviewing failures identified in D78586, I was finding the identifiers
for these iterators hard to read.
Reviewers: efriedma, MaskRay, jyknight
Reviewed By: MaskRay
Subscribers: hiraditya, llvm-commits, srhines
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D78849
The tl;dr story is that this causes jumps in the emitted line
tables, even at `-O0`. We could at some point consider more fancy
solutions to preserve locations, but it doesn't seem to be worth
the effort for now.
<rdar://problem/62460788>
Differential Revision: https://reviews.llvm.org/D78947
Summary:
When generating code for the LLVM IR zeroinitialiser operation, if
the vector type is scalable we should be using SPLAT_VECTOR instead
of BUILD_VECTOR.
Differential Revision: https://reviews.llvm.org/D78636
This is a NFC patch for D77319. The idea is to hide the getNegatibleCost inside the getNegatedExpression()
to have it return null if the cost is expensive, and add some helper function for easy to use. And
rename the old getNegatedExpression to negateExpression to avoid the semantic conflict.
Reviewed By: RKSimon
Differential revision: https://reviews.llvm.org/D78291
Summary:
Instead of adding a ".unlikely" or ".eh" suffix for machine basic blocks,
this change updates the behaviour to use an appropriate prefix
instead. This allows lld to group basic block sections together
when -z,keep-text-section-prefix is specified and matches the behaviour
observed in gcc.
Reviewers: tmsriram, mtrofin, efriedma
Reviewed By: tmsriram, efriedma
Subscribers: eli.friedman, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D78742
Follow-up of D78082 and D78590.
Otherwise, because xray_instr_map is now read-only, the absolute
relocation used for Sled.Function will cause a text relocation.
A previous bug fix for varargs introduced a regression where we would
incorrectly widen some stores to memory when passing i8/i16 parameters on the
stack. This didn't show up seemingly because it only happens when there is
no signext/zeroext parameter attribute, which I think for Darwin clang adds.
Swift however seems to be a different story, and a plain anyext on the parameter
triggered the bug.
To fix this, I've added a new ValueHandler::assignValueToAddress type override
which lets us distiguish between varargs and fixed args (we still need this
widening behaviour for varargs to fix the original bug in 2018).
rdar://61353552
Summary:
Add a check to make sure that MachineInstr::mayAlias returns prematurely if at least one of its instruction parameters does not access memory. This prevents calls to TargetInstrInfo::areMemAccessesTriviallyDisjoint with incompatible instructions.
A side effect of this change is to render the mayAlias helper in the AArch64 load/store optimizer obsolete. We can now directly call the MachineInstr::mayAlias member function.
Reviewers: hfinkel, t.p.northover, mcrosier, eli.friedman, efriedma
Reviewed By: efriedma
Subscribers: efriedma, kristof.beyls, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D78823
Follow-up of D78082 (x86-64).
This change avoids dynamic relocations in `xray_instr_map` for ARM/AArch64/powerpc64le.
MIPS64 cannot use 64-bit PC-relative addresses because R_MIPS_PC64 is not defined.
Because MIPS32 shares the same code, for simplicity, we don't use PC-relative addresses for MIPS32 as well.
Tested on AArch64 Linux and ppc64le Linux.
Reviewed By: ianlevesque
Differential Revision: https://reviews.llvm.org/D78590
Summary:
Given a VL=14 that is enveloped by a proper VL=16, splitting the
masked load using the enveloping halving VL=8/8 should yields
should eventually yield V=8/5. This fixes various assert failures
in getHalfNumVectorElementsVT() and IncrementMemoryAddress().
Note, I suspect similar fixes will be needed for other masked
operations, but for now I send out a fix for masked load only.
Bugzilla issue 45563
https://bugs.llvm.org/show_bug.cgi?id=45563
Reviewers: craig.topper, mehdi_amini, nicolasvasilache
Reviewed By: craig.topper
Subscribers: hiraditya, dmgreen, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D78608
Using getValueType() is not correct for architectures extended with CHERI since
we need a pointer type and not the value that is loaded. While stack
protector is useless when you have CHERI (since CHERI provides much
stronger security guarantees), we still have a test to check that we can
generate correct code for checks. Merging b281138a1b
into our tree broke this test. Fix by using TLI.getFrameIndexTy().
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D77785