The recently announced IBM z16 processor implements the architecture
already supported as "arch14" in LLVM. This patch adds support for
"z16" as an alternate architecture name for arch14.
This patch boosts the inlining threshold for a particular type of functions
that are using an incoming argument only as a memcpy source.
Review: Ulrich Weigand
Differential Revision: https://reviews.llvm.org/D121341
This patch adds support for inline assembly address operands using the "p"
constraint on X86 and SystemZ.
This was in fact broken on X86 (see example at
https://reviews.llvm.org/D110267, Nov 23).
These operands should probably be treated the same as memory operands by
CodeGenPrepare, which have been commented with "TODO" there.
Review: Xiang Zhang and Ulrich Weigand
Differential Revision: https://reviews.llvm.org/D122220
Place PersistentId declaration under #if LLVM_ENABLE_ABI_BREAKING_CHECKS to
reduce memory usage when it is not needed.
Differential Revision: https://reviews.llvm.org/D120714
This patch extends support for generating huge stack frames on 64-bit XPLINK by implementing the ABI-mandated call to the stack extension routine.
Reviewed By: uweigand
Differential Revision: https://reviews.llvm.org/D120450
Add support for va intrinsics for the XPLINK ABI.
Only the extended vararg variant, which uses a pointer to next
argument, is supported. The standard variant will build on this.
Reviewed By: uweigand
Differential Revision: https://reviews.llvm.org/D120148
The tablegen lines that specify the XPLINK64 calling convention for promoting an f32 vararg to an f64 are effectively overwritten by the following tablegen line which bitcast an f64 vararg to an i64 (so that it can be used in the GPRs). It becomes a bitcast from f32 to i64.
Since we don't handle a bitcast for f32s this caused an assertion.
With XPLINK, dynamic stack allocations requires calling
a runtime function, which allocates the stack memory,
moves the register save area, and returns the new
stack pointer.
Reviewed By: uweigand
Differential Revision: https://reviews.llvm.org/D119732
With XPLINK, a no-op with information about the call type is emitted
after each call instruction. Centralizing it has the advantage that it is
easy to document all cases, and it makes it easier to extend it later
(e.g. dynamic stack allocation, 32 bit mode).
Also add a test checking the call types emitted so far.
Reviewed By: uweigand
Differential Revision: https://reviews.llvm.org/D119557
Return false from ShouldShrinkFPConstant(), so that these constants are stored
in their full size on the constant pool, even if they could have been shrunk
and used with an extending load.
This is better since LD is faster than LDE, and it also enables reg/mem opcodes.
Review: Ulrich Weigand
Differential Revision: https://reviews.llvm.org/D117927
By reordering the objects on the stack frame after looking at the users, a
better utilization of displacement operands will result. This means less
needed Load Address instructions for the accessing of these objects.
This is important for very large functions where otherwise small changes
could cause a lot more/less accesses go out of range.
Note: this is not yet enabled for SystemZXPLINKFrameLowering, but should be.
Review: Ulrich Weigand
Differential Revision: https://reviews.llvm.org/D115690
In D115311, we're looking to modify clang to emit i constraints rather
than X constraints for callbr's indirect destinations. Prior to doing
so, update all of the existing tests in llvm/ to match.
Reviewed By: void, jyknight
Differential Revision: https://reviews.llvm.org/D115410
This patch adds support for prologue and epilogue generation for the z/OS target under the XPLINK64 ABI for functions with a stack size of less than 1048576 bytes (huge stack frames).
Reviewed By: uweigand
Differential Revision: https://reviews.llvm.org/D114457
This patch adds support for prologue and epilogue generation for
the z/OS target under the XPLINK64 ABI for functions with a stack
size of less than 1048576 bytes (huge stack frames).
Reviewed by: uweigand, Kai
Differential Revision: https://reviews.llvm.org/D114457
Memset with a constant length was implemented with a single store followed by
a series of MVC:s. This patch changes this so that one store of the byte is
emitted for each MVC, which avoids data dependencies between the MVCs. An
MVI/STC + MVC(len-1) is done for each block.
In addition, memset with a variable length is now also handled without a
libcall. Since the byte is first stored and then MVC is used from that
address, a length of two must now be subtracted instead of one for the loop
and EXRL. This requires an extra check for the one-byte case, which is
handled in a special block with just a single MVI/STC (like GCC).
Review: Ulrich Weigand
Differential Revision: https://reviews.llvm.org/D112004
Currently we create register mappings for registers used only once in current
MBB. For registers with multiple uses, when all the uses are in the current MBB,
we can also create mappings for them similarly according to the last use.
For example
%reg101 = ...
= ... reg101
%reg103 = ADD %reg101, %reg102
We can create mapping between %reg101 and %reg103.
Differential Revision: https://reviews.llvm.org/D113193
It was discovered that an extra register COPY remained when expanding a
(variable length) memory operation with a loop and there was another use of
the involved address register(s) afterwards.
A simple fix for this is to COPY the address registers before the loop and
use that new vreg instead.
Review: Ulrich Weigand
Differential Revision: https://reviews.llvm.org/D112065
All instructions must have a correct size value close to emission when
SystemZLongBranch runs, or a necessary branch relaxation may be missed.
This patch also adds an assert for instruction sizes in SystemZLongBranch.
Review: Ulrich Weigand
- This patch provides the initial implementation for lowering a call on z/OS according to the XPLINK64 calling convention
- A series of changes have been made to SystemZCallingConv.td to account for these additional XPLINK64 changes including adding a new helper function to shadow the stack along with allocation of a register wherever appropriate
- For the cases of copying a f64 to a gr64 and a f128 / 128-bit vector type to a gr64, a `CCBitConvertToType` has been added and has been bitcasted appropriately in the lowering phase
- Support for the ADA register (R5) will be provided in a later patch.
Reviewed By: uweigand
Differential Revision: https://reviews.llvm.org/D111662
This patch fixes the bug that consisted of treating variable / immediate
length mem operations (such as memcpy, memset, ...) differently. The variable
length case needs to have the length minus 1 passed due to the use of EXRL
target instructions. However, the DAGCombiner can convert a register length
argument into a constant one, and whenever that happened one byte too little
would end up being performed.
This is also a refactorization by reducing the number of opcodes and variants
involved. For any opcode (variable or constant length), only the length minus
one is passed on to the ISD node. The rest of the logic is now instead
handled during isel pseudo expansion.
Review: Ulrich Weigand
Differential Revision: https://reviews.llvm.org/D111729
This PR implements the save of the XPLINK callee-saved registers
on z/OS.
Reviewed By: uweigand, Kai
Differential Revision: https://reviews.llvm.org/D111653
This patch contains following enhancements to SrcRegMap and DstRegMap:
1 In findOnlyInterestingUse not only check if the Reg is two address usage,
but also check after commutation can it be two address usage.
2 If a physical register is clobbered, remove SrcRegMap entries that are
mapped to it.
3 In processTiedPairs, when create a new COPY instruction, add a SrcRegMap
entry only when the COPY instruction is coalescable. (The COPY src is
killed)
With these enhancements isProfitableToCommute can do better commute decision,
and finally more register copies are removed.
Differential Revision: https://reviews.llvm.org/D108731
Fix the calculation of ReplacedAllUntiedUses when any of the tied defs
are early-clobber. The effect of this is to fix the placement of kill
flags on an instruction like this (from @f2 in
test/CodeGen/SystemZ/asm-18.ll):
INLINEASM &"stepb $1, $2" [attdialect], $0:[regdef-ec:GRH32Bit], def early-clobber %3:grh32bit, $1:[reguse tiedto:$0], killed %4:grh32bit(tied-def 3), $2:[reguse:GRH32Bit], %4:grh32bit
After TwoAddressInstruction without this patch:
%3:grh32bit = COPY killed %4:grh32bit
INLINEASM &"stepb $1, $2" [attdialect], $0:[regdef-ec:GRH32Bit], def early-clobber %3:grh32bit, $1:[reguse tiedto:$0], %3:grh32bit(tied-def 3), $2:[reguse:GRH32Bit], %4:grh32bit
Note that the COPY kills %4, even though there is a later use of %4 in
the INLINEASM. This fails machine verification if you force it to run
after TwoAddressInstruction (currently it is disabled for other
reasons).
After TwoAddressInstruction with this patch:
%3:grh32bit = COPY %4:grh32bit
INLINEASM &"stepb $1, $2" [attdialect], $0:[regdef-ec:GRH32Bit], def early-clobber %3:grh32bit, $1:[reguse tiedto:$0], %3:grh32bit(tied-def 3), $2:[reguse:GRH32Bit], %4:grh32bit
Differential Revision: https://reviews.llvm.org/D110848
Seem to cause test failures in compiler-rt.
Revert "[SystemZ] Implement memcmp of variable length with CLC."
This reverts commit 7a4e9a0c73.
Revert "[SystemZ] Implement memcpy of variable length with MVC."
This reverts commit c6c13c58ee.
Following the same pattern of memset/memcpy, this patch implements a variable
length memcmp with a CLC loop followed by an EXRL instruction.
Review: Ulrich Weigand
Differential Revision: https://reviews.llvm.org/D107380
Instead of making a memcpy libcall, emit an MVC loop and an EXRL instruction
the same way as is already done for memset 0.
Review: Ulrich Weigand
Differential Revision: https://reviews.llvm.org/D106874
In TwoAddressInstructionPass::processTiedPairs, update subranges of the
live interval for RegB as well as the main range.
This is a small step towards switching TwoAddressInstructionPass over
from LiveVariables to LiveIntervals. Currently this path is only tested
if you explicitly enable -early-live-intervals.
Differential Revision: https://reviews.llvm.org/D110526
The type legalizer has by default no method of doing this bitcast other than
storing and reloading the value from stack.
This patch implements a custom lowering of this operation using extractions
of subregs (z13 and earlier using FP128 register pairs), or of vector
elements (with 'vector enhancements 1' using VR128 FP registers).
Review: Ulrich Weigand
Differential Revision: https://reviews.llvm.org/D110346
SystemZ adds the EXRL target instructions in the end of each file. This must
be done before debug info emission since that may end the text section, and
therefore this is now done in emitConstantPools() (instead of in
emitEndOfAsmFile).
Review: Ulrich Weigand
Differential Revision: https://reviews.llvm.org/D109513
- This patch consists of the bare basic code needed in order to generate some assembly for the z/OS target.
- Only the .text and the .bss sections are added for now.
- The relevant MCSectionGOFF/Symbol interfaces have been added. This enables us to print out the GOFF machine code sections.
- This patch enables us to add simple lit tests wherever possible, and contribute to the testing coverage for the z/OS target
- Further improvements and additions will be made in future patches.
Reviewed By: tmatheson
Differential Revision: https://reviews.llvm.org/D106380
This patch adds support for the next-generation arch14
CPU architecture to the SystemZ backend.
This includes:
- Basic support for the new processor and its features.
- Detection of arch14 as host processor.
- Assembler/disassembler support for new instructions.
- New LLVM intrinsics for certain new instructions.
- Support for low-level builtins mapped to new LLVM intrinsics.
- New high-level intrinsics in vecintrin.h.
- Indicate support by defining __VEC__ == 10304.
Note: No currently available Z system supports the arch14
architecture. Once new systems become available, the
official system name will be added as supported -march name.
Don't use a local MachineOperand copy in SystemZAsmPrinter::PrintAsmOperand()
and change the register as it may break the MRI tracking of register
uses. Use an MCOperand instead.
Review: Ulrich Weigand
Differential Revision: https://reviews.llvm.org/D105757
We already have reassociation code for Adds and Ors separately in DAG
combiner, this adds it for the combination of the two where Ors act like
Adds. It reassociates (add (or (x, c), y) -> (add (add (x, y), c)) where
we know that the Ors operands have no common bits set, and the Or has
one use.
Differential Revision: https://reviews.llvm.org/D104765
This will use the python that LLVM was configured to use rather than
python from PATH.
Reviewed By: serge-sans-paille
Differential Revision: https://reviews.llvm.org/D105224
The odd register of a (128 bit) register pair is accessed with the 'N' code
with an inline assembly operand.
Review: Ulrich Weigand
Differential Revision: https://reviews.llvm.org/D105502