Commit Graph

28987 Commits

Author SHA1 Message Date
Matt Arsenault
7cd8a0256d GlobalISel: Legalize G_FPOWI 2020-07-21 18:13:04 -04:00
Matt Arsenault
7941dc5041 GlobalISel: Translate llvm.powi intrinsic
There are a few questionable things about this intrinsic and existing
DAG implementation. For some reason the intrinsic hardcodes the second
operand to be scalar-only i32, and SelectionDAG builder makes a
legalization decision based on whether the operand is constant.
2020-07-21 18:13:04 -04:00
Matt Arsenault
f659c44016 CodeGen: Add support for lowering byref attribute 2020-07-21 17:38:15 -04:00
Matt Arsenault
2fe0ea8261 DAG: Handle expanding strict_fsub into fneg and strict_fadd
The AMDGPU handling of f16 vectors is terrible still since it gets
scalarized even when the vector operation is legal.

The code is is essentially duplicated between the non-strict and
strict case. Apparently no other expansions are currently trying to do
this. This is mostly because I found the behavior of
getStrictFPOperationAction to be confusing. In the ARM case, it would
expand strict_fsub even though it shouldn't due to the later check. At
that point, the logic required to check for legality was more complex
than just duplicating the 2 instruction expansion.
2020-07-21 16:17:10 -04:00
Guozhi Wei
28759e9fcc [MBP] Use profile count to compute tail dup cost if it is available
Current tail duplication in machine block placement pass uses block frequency
information in cost model. But frequency number has only relative meaning
compared to other basic blocks in the same function. A large frequency number
doesn't mean it is hot and a small frequency number doesn't mean it is cold.

To overcome this problem, this patch uses profile count in cost model if it's
available. So we can tail duplicate real hot basic blocks.

Differential Revision: https://reviews.llvm.org/D83265
2020-07-21 11:18:06 -07:00
David Blaikie
38fbba4cb8 DebugInfo: Move getMD5AsBytes from DwarfUnit to DwarfDebug
It wasn't using any state from DwarfUnit anyway.
2020-07-20 19:21:39 -07:00
Matt Arsenault
1ef3ed0eb4 GlobalISel: Rewrite getLCMType
Try to make the behavior more consistent with getGCDType, and bias
towards returning something closer to the source type whenever there's
an ambiguity.
2020-07-20 21:06:30 -04:00
Matt Arsenault
12d5bec8c7 GlobalISel: Handle more cases in getGCDType
Try harder to find a canonical unmerge type when trying to cover the
desired target type. Handle finding a compatible unmerge type for two
vectors with different element types. This will return the largest
multiple of the source vector element that will evenly divide the
target vector type.

Also make the handling mixing scalars and vectors, and prefer the
source element type as the unmerge target type.
2020-07-20 20:53:35 -04:00
Eli Friedman
b8f765a1e1 [AArch64][SVE] Add support for trunc to <vscale x N x i1>.
This isn't a natively supported operation, so convert it to a
mask+compare.

In addition to the operation itself, fix up some surrounding stuff to
make the testcase work: we need concat_vectors on i1 vectors, we need
legalization of i1 vector truncates, and we need to fix up all the
relevant uses of getVectorNumElements().

Differential Revision: https://reviews.llvm.org/D83811
2020-07-20 13:11:02 -07:00
Yuanfang Chen
efcb8a1903 [NFC] remove unneeded TargetLoweringObjectFile init after 85c30f3374 2020-07-20 10:43:28 -07:00
Yuanfang Chen
589c646a7e [llc] (almost) remove --print-machineinstrs
Its effect could be achieved by
`-stop-after`,`-print-after`,`-print-after-all`. But a few tests need to
print MIR after ISel which could not be done with
`-print-after`/`-stop-after` since isel pass does not have commandline name.
That's the reason `--print-machineinstrs` is downgraded to
`--print-after-isel` in this patch. `--print-after-isel` could be
removed after we switch to new pass manager since isel pass would have a
commandline text name to use `print-after` or equivalent switches.

The motivation of this patch is to reduce tests dependency on
would-be-deprecated feature.

Reviewed By: arsenm, dsanders

Differential Revision: https://reviews.llvm.org/D83275
2020-07-20 10:43:28 -07:00
Alok Kumar Sharma
2d10258a31 [DebugInfo] Support for DW_AT_associated and DW_AT_allocated.
Summary:
This support is needed for the Fortran array variables with pointer/allocatable
attribute. This support enables debugger to identify the status of variable
whether that is currently allocated/associated.

  for pointer array (before allocation/association)
  without DW_AT_associated

(gdb) pt ptr
type = integer (140737345375288:140737354129776)
(gdb) p ptr
value requires 35017956 bytes, which is more than max-value-size

  with DW_AT_associated

(gdb) pt ptr
type = integer (:)
(gdb) p ptr
$1 = <not associated>

  for allocatable array (before allocation)

  without DW_AT_allocated

(gdb) pt arr
type = integer (140737345375288:140737354129776)
(gdb) p arr
value requires 35017956 bytes, which is more than max-value-size

  with DW_AT_allocated

(gdb) pt arr
type = integer, allocatable (:)
(gdb) p arr
$1 = <not allocated>

    Testing
- unit test cases added
- check-llvm
- check-debuginfo

Reviewed By: aprantl

Differential Revision: https://reviews.llvm.org/D83544
2020-07-20 19:54:35 +05:30
Petar Avramovic
6a1030aa0e AMDGPU/GlobalISel: Legalize s16->s64 G_FPEXT
Legalize using narrowScalar as s16->s32 G_FPEXT
followed by s32->s64 G_FPEXT.

Differential Revision: https://reviews.llvm.org/D84030
2020-07-20 16:12:19 +02:00
Matt Arsenault
5cbd4e415e GlobalISel: Don't handle widenScalar for vector G_INSERT
This handling didn't make any sense for vectors.
2020-07-20 10:06:18 -04:00
Matt Arsenault
a679f27e98 GlobalISel: Consistently get TII from MIRBuilder 2020-07-20 10:06:18 -04:00
Petar Avramovic
ba938f6388 AMDGPU/GlobalISel: Legalize s16->s64 G_FPTOSI/G_FPTOUI
Add narrowScalarFor action.
Add narrow scalar for typeIndex == 0 for G_FPTOSI/G_FPTOUI.
Legalize using narrowScalarFor as s16->s32 G_FPTOSI/G_FPTOUI
followed by s32->s64 G_SEXT/G_ZEXT.

Differential Revision: https://reviews.llvm.org/D84010
2020-07-20 11:06:11 +02:00
Evgeny Leviant
24089928be [CodeGen][TargetPassConfig] Add TargetTransformInfo pass correctly
Patch adds tti pass directly enforcing its execution with correctly set
TargetTransformInfo.

Differential revision: https://reviews.llvm.org/D84047
2020-07-18 14:11:40 +03:00
Aditya Nandakumar
63c081e73d [GISel: Add support for CSEing SrcOps which are immediates
https://reviews.llvm.org/D84072

Add G_EXTRACT to CSEConfigFull and add unit test as well.
2020-07-17 16:04:24 -07:00
Sam Tebbs
6c348e4067 [HWLoops] Stop converting to a while loop when it would be unsafe to
There were cases where a do-while loop would be converted to a while
loop before finding out that it would be unsafe to expand the SCEV in
this situation and then bailing out of hardware loop conversion.

This patch checks if it would be unsafe to expand the SCEV and if so stops converting the do-while into a while, allowing conversion to a hardware loop.

Differential Revision: https://reviews.llvm.org/D83953
2020-07-17 11:47:08 +01:00
Jay Foad
62fd7f767c [MachineScheduler] Fix the TopDepth/BotHeightReduce latency heuristics
tryLatency compares two sched candidates. For the top zone it prefers
the one with lesser depth, but only if that depth is greater than the
total latency of the instructions we've already scheduled -- otherwise
its latency would be hidden and there would be no stall.

Unfortunately it only tests the depth of one of the candidates. This can
lead to situations where the TopDepthReduce heuristic does not kick in,
but a lower priority heuristic chooses the other candidate, whose depth
*is* greater than the already scheduled latency, which causes a stall.

The fix is to apply the heuristic if the depth of *either* candidate is
greater than the already scheduled latency.

All this also applies to the BotHeightReduce heuristic in the bottom
zone.

Differential Revision: https://reviews.llvm.org/D72392
2020-07-17 11:02:13 +01:00
Florian Hahn
e297006d6f [ScheduleDAG] Move DBG_VALUEs after first term forward.
MBBs are not allowed to have non-terminator instructions after the first
terminator. Currently in some cases (see the modified test),
EmitSchedule can add DBG_VALUEs after the last terminator, for example
when referring a debug value that gets folded into a TCRETURN
instruction on ARM.

This patch updates EmitSchedule to move inserted DBG_VALUEs just before
the first terminator. I am not sure if there are terminators produce
values that can in turn be used by a DBG_VALUE. In that case, moving the
DBG_VALUE might result in referencing an undefined register. But in any
case, it seems like currently there is no way to insert a proper DBG_VALUEs
for such registers anyways.

Alternatively it might make sense to just remove those extra DBG_VALUES.

I am not too familiar with the details of debug info in the backend and
would appreciate any suggestions on how to address the issue in the best
possible way.

Reviewers: vsk, aprantl, jpaquette, efriedma, paquette

Reviewed By: aprantl

Differential Revision: https://reviews.llvm.org/D83561
2020-07-17 10:27:43 +01:00
Igor Kudrin
f76a0cd97a [DebugInfo] Fix a misleading usage of DWARF forms with DIEExpr. NFCI.
For now, DIEExpr is used only in two places:

 1) in the debug info library unit test suite to emit
    a DW_AT_str_offsets_base attribute with the DW_FORM_sec_offset
    form, see dwarfgen::DIE::addStrOffsetsBaseAttribute();

 2) in DwarfCompileUnit::addLocationAttribute() to generate the location
    attribute for a TLS variable.

The later case used an incorrect DWARF form of DW_FORM_udata, which
implies storing an uleb128 value, not a 4/8 byte constant. The generated
result was as expected because DIEExpr::SizeOf() did not handle the used
form, but returned the size of the code pointer by default.

The patch fixes the issue by using more appropriate DWARF forms for
the problematic case and making DIEExpr::SizeOf() more straightforward.

Differential Revision: https://reviews.llvm.org/D83958
2020-07-17 13:49:27 +07:00
Denis Antrushin
e04fe9aefd [Statepoint] Fix bug found by sanitaizer.
Statepoint has no static operands, so it cannot be verified
against MCInstrDescr. Revert NumDefs change introduced by ef658ebd62.
2020-07-16 23:06:53 +03:00
Nadav Rotem
a394aa1b97 [LiveVariables] Replace std::vector with SmallVector.
Replace std::vector with SmallVector to reduce the number of mallocs.
This method is frequently executed, and the number of elements in the
vector is typically small.

https://reviews.llvm.org/D83920
2020-07-16 11:39:54 -07:00
Matt Arsenault
9d3e56e2ee DAG: Try scalarizing when expanding saturating add/sub
In an upcoming AMDGPU patch, the scalar cases will be legal and vector
ops should be scalarized, rather than producing a long sequence of
vector ops which will also need to be scalarized.

Use a lazy heuristic that seems to work and improves the thumb2 MVE
test.
2020-07-16 14:05:16 -04:00
Denis Antrushin
ef658ebd62 MIR Statepoint refactoring. Part 1: Basic MI level changes.
Basic support for variadic-def MIR Statepoint:
- Change TableGen STATEPOINT description to variadic out list
  (For self-documentation purpose; by itself it does not affect
  code generation in any way).
- Update StatepointOpers helper class to handle variadic defs.
- Update MachineVerifier to properly handle them, too.

With this change, new Statepoint instruction can be passed through
backend (excluding ISEL) without errors.

Full change set is available at D81603.
Reviewed By: reames
Differential Revision: https://reviews.llvm.org/D81645
2020-07-17 00:57:21 +07:00
Matt Arsenault
023883a834 IR: Rename Argument::hasPassPointeeByValueAttr to prepare for byref
When the byref attribute is added, there will need to be two similar
functions for the existing cases which have an associate value copy,
and byref which does not. Most, but not all of the existing uses will
use the existing version.

The associated size function added by D82679 also needs to
contextually differ, and will help eliminate a few places still
relying on pointee element types.
2020-07-16 13:50:49 -04:00
Petar Avramovic
6850033ca6 AMDGPU/GlobalISel: Legalize s64->s16 G_SITOFP/G_UITOFP
Add widenScalar for TypeIdx == 0 for G_SITOFP/G_UITOFP.
Legailize, using widenScalar, as s64->s32 G_SITOFP/G_UITOFP
followed by s32->s16 G_FPTRUNC.

Differential Revision: https://reviews.llvm.org/D83880
2020-07-16 16:31:57 +02:00
James Y Knight
60433c63ac Remove TwoAddressInstructionPass::sink3AddrInstruction.
This function has a bug which will incorrectly reschedule instructions
after an INLINEASM_BR (which can branch). (The bug may also allow
scheduling past a throwing-CALL, I'm not certain.)

I could fix that bug, but, as the removed FIXME notes, it's better to
attempt rescheduling before converting to 3-addr form, as that may
remove the need to convert in the first place. In fact, the code to do
such reordering was added to this pass only a few months later, in
2011, via the addition of the function rescheduleMIBelowKill. That
code does not contain the same bug.

The removal of the sink3AddrInstruction function is not a no-op: in
some cases it would move an instruction post-conversion, when
rescheduleMIBelowKill would not move the instruction pre-converison.
However, this does not appear to be important: the machine instruction
scheduler can reorder the after-conversion instructions, in any case.

This patch fixes a kernel panic 4.4 LTS x86_64 Linux kernels, when
built with clang after 4b0aa5724f.

Link: https://github.com/ClangBuiltLinux/linux/issues/1085

Differential Revision: https://reviews.llvm.org/D83708
2020-07-16 10:02:52 -04:00
Kerry McLaughlin
2762da0a16 [SVE][CodeGen] Legalisation of masked loads and stores
Summary:
This patch modifies IncrementMemoryAddress to use a vscale
when calculating the new address if the data type is scalable.

Also adds tablegen patterns which match an extract_subvector
of a legal predicate type with zip1/zip2 instructions

Reviewers: sdesmalen, efriedma, david-arm

Reviewed By: efriedma, david-arm

Subscribers: tschuett, hiraditya, psnobl, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D83137
2020-07-16 10:55:45 +01:00
Quentin Colombet
294be6b5d3 [CalcSpillWeights] Propagate the fact that a live-interval is not spillable
When we calculate the weight of a live-interval, add some code to
check if the original live-interval was markied as not spillable and
if so, progagate that information down to the new interval.

Previously we would just recompute a weight for the new interval,
thus, we could in theory just spill live-intervals marked as not
spillable by just splitting them. That goes against the spirit of
a non-spillable live-interval.

E.g., previously we could do:
v1 =  // v1 must not be spilled
...
= v1

Split:
v1 = // v1 must not be spilled
...
v2 = v1 // v2 can be spilled
...
v3 = v2 // v3 can be spilled
= v3

There's no test case for that one as we would need to split a
non-spillable live-interval without using LiveRangeEdit to see this
happening.
RegAlloc inserts non-spillable intervals only as part of the spilling
mechanism, thus at this point the intervals are not splittable anymore.
On top of that, RegAlloc uses the LiveRangeEdit API, which already
properly propagate that information.

In other words, this could only happen if a target was to mark
a live-interval as not spillable before register allocation and
split it without using LRE, e.g., through
LiveIntervals::splitSeparateComponent.
2020-07-15 17:57:36 -07:00
Hiroshi Yamauchi
f233b92f92 [PGO][PGSO] Add profile guided size optimization to LegalizeDAG.
Reviewers: davidxl

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D83333
2020-07-15 10:03:38 -07:00
Cameron McInally
ae51a70030 [Legalize] Hoist invariant condition in ExpandVectorBuildThroughStack(...)
The operands of a BUILD_VECTOR must all have the same type, so we can hoist this invariant condition out of the loop.

Differential Revision: https://reviews.llvm.org/D83882
2020-07-15 11:05:20 -05:00
Tim Northover
37b96d51d0 CodeGenPrep: remove AssertingVH references before deleting dead instructions.
CodeGenPrepare keeps fairly close track of various instructions it's
seen, particularly GEPs, in maps and vectors. However, sometimes those
instructions become dead and get removed while it's still executing.
This triggers AssertingVH references to them in an asserts build and
could lead to miscompiles in a release build (I've only seen a later
segfault though).

So this patch adds a callback to
RecursivelyDeleteTriviallyDeadInstructions which can make sure the
instruction about to be deleted is removed from CodeGenPrepare's data
structures.
2020-07-15 15:19:21 +01:00
Tim Northover
5165b2b5fd AArch64+ARM: make LLVM consider system registers volatile.
Some of the system registers readable on AArch64 and ARM platforms
return different values with each read (for example a timer counter),
these shouldn't be hoisted outside loops or otherwise interfered with,
but the normal @llvm.read_register intrinsic is only considered to read
memory.

This introduces a separate @llvm.read_volatile_register intrinsic and
maps all system-registers on ARM platforms to use it for the
__builtin_arm_rsr calls. Registers declared with asm("r9") or similar
are unaffected.
2020-07-15 09:47:36 +01:00
Roger Ferrer Ibanez
14bc5e149d [DAGCombiner] Rebuild (setcc x, y, ==) from (xor (xor x, y), 1)
The existing code already considered this case. Unfortunately a typo in
the condition prevents it from triggering. Also the existing code, had
it run, forgot to do the folding.

This fixes PR42876.

Differential Revision: https://reviews.llvm.org/D65802
2020-07-15 07:34:22 +00:00
Krzysztof Pszeniczny
c3e6555616 Call Frame Information (CFI) Handling for Basic Block Sections
This patch handles CFI with basic block sections, which unlike DebugInfo does
not support ranges. The DWARF standard explicitly requires emitting separate
CFI Frame Descriptor Entries for each contiguous fragment of a function. Thus,
the CFI information for all callee-saved registers (possibly including the
frame pointer, if necessary) have to be emitted along with redefining the
Call Frame Address (CFA), viz. where the current frame starts.

CFI directives are emitted in FDE’s in the object file with a low_pc, high_pc
specification. So, a single FDE must point to a contiguous code region unlike
debug info which has the support for ranges. This is what complicates CFI for
basic block sections.

Now, what happens when we start placing individual basic blocks in unique
sections:

* Basic block sections allow the linker to randomly reorder basic blocks in the
address space such that a given basic block can become non-contiguous with the
original function.
* The different basic block sections can no longer share the cfi_startproc and
cfi_endproc directives. So, each basic block section should emit this
independently.
* Each (cfi_startproc, cfi_endproc) directive will result in a new FDE that
caters to that basic block section.
* Now, this basic block section needs to duplicate the information from the
entry block to compute the CFA as it is an independent entity. It cannot refer
to the FDE of the original function and hence must duplicate all the stuff that
is needed to compute the CFA on its own.
* We are working on a de-duplication patch that can share common information in
FDEs in a CIE (Common Information Entry) and we will present this as a follow up
patch. This can significantly reduce the duplication overhead and is
particularly useful when several basic block sections are created.
* The CFI directives are emitted similarly for registers that are pushed onto
the stack, like callee saved registers in the prologue. There are cfi
directives that emit how to retrieve the value of the register at that point
when the push happened. This has to be duplicated too in a basic block that is
floated as a separate section.

Differential Revision: https://reviews.llvm.org/D79978
2020-07-14 12:54:12 -07:00
Logan Smith
a19461d9e1 [NFC] Add 'override' keyword where missing in include/ and lib/.
This fixes warnings raised by Clang's new -Wsuggest-override, in preparation for enabling that warning in the LLVM build. This patch also removes the virtual keyword where redundant, but only in places where doing so improves consistency within a given file. It also removes a couple unnecessary virtual destructor declarations in derived classes where the destructor inherited from the base class is already virtual.

Differential Revision: https://reviews.llvm.org/D83709
2020-07-14 09:47:29 -07:00
Paul Walker
6e198aae1d [SelectionDAG] Prevent warnings when extracting fixed length vector from scalable.
ComputeNumSignBits and computeKnownBits both trigger "Scalable flag
may be dropped" warnings when a fixed length vector is extracted
from a scalable vector.  This patch assumes nothing about the
demanded elements thus matching the behaviour when extracting a
scalable vector from a scalable vector.

Differential Revision: https://reviews.llvm.org/D83642
2020-07-14 11:12:56 +00:00
Sam Elliott
1d15bbb9d9 Revert "[RISCV] Avoid Splitting MBB in RISCVExpandPseudo"
This reverts commit 97106f9d80.

This is based on feedback from https://reviews.llvm.org/D82988#2147105
2020-07-14 11:15:01 +01:00
David Sherwood
3b8eaf26db [SVE][CodeGen] Fix implicit TypeSize->uint64_t conversion in TransformFPLoadStorePair
In DAGCombiner::TransformFPLoadStorePair we were dropping the scalable
property of TypeSize when trying to create an integer type of equivalent
size. In fact, this optimisation makes no sense for scalable types
since we don't know the size at compile time. I have changed the code
to bail out when encountering scalable type sizes.

I've added a test to

  llvm/test/CodeGen/AArch64/sve-fp.ll

that exercises this code path. The test already emits an error if it
encounters warnings due to implicit TypeSize->uint64_t conversions.

Differential Revision: https://reviews.llvm.org/D83572
2020-07-14 08:07:30 +01:00
Amara Emerson
64eb3a4915 [AArch64][GlobalISel] Add post-legalize combine for sext_inreg(trunc(sextload)) -> copy
On AArch64 we generate redundant G_SEXTs or G_SEXT_INREGs because of this.

Differential Revision: https://reviews.llvm.org/D81993
2020-07-13 20:27:45 -07:00
zuojian lin
fefe6a6642 Fix undefined behavior in DWARF emission
Caused by uninitialized load of llvm::DwarfDebug::PrevCU:
llvm::DwarfCompileUnit::addRange () at ../lib/CodeGen/AsmPrinter/DwarfCompileUnit.cpp:276
llvm::DwarfDebug::endFunctionImpl () at ../lib/CodeGen/AsmPrinter/DwarfDebug.cpp:1586
llvm::DebugHandlerBase::endFunction () at ../lib/CodeGen/AsmPrinter/DebugHandlerBase.cpp:319
llvm::AsmPrinter::EmitFunctionBody () at ../lib/CodeGen/AsmPrinter/AsmPrinter.cpp:1230
llvm::ARMAsmPrinter::runOnMachineFunction () at ../lib/Target/ARM/ARMAsmPrinter.cpp:161

Most of the DebugInfo tests under `LLVM_LIT_ARGS:STRING=-sv --vg` prior to this fix, and pass with the fix applied.

Reviewed By: aprantl, dblaikie

Differential Revision: https://reviews.llvm.org/D81631
2020-07-13 18:32:36 -07:00
Matt Arsenault
23ec773d19 GlobalISel: Implement fewerElementsVector for saturating add/sub 2020-07-13 14:46:40 -04:00
Matt Arsenault
6a8c11a11f GlobalISel: Implement widenScalar for saturating add/sub
Add a placeholder legality rule for AMDGPU until the rest of the
actions are handled.
2020-07-13 14:46:40 -04:00
Sanjay Patel
8779b11410 [DAGCombiner] rot i16 X, 8 --> bswap X
We have this generic transform in IR (instcombine),
but as shown in PR41098:
http://bugs.llvm.org/PR41098
...the pattern may emerge in codegen too.

x86 has a potential refinement/reversal opportunity here,
but that should come later or needs a target hook to
avoid the transform. Converting to bswap is the more
specific form, so we should use it if it is available.
2020-07-13 12:01:53 -04:00
Sanjay Patel
2df46a5743 [DAGCombiner] allow load/store merging if pairs can be rotated into place
This carves out an exception for a pair of consecutive loads that are
reversed from the consecutive order of a pair of stores. All of the
existing profitability/legality checks for the memops remain between
the 2 altered hunks of code.

This should give us the same x86 base-case asm that gcc gets in
PR41098 and PR44895:
http://bugs.llvm.org/PR41098
http://bugs.llvm.org/PR44895

I think we are missing a potential subsequent conversion to use "movbe"
if the target supports that. That might be similar to what AArch64
would use to get "rev16".

Differential Revision: https://reviews.llvm.org/D83567
2020-07-13 08:57:00 -04:00
Sanjay Patel
f1bbf3acb4 Revert "[DAGCombiner] allow load/store merging if pairs can be rotated into place"
This reverts commit 591a3af5c7.
The commit message was cut off and failed to include the review citation.
2020-07-13 08:55:29 -04:00
Sanjay Patel
591a3af5c7 [DAGCombiner] allow load/store merging if pairs can be rotated into place
This carves out an exception for a pair of consecutive loads that are
reversed from the consecutive order of a pair of stores. All of the
existing profitability/legality checks for the memops remain between
the 2 altered hunks of code.

This should give us the same x86 base-case asm that gcc gets in
PR41098 and PR44895:i
http://bugs.llvm.org/PR41098
http://bugs.llvm.org/PR44895

I think we are missing a potential subsequent conversion to use "movbe"
if the target supports that. That might be similar to what AArch64
would use to get "rev16".

Differential Revision:
2020-07-13 08:53:06 -04:00
Kerry McLaughlin
afcc9a81d2 [SVE][Codegen] Add a helper function for pointer increment logic
Summary:
Helper used when splitting load & store operations to calculate
the pointer + offset for the high half of the split

Reviewers: efriedma, sdesmalen, david-arm

Reviewed By: efriedma

Subscribers: tschuett, hiraditya, psnobl, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D83577
2020-07-13 10:53:40 +01:00