Commit Graph

4922 Commits

Author SHA1 Message Date
Jessica Paquette
bd22a99c57 Add missing REQUIRES: asserts to combine-icmp-to-lhs-known-bits.mir 2021-09-03 09:25:37 -07:00
Amara Emerson
6d9505b8e0 [AArch64][GlobalISel] Support for folding G_ROTR as shifted operands.
This allows selection like: eor w0, w1, w2, ror #8

Saves 500 bytes on ClamAV -Os, which is 0.1%.

Differential Revision: https://reviews.llvm.org/D109206
2021-09-02 21:37:24 -07:00
Jessica Paquette
844d8e0337 [GlobalISel] Combine icmp eq/ne x, 0/1 -> x when x == 0 or 1
This adds the following combines:

```
x = ... 0 or 1
c = icmp eq x, 1

->

c = x
```

and

```
x = ... 0 or 1
c = icmp ne x, 0

->

c = x
```

When the target's true value for the relevant types is 1.

This showed up in the following situation:

https://godbolt.org/z/M5jKexWTW

SDAG currently supports the `ne` case, but not the `eq` case. This can probably
be further generalized, but I don't feel like thinking that hard right now.

This gives some minor code size improvements across the board on CTMark at
-Os for AArch64. (0.1% for 7zip and pairlocalalign in particular.)

Differential Revision: https://reviews.llvm.org/D109130
2021-09-02 15:05:31 -07:00
Bradley Smith
14e1a4a6ee [AArch64][SVE] Workaround incorrect types when lowering fixed length gather/scatter
When lowering a fixed length gather/scatter the index type is assumed to
be the same as the memory type, this is incorrect in cases where the
extension of the index has been folded into the addressing mode.

For now add a temporary workaround to fix the codegen faults caused by
this by preventing the removal of this extension. At a later date the
lowering for SVE gather/scatters will be redesigned to improve the way
addressing modes are handled.

As a short term side effect of this change, the addressing modes
generated for fixed length gather/scatters will not be optimal.

Differential Revision: https://reviews.llvm.org/D109145
2021-09-02 15:07:24 +00:00
Roman Lebedev
3f1f08f0ed Revert @llvm.isnan intrinsic patchset.
Please refer to
https://lists.llvm.org/pipermail/llvm-dev/2021-September/152440.html
(and that whole thread.)

TLDR: the original patch had no prior RFC, yet it had some changes that
really need a proper RFC discussion. It won't be productive to discuss
such an RFC, once it's actually posted, while said patch is already
committed, because that introduces bias towards already-committed stuff,
and the tree is potentially in broken state meanwhile.

While the end result of discussion may lead back to the current design,
it may also not lead to the current design.

Therefore i take it upon myself
to revert the tree back to last known good state.

This reverts commit 4c4093e6e3.
This reverts commit 0a2b1ba33a.
This reverts commit d9873711cb.
This reverts commit 791006fb8c.
This reverts commit c22b64ef66.
This reverts commit 72ebcd3198.
This reverts commit 5fa6039a5f.
This reverts commit 9efda541bf.
This reverts commit 94d3ff09cf.
2021-09-02 13:53:56 +03:00
Ben Shi
14500628b6 [AArch64][test] Add new tests for (mul (add x, c0), c1)
Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D108870
2021-09-02 10:39:49 +08:00
Jon Roelofs
9237eda304 Revert "[AArch64][GlobalISel] Legalize bswap <2 x i16>"
This reverts commit 5cd63e9ec2.

https://bugs.llvm.org/show_bug.cgi?id=51707

The sequence feeding in/out of the rev32/ushr isn't quite right:

 _swap:
         ldr     h0, [x0]
         ldr     h1, [x0, #2]
-        mov     v0.h[1], v1.h[0]
+        mov     v0.s[1], v1.s[0]
         rev32   v0.8b, v0.8b
         ushr    v0.2s, v0.2s, #16
-        mov     h1, v0.h[1]
+        mov     s1, v0.s[1]
         str     h0, [x0]
         str     h1, [x0, #2]
         ret
2021-09-01 16:49:20 -07:00
Roman Lebedev
f5753125f0 [Codegen][TLI][X86] SimplifyMultipleUseDemandedBits(): 0'th vec subreg widening is free, try to perform it earlier
I believe, the profitability reasoning here is correct
"sub"reg is already located within the 0'th subreg of wider reg,
so if we have suvector insertion at index 0 into undef,
then it's always free do to.

After this, D109065 finally avoids the regression in D108382.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D109074
2021-09-02 00:54:05 +03:00
Amara Emerson
a86bbe1e31 [AArch64][GlobalISel] Handle any-extending FPR loads in manual selection code.
When we have an any-extending FPR bank load, none of the tablegen patterns
match and we fall back to the C++ selector. Like with the truncating stores
that were fixed recently, the C++ wasn't able to handle it and ended up
generating invalid copies between different size regclasses.

This change adds handling for this case, splitting the load into a regular
load and a SUBREG_TO_REG to extend it into the original wide destination reg.
2021-09-01 10:19:22 -07:00
Jessica Paquette
94d3ff09cf [GlobalISel] Don't use G_FPTOSI in G_ISNAN legalization
As noted in the comments in D108227, using G_FPTOSI produces wrong results for
G_ISNAN. Drop the G_FPTOSI and perform the operation on integer types.

Elsewhere in LLVM, a bitcast would be the appropriate choice (as it is in SDAG).
GlobalISel does not distinguish between integer and FP types, so a bitcast would
be meaningless here.
2021-08-31 10:26:42 -07:00
Owen Anderson
db9de22f2b Teach the AArch64 backend patterns to generate the EOR3 instruction.
Adds patterns to match the EOR3 instruction.

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D108793
2021-08-30 20:01:08 +00:00
Ellis Hoag
47b239eb5a [DIBuilder] Do not replace empty enum types
It looks like this array was missed in 4276d4a8d0

Fixed tests that expected `elements` to be empty or depeneded on the order of the empty DINode.

Reviewed By: aprantl

Differential Revision: https://reviews.llvm.org/D107024
2021-08-30 12:33:03 -07:00
Jun Ma
15b2a8e7fa [AArch64][SVE] Optimize ptrue predicate pattern with known sve register width.
For vectors that are exactly equal to getMaxSVEVectorSizeInBits, just use
AArch64SVEPredPattern::all, which can enable the use of unpredicated ptrue when available.

TestPlan: check-llvm

Differential Revision: https://reviews.llvm.org/D108706
2021-08-27 20:03:48 +08:00
Jessica Paquette
2363a20001 [AArch64][GlobalISel] Optimize G_BUILD_VECTOR of undef + 1 elt -> SUBREG_TO_REG
This pattern

```
%elt = ... something ...
%undef = G_IMPLICIT_DEF
%vec = G_BUILD_VECTOR %elt, %undef, %undef, ... %undef
```

Can be selected to a SUBREG_TO_REG, assuming `%elt` and `%vec` have the same
register bank. We don't care about any of the bits in `%vec` aside from those
in `%elt`, which just happens to be the 0th element.

This is preferable to emitting `mov` instructions for every index.

This gives minor code size improvements on the test suite at -Os.

Differential Revision: https://reviews.llvm.org/D108773
2021-08-26 11:45:11 -07:00
Simon Wallis
c4dc81eeab [AArch64] provide strictfp attributes in test file
A post-commit review comment on  https://reviews.llvm.org/D107452 pointed out that
https://llvm.org/docs/LangRef.html
says:
"In a function that uses the constrained intrinsics the strictfp attribute is required on all function calls."

Although there are several files across several test directories which don't follow this guidance, it is straightforward to provide this attribute.

Reviewed By: kpn

Differential Revision: https://reviews.llvm.org/D107567
2021-08-26 16:56:43 +01:00
Jacob Bramley
05f3219b38 [AArch64] Lower fpto*i.sat intrinsics for NEON.
Following on from D102353, extend the fpto*i.sat intrinsics to use NEON
fcvt* instructions.

Differential Revision: https://reviews.llvm.org/D108460
2021-08-26 15:37:00 +01:00
Andrew Wei
99c4336374 [LoopDataPrefetch] Add missed LoopSimplify dependence for prefetch pass
SCEVExpander::expandCodeFor may expand add recurrences for loop with a preheader,
so we should make LoopDataPrefetch dependent on LoopSimplify.
This patch will try to fix : https://bugs.llvm.org/show_bug.cgi?id=43784

Reviewed By: Meinersbur

Differential Revision: https://reviews.llvm.org/D108448
2021-08-26 21:01:59 +08:00
David Green
6ffc6951a3 [AArch64] Remove unpredictable from narrowing instructions.
Like other similar instructions the xtn2 family do not have side
effects, and explicitly marking them as such can help improve scheduling
freedom.
2021-08-26 09:43:44 +01:00
Nicholas Guy
36fcf47fc8 [AArch64] Generate SMOV in place of sext(fmov(...))
A single smov instruction is capable of moving from a vector register while performing
the sign-extend during said move, rather than each step being performed by separate instructions.

Differential Revision: https://reviews.llvm.org/D108633
2021-08-25 15:23:22 +01:00
Peilin Guo
4c4dbeeeea [DAGCombine] Check the legality of the index of EXTRACT_SUBVECTOR
For ISD::EXTRACT_SUBVECTOR, its second operand must be a constant
multiple of the known-minimum vector length of the result type.

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D107795
2021-08-25 19:33:39 +08:00
Amara Emerson
2ed8053d46 Revert "[AArch64][GlobalISel] Don't contract cross-bank copies into truncating stores."
This reverts commit 67bf3ac744.

The reason is that this change is now superseded by 04fb9b729a which fixes the
underlying problem in the selector. Now it's fine to generate truncating FP stores
since the selector code will just generate subreg copies to handle them.
2021-08-24 16:26:56 -07:00
Amara Emerson
04fb9b729a [AArch64][GlobalISel] Fix incorrect handling of fp truncating stores.
When the tablegen patterns fail to select a truncating scalar FPR store,
our manual selection code also failed to handle it silently, trying to
generate an invalid copy. Fix this by adding support in the manual code
to generate a proper subreg copy before selecting a non-truncating store.
2021-08-24 16:07:00 -07:00
Jessica Paquette
ef8707574b [AArch64][GlobalISel] Legalize narrow scalar FP arithmetic
Widen narrow fp arithmetic ops (e.g. G_FADD). When we don't have full FP16
support, widen to s32. Otherwise widen to s16.

https://godbolt.org/z/TbT9Pqa7e

Differential Revision: https://reviews.llvm.org/D108660
2021-08-24 13:54:28 -07:00
Eli Friedman
09dcf31d74 [NFC] Add tests for i128 fshl on a few targets.
In preparation for D108058.
2021-08-24 11:43:35 -07:00
Jessica Paquette
db232de193 [AArch64][GlobalISel] Legalize + select v2p0 -> v264 G_PTRTOINT
1) Just mark this case as legal because it can just be a copy.

2) Ensure the copy in the existing code actually gets selected. Without doing
this, we'll crash because the destination won't have a register class.

This fell back 35 times in a build of clang with GISel for AArch64.

Differential Revision: https://reviews.llvm.org/D108610
2021-08-24 11:02:01 -07:00
Jessica Paquette
67d4dd5c07 [AArch64][GlobalISel] Select @llvm.aarch64.neon.ld4.*
Reuse the selection code from the ld2 case. This is similar to how SDAG handles
things in AArch64ISelDAGToDAG. (See SelectLoad)

This fell back ~100 times while building clang with GISel enabled for AArch64.

Factoring out the gross subreg copy part ought to make selecting the rest of
this family fairly easy.

Differential Revision: https://reviews.llvm.org/D108600
2021-08-24 09:03:49 -07:00
Petar Avramovic
2bf4eeeeb6 [GlobalISel] Avoid creating COPY in LegalizationArtifactCombiner
When Src and Dst used in buildAnyExtOrTrunc or buildSExtOrTrunc
have the same type (creates COPY) use Src register directly or
use replaceRegOrBuildCopy instead.

Differential Revision: https://reviews.llvm.org/D108306
2021-08-24 11:09:56 +02:00
Jessica Paquette
2ec2b25fba [AArch64][GlobalISel] Select @llvm.aarch64.neon.ld2.*
This is pretty similar to the ST2 selection code in
`AArch64InstructionSelector::selectIntrinsicWithSideEffects`.

This is a GISel equivalent of the ld2 case in `AArch64DAGToDAGISel::Select`.
There's some weirdness there that appears here too (e.g. using ld1 for scalar
cases, which are 1-element vectors in SDAG.)

It's a little gross that we have to create the copy and then select it right
after, but I think we'd need to refactor the existing copy selection code
quite a bit to do better.

This was falling back while building llvm-project with GISel for AArch64.

Differential Revision: https://reviews.llvm.org/D108590
2021-08-23 17:15:53 -07:00
Stanislav Mekhanoshin
401a45c61b Fix late rematerialization operands check
D106408 enables rematerialization of instructions with virtual
register uses. That has uncovered the bug in the allUsesAvailableAt
implementation: https://bugs.llvm.org/show_bug.cgi?id=51516.

In the majority of cases canRematerializeAt() called to check if
an instruction can be rematerialized before the given UseIdx.
However, SplitEditor::enterIntvAtEnd() calls it to rematerialize
an instruction at the end of a block passing LIS.getMBBEndIdx()
into the check. In the testcase from the bug it has attempted to
rematerialize ADDXri after STRXui in bb.17. The use operand %55
of the ADD is killed by the STRX but that is undetected by the check
because it adjusts passed UseIdx to the reg slot, before the kill.
The value is dead at the index passed to the check however.

This change uses a later of passed UseIdx and its reg slot. This
shall be correct because if are checking an availability of operands
before an instruction that instruction cannot be the one defining
these operands. If we are checking for late rematerialization we
are really interested if operands live past the instruction.

The bug is not exploitable without D106408 but needed to reland
reverted D106408.

Differential Revision: https://reviews.llvm.org/D108475
2021-08-23 12:23:58 -07:00
Jessica Paquette
a2c8e17658 [AArch64][GlobalISel] Add regbankselect support for G_LLROUND
Same as G_LROUND: destination should always be a GPR, source should always be
a FPR.

Differential Revision: https://reviews.llvm.org/D108566
2021-08-23 10:32:20 -07:00
Jessica Paquette
fe51f9098b [AArch64][GlobalISel] Legalize G_LLROUND for s64 + s32
Same as G_LROUND.

Also add a TODO for full fp16 legalization.

Differential Revision: https://reviews.llvm.org/D108564
2021-08-23 09:45:23 -07:00
Jessica Paquette
6760e2a7bc [GlobalISel] Translate @llvm.llround.* -> G_LLROUND
Translate it using `IRTranslator::translateSimpleIntrinsic`.

Differential Revision: https://reviews.llvm.org/D108563
2021-08-23 09:42:53 -07:00
Amara Emerson
3187a4f3f1 [AArch64][GlobalISel] Add legalizer support for the @llvm.get.dynamic.area.offset intrinsic.
This is just 0 on AArch64.
2021-08-20 17:13:34 -07:00
Amara Emerson
67bf3ac744 [AArch64][GlobalISel] Don't contract cross-bank copies into truncating stores.
Truncating stores with GPR bank sources shouldn't be mutated into using FPR bank
sources, since those aren't supported.

Ideally this should be a selection failure in the tablegen patterns, but for now
avoid generating them.
2021-08-20 16:36:23 -07:00
Jessica Paquette
9e9d70591e [AArch64][GlobalISel] Legalize non-register-sized scalar G_BITREVERSE
Clamp types to [s32, s64] and make them a power of 2.

This matches SDAG's behaviour.

https://godbolt.org/z/vTeGqf4vT

Differential Revision: https://reviews.llvm.org/D108344
2021-08-20 14:44:03 -07:00
Jessica Paquette
7e91c59844 [AArch64][GlobalISel] Legalize 32-bit + narrow G_SMULO + G_UMULO
SDAG lowers 32-bit and 64-bit G_SMULO + G_UMULO. We were missing the 32-bit
case.

For other sizes, make the 0th type a power of 2 and clamp it to either 32 bits
or 64 bits.

Right now, this will allow us to handle narrow types (e.g. s4, s24, etc.). The
LegalizerHelper doesn't support narrowing G_SMULO or G_UMULO right now. I think
we want clamping behaviour either way, so we might as well include it now to
be explicit.

Differential Revision: https://reviews.llvm.org/D108240
2021-08-20 14:37:46 -07:00
Jessica Paquette
16caf6321c [AArch64][GlobalISel] Clamp vectors of p0 when legalizing G_LOAD/G_STORE
We had a rule for <n x s64> but not one for <n x p0>. As a result, we'd fall
back on like <5 x p0> or whatever.

Differential Revision: https://reviews.llvm.org/D108484
2021-08-20 14:34:49 -07:00
Jessica Paquette
470c74f181 [AArch64][GlobalISel] Add regbankselect support for G_LROUND
Destination is always a GPR, since the result is always an integer.

Source is always a FPR, since the source is always floating point.

Differential Revision: https://reviews.llvm.org/D108419
2021-08-20 14:31:14 -07:00
Jessica Paquette
44bf0dc625 [AArch64][GlobalISel] Mark G_LROUND as legal for s64 dst + s32/s64 src.
Matches SDAG's behaviour for these types.

Differential Revision: https://reviews.llvm.org/D108420
2021-08-20 14:22:58 -07:00
Jessica Paquette
af8e09d4bb [GlobalISel] Add G_LLROUND
Basically the same as G_LROUND. Handles the llvm.llround family of intrinsics.

Also add a helper function to the MachineVerifier for checking if all of the
(virtual register) operands of an instruction are scalars. Seems like a useful
thing to have.

Differential Revision: https://reviews.llvm.org/D108429
2021-08-20 14:07:21 -07:00
Tim Northover
3d41ef68e7 AArch64: don't form indexed paired ops if base reg overlaps operands.
The registers involved might not be identical, but can still overlap (e.g.
"str w0, [x0, #4]!").
2021-08-20 11:39:38 +01:00
Jessica Paquette
3207ed196c [GlobalISel] Add IRTranslator support for @llvm.lround.* -> G_LROUND
Translate the `@llvm.lround.*` family to G_LROUND via
`IRTranslator::translateSimpleIntrinsic`.

Differential Revision: https://reviews.llvm.org/D108418
2021-08-19 17:08:08 -07:00
Jessica Paquette
3118926483 [GlobalISel] Add a G_LROUND instruction
Meant to represent the `@llvm.lround.*` family.

Add the opcode, docs, and verification.

Differential Revision: https://reviews.llvm.org/D108417
2021-08-19 17:06:24 -07:00
Amara Emerson
95ac3d15e9 [AArch64][GlobalISel] Add G_VECREDUCE fewerElements support for full scalarization.
For some reductions like G_VECREDUCE_OR on AArch64, we need to scalarize
completely if the source is <= 64b. This change adds support for that in
the legalizer. If the source has a pow-2 num elements, then we can do
a tree reduction using the scalar operation in the individual elements.
Otherwise, we just create a sequential chain of operations.

For AArch64, we only need to scalarize if the input is <64b. If it's great than
64b then we can first do a fewElements step to 64b, taking advantage of vector
instructions until we reach the point of scalarization.

I also had to relax the verifier checks for reductions because the intrinsics
support <1 x EltTy> types, which we lower to scalars for GlobalISel.

Differential Revision: https://reviews.llvm.org/D108276
2021-08-19 16:38:52 -07:00
Amara Emerson
a0051f7149 [AArch64][GlobalISel] Fix miscompile of <16 x s8> G_EXTRACT_VECTOR_ELT.
When support for copying vector s8 lanes was added recently, this also
had the side effect of fixing a fallback for <16 x s8> extracts since
both used the same helper. However, there was a bug in another helper
to get the regclass for a specific FPR-native type, which was assigning
FPR16 to s8 instead of FPR8.
2021-08-19 16:22:32 -07:00
Tim Northover
edab411ee6 AArch64: copy all parts of the mem operand across when combining a store
In particular we were dropping volatility, which can lead to unwanted
transformations.
2021-08-19 18:26:39 +01:00
Owen Anderson
06a4c85890 Use v16i8 rather than v2i64 as the VT for memset expansion on AArch64.
This allows the instruction selector to realize that it can directly
broadcast the low byte of the memset value, rather than replicating
it to a 64-bit GPR before broadcasting.

This fixes PR50985.

Differential Revision: https://reviews.llvm.org/D108354
2021-08-19 16:54:07 +00:00
David Green
d10f23a25d [ISel] Expand saddsat and ssubsat via asr and xor
This changes the lowering of saddsat and ssubsat so that instead of
using:
  r,o = saddo x, y
  c = setcc r < 0
  s = c ? INTMAX : INTMIN
  ret o ? s : r
into using asr and xor to materialize the INTMAX/INTMIN constants:
  r,o = saddo x, y
  s = ashr r, BW-1
  x = xor s, INTMIN
  ret o ? x : r
https://alive2.llvm.org/ce/z/TYufgD

This seems to reduce the instruction count in most testcases across most
architectures. X86 has some custom lowering added to compensate for
cases where it can increase instruction count.

Differential Revision: https://reviews.llvm.org/D105853
2021-08-19 16:08:07 +01:00
Jessica Paquette
3d91d5b757 [AArch64][GlobalISel] Mark G_FMINNUM/G_FMAXNUM as floating point opcodes
We need to ensure that these end up on FPR to allow imported patterns to
select them.

This will also ensure that we get good regbank selection when dealing with
instructions like G_PHI/G_LOAD/G_STORE which deduce their banks from their
uses/users.

Differential Revision: https://reviews.llvm.org/D108260
2021-08-18 13:32:19 -07:00
Jessica Paquette
45e1a6bd25 [AArch64][GlobalISel] Legalize scalar G_FMINNUM + G_FMAXNUM
For subtargets with full FP16, this is legal for s16, s32, and s64. Without
full FP16, it's legal for s32 and s64.

For s128, this is a libcall.

We also support some vector types, but for now, let's just support scalars.

Differential Revision: https://reviews.llvm.org/D108259
2021-08-18 13:30:03 -07:00