Commit Graph

61313 Commits

Author SHA1 Message Date
Carl Ritson
e5b0b434f6 [AMDGPU] Refactor MIMG tables to better handle hardware variants
Add mimgopc object to represent the opcode allowing different
opcodes for different hardware variants.
This enables image_atomic_fcmpswap, image_atomic_fmin, and
image_atomic_fmax on GFX10

Reviewed By: foad, rampitec

Differential Revision: https://reviews.llvm.org/D96309
2021-02-11 13:22:41 +09:00
Craig Topper
5189c5b940 [X86] Simplify patterns for avx512 vpcmp. NFC
This removes the commuted PatFrags that only existed to carry
an SDNodeXForm in its OperandTransform field. We know all the places
that need to use the commuted SDNodeXForm and there is one transform
shared by signed and unsigned compares. So just hardcode the
the SDNodeXForm where it is needed and use the non commuted PatFrag
in the pattern.

I think when I wrote this I thought the SDNodeXForm name had to
match what is in the PatFrag that is being used. But that's not
true. The OperandTransform is only used when the PatFrag is used
in an instruction pattern and not a separate Pat pattern. All
the commuted cases are Pat patterns.
2021-02-10 19:24:27 -08:00
Jessica Clarke
ca606dc988 [RISCV] More whitespace and comment typo fixes in RISCVInstrInfoC.td 2021-02-11 02:32:36 +00:00
Jessica Clarke
0973ce8596 [RISCV] Fix whitespace in RISCVInstrInfoC.td 2021-02-11 02:23:09 +00:00
Craig Topper
350ab4e617 [RISCV] Use OperandTransform field of ImmLeaf to slightly simplify a couple bitmanip patterns. NFC
This binds the SDNodeXForm to the ImmLeaf so we only need to mention
the ImmLeaf in both the input and output pattern.
2021-02-10 17:52:07 -08:00
Jessica Paquette
1514f3b2c8 [AArch64][GlobalISel] Don't perform the mul const combine with G_PTR_ADD
A G_MUL + G_PTR_ADD can also be folded into a madd. So, conservatively, we
shouldn't combine when the G_MUL is used by a G_PTR_ADD either.

Differential Revision: https://reviews.llvm.org/D96457
2021-02-10 15:30:45 -08:00
Jessica Paquette
5f7a4d8d05 [AArch64][GlobalISel] Perform load/store extended reg folding with optsize
GlobalISel was only doing this with minsize. SDAG does this with optsize.

(See: `SelectionDAG::shouldOptForSize()`)

This is a 0.3% code size improvement for CTMark at -Os.

(Best: 1.1% improvements on lencod + pairlocalalign)

Differential Revision: https://reviews.llvm.org/D96451
2021-02-10 14:42:25 -08:00
Jessica Paquette
9283058abb [AArch64][GlobalISel] Fold G_ADD into the cset for G_ICMP
When we have a G_ADD which is fed by a G_ICMP on one side, we can fold it into
the cset for the G_ICMP.

e.g. Given

```
%cmp = G_ICMP ... %x, %y
%add = G_ADD %cmp, %z
```

We would normally emit a cmp, cset, and add.

However, `%add` is either `%z` or `%z + 1`. So, we can just use `%z` as the
source of the cset rather than wzr, saving an instruction.

This would probably be cleaner in AArch64PostLegalizerLowering, but we'd need
to change the way we represent G_ICMP to do that, I think. For now, it's
easiest to implement in selection.

This is a 0.1% code size improvement on CTMark/pairlocalalign at -Os.

Example: https://godbolt.org/z/7KdrP8

Differential Revision: https://reviews.llvm.org/D96388
2021-02-10 13:28:01 -08:00
Craig Topper
fc4d780eaf [RISCV] Remove superfluous semicolon. NFC 2021-02-10 11:20:29 -08:00
Nick Desaulniers
68945a8686 [Thumb2] support movs pc, lr alias for subs pc, lr, #0/eret
This is used by the Linux kernel built with CONFIG_THUMB2_KERNEL.

Because different operands are not permitted to `movs`, the diagnostics now provide multiple suggestions along the lines of using a non-pc destination operand or lr source operand.

Forked from D95586.

Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>

Reviewed By: DavidSpickett

Differential Revision: https://reviews.llvm.org/D96304
2021-02-10 11:00:42 -08:00
Craig Topper
cb161b3a88 [RISCV] Add support for matching .vf forms of fadd/fsub/fmul/fdiv/fma for fixed vectors.
fma+neg will come in a different patch since I haven't done it for .vv
yet either.

Differential Revision: https://reviews.llvm.org/D96375
2021-02-10 10:16:27 -08:00
Craig Topper
0c254b4a69 [RISCV] Add support for selecting vrgather.vx/vi for fixed vector splat shuffles.
The test cases extract a fixed element from a vector and splat it
into a vector. This gets DAG combined into a splat shuffle.

I've used some very wide vectors in the test to make sure we have
at least a couple tests where the element doesn't fit into the
uimm5 immediate of vrgather.vi so we fall back to vrgather.vx.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D96186
2021-02-10 10:01:56 -08:00
Jay Foad
2114b458b0 [AMDGPU] Fix comments in SILoadStoreOptimizer::offsetsCanBeCombined 2021-02-10 14:49:33 +00:00
Daniel Cederman
ad3b023c88 [Sparc] Support relocatable expressions in the assembler
Allow assembler expressions to start with an identifier. This allows for expressions such as
```
b symbol + 4
```
and
```
mov symEnd - symStart, %g1
```

The patch builds upon https://reviews.llvm.org/D47136.

Reviewed By: joerg

Differential Revision: https://reviews.llvm.org/D47458
2021-02-10 14:52:44 +01:00
Fraser Cormack
a3c74d6d53 [RISCV] Add support for selecting vid.v from build_vector
This patch optimizes a build_vector "index sequence" and lowers it to
the existing custom RISCVISD::VID node. This pattern is common in
autovectorized code.

The custom node was updated to allow it to be used by both scalable and
fixed-length vectors, thus avoiding pattern duplication.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D96332
2021-02-10 10:58:40 +00:00
Simon Pilgrim
eb31c3c5cb Revert rGe1172959226689a "[X86][AVX] canonicalizeLaneShuffleWithRepeatedOps - merge VPERMILPD ops with different low/high masks."
Revert this while I investigate a downstream breakage report.
2021-02-10 10:26:44 +00:00
Sam Parker
9d81ccc02f [WebAssembly] Enable loop unrolling
Enable partial and runtime unrolling with a threshold of 30, which
was derived from a large number of kernels running on node and
wasmtime for amd64 and aarch64.

Unrolling is enabled by default at -O2 and -O3 and is disabled at
-Oz and -Os. Compiling with -Os is recommended if the wasm binary
size is the most important factor.

Differential Revision: https://reviews.llvm.org/D95125
2021-02-10 08:25:46 +00:00
Jessica Paquette
7eee858585 [AArch64][GlobalISel] Fold selects fed by G_PTR_ADD
Similar to the case for G_ADD.

There was a function in CTMark/pairlocalalign which was missing this case,
causing GlobalISel to emit a add + csel when a csinc is all that is necessary.

https://godbolt.org/z/ax69E9

Minor code size improvements on CTMark at -Os.

Differential Revision: https://reviews.llvm.org/D96390
2021-02-10 00:03:13 -08:00
Jessica Paquette
0e85d63486 [AArch64][GlobalISel] Allow vector load legalization into 128-bit-wide types
Similar to 3d25fdc5c2

This fixes bad codegen in cases like so:

https://godbolt.org/z/hePhz1

Differential Revision: https://reviews.llvm.org/D96296
2021-02-09 13:35:59 -08:00
Artem Belevich
2aa01ccec3 [CUDA, NVPTX] Allow targeting sm_86 GPUs.
The patch only plumbs through the option necessary for targeting sm_86 GPUs w/o
adding any new functionality.

Differential Revision: https://reviews.llvm.org/D95974
2021-02-09 11:01:10 -08:00
Matt Arsenault
f4ca6d8289 AMDGPU: Fix verifier error with argument passed in CSR SGPR
We need to avoid setting the kill flag on the CSR spill if there's an
additional use of the register after the spill.

This does rely on consistency between the entry block liveins and the
MRI's function live ins, which is not something the verifier checks
now.
2021-02-09 13:49:44 -05:00
Matt Arsenault
b72a23650f GlobalISel: Fix using wrong calling convention for callees
This was taking the calling convention from the parent function,
instead of the callee. Avoids regressions in a future patch when the
caller and callee have different type breakdowns.

For some reason AArch64's lowerFormalArguments seems to intentionally
ignore the parent isVarArg.
2021-02-09 13:48:56 -05:00
Craig Topper
18ff7e045a [RISCV] Make the min and max vector width command line options more consistent and check their relationship to each other. 2021-02-09 10:47:23 -08:00
Craig Topper
fd5adae02c [RISCV] Remove SRO* and SLO* instructions from bitmanip.
As of the current draft these are no longer being considered
for the bitmanip spec. It wasn't clear what sub extension they
belonged in in the 0.93 spec.

So remove them. They can always be added back if something changes.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D96157
2021-02-09 09:35:05 -08:00
Nico Weber
de1966e542 Revert "[ObjC][ARC] Use operand bundle 'clang.arc.rv' instead of explicitly"
This reverts commit 4a64d8fe39.
Makes clang crash when buildling trivial iOS programs, see comment
after https://reviews.llvm.org/D92808#2551401
2021-02-09 11:06:32 -05:00
Simon Pilgrim
89d9ff8229 [X86][SSE] foldShuffleOfHorizOp - add SHUFPS v4f32 handling
Fold shufps(hop(x,y),hop(z,w)) -> permute(hop(x,z)) - this is very similar to the equivalent unpack fold.

I did start trying to convert foldShuffleOfHorizOp to handle generic shuffle masks but we're relying on a lot of special cases at the moment.
2021-02-09 14:18:45 +00:00
Nemanja Ivanovic
a5222aa085 [DAGCombine] Do not remove masking argument to FP16_TO_FP for some targets
As of commit 284f2bffc9, the DAG Combiner gets rid of the masking of the
input to this node if the mask only keeps the bottom 16 bits. This is because
the underlying library function does not use the high order bits. However, on
PowerPC's ELFv2 ABI, it is the caller that is responsible for clearing the bits
from the register. Therefore, the library implementation of __gnu_h2f_ieee will
return an incorrect result if the bits aren't cleared.

This combine is desired for ARM (and possibly other targets) so this patch adds
a query to Target Lowering to check if this zeroing needs to be kept.

Fixes: https://bugs.llvm.org/show_bug.cgi?id=49092

Differential revision: https://reviews.llvm.org/D96283
2021-02-09 06:33:48 -06:00
Nemanja Ivanovic
f6e4b9fc06 [RISCV] Fix shared libs build
Commit a2d19bad07 introduced a
dependency in the RISCV disassembler on two additional libraries
(MC, RISCVDesc) which wasn't added to the CMakeLists.txt. This
causes shared library builds to break. This patch just adds them
to fix failures seen on some bots, such as the PPC64LE Multistage.
2021-02-09 06:14:25 -06:00
Dylan McKay
2ccb941740 [AVR] Fix global references to function symbols
References to functions are in program memory and need a `pm()` fixup. This should fix trait objects for Rust on AVR.

Differential Revision: https://reviews.llvm.org/D87631

Patch by Alex Mikhalev.
2021-02-10 00:40:49 +13:00
Hsiangkai Wang
a2d19bad07 [RISCV] Use whole register load/store for generic load/store.
In vector v0.10, there are whole vector register load/store
instructions. I suggest to use the whole register load/store
instructions for generic load/store for scalable vector types. It could
save up vset{i}vl{i} for these load/store.

For fractional LMUL, I keep to use vle{eew}.v/vse{eew}.v instructions to
load/store partial vector registers.

Differential Revision: https://reviews.llvm.org/D95853
2021-02-09 15:52:04 +08:00
Jinsong Ji
9202806241 Revert "[CostModel] Remove VF from IntrinsicCostAttributes"
This reverts commit 502a67dd7f.

This expose a failure in test-suite build on PowerPC,
revert to unblock buildbot first,
Dave will re-commit in https://reviews.llvm.org/D96287.

Thanks Dave.
2021-02-09 02:14:14 +00:00
LemonBoy
45e33e8ba9 [SPARC] Recognize and handle the %lm(sym) operator
Reviewed By: joerg

Differential Revision: https://reviews.llvm.org/D77737
2021-02-08 19:25:33 -05:00
Hsiangkai Wang
a5b07a221a [RISCV] Initial support of LoopVectorizer for RISC-V Vector.
Define an option -riscv-vector-bits-max to specify the maximum vector
bits for vectorizer. Loop vectorizer will use the value to check if it
is safe to use the whole vector registers to vectorize the loop.

It is not the optimum solution for loop vectorizing for scalable vector.
It assumed the whole vector registers will be used to vectorize the code.
If it is possible, we should configure vl to do vectorize instead of
using whole vector registers.

We only consider LMUL = 1 in this patch.

This patch just an initial work for loop vectorizer for RISC-V Vector.

Differential Revision: https://reviews.llvm.org/D95659
2021-02-09 06:32:18 +08:00
Matt Arsenault
bcf723b2fd AMDGPU: Stop adding stack passed wide arguments to call conv handler
The generated calling convention code shouldn't see these types since
we split large types into 32-bit chunks before the calling convention
code is triggered.

GlobalISel ends up directly calls the generated CC code before
checking for the register count breakdown. Arguably this difference is
a bug, but this was dead code for the DAG anyway.
2021-02-08 17:09:28 -05:00
Arthur Eubanks
e84a4650eb [NVPTX][NewPM] Re-enable NVVMReflectPass
Disabled alongside NVVMIntrRangePass in https://reviews.llvm.org/D96166,
but turns out NVVMIntrRangePass was the issue.

Reviewed By: tra

Differential Revision: https://reviews.llvm.org/D96291
2021-02-08 13:58:17 -08:00
David Green
0c7e044a7f [ARM] One-off identity shuffle
A One-Off Identity mask is a shuffle that is mostly an identity mask
from as single source but contains a single element out-of-place, either
from a different vector or from another position in the same vector. As
opposed to lowering this via a ARMISD::BUILD_VECTOR we can generate an
extract/insert pair directly. Under ARM with individually accessible
lane elements this often becomes a simple lane move.

This also alters the LowerVECTOR_SHUFFLEUsingMovs code to use v4f32 (not
v4i32), a more natural type for lane moves.

Differential Revision: https://reviews.llvm.org/D95551
2021-02-08 21:24:32 +00:00
Amara Emerson
ec41ed5b1b [AArch64][GlobalISel] Support the 'returned' parameter attribute.
On AArch64 (which seems to be the only target that supports it), this
attribute allows codegen to avoid saving/restoring the value in x0
across a call.

Gives a 0.1% geomean -Os code size improvement on CTMark.

Differential Revision: https://reviews.llvm.org/D96099
2021-02-08 12:47:39 -08:00
Martin Storsjö
71c29b4cf3 [AArch64] Use '//' as comment string for MSVC assembly
As the actual MSVC toolset doesn't use the GAS-style assembly that
Clang/LLVM produces and consumes, there's no reference for what
string to use for e.g. comments when building with a MSVC triple.

This frees up the use of semicolon as separator string, just like
was done for GNU targets in 2341319564.
(Previously, both the separator and comment strings were set to
the same, a semicolon.)

Compiler-rt extensively uses separator chars in its assembly,
and that assembly should be buildable with clang-cl for MSVC too.

Differential Revision: https://reviews.llvm.org/D96259
2021-02-08 22:30:14 +02:00
Craig Topper
b49aaed8c7 [RISCV] Use _COMMUTABLE fma pseudos for fixed vectors.
This matches what we do in the VLMAX SDNode patterns.
2021-02-08 11:27:23 -08:00
Craig Topper
8d8cafa32e [RISCV] Add support for splat fixed length build_vectors using RVV.
Building on the fixed vector support from D95705

I've added ISD nodes for vmv.v.x and vfmv.v.f and switched to
lowering the intrinsics to it. This allows us to share the same
isel patterns for both.

This doesn't handle splats of i64 on RV32 yet. The build_vector
gets converted to a vXi32 build_vector+bitcast during type
legalization. Not sure the best way to handle this at the moment.

Differential Revision: https://reviews.llvm.org/D96108
2021-02-08 11:12:56 -08:00
Craig Topper
b8d719fbe8 [RISCV] Add support for fixed vector FMA.
Follow up to D95705. Does not include the commuting support from D95800.

Differential Revision: https://reviews.llvm.org/D96103
2021-02-08 11:12:56 -08:00
Craig Topper
a719b667a9 [RISCV] Add initial support for converting fixed vectors to scalable vectors during lowering to use RVV instructions.
This is an alternative to D95563.

This is modeled after a similar feature for AArch64's SVE that uses
predicated scalable vector instructions.a

Rather than use predication, this patch uses an explicit VL operand.
I've limited it to always use LMUL=1 for now, but we can improve this
in the future.

This requires a bunch of new ISD opcodes to carry the VL operand.
I think we can probably lower intrinsics to these ISD opcodes to
cut down on the size of the isel table. Which is why I've added
patterns for all integer/float types and not just LMUL=1.

I'm only testing one vector width right now, but the width is
programmable via the command line.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D95705
2021-02-08 10:41:30 -08:00
Craig Topper
b7b4f4cbc3 [RISCV] Make scalable vector FMA commutable for register allocation.
This adds support for commuting operands and converting between
vfmadd and vfmacc to avoid register copies.

To avoid messing up intrinsic behavior, I've added new pseudo
instructions that have the isCommutable flag set. These pseudos also
force a tail agnostic policy. The intrinsic version still use
the tail undisturbed policy.

For best results it looks like we need to start with fmadd and only
pick fmacc if its beneficial. MachineCSE commutes without contraining
the operands and then commutes back if it didn't help with CSE. So
I've made sure that when the operand choice isn't constrained, we
will keep fmadd for MachineCSE and when it does the second commute,
we get back the original instruction.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D95800
2021-02-08 10:05:33 -08:00
Craig Topper
cc2c45dc54 [RISCV] Use SplatPat/SplatPat_simm5 to handle PseudoVMV_V_X_/PseudoVMV_V_I_ selection as well.
This ensures that we'll match immediates consistently regardless
of whether we match them as a standalone splat or as part of
another operation.

While I was there I added complexities to the simm5/uimm5 patterns so
we didn't have to assume that the 1 on the non-immediate was lower
than what tablegen inferred.

I had to make a minor tweak to tablegen to fix one place that
didn't expect to see a ComplexPattern that wasn't a "leaf".

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D96199
2021-02-08 09:48:27 -08:00
Jay Foad
a4b1df8af3 [AMDGPU] Use named unified buffer format constant. NFC. 2021-02-08 17:34:36 +00:00
Sander de Smalen
981a38baf4 [AArch64AsmParser] Fix type-limits warning for VectorIndex.
Making VectorIndex an `int` instead of `unsigned`, silences the warning:
  comparison of unsigned expression in ‘>= 0’ is always true

in:
  template <int Min, int Max>
  DiagnosticPredicate isVectorIndex() const {
    ...
    if (VectorIndex.Val >= Min && VectorIndex.Val <= Max)
      return DiagnosticPredicateTy::Match;
    ...
  }

when Min is 0.
2021-02-08 15:35:30 +00:00
Tim Northover
c93d50dd71 AArch64: use a constpool for blockaddress(...) on MachO
More MachO madness for everyone. MachO relocations are only 32-bits, which
means the ARM64_RELOC_ADDEND one only actually has 24 (signed) bits for the
actual addend. This is a problem when calculating the address of a basic block;
because it has no symbol of its own, the sequence

	adrp x0, Ltmp0@PAGE
	add x0, x0, x0 Ltmp0@PAGEOFF

is represented by relocation with an addend that contains the offset from the
function start to Ltmp, and so the largest function where this is guaranteed to
work is 8MB. That's not quite big enough that we can call it user error (IMO).

So this patch puts the any blockaddress into a constant-pool, where the addend
is instead stored in the (x)word being relocated, which is obviously big enough
for any function.
2021-02-08 15:13:29 +00:00
Mikael Holmen
eb8c27c60c [RISCV] Use std::make_tuple to make some toolchains happy again
My toolchain (LLVM 8.0, libstdc++ 5.4.0) complained with:

12:38:19 ../lib/Target/RISCV/RISCVISelLowering.cpp:1717:12: error: chosen constructor is explicit in copy-initialization
12:38:19     return {RISCVISD::VECREDUCE_FADD, Op.getOperand(0),
12:38:19            ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
12:38:19 /proj/flexasic/app/llvm/8.0/bin/../lib/gcc/x86_64-unknown-linux-gnu/5.4.0/../../../../include/c++/5.4.0/tuple:479:19: note: explicit constructor declared here
12:38:19         constexpr tuple(_UElements&&... __elements)
12:38:19                   ^
12:38:19 ../lib/Target/RISCV/RISCVISelLowering.cpp:1720:12: error: chosen constructor is explicit in copy-initialization
12:38:19     return {RISCVISD::VECREDUCE_SEQ_FADD, Op.getOperand(1), Op.getOperand(0)};
12:38:19            ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
12:38:19 /proj/flexasic/app/llvm/8.0/bin/../lib/gcc/x86_64-unknown-linux-gnu/5.4.0/../../../../include/c++/5.4.0/tuple:479:19: note: explicit constructor declared here
12:38:19         constexpr tuple(_UElements&&... __elements)
12:38:19                   ^
12:38:19 2 errors generated.

This commit adds explicit calls to std::make_tuple to work around
the problem.
2021-02-08 14:37:25 +01:00
Nicholas Guy
cd880442ae [CodeGen][AArch64] Add TargetInstrInfo hook to modify the TailDuplicateSize default threshold
Different targets might handle branch performance differently, so this patch allows for
targets to specify the TailDuplicateSize threshold. Said threshold defines how small a branch
can be and still be duplicated to generate straight-line code instead.
This patch also specifies said override values for the AArch64 subtarget.

Differential Revision: https://reviews.llvm.org/D95631
2021-02-08 13:28:00 +00:00
Thomas Symalla
f89f6d1e5d [AMDGPU]: Fixes an invalid clamp selection pattern.
When running the tests on PowerPC and x86, the lit test GlobalISel/trunc.ll fails at the memory sanitize step. This seems to be due to wrong invalid logic (which matches even if it shouldn't) and likely missing variable initialisation."

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D95878
2021-02-08 13:06:30 +01:00