Commit Graph

115 Commits

Author SHA1 Message Date
Matt Arsenault
538b73b797 AMDGPU/GlobalISel: Handle more G_INSERT cases
Start manually writing a table to get the subreg index. TableGen
should probably generate this, but I'm not sure what it looks like in
the arbitrary case where subregisters are allowed to not fully cover
the super-registers.

llvm-svn: 373947
2019-10-07 19:16:26 +00:00
Matt Arsenault
0b2ea91d6d AMDGPU/GlobalISel: Use S_MOV_B64 for inline constants
This hides some defects in SIFoldOperands when the immediates are
split.

llvm-svn: 373943
2019-10-07 19:07:19 +00:00
Matt Arsenault
b4cbf9862c AMDGPU/GlobalISel: Select more G_INSERT cases
At minimum handle the s64 insert type, which are emitted in real cases
during legalization.

We really need TableGen to emit something to emit something like the
inverse of composeSubRegIndices do determine the subreg index to use.

llvm-svn: 373938
2019-10-07 18:43:31 +00:00
Matt Arsenault
27269054d2 GlobalISel: Add target pre-isel instructions
Allows targets to introduce regbankselectable
pseudo-instructions. Currently the closet feature to this is an
intrinsic. However this requires creating a public intrinsic
declaration. This litters the public intrinsic namespace with
operations we don't necessarily want to expose to IR producers, and
would rather leave as private to the backend.

Use a new instruction bit. A previous attempt tried to keep using enum
value ranges, but it turned into a mess.

llvm-svn: 373937
2019-10-07 18:43:29 +00:00
Matt Arsenault
e59296a051 AMDGPU/GlobalISel: Fall back on weird G_EXTRACT offsets
llvm-svn: 373842
2019-10-06 01:41:22 +00:00
Matt Arsenault
412e0bf8f3 AMDGPU/GlobalISel: Select G_PTRTOINT
llvm-svn: 373715
2019-10-04 08:35:37 +00:00
Piotr Sobczak
265e94e657 [AMDGPU] Extend buffer intrinsics with swizzling
Summary:
Extend cachepolicy operand in the new VMEM buffer intrinsics
to supply information whether the buffer data is swizzled.
Also, propagate this information to MIR.

Intrinsics updated:
int_amdgcn_raw_buffer_load
int_amdgcn_raw_buffer_load_format
int_amdgcn_raw_buffer_store
int_amdgcn_raw_buffer_store_format
int_amdgcn_raw_tbuffer_load
int_amdgcn_raw_tbuffer_store
int_amdgcn_struct_buffer_load
int_amdgcn_struct_buffer_load_format
int_amdgcn_struct_buffer_store
int_amdgcn_struct_buffer_store_format
int_amdgcn_struct_tbuffer_load
int_amdgcn_struct_tbuffer_store

Furthermore, disable merging of VMEM buffer instructions
in SI Load/Store optimizer, if the "swizzled" bit on the instruction
is on.

The default value of the bit is 0, meaning that data in buffer
is linear and buffer instructions can be merged.

There is no difference in the generated code with this commit.
However, in the future it will be expected that front-ends
use buffer intrinsics with correct "swizzled" bit set.

Reviewers: arsenm, nhaehnle, tpr

Reviewed By: nhaehnle

Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, arphaman, jfb, Petar.Avramovic, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D68200

llvm-svn: 373491
2019-10-02 17:22:36 +00:00
Matt Arsenault
86f864dace AMDGPU/GlobalISel: Use getIntrinsicID helper
llvm-svn: 373417
2019-10-02 01:02:27 +00:00
Matt Arsenault
fdea5e02ce AMDGPU/GlobalISel: Select s1 src G_SITOFP/G_UITOFP
llvm-svn: 373298
2019-10-01 02:23:20 +00:00
Matt Arsenault
59b91aa93e AMDGPU/GlobalISel: Add support for init.exec intrinsics
TThe existing wave32 behavior seems broken and incomplete, but this
reproduces it.

llvm-svn: 373296
2019-10-01 02:07:25 +00:00
Matt Arsenault
54167ea316 AMDGPU/GlobalISel: Select G_UADDO/G_USUBO
llvm-svn: 373288
2019-10-01 01:23:13 +00:00
Matt Arsenault
76f44f6b53 AMDGPU/GlobalISel: Avoid getting MRI in every function
Store it in AMDGPUInstructionSelector to avoid boilerplate in nearly
every select function.

llvm-svn: 373139
2019-09-28 03:41:13 +00:00
Bjorn Pettersson
169cb63478 [AMDGPU] Use std::make_tuple to make some toolchains happy again
My toolchain stopped working (LLVM 8.0 , libstdc++ 5.4.0) after
r372338.

The same problem was seen in clang-cuda-build buildbots:

clang-cuda-build/llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp:763:12:
error: chosen constructor is explicit in copy-initialization
    return {Reg, 0, nullptr};
           ^~~~~~~~~~~~~~~~~
/usr/bin/../lib/gcc/x86_64-linux-gnu/5.4.0/../../../../include/c++/5.4.0/tuple:479:19:
note: explicit constructor declared here
        constexpr tuple(_UElements&&... __elements)
                  ^

This commit adds explicit calls to std::make_tuple to work around
the problem.

llvm-svn: 372384
2019-09-20 12:13:12 +00:00
Matt Arsenault
3ecab8e455 Reapply r372285 "GlobalISel: Don't materialize immarg arguments to intrinsics"
This reverts r372314, reapplying r372285 and the commits which depend
on it (r372286-r372293, and r372296-r372297)

This was missing one switch to getTargetConstant in an untested case.

llvm-svn: 372338
2019-09-19 16:26:14 +00:00
Hans Wennborg
13bdae8541 Revert r372285 "GlobalISel: Don't materialize immarg arguments to intrinsics"
This broke the Chromium build, causing it to fail with e.g.

  fatal error: error in backend: Cannot select: t362: v4i32 = X86ISD::VSHLI t392, Constant:i8<15>

See llvm-commits thread of r372285 for details.

This also reverts r372286, r372287, r372288, r372289, r372290, r372291,
r372292, r372293, r372296, and r372297, which seemed to depend on the
main commit.

> Encode them directly as an imm argument to G_INTRINSIC*.
>
> Since now intrinsics can now define what parameters are required to be
> immediates, avoid using registers for them. Intrinsics could
> potentially want a constant that isn't a legal register type. Also,
> since G_CONSTANT is subject to CSE and legalization, transforms could
> potentially obscure the value (and create extra work for the
> selector). The register bank of a G_CONSTANT is also meaningful, so
> this could throw off future folding and legalization logic for AMDGPU.
>
> This will be much more convenient to work with than needing to call
> getConstantVRegVal and checking if it may have failed for every
> constant intrinsic parameter. AMDGPU has quite a lot of intrinsics wth
> immarg operands, many of which need inspection during lowering. Having
> to find the value in a register is going to add a lot of boilerplate
> and waste compile time.
>
> SelectionDAG has always provided TargetConstant for constants which
> should not be legalized or materialized in a register. The distinction
> between Constant and TargetConstant was somewhat fuzzy, and there was
> no automatic way to force usage of TargetConstant for certain
> intrinsic parameters. They were both ultimately ConstantSDNode, and it
> was inconsistently used. It was quite easy to mis-select an
> instruction requiring an immediate. For SelectionDAG, start emitting
> TargetConstant for these arguments, and using timm to match them.
>
> Most of the work here is to cleanup target handling of constants. Some
> targets process intrinsics through intermediate custom nodes, which
> need to preserve TargetConstant usage to match the intrinsic
> expectation. Pattern inputs now need to distinguish whether a constant
> is merely compatible with an operand or whether it is mandatory.
>
> The GlobalISelEmitter needs to treat timm as a special case of a leaf
> node, simlar to MachineBasicBlock operands. This should also enable
> handling of patterns for some G_* instructions with immediates, like
> G_FENCE or G_EXTRACT.
>
> This does include a workaround for a crash in GlobalISelEmitter when
> ARM tries to uses "imm" in an output with a "timm" pattern source.

llvm-svn: 372314
2019-09-19 12:33:07 +00:00
Matt Arsenault
494243597b AMDGPU/GlobalISel: Select llvm.amdgcn.raw.buffer.store.format
This needs special handling due to some subtargets that have a
nonstandard register layout for f16 vectors

Also reject some illegal types on other targets.

llvm-svn: 372293
2019-09-19 02:35:08 +00:00
Matt Arsenault
67f1f6ff8c AMDGPU/GlobalISel: Select llvm.amdgcn.raw.buffer.store
llvm-svn: 372292
2019-09-19 02:30:27 +00:00
Matt Arsenault
d8399d12cd GlobalISel: Don't materialize immarg arguments to intrinsics
Encode them directly as an imm argument to G_INTRINSIC*.

Since now intrinsics can now define what parameters are required to be
immediates, avoid using registers for them. Intrinsics could
potentially want a constant that isn't a legal register type. Also,
since G_CONSTANT is subject to CSE and legalization, transforms could
potentially obscure the value (and create extra work for the
selector). The register bank of a G_CONSTANT is also meaningful, so
this could throw off future folding and legalization logic for AMDGPU.

This will be much more convenient to work with than needing to call
getConstantVRegVal and checking if it may have failed for every
constant intrinsic parameter. AMDGPU has quite a lot of intrinsics wth
immarg operands, many of which need inspection during lowering. Having
to find the value in a register is going to add a lot of boilerplate
and waste compile time.

SelectionDAG has always provided TargetConstant for constants which
should not be legalized or materialized in a register. The distinction
between Constant and TargetConstant was somewhat fuzzy, and there was
no automatic way to force usage of TargetConstant for certain
intrinsic parameters. They were both ultimately ConstantSDNode, and it
was inconsistently used. It was quite easy to mis-select an
instruction requiring an immediate. For SelectionDAG, start emitting
TargetConstant for these arguments, and using timm to match them.

Most of the work here is to cleanup target handling of constants. Some
targets process intrinsics through intermediate custom nodes, which
need to preserve TargetConstant usage to match the intrinsic
expectation. Pattern inputs now need to distinguish whether a constant
is merely compatible with an operand or whether it is mandatory.

The GlobalISelEmitter needs to treat timm as a special case of a leaf
node, simlar to MachineBasicBlock operands. This should also enable
handling of patterns for some G_* instructions with immediates, like
G_FENCE or G_EXTRACT.

This does include a workaround for a crash in GlobalISelEmitter when
ARM tries to uses "imm" in an output with a "timm" pattern source.

llvm-svn: 372285
2019-09-19 01:33:14 +00:00
Matt Arsenault
fb51e64eac AMDGPU/GlobalISel: Fail select of G_INSERT non-32-bit source
This was producing an illegal copy which would hit an assert
later. Error on selection for now until this is implemented.

llvm-svn: 371993
2019-09-16 14:26:14 +00:00
Matt Arsenault
3b7ffc6ae7 AMDGPU/GlobalISel: Fix assert on multi-return side effect intrinsics
llvm.amdgcn.else hits this.

llvm-svn: 371812
2019-09-13 04:12:12 +00:00
Matt Arsenault
77e3e9cafd AMDGPU/GlobalISel: Select llvm.amdgcn.class
Also fixes missing SubtargetPredicate on f16 class instructions.

llvm-svn: 371436
2019-09-09 18:29:45 +00:00
Matt Arsenault
d6c1f5bb15 AMDGPU/GlobalISel: Select fmed3
llvm-svn: 371435
2019-09-09 18:29:37 +00:00
Matt Arsenault
c34b4036ff AMDGPU/GlobalISel: Select G_PTR_MASK
llvm-svn: 371412
2019-09-09 15:46:13 +00:00
Matt Arsenault
2dd088ec7d AMDGPU/GlobalISel: Use known bits for selection
llvm-svn: 371409
2019-09-09 15:39:32 +00:00
Matt Arsenault
d50f937378 AMDGPU/GlobalISel: Try generated matcher before add/sub code
This will allow optimization patterns which fold adds away to work.

llvm-svn: 371406
2019-09-09 15:20:44 +00:00
Matt Arsenault
d51a3746d0 AMDGPU/GlobalISel: Fix assert on load from constant address
llvm-svn: 371006
2019-09-05 02:20:25 +00:00
Matt Arsenault
a8bbcbd006 AMDGPU/GlobalISel: Fix constraining scalar and/or/xor
If the result register already had a register class assigned, the
sources may not have been properly constrained.

llvm-svn: 370150
2019-08-28 02:11:03 +00:00
Daniel Sanders
0c47611131 Apply llvm-prefer-register-over-unsigned from clang-tidy to LLVM
Summary:
This clang-tidy check is looking for unsigned integer variables whose initializer
starts with an implicit cast from llvm::Register and changes the type of the
variable to llvm::Register (dropping the llvm:: where possible).

Partial reverts in:
X86FrameLowering.cpp - Some functions return unsigned and arguably should be MCRegister
X86FixupLEAs.cpp - Some functions return unsigned and arguably should be MCRegister
X86FrameLowering.cpp - Some functions return unsigned and arguably should be MCRegister
HexagonBitSimplify.cpp - Function takes BitTracker::RegisterRef which appears to be unsigned&
MachineVerifier.cpp - Ambiguous operator==() given MCRegister and const Register
PPCFastISel.cpp - No Register::operator-=()
PeepholeOptimizer.cpp - TargetInstrInfo::optimizeLoadInstr() takes an unsigned&
MachineTraceMetrics.cpp - MachineTraceMetrics lacks a suitable constructor

Manual fixups in:
ARMFastISel.cpp - ARMEmitLoad() now takes a Register& instead of unsigned&
HexagonSplitDouble.cpp - Ternary operator was ambiguous between unsigned/Register
HexagonConstExtenders.cpp - Has a local class named Register, used llvm::Register instead of Register.
PPCFastISel.cpp - PPCEmitLoad() now takes a Register& instead of unsigned&

Depends on D65919

Reviewers: arsenm, bogner, craig.topper, RKSimon

Reviewed By: arsenm

Subscribers: RKSimon, craig.topper, lenary, aemerson, wuzish, jholewinski, MatzeB, qcolombet, dschuff, jyknight, dylanmckay, sdardis, nemanjai, jvesely, wdng, nhaehnle, sbc100, jgravelle-google, kristof.beyls, hiraditya, aheejin, kbarton, fedor.sergeev, javed.absar, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, tpr, PkmX, jocewei, jsji, Petar.Avramovic, asbirlea, Jim, s.egerton, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D65962

llvm-svn: 369041
2019-08-15 19:22:08 +00:00
Amara Emerson
e14c91b71a [GlobalISel] Make the InstructionSelector instance non-const, allowing state to be maintained.
Currently we can't keep any state in the selector object that we get from
subtarget. As a result we have to plumb through all our variables through
multiple functions. This change makes it non-const and adds a virtual init()
method to allow further state to be captured for each target.

AArch64 makes use of this in this patch to cache a call to hasFnAttribute()
which is expensive to call, and is used on each selection of G_BRCOND.

Differential Revision: https://reviews.llvm.org/D65984

llvm-svn: 368652
2019-08-13 06:26:59 +00:00
Daniel Sanders
2bea69bf65 Finish moving TargetRegisterInfo::isVirtualRegister() and friends to llvm::Register as started by r367614. NFC
llvm-svn: 367633
2019-08-01 23:27:28 +00:00
Matt Arsenault
57495268ac AMDGPU/GlobalISel: Remove manual store select code
This regresses the weird types that are newly treated as legal load
types, but fixes incorrectly using flat instrucions on SI.

llvm-svn: 367512
2019-08-01 03:52:40 +00:00
Matt Arsenault
26cb53b260 AMDGPU/GlobalISel: Handle G_ATOMICRMW_FADD
llvm-svn: 367509
2019-08-01 03:33:15 +00:00
Matt Arsenault
da5b9bfa95 AMDGPU/GlobalISel: Allow selection of DS atomicrmw
llvm-svn: 367507
2019-08-01 03:29:01 +00:00
Matt Arsenault
3baf4d3418 AMDGPU/GlobalISel: Select simple local stores
llvm-svn: 367504
2019-08-01 03:09:15 +00:00
Matt Arsenault
3594011de0 AMDGPU/GlobalISel: Select local loads
llvm-svn: 367498
2019-08-01 00:53:38 +00:00
Matt Arsenault
0e7d8698b5 AMDGPU/GlobalISel: Don't assume instruction can be erased when selecting exts
The G_ANYEXT handling can end up reaching selectCOPY, which mutates
the instruction in place.

llvm-svn: 366915
2019-07-24 16:05:53 +00:00
Matt Arsenault
937d0ee5d8 AMDGPU/GlobalISel: Remove unnecessary code
The minnum/maxnum case are dead, and the cvt is handled by the
default.

llvm-svn: 366685
2019-07-22 13:05:25 +00:00
Matt Arsenault
7161fb0be5 AMDGPU/GlobalISel: Select private loads
llvm-svn: 366248
2019-07-16 19:22:21 +00:00
Matt Arsenault
dad1f89210 AMDGPU/GlobalISel: Select flat stores
llvm-svn: 366246
2019-07-16 18:42:53 +00:00
Matt Arsenault
35c96598b1 AMDGPU/GlobalISel: Select flat loads
Now that the patterns use the new PatFrag address space support, the
only blocker to importing most load patterns is the addressing mode
complex patterns.

llvm-svn: 366237
2019-07-16 18:05:29 +00:00
Matt Arsenault
22c4a147a9 AMDGPU/GlobalISel: Fix test failures in release build
Apparently the check for legal instructions during instruction
select does not happen without an asserts build, so these would
successfully select in release, and fail in debug.

Make s16 and/or/xor legal. These can just be selected directly
to the 32-bit operation, as is already done in SelectionDAG, so just
make them legal.

llvm-svn: 366210
2019-07-16 14:28:30 +00:00
Matt Arsenault
c8291c94f8 AMDGPU/GlobalISel: Select G_AND/G_OR/G_XOR
llvm-svn: 366121
2019-07-15 19:50:07 +00:00
Matt Arsenault
ad19b50c00 AMDGPU/GlobalISel: Don't constrain source register of VCC copies
This is a hack until I come up with a better way of dealing with the
pseudo-register banks used for boolean values. If the use instruction
constrains the register, the selector for the def instruction won't
see that the bank was VCC. A 1-bit SReg_32 is could ambiguously have
been SCCRegBank or VCCRegBank in wave32.

This is necessary to successfully select branches with and and/or/xor
condition.

llvm-svn: 366120
2019-07-15 19:48:36 +00:00
Matt Arsenault
e1b52f4180 AMDGPU/GlobalISel: Fix selecting vcc->vcc bank copies
The extra test change is correct, although how it arrives there is a
bug that needs work. With wave32, the test for isVCC ambiguously
reports true for an SCC or VCC source. A new allocatable pseudo
register class for SCC may be necesssary.

llvm-svn: 366119
2019-07-15 19:46:48 +00:00
Matt Arsenault
3bfdb54d88 AMDGPU/GlobalISel: Fix not constraining result reg of copies to VCC
llvm-svn: 366118
2019-07-15 19:45:49 +00:00
Matt Arsenault
18b7133843 AMDGPU/GlobalISel: Fix handling of sgpr (not scc bank) s1 to VCC
This was emitting a copy from a 32-bit register to a 64-bit.

llvm-svn: 366117
2019-07-15 19:44:07 +00:00
Matt Arsenault
5dfd466032 AMDGPU/GlobalISel: Fix G_ICMP for wave32
llvm-svn: 366114
2019-07-15 19:39:31 +00:00
Matt Arsenault
53fa759ff5 AMDGPU/GlobalISel: Handle llvm.amdgcn.if.break
llvm-svn: 366102
2019-07-15 18:25:24 +00:00
Matt Arsenault
b390121efb AMDGPU/GlobalISel: Select llvm.amdgcn.end.cf
llvm-svn: 366099
2019-07-15 18:18:46 +00:00
Matt Arsenault
a65913e752 AMDGPU/GlobalISel: Select easy cases for G_BUILD_VECTOR
llvm-svn: 366087
2019-07-15 17:26:43 +00:00