Commit Graph

953 Commits

Author SHA1 Message Date
Mirko Brkusanin
cf40db21af [AMDGPU][GlobalISel] Fix G_AMDGPU_TBUFFER_STORE_FORMAT mapping
Add missing mappings and tablegen definitions for TBUFFER_STORE_FORMAT.

Differential Revision: https://reviews.llvm.org/D83240
2020-07-10 11:32:32 +02:00
Matt Arsenault
fdde69aac9 AMDGPU/GlobalISel: Work around verifier error in test
The unfortunate split between finalizeLowering and the selector pass
means there's a point where the verifier fails. The DAG selector pass
skips the verifier, but this seems to not work when using the
GlobalISel fallback.
2020-07-09 10:24:16 -04:00
Matt Arsenault
74a148ad39 GlobalISel: Verify G_BITCAST changes the type
Updated the AArch64 tests the best I could with my vague, inferred
understanding of AArch64 register banks. As far as I can tell, there
is only one 32-bit/64-bit type which will use the gpr register bank,
so we have to use the fpr bank for the other operand.
2020-07-08 17:16:27 -04:00
Jay Foad
a8816ebee0 [AMDGPU] Fix and simplify AMDGPULegalizerInfo::legalizeUDIV_UREM32Impl
Use the algorithm from AMDGPUCodeGenPrepare::expandDivRem32.

Differential Revision: https://reviews.llvm.org/D83383
2020-07-08 19:14:49 +01:00
Jay Foad
f4bd01c191 [AMDGPU] Fix and simplify AMDGPUCodeGenPrepare::expandDivRem32
Fix the division/remainder algorithm by adding a second quotient
refinement step, which is required in some cases like
0xFFFFFFFFu / 0x11111111u (https://bugs.llvm.org/show_bug.cgi?id=46212).

Also document, rewrite and simplify it by ensuring that we always have a
lower bound on inv(y), which simplifies the UNR step and the quotient
refinement steps.

Differential Revision: https://reviews.llvm.org/D83381
2020-07-08 19:14:48 +01:00
Matt Arsenault
23157f3bdb GlobalISel: Handle EVT argument lowering correctly
handleAssignments was assuming every argument type is an MVT, and
assignArg would always fail. This fixes one of the hacks in the
current AMDGPU calling convention code that pre-processes the
arguments.
2020-07-07 16:36:14 -04:00
Matt Arsenault
42bb481442 AMDGPU/GlobalISel: Fix skipping unused kernel arguments
The tests in a5b9ad7e9a actually failed
the verifier, which for some reason is not the default. Also add tests
for 0-sized function arguments, which do not add entries to the
expected register lists.
2020-07-07 16:36:13 -04:00
Matt Arsenault
a5b9ad7e9a AMDGPU/GlobalISel: Don't emit code for unused kernel arguments 2020-07-06 09:04:06 -04:00
Matt Arsenault
581f1823cd AMDGPU/GlobalISel: Fix hardcoded register number checks in test 2020-07-06 09:01:59 -04:00
Matt Arsenault
bcff3deaa1 AMDGPU/GlobalISel: Add some missing return tests 2020-07-06 09:01:18 -04:00
Jay Foad
6f1694759c [AMDGPU] Fix formatting in MIR tests 2020-07-02 10:27:34 +01:00
Petar Avramovic
4b9ae1b7e5 AMDGPU/GlobalISel: Select init_exec intrinsic
Change imm with timm in pattern for SI_INIT_EXEC_LO and
remove regbank mappings for non register operands.

Differential Revision: https://reviews.llvm.org/D82885
2020-07-01 11:50:59 +02:00
Matt Arsenault
291ece0efa AMDGPU/GlobalISel: Remove some selection tests which should be invalid
These use undef generic virtual register operands, which should be
rejected by the verifier.
2020-06-30 19:18:01 -04:00
Petar Avramovic
d717382633 AMDGPU/GlobalISel: Select icmp intrinsic
Select into corresponding V_CMP instruction based on CmpInst predicate,
stored as immediate, in last operand.

Differential Revision: https://reviews.llvm.org/D82652
2020-06-30 10:57:41 +02:00
Petar Avramovic
4b980cc9ca [GlobalISel][InlineAsm] Add support for matching input constraints
Find def operand that corresponds to matching constraint and
tie input to that operand.

Differential Revision: https://reviews.llvm.org/D82651
2020-06-30 10:49:05 +02:00
Matt Arsenault
443556c18f AMDGPU/GlobalISel: Fix some legalization of < dword vector stores
This avoids many instances of failing to legalize a vector truncstore
of <4 x s8> to 2 bytes. We don't perfectly handle every truncstore
yet, largely because the given set of legalization actions can't
actually differentiate between changing the result type and changing
the memory type.
2020-06-26 18:07:39 -04:00
Matt Arsenault
431daedee4 AMDGPU/GlobalISel: Fix legacy clover kernel argument ABI
This had an extra attempt to align the pointer, which only did
anything with a base kernel argument offset which only clover used to
use.
2020-06-26 10:03:05 -04:00
Matt Arsenault
54573528ae AMDGPU/GlobalISel: Add baseline checks for legacy clover kernel ABI
I'm not sure we actually need to support this now, since I think
clover always explicitly uses amdgcn-mesa-mesa3d now, not the
ill-defined amdgcn-- behavior.
2020-06-26 10:03:05 -04:00
Matt Arsenault
b1cfa64cb1 AMDGPU/GlobalISel: Uncomment some fixed tests 2020-06-26 10:03:05 -04:00
Matt Arsenault
a448670752 AMDGPU/GlobalISel: Legalize 64-bit G_SDIV/G_SREM
Now all the divisions should be complete, although we should fix
emitting the entire common part for div/rem when you use both.
2020-06-24 11:39:45 -04:00
Matt Arsenault
a162048a47 AMDGPU/GlobalISel: Fix fixed ABI special VGPR function arguments
I forgot to copy the new fixed function ABI into GlobalISel, so this
was mismatched with the DAG compiled calling function. This was
allocating part of the argument list to v31, which was supposed to be
reserved for the workitem IDs.
2020-06-23 21:21:35 -04:00
Your Name
cc9d693856 [AMDGPU/MemOpsCluster] Implement new heuristic for computing max mem ops cluster size
Summary:
Make use of both the - (1) clustered bytes and (2) cluster length, to decide on
the max number of mem ops that can be clustered. On an average, when loads
are dword or smaller, consider `5` as max threshold, otherwise `4`. This
heuristic is purely based on different experimentation conducted, and there is
no analytical logic here.

Reviewers: foad, rampitec, arsenm, vpykhtin

Reviewed By: rampitec

Subscribers: llvm-commits, kerbowa, hiraditya, t-tye, Anastasia, tpr, dstuttard, yaxunl, nhaehnle, wdng, jvesely, kzhuravl, thakis

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D82393
2020-06-24 00:39:41 +05:30
Matt Arsenault
db777eaea3 AMDGPU/GlobalISel: Fix asserts on non-s32 sitofp/uitofp sources
The combine to form cvt_f32_ubyte0 was assuming the source type was
always 32-bit, but this needs to tolerate any legal source type.
2020-06-23 10:00:35 -04:00
Carl Ritson
8f3b2c8aa3 AMDGPU/GlobalISel: Remove selection of MAD/MAC when not available
Add code to respect mad-mac-f32-insts target feature.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D81990
2020-06-19 10:30:19 +09:00
Matt Arsenault
3b34f3fcca AMDGPU/GlobalISel: Fix obvious bug in ported 32-bit udiv/urem
This was hidden by the IR expansion in AMDGPUCodeGenPrepare, which I
forgot to turn off.
2020-06-16 22:46:35 -04:00
Matt Arsenault
c5c58fd6b5 AMDGPU: Remove intermediate DAG node for trig_preop intrinsic
We weren't doing anything with this, and keeping it would just add
more boilerplate for GlobalISel.
2020-06-16 21:06:25 -04:00
Stanislav Mekhanoshin
9ee272f13d [AMDGPU] Add gfx1030 target
Differential Revision: https://reviews.llvm.org/D81886
2020-06-15 16:18:05 -07:00
Matt Arsenault
1a7f115dce AMDGPU/GlobalISel: Extend load/store workaround to i128 vectors 2020-06-15 14:55:11 -04:00
Matt Arsenault
362eedcbb4 AMDGPU/GlobalISel: Correct memory size in test 2020-06-15 14:12:28 -04:00
Matt Arsenault
2ca552322c AMDGPU/GlobalISel: Fix 8-byte aligned, 96-bit scalar loads
These are legal since we can do a 96-bit load on some subtargets, but
this is only for vector loads. If we can't widen the load, it needs to
be broken down once known scalar. For 16-byte alignment, widen to a
128-bit load.
2020-06-15 11:33:16 -04:00
Matt Arsenault
dae9554b2b AMDGPU/GlobalISel: Workaround some load/store type selection patterns
The logic is written for what loads/stores should be selectable. There
are a set of cases that should be selectable, but due to missing MVTs
and/or selection patterns, will fail to select. I think eventually
load/store select patterns should ignore the type and only look at the
value size, but until that happens, bitcast these to equivalent i32
vectors.
2020-06-15 07:42:20 -04:00
Matt Arsenault
96229606f9 AMDGPU/GlobalISel: Use less artifical example to avoid abort=0
These were failing due to an unlegalizable G_CONCAT_VECTORS due to
registers with types that are naturally illegal.
2020-06-15 07:37:15 -04:00
Matt Arsenault
33e9086501 GlobalISel: Support lowering vector->vector G_BITCAST
Extract subvectors and cast to the result element type before
remerging.
2020-06-15 07:36:30 -04:00
Matt Arsenault
fb51d508ee AMDGPU/GlobalISel: Select general case for G_PTRMASK 2020-06-14 13:12:29 -04:00
Matt Arsenault
350ee7fb3f GlobalISel: Fix not erasing old instruction in sitofp/uitofp lowering 2020-06-12 10:33:23 -04:00
Sebastian Neubauer
29a6ad94fd [AMDGPU] Add G16 support to image instructions
Add G16 feature for GFX10 and support A16 and G16 in GlobalISel.

Differential Revision: https://reviews.llvm.org/D76836
2020-06-12 11:26:31 +02:00
Matt Arsenault
7d913becfc AMDGPU/GlobalISel: Fix select of private <2 x s16> load 2020-06-11 19:25:25 -04:00
Matt Arsenault
27f8bd94cb AMDGPU/GlobalISel: Fix select of <8 x s64> scalar load 2020-06-11 19:09:43 -04:00
Matt Arsenault
2247072b65 AMDGPU/GlobalISel: Set insert point when emitting control flow pseudos
This was implicitly assuming the branch instruction was the next after
the pseudo. It's possible for another non-terminator instruction to be
inserted between the intrinsic and the branch, so adjust the insertion
point. Fixes a non-terminator after terminator verifier error (which
without the verifier, manifested itself as an infinite loop in
analyzeBranch much later on).
2020-06-11 18:53:26 -04:00
Petar Avramovic
bd3d951b8b AMDGPU/GlobalISel: Fix lower for f64->f16 G_FPTRUNC
Put AND before ADD in LegalizerHelper::lowerFPTRUNC_F64_TO_F16
in order to match algorithm from AMDGPUTargetLowering::LowerFP_TO_FP16.

Differential Revision: https://reviews.llvm.org/D81666
2020-06-11 18:19:27 +02:00
Matt Arsenault
19b3b886b7 AMDGPU/GlobalISel: Fix porting error in 32-bit division
The baffling thing is this passed the OpenCL conformance test for
32-bit integer divisions, but only failed in the 32-bit path of
BypassSlowDivisions for the 64-bit tests.
2020-06-10 21:48:58 -04:00
Stanislav Mekhanoshin
09d325b20c AMDGPU/GlobalISel: cmp/select method for insert element
Differential Revision: https://reviews.llvm.org/D80754
2020-06-10 13:12:54 -07:00
Matt Arsenault
ea1bd95411 AMDGPU/GlobalISel: Make G_IMPLICIT_DEF legality more consistent
Makes <6 x s16> legal, <4 x s8> illegal, and clamps the maximum size
to 1024.
2020-06-10 11:05:59 -04:00
Matt Arsenault
44b355f34b AMDGPU/GlobalISel: Add new baseline tests for bitcast legalization 2020-06-09 15:46:53 -04:00
hsmahesha
7410571ce9 Revert "[AMDGPU/MemOpsCluster] Implement new heuristic for computing max mem ops cluster size"
This reverts commit 40a632a335.
2020-06-09 19:27:17 +05:30
hsmahesha
40a632a335 [AMDGPU/MemOpsCluster] Implement new heuristic for computing max mem ops cluster size
Summary:
Make use of both the - (1) clustered bytes and (2) cluster length, to decide on
the max number of mem ops that can be clustered. On an average, when loads
are dword or smaller, consider `5` as max threshold, otherwise `4`. This heuristic
is purely based on different experimentation conducted, and there is no analytical
logic here.

Reviewers: foad, rampitec, arsenm, vpykhtin

Reviewed By: foad, rampitec

Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, Anastasia, t-tye, hiraditya, kerbowa, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D81085
2020-06-09 14:09:14 +05:30
Matt Arsenault
67b700480b AMDGPU/GlobalISel: Precommit regenerated check lines
The update_*test_checks scripts miss new stuff added at the end of
lines. Regenerate checks so the new mode register operands don't show
up in the diff of a future patch.
2020-06-08 12:47:45 -04:00
Matt Arsenault
38fb446fc7 AMDGPU/GlobalISel: Fix test failure in release build
The annoying behavior where the output is different due to the
legality check struck again, plus the subtarget predicate wasn't
really correctly set for DS FP atomics.

Some of the FP min/max instructions seem to be in the gfx6/gfx7
manuals, but IIRC this might have been one of the cases where the
manual got ahead of the actual hardware support, but I've left these
as-is for now since the assembler tests seem to expect them.
2020-06-06 11:01:18 -04:00
Matt Arsenault
bc20bdb9f9 AMDGPU/GlobalISel: Start rewriting load/store legality rules
The current set is an incomprehensible mess riddled with ordering
hacks for various limitations in the legalizer at the time of writing,
many of which have been fixed. This takes a very small step in
correcting this.

The core first change is to start checking for fully legal cases
first, rather than trying to figure out all of the actions that could
need to be performed. It's recommended to check the legal cases first
for faster legality checks in the common case. This still has a table
listing some common cases, but it needs measuring whether this really
helps or not.

More significantly, stop trying to allow any arbitrary type with a
legal bitwidth as a legal memory type, and start using the bitcast
legalize action for them. Allowing loads of these weird vector types
produced new burdens we don't need for handling all of the
legalization artifacts. Unlike the SelectionDAG handling, this is
still not casting 64 or 16-bit element vectors to 32-bit
vectors. These cases should still be handled by increasing/decreasing
the number of 16-bit elements. This is primarily to fix 8-bit element
vectors.

Another change is to stop trying to handle the load-widening based on
a higher alignment. We should still do this, but the way it was
handled wasn't really correct. We really need to modify the MMO's size
at the same time, and not just increase the result type. The
LegalizerHelper does not do this, and I think this would really
require a separate WidenMemory action (or to add a memory action
payload to the LegalizeMutation). These will now fail to legalize.

The structure of the legalizer rules makes writing concise rules here
difficult. It would be easier if the same function could answer the
query the query, and report the action to perform at the same
time. Instead these two are split into distinct predicate and action
functions. This is mostly tolerable for other cases, but the
load/store rules get pretty complicated so it's difficult to keep two
versions of these functions in sync.
2020-06-06 09:59:46 -04:00
Stanislav Mekhanoshin
5d62606f90 AMDGPU/GlobalISel: cmp/select method for extract element
Differential Revision: https://reviews.llvm.org/D80749
2020-06-05 12:57:40 -07:00