Commit Graph

4279 Commits

Author SHA1 Message Date
Jay Foad
b5f3383152 [AMDGPU] Add another test case for combining DS reads 2021-02-10 14:59:49 +00:00
Matt Arsenault
f4ca6d8289 AMDGPU: Fix verifier error with argument passed in CSR SGPR
We need to avoid setting the kill flag on the CSR spill if there's an
additional use of the register after the spill.

This does rely on consistency between the entry block liveins and the
MRI's function live ins, which is not something the verifier checks
now.
2021-02-09 13:49:44 -05:00
Matt Arsenault
e855cc6d04 AMDGPU/GlobalISel: Remove dead check prefixes 2021-02-08 17:09:28 -05:00
Jay Foad
d8323b1a86 [AMDGPU] Generate test checks and add GFX10 test coverage
Differential Revision: https://reviews.llvm.org/D96143
2021-02-08 12:57:51 +00:00
Thomas Symalla
f89f6d1e5d [AMDGPU]: Fixes an invalid clamp selection pattern.
When running the tests on PowerPC and x86, the lit test GlobalISel/trunc.ll fails at the memory sanitize step. This seems to be due to wrong invalid logic (which matches even if it shouldn't) and likely missing variable initialisation."

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D95878
2021-02-08 13:06:30 +01:00
Wen-Heng (Jack) Chung
04766c401b [AMDGPU] Add Fiji target in fptosi/fptoui instruction-select MIR tests.
In response to review comments in D95964, add a target with f16 instructions.

Differential Revision: https://reviews.llvm.org/D96061
2021-02-05 11:33:54 -06:00
Wen-Heng (Jack) Chung
50578cf339 [AMDGPU] Add f16 to i1 CodeGen patterns.
Follow patterns used for f32 and f64 types.

Differential Revision: https://reviews.llvm.org/D95964
2021-02-04 11:44:18 -06:00
Jay Foad
d84e5fdac1 [AMDGPU][GlobalISel] Fix v2s16 right shifts
When widening, each half of the v2s16 operands needs to be sign extended
for G_ASHR or zero extended for G_LSHR.

Differential Revision: https://reviews.llvm.org/D96048
2021-02-04 17:04:32 +00:00
Jay Foad
b3bb5c3efc [AMDGPU][GlobalISel] Use scalar min/max instructions
SALU min/max s32 instructions exist so use them. This means that
regbankselect can handle min/max much like add/sub/mul/shifts.

Differential Revision: https://reviews.llvm.org/D96047
2021-02-04 17:04:32 +00:00
Konstantin Zhuravlyov
6054a456da AMDGPU: Add support for amdgpu-unsafe-fp-atomics attribute
If amdgpu-unsafe-fp-atomics is specified, allow {flat|global}_atomic_add_f32 even if atomic modes don't match.

Differential Revision: https://reviews.llvm.org/D95391
2021-02-04 08:09:34 -05:00
Sebastian Neubauer
6c59dc474d [AMDGPU] Save all lanes for reserved VGPRs
When SGPRs are spilled to VGPRs, they can overwrite any lane. We need
to preserve the value of inactive lanes in function calls, so we save
the register even if it is marked as caller saved.

Also, teach buildPrologSpill to work when no registers are free like in
CodeGen/AMDGPU/pei-scavenge-vgpr-spill.mir and update the comment on
findScratchNonCalleeSaveRegister as it is not used anymore to realign
the stack pointer since D95865.

Differential Revision: https://reviews.llvm.org/D95946
2021-02-04 09:56:36 +01:00
Amara Emerson
1a13ee1efb [GlobalISel] Add sext(constant) -> constant artifact combine.
This is the G_SEXT counterpart to the existing G_ZEXT/G_ANYEXT combines.

Differential Revision: https://reviews.llvm.org/D95729
2021-02-03 14:10:08 -08:00
Matt Arsenault
39fbb5c3e3 RegisterCoalescer: Fix not setting undef on coalesced subregister uses
This was only adding undef to the use if the copy itself had a
subregister index. It did not consider the subrange liveness if the
use had a subreg index to begin with.
2021-02-03 13:54:43 -05:00
Matt Arsenault
d886da042c RegisterCoalescer: Prune undef subranges from copy pairs in loops
If we had a pair of copies inside a loop which introduced new liveness
to a subregister which was undef before the loop, we would have a
dummy phi-only segment remaining across the loop body. Later, this
false segment would confuse RenameIndependentSubregs causing it to
introduce IMPLICIT_DEFs with broken value numbering.

It seems always adding the lanes to ShrinkMask is OK, so any
conditions should be purely a compile time filter.
2021-02-03 13:42:53 -05:00
Matt Arsenault
477e3fe4f8 Revert "AMDGPU: Don't consider global pressure when bundling soft clauses"
This reverts commit 1e377a273f.

A regression was reported.
2021-02-03 13:25:05 -05:00
Stanislav Mekhanoshin
6038d68baf [AMDGPU] Added -mcpu to couple more tests. NFC. 2021-02-03 10:20:18 -08:00
Juneyoung Lee
06829034ca Revert "[ConstantFold] Fold more operations to poison"
This reverts commit 53040a968d due to its
bad interaction with select i1 -> and/or i1 transformation.

This fixes:
https://bugs.llvm.org/show_bug.cgi?id=49005
https://bugs.llvm.org/show_bug.cgi?id=48435
2021-02-04 00:24:02 +09:00
Matt Arsenault
9719f17011 AMDGPU: Move handling of allocation of fixed ABI inputs
For the fixed ABI, set this in the initial argument constructor,
rather than relying on the allocation logic to set the values. Also
stop passing them for amdgpu_gfx, since the DAG path seems to skip
these. I'm unclear on what amdgpu_gfx's expectations are.  This will
allow moving the special input registers out of the normal argument
range.
2021-02-03 09:27:59 -05:00
Sebastian Neubauer
d49efdc969 Revert "[AMDGPU] Add a new Clamp Pattern to the GlobalISel Path."
This reverts commits 62af0305b7cc..677a3529d3e6 from D93708.
They cause failures in the sanitizer builds because of uninitialized
values.

A fix is in D95878, but it might take some time until this is pushed,
so reverting the changes for now.
2021-02-03 11:03:34 +01:00
Matt Arsenault
af2cbe8eff AMDGPU: Fix adding extra operands for i128 asm constraints
We don't register i128 as a legal type with addRegisterClass, but it
appears in the list of legal register types. This inconsistency
resulted in the asm constraint lowering trying to use 2 128-bit
registers for these operands. This would leave behind a dead def that
would waste registers.

Regresses GlobalISel tests for i128 load/store, but these aren't very
important right now. Ideally these would not depend on the list of
register types.
2021-02-02 19:01:04 -05:00
Matt Arsenault
1e377a273f AMDGPU: Don't consider global pressure when bundling soft clauses
This should only consider whether the pressure impact of the bundle at
the given point in the program will decrease the occupancy. High VGPR
pressure was incorrectly blocking the formation of scalar bundles, and
vice versa. This was also blocking bundling from high pressure
situations at other points in the program.
2021-02-02 19:00:14 -05:00
Sebastian Neubauer
8b898b19a8 [AMDGPU] Remove unused tmp register
The temporary register is only used to compute the frame pointer.
The frame pointer is overwritten and not used in between, so we
can reuse the frame pointer for the computation, saving one register.

Differential Revision: https://reviews.llvm.org/D95865
2021-02-02 17:17:54 +01:00
Sebastian Neubauer
6b6ae583cf [AMDGPU] Save fp/bp after csr saves
Saving callee-save registers happens in whole wave mode. Exec is saved
to a free register, which can be reused to save the frame pointer.
Therefore, saving the fp needs to happen after saving csrs.

Differential Revision: https://reviews.llvm.org/D95861
2021-02-02 17:17:54 +01:00
Sebastian Neubauer
b91afa474e [AMDGPU] Mark epilog restores as frame-destroy
I guess instructions were marked as frame-setup by accident, they are
restores as part of the epilog.

Differential Revision: https://reviews.llvm.org/D95783
2021-02-02 10:24:37 +01:00
Thomas Symalla
fa3e840d3d Removed the generic virtual register creations. Reworked the tests. 2021-02-02 09:14:54 +01:00
Thomas Symalla
6604d81e1b Added and used new target pseudo for v_cvt_pk_i16_i32, changes due to code review. 2021-02-02 09:14:53 +01:00
Thomas Symalla
79e729bdf1 Fixed tests. 2021-02-02 09:14:53 +01:00
Thomas Symalla
3a46502264 Move step to PreLegalizer 2021-02-02 09:14:53 +01:00
Thomas Symalla
cdfd9b3bf5 Move Combiner to PreLegalize step 2021-02-02 09:14:53 +01:00
Thomas Symalla
f2ef2fbc69 Renamed identifiers in lit 2021-02-02 09:14:53 +01:00
Thomas Symalla
dae85e4671 Fixed the lit tests and a bug in the implementation. 2021-02-02 09:14:52 +01:00
Thomas Symalla
d41b7fa9bf Renames 2021-02-02 09:14:52 +01:00
Thomas Symalla
62af0305b7 Added clamp i64 to i16 global isel pattern. 2021-02-02 09:14:52 +01:00
Matt Arsenault
41877b82f0 AMDGPU: Fix dbg_value handling when forming soft clause bundles
DBG_VALUES placed between memory instructions would change
codegen. Skip over these and re-insert them after the bundle instead
of giving up on bundling.
2021-02-01 22:16:35 -05:00
Austin Kerbow
0397dca021 [AMDGPU] Fix crash with sgpr spills to vgpr disabled
This would assert with amdgpu-spill-sgpr-to-vgpr disabled when trying to
spill the FP.

Fixes: SWDEV-262704

Reviewed By: RamNalamothu

Differential Revision: https://reviews.llvm.org/D95768
2021-02-01 08:35:25 -08:00
Matt Arsenault
1801e2aa24 RegAlloc: Fix assert if all registers in class reserved
With a context instruction, this would produce a context
error. However, it would continue on and do an out of bounds access of
the empty allocation order array.
2021-01-31 11:10:04 -05:00
Roman Lebedev
a78d8feb48 [LowerConstantIntrinsics] Preserve Dominator Tree, if avaliable 2021-01-30 01:14:50 +03:00
Carl Ritson
0824694d68 [AMDGPU] Fix WMM Entry SCC preservation
SCC was not correctly preserved when entering WWM.
Current lit test was unable to detect this as entry block is
handled differently.
Additionally fix an issue where SCC was unnecessarily preserved
when exiting from WWM to Exact mode.

Reviewed By: foad

Differential Revision: https://reviews.llvm.org/D95500
2021-01-29 10:05:36 +09:00
Carl Ritson
0e8f50595e [AMDGPU] Mark V_SET_INACTIVE as defining SCC
V_SET_INACTIVE is implemented with S_NOT which clobbers SCC.
Mark sure it is marked appropriately.

Reviewed By: piotr

Differential Revision: https://reviews.llvm.org/D95509
2021-01-29 09:46:41 +09:00
Cassie Jones
f22f4557a7 [GlobalISel] Implement widenScalar for carry-in add/sub
These are widened to a wider UADDE/USUBE, with the overflow value
unused, and with the same synthesis of a new overflow value as for the
O operations.

Reviewed By: paquette

Differential Revision: https://reviews.llvm.org/D95326
2021-01-28 17:06:24 -05:00
Jay Foad
39ef0965df [AMDGPU] Simplify some RUN lines. NFC. 2021-01-28 17:57:55 +00:00
Mirko Brkusanin
3c979ae9ec [AMDGPU][GlobalISel] Remove redundant cmp when copying constant to vcc
Differential Revision: https://reviews.llvm.org/D95540
2021-01-28 11:20:09 +01:00
Mirko Brkusanin
4b422708ba [AMDGPU][GlobalISel] Handle G_PTR_ADD when looking for constant offset
Look throught G_PTRTOINT and G_PTR_ADD nodes when looking for constant
offset for buffer stores. This also helps with merging of these instructions
later on.

Differential Revision: https://reviews.llvm.org/D95242
2021-01-28 11:20:09 +01:00
Piotr Sobczak
fc8e741121 [AMDGPU] Avoid an illegal operand in si-shrink-instructions
Before the patch it was possible to trigger a constant bus
violation when folding immediates into a shrunk instruction.

The patch adds a check to enforce the legality of the new operand.

Differential Revision: https://reviews.llvm.org/D95527
2021-01-28 08:49:21 +01:00
Carl Ritson
2b9ed4fca6 [AMDGPU][NFC] Pre-commit test for D95509 2021-01-28 12:37:58 +09:00
Carl Ritson
8d8be87979 [AMDGPU][NFC] Generate llvm.amdgcn.set.inactive tests
This is a pre-commit for D95509.
2021-01-28 11:43:36 +09:00
Stanislav Mekhanoshin
d91ee2f782 [AMDGPU] Do not reassign spilled registers
We cannot call LRM::unassign() if LRM::assign() was never called
before, these are symmetrical calls. There are two ways of
assigning a physical register to virtual, via LRM::assign() and
via VRM::assignVirt2Phys(). LRM::assign() will call the VRM to
assign the register and then update LiveIntervalUnion. Inline
spiller calls VRM directly and thus LiveIntervalUnion never gets
updated. A call to LRM::unassign() then asserts about inconsistent
liveness.

We have to note that not all callers of the InlineSpiller even
have LRM to pass, RegAllocPBQP does not have it, so we cannot
always pass LRM into the spiller.

The only way to get into that spiller LRE_DidCloneVirtReg() call
is from LiveRangeEdit::eliminateDeadDefs if we split an LI.

This patch refuses to reassign a LiveInterval created by a split
to workaround the problem. In fact we cannot reassign a spill
anyway as all registers of the needed class are occupied and we
are spilling.

Fixes: SWDEV-267996

Differential Revision: https://reviews.llvm.org/D95489
2021-01-27 16:29:05 -08:00
Fangrui Song
4d28f0a6a4 [llc] Add reportError helper and canonicalize error messages 2021-01-26 15:33:37 -08:00
Jessica Paquette
f36007e811 [GlobalISel] Implement computeKnownBits for G_SEXT_INREG
Just use the existing `Known.sextInReg` implementation.

- Update KnownBitsTest.cpp.
- Update combine-redundant-and.mir for a more concrete example.

Differential Revision: https://reviews.llvm.org/D95484
2021-01-26 15:01:38 -08:00
Austin Kerbow
2291bd137d [AMDGPU] Update subtarget features for new target ID support
Support for XNACK and SRAMECC is not static on some GPUs. We must be able
to differentiate between different scenarios for these dynamic subtarget
features.

The possible settings are:

- Unsupported: The GPU has no support for XNACK/SRAMECC.
- Any: Preference is unspecified. Use conservative settings that can run anywhere.
- Off: Request support for XNACK/SRAMECC Off
- On: Request support for XNACK/SRAMECC On

GCNSubtarget will track the four options based on the following criteria. If
the subtarget does not support XNACK/SRAMECC we say the setting is
"Unsupported". If no subtarget features for XNACK/SRAMECC are requested we
must support "Any" mode. If the subtarget features XNACK/SRAMECC exist in the
feature string when initializing the subtarget, the settings are "On/Off".

The defaults are updated to be conservatively correct, meaning if no setting
for XNACK or SRAMECC is explicitly requested, defaults will be used which
generate code that can be run anywhere. This corresponds to the "Any" setting.

Differential Revision: https://reviews.llvm.org/D85882
2021-01-26 11:25:51 -08:00