Commit Graph

530 Commits

Author SHA1 Message Date
Matt Arsenault
3d0350b762 AMDGPU: Add MF independent version of getImplicitParameterOffset 2023-06-07 08:26:31 -04:00
Matt Arsenault
bc61bc8d6a AMDGPU: Use available subtarget member 2023-06-07 08:26:31 -04:00
Matt Arsenault
eece6ba283 IR: Add llvm.ldexp and llvm.experimental.constrained.ldexp intrinsics
AMDGPU has native instructions and target intrinsics for this, but
these really should be subject to legalization and generic
optimizations. This will enable legalization of f16->f32 on targets
without f16 support.

Implement a somewhat horrible inline expansion for targets without
libcall support. This could be better if we could introduce control
flow (GlobalISel version not yet implemented). Support for strictfp
legalization is less complete but works for the simple cases.
2023-06-06 17:07:18 -04:00
Jay Foad
a4a3ac10cb [AMDGPU] Remove extract_subvector patterns
Removing them seems to slightly increase code quality as well as
simplifying both the tablegen and C++ parts of the code.

Differential Revision: https://reviews.llvm.org/D149853
2023-06-06 14:04:50 +01:00
Krzysztof Drewniak
faa2c678aa [AMDGPU] Add buffer intrinsics that take resources as pointers
In order to enable the LLVM frontend to better analyze buffer
operations (and to potentially enable more precise analyses on the
backend), define versions of the raw and structured buffer intrinsics
that use `ptr addrspace(8)` instead of `<4 x i32>` to represent their
rsrc arguments.

The new intrinsics are named by replacing `buffer.` with `buffer.ptr`.

One advantage to these intrinsic definitions is that, instead of
specifying that a buffer load/store will read/write some memory, we
can indicate that the memory read or written will be based on the
pointer argument. This means that, for example, a read from a
`noalias` buffer can be pulled out of a loop that is modifying a
distinct buffer.

In the future, we will define custom PseudoSourceValues that will
allow us to package up the (buffer, index, offset) triples that buffer
intrinsics contain and allow for more precise backend analysis.

This work also enables creating address space 7, which represents
manipulation of raw buffers using native LLVM load and store
instructions.

Where tests simply used a buffer intrinsic while testing some other
code path (such as the tests for VGPR spills), they have been updated
to use the new intrinsic form. Tests that are "about" buffer
intrinsics (for instance, those that ensure that they codegen as
expected) have been duplicated, either within existing files or into
new ones.

Depends on D145441

Reviewed By: arsenm, #amdgpu

Differential Revision: https://reviews.llvm.org/D147547
2023-06-05 16:59:07 +00:00
Elliot Goodrich
ac73c48e09 [llvm] Reduce ComplexDeinterleavingPass.h includes
Remove the unnecessary `"llvm/IR/PatternMatch.h"` include directive from
`ComplexDeinterleavingPass.h` and move it to the corresponding source
file.

Add missing includes that were transitively included by this header to 3
other source files.

This reduces the total number of preprocessing tokens across the LLVM
source files in `lib` from (roughly) 1,964,876,961 to 1,935,091,611 - a
reduction of ~1.52%. This should result in a small improvement in
compilation time.
2023-05-20 17:49:18 +01:00
Thomas Symalla
91a7aa4c9b [AMDGPU] Improve abs modifier usage
If a call to the llvm.fabs intrinsic has users in another reachable
BB, SelectionDAG will not apply the abs modifier to these users and
instead generate a v_and ..., 0x7fffffff instruction.
For fneg instructions, the issue is similar.
This patch implements `AMDGPUIselLowering::shouldSinkOperands`,
which allows CodegenPrepare to call `tryToSinkFreeOperands`.

Reviewed By: foad

Differential Revision: https://reviews.llvm.org/D150347
2023-05-19 12:02:21 +02:00
Philip Reames
0dc0c27989 [TLI] Add IsZero parameter to storeOfVectorConstantIsCheap [nfc]
Make the decision to consider zero constant stores cheap target specific.  Will be used in an upcoming change for RISCV.
2023-05-17 09:19:01 -07:00
Nicolai Hähnle
ef13308b26 AMDGPU/SDAG: Improve {extract,insert}_subvector lowering for 16-bit vectors
v2:
- simplify the escape to TableGen patterns

Differential Revision: https://reviews.llvm.org/D149841
2023-05-05 10:55:18 +02:00
Sergei Barannikov
e744e51b12 [SelectionDAG] Rename ADDCARRY/SUBCARRY to UADDO_CARRY/USUBO_CARRY (NFC)
This will make them consistent with other overflow-aware nodes.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D148196
2023-04-29 21:59:58 +03:00
Changpeng Fang
1ab8b9ae15 AMDGPU: Define sub-class of SGPR_64 for tail call return
Summary:
  Registers for tail call return should not be clobbered by callee.
So we need a sub-class of SGPR_64 (excluding callee saved registers (CSR)) to hold
the tail call return address.

Because GFX and C calling conventions have different CSR, we need to define
the sub-class separately. This work is an extension of D147096 with the
consideration of GFX calling convention.

Based on the calling conventions, different instructions will be selected with
different sub-class of SGPR_64 as the input.

Reviewers: arsenm, cdevadas and sebastian-ne

Differential Revision: https://reviews.llvm.org/D148824
2023-04-27 10:45:11 -07:00
Jay Foad
14a7b2bfff [AMDGPU] Move GCN-specific stuff out of AMDGPUISelLowering. NFC.
Differential Revision: https://reviews.llvm.org/D149145
2023-04-25 13:55:50 +01:00
Matt Arsenault
2fce50e8f5 AMDGPU: Fix assertion with multiple uses of f64 fneg of select
A bitcast needs to be inserted back to the original type. Just
skip the multiple use case for a safer quick fix. Handling
the multiple use case seems to be beneficial in some but not
all cases.
2023-04-20 10:15:18 -04:00
Matt Arsenault
f608ac6286 AMDGPU: Push fneg into bitcast of integer select
Avoids some regressions in the math libraries in a future
patch.
2023-04-12 06:48:58 -04:00
Matt Arsenault
0f59720e1c AMDGPU: Fold fneg into bitcast of build_vector
The math libraries have a lot of code that performs
manual sign bit operations by bitcasting doubles to int2
and doing bithacking on them. This is a bad canonical form
we should rewrite to use high level sign operations directly
on double. To avoid codegen regressions, we need to do a better
job moving fnegs to operate only on the high 32-bits.

This is only halfway to fixing the real case.
2023-04-11 07:12:01 -04:00
Craig Topper
219ff07f72 [Targets] Rename Flag->Glue. NFC
Long long ago Glue was called Flag, and it was never completely
renamed.
2023-04-02 19:28:51 -07:00
Simon Pilgrim
8153b92d9b [DAG] Add SelectionDAG::SplitScalar helper
Similar to the existing SelectionDAG::SplitVector helper, this helper creates the EXTRACT_ELEMENT nodes for the LO/HI halves of the scalar source.

Differential Revision: https://reviews.llvm.org/D147264
2023-03-31 18:35:40 +01:00
Kazu Hirata
1a8668cf0c [Target] Use isAllOnesConstant (NFC) 2023-03-26 22:57:39 -07:00
Simon Pilgrim
9041682d2c [DAG] Remove redundant isZExtFree(SDValue,VT) overrides. NFC.
These implementations both match the TargetLoweringBase.isZExtFree implementation
2023-03-12 15:56:04 +00:00
Jon Chesterfield
d3dda422bf [amdgpu][nfc] Replace ad hoc LDS frame recalculation with absolute_symbol MD
Post ISel, LDS variables are absolute values. Representing them as
such is simpler than the frame recalculation currently used to build assembler
tables from their addresses.

This is a precursor to lowering dynamic/external LDS accesses from non-kernel
functions.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D144221
2023-03-12 13:47:48 +00:00
Jon Chesterfield
bf579a7049 [amdgpu] Change LDS lowering default to hybrid
Postponed from D139433 until the bug fixed by D139874 could be resolved.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D141852
2023-02-24 15:20:12 +00:00
Matt Arsenault
c177051f60 AMDGPU: Restrict foldFreeOpFromSelect combine based on legal source mods
Provides a small code size savings for some f32 cases.
2023-02-19 22:05:54 -04:00
Matt Arsenault
28d8889d27 AMDGPU: Teach fneg combines that select has source modifiers
We do match source modifiers for f32 typed selects already, but the
combiner code was never informed of this.

A long time ago the documentation lied and stated that source
modifiers don't work for v_cndmask_b32 when they in fact do. We had a
bunch fo code operating under the assumption that they don't support
source modifiers, so we tried to move fnegs around to work around
this.

Gets a few small improvements here and there. The main hazard to watch
out for is infinite loops in the combiner since we try to move fnegs
up and down the DAG. For now, don't fold fneg directly into select.
The generic combiner does this for a restricted set of cases
when getNegatedExpression obviously shows an improvement for both
operands. It turns out to be trickier to avoid infinite looping the
combiner in conjunction with pulling out source modifiers, so
leave this for a later commit.
2023-02-19 20:13:38 -04:00
Jay Foad
8e5a41e827 Revert "AMDGPU: Override getNegatedExpression constant handling"
This reverts commit 11c3cead23.

It was causing infinite loops in the DAG combiner.
2023-02-16 17:11:32 +00:00
Matt Arsenault
9ccc58893b AMDGPU: Fix not adding to depth in getNegatedExpression 2023-02-15 08:32:58 -04:00
Matt Arsenault
11c3cead23 AMDGPU: Override getNegatedExpression constant handling
Ignore the multiple use heuristics of the default
implementation, and report cost based on inline immediates. This
is mostly interesting for -0 vs. 0. Gets a few small improvements.
fneg_fadd_0_f16 is a small regression. We could probably avoid this
if we handled folding fneg into div_fixup.
2023-02-15 05:21:00 -04:00
Matt Arsenault
a4e8347b36 AMDGPU: Refactor isConstantCostlierToNegate 2023-02-15 05:21:00 -04:00
Kazu Hirata
64dad4ba9a Use llvm::bit_cast (NFC) 2023-02-14 01:22:12 -08:00
Matt Arsenault
4f0eb57222 AMDGPU: Teach getNegatedExpression about rcp 2023-02-14 04:02:39 -04:00
Matt Arsenault
149e8abbd9 AMDGPU: Factor out fneg fold predicate function 2023-02-02 22:50:23 -04:00
Matt Arsenault
36cfe26a52 AMDGPU: Try to unfold fneg source when matching legacy fmin/fmax
This is NFC as it stands, since other combines will effectively
prevent this from being reachable. This will avoid regressions in a
future change which tries to make better use of select source
modifiers.

Didn't bother with the GlobalISel part for now, since the baseline
combine doesn't seem to work on the existing test.
2023-02-02 22:50:23 -04:00
Kazu Hirata
e078201835 [Target] Use llvm::count{l,r}_{zero,one} (NFC) 2023-01-28 09:23:07 -08:00
Matt Arsenault
93ec3fa402 AMDGPU: Support atomicrmw uinc_wrap/udec_wrap
For now keep the exising intrinsics working.
2023-01-27 22:17:16 -04:00
Stanislav Mekhanoshin
c8ed36281a [AMDGPU] Cast sub-dword elements to i32 in concat_vectors
This produces better code by avoiding repacking in some cases.

Fixes: SWDEV-373436

Differential Revision: https://reviews.llvm.org/D141329
2023-01-09 15:35:49 -08:00
Leon Clark
daa022ca57 Enable roundeven.
Add support for roundeven and implement appropriate tests.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D137954
2022-12-20 15:40:20 +00:00
Jay Foad
6443c0ee02 [AMDGPU] Stop using make_pair and make_tuple. NFC.
C++17 allows us to call constructors pair and tuple instead of helper
functions make_pair and make_tuple.

Differential Revision: https://reviews.llvm.org/D139828
2022-12-14 13:22:26 +00:00
Pierre van Houtryve
678d8946ba [AMDGPU] Add bf16 storage support
- [Clang] Declare AMDGPU target as supporting BF16 for storage-only purposes on amdgcn
  - Add Sema & CodeGen tests cases.
  - Also add cases that D138651 would have covered as this patch replaces it.
- [AMDGPU] Add BF16 storage-only support
  - Support legalization/dealing with bf16 operations in DAGIsel.
  - bf16 as a type remains illegal and is represented as i16 for storage purposes.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D139398
2022-12-13 10:34:26 -05:00
Nico Weber
a862d09a92 Revert "[amdgpu] Reimplement LDS lowering"
This reverts commit 982017240d.
Breaks check-llvm, see https://reviews.llvm.org/D139433#3974862
2022-12-06 12:01:36 -05:00
Jon Chesterfield
982017240d [amdgpu] Reimplement LDS lowering
Renames the current lowering scheme to "module" and introduces two new
ones, "kernel" and "table", plus a "hybrid" that chooses between those three
on a per-variable basis.

Unit tests are set up to pass with the default lowering of "module" or "hybrid"
with this patch defaulting to "module", which will be a less dramatic codegen
change relative to the current. This reflects the sparsity of test coverage for
the table lowering method. Hybrid is better than module in every respect and
will be default in a subsequent patch.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D139433
2022-12-06 16:28:15 +00:00
Kazu Hirata
20cde15415 [Target] Use std::nullopt instead of None (NFC)
This patch mechanically replaces None with std::nullopt where the
compiler would warn if None were deprecated.  The intent is to reduce
the amount of manual work required in migrating from Optional to
std::optional.

This is part of an effort to migrate from llvm::Optional to
std::optional:

https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716
2022-12-02 20:36:06 -08:00
Mateja Marjanovic
595a08847a [AMDGPU] Add support for new LLVM vector types
Add VReg, AReg and SReg on AMDGPU for bit widths: 288, 320, 352 and 384.

Differential Revision: https://reviews.llvm.org/D138205
2022-11-29 17:02:04 +01:00
Janek van Oirschot
322966f8f8 [AMDGPU] Add llvm.is.fpclass intrinsic to existing SelectionDAG fp
class support and introduce GlobalISel implementation for AMDGPU

Uses existing SelectionDAG lowering of the llvm.amdgcn.class intrinsic
for llvm.is.fpclass
2022-11-28 16:00:36 -05:00
Ivan Kosarev
ec8ede8177 [AMDGPU][CodeGen] Support raw format TFE buffer loads other than byte, short and d16 ones.
Differential Revision: https://reviews.llvm.org/D138215
2022-11-24 10:50:26 +00:00
Stanislav Mekhanoshin
bcaf31ec3f [AMDGPU] Allow finer grain control of an unaligned access speed
A target can return if a misaligned access is 'fast' as defined
by the target or not. In reality there can be different levels
of 'fast' and 'slow'. This patch changes the boolean 'Fast'
argument of the allowsMisalignedMemoryAccesses family of functions
to an unsigned representing its speed.

A target can still define it as it wants and the direct translation
of the current code uses 0 and 1 for current false and true. This
makes the change an NFC.

Subsequent patch will start using an actual value of speed in
the load/store vectorizer to compare if a vectorized access going
to be not just fast, but not slower than before.

Differential Revision: https://reviews.llvm.org/D124217
2022-11-17 09:23:53 -08:00
Simon Pilgrim
78739fdb4d [DAG] Enable combineShiftOfShiftedLogic folds after type legalization
This was disabled to prevent regressions, which appear to be just occurring on AMDGPU (at least in our current lit tests), which I've addressed by adding AMDGPUTargetLowering::isDesirableToCommuteWithShift overrides.

Fixes #57872

Differential Revision: https://reviews.llvm.org/D136042
2022-10-29 12:30:04 +01:00
Pierre van Houtryve
824dd811be [AMDGPU][DAG] Fix trunc/shift combine condition
The condition needs to be different for right-shifts, else we may lose information in some cases.

Reviewed By: foad

Differential Revision: https://reviews.llvm.org/D136059
2022-10-21 06:36:07 +00:00
Leon Clark
6370bc2435 Add f16 nearbyint support.
Enable lowering of FNEARBYINT for f16 and extend existing tests.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D135124
2022-10-14 08:05:24 +01:00
Stanislav Mekhanoshin
5a3fe9a039 [AMDGPU] Move SIModeRegisterDefaults to SI MFI
It does not belong to a general AMDGPU MFI.

Differential Revision: https://reviews.llvm.org/D134666
2022-09-28 13:13:40 -07:00
Vitaly Buka
20a80d60a8 Revert "[AMDGPU] Move SIModeRegisterDefaults to SI MFI"
Break msan bots. Details in D134666.

This reverts commit 0ce96e06ee.
2022-09-26 22:22:09 -07:00
Stanislav Mekhanoshin
0ce96e06ee [AMDGPU] Move SIModeRegisterDefaults to SI MFI
It does not belong to a general AMDGPU MFI.

Differential Revision: https://reviews.llvm.org/D134666
2022-09-26 13:20:24 -07:00