Commit Graph

547 Commits

Author SHA1 Message Date
Matt Arsenault
0295513238 AMDGPU: Filter out contract flags when lowering exp
It is unsafe to contract the fsub into the fmul. It also increases
code size by duplicating a constant.
2023-07-20 18:14:24 -04:00
Matt Arsenault
fbe4ff8149 AMDGPU: Partially fix not respecting dynamic denormal mode
The most notable issue was producing v_mad_f32 in functions with the
dynamic mode, since it just ignores the mode. fdiv lowering is still
somewhat broken because it involves a mode switch and we need to query
the original mode.
2023-07-11 15:14:52 -04:00
Amara Emerson
3a80bdb316 [GlobalISel] Remove an erroneous oneuse check in the G_ADD reassociation combine.
This check was unnecessary/incorrect, it was already being done by the target
hook default implementation, and the one in the matcher was checking for a
completely different thing. This change:
 1) Removes the check and updates affected tests which now do some more reassociations.
 2) Modifies the AMDGPU hooks which were stubbed with "return true" to also do the oneuse
    check. Not sure why I didn't do this the first time.
2023-07-10 01:03:12 -07:00
Matt Arsenault
8ee1cc82c9 AMDGPU: Fold out sign bit ops on frexp_exp
The sign bit has no impact on the exponent, so strip these away. Saves
on the source modifier encoding cost. I left the GlobalISel handling
until there's a resolution to issue #62628.

We should do this in instcombine too, but legalization should be
introducing more frexps than it currently is where this would occur.
2023-07-06 10:26:21 -04:00
Matt Arsenault
5491666248 AMDGPU: Correctly lower llvm.exp.f32
The library expansion has too many paths for all the permutations of
DAZ, unsafe and the 3 exp functions. It's easier to expand it in the
backend when we know all of these things. The library currently misses
the no-infinity check on the overflow, which this handles optimizing
out.

Some of the <3 x half> fast tests regress due to vector widening
dropping flags which will be fixed separately.

Apparently there is no exp10 intrinsic, but there should be. Adds some
deadish code in preparation for adding one while I'm following along
with the current library expansion.
2023-07-05 17:23:49 -04:00
Matt Arsenault
ed556a1ad5 AMDGPU: Correctly lower llvm.exp2.f32
Previously this did a fast math expansion only.
2023-07-05 17:23:48 -04:00
Matt Arsenault
4e15f378ee AMDGPU: Correctly lower llvm.log.f32 and llvm.log10.f32
Previously we expanded these in a fast-math way and the device
libraries were relying on this behavior. The libraries have a pending
change to switch to the new target intrinsic.

Unlike the library version, this takes advantage of no-infinities on
the result overflow check.
2023-07-05 15:30:35 -04:00
Matt Arsenault
89ccfa1b39 AMDGPU: Use correct lowering for llvm.log2.f32
We previously directly codegened to v_log_f32, which is broken for
denormals. The lowering isn't complicated, you simply need to scale
denormal inputs and adjust the result. Note log and log10 are still
not accurate enough, and will be fixed separately.
2023-06-23 08:37:37 -04:00
Matt Arsenault
d9333e360a Revert "AMDGPU: Drop and auto-upgrade llvm.amdgcn.ldexp to llvm.ldexp"
This reverts commit 1159c670d4.

Accidentally pushed wrong patch
2023-06-16 18:13:07 -04:00
Matt Arsenault
1159c670d4 AMDGPU: Drop and auto-upgrade llvm.amdgcn.ldexp to llvm.ldexp 2023-06-16 18:06:27 -04:00
Matt Arsenault
28f3edd2be AMDGPU: Add llvm.amdgcn.exp2 intrinsic
Provide direct access to v_exp_f32 and v_exp_f16, so we can start
correctly lowering the generic exp intrinsics.

Unfortunately have to break from the usual naming convention of
matching the instruction name and stripping the v_ prefix. exp is
already taken by the export intrinsic. On the clang builtin side, we
have a choice of maintaining the convention to the instruction name,
or following the intrinsic name.
2023-06-15 07:00:07 -04:00
Matt Arsenault
d0923a7739 AMDGPU: Correct constants used in fast math log expansion
The division between float constants was done with less
precision. Performing the divide in double and truncating to float
provides the same value as used in the library fast math expansion.
2023-06-12 21:11:41 -04:00
Matt Arsenault
eccc89b26c AMDGPU: Add llvm.amdgcn.log intrinsic
This will map directly to the hardware instruction which does not
handle denormals for f32. This will allow moving the generic intrinsic
to be lowered correctly. Also handles selecting the f16 version, but
there's no reason to use it over the generic intrinsic.
2023-06-12 21:10:30 -04:00
Matt Arsenault
abff7668ab AMDGPU: Implement known bits functions for min3/max3/med3 2023-06-10 10:58:44 -04:00
Matt Arsenault
4e4c351ae5 AMDGPU: Avoid endpgm in middle of block for fallback trap lowering.
This was inserting an s_endpgm in the middle of the block when it has
to be a terminator. Split the block and insert a branch to a new block
with the trap if it's not in a terminator position.

Fixes verifier error on LDS in function with no trap support (and
other trap sources).
2023-06-09 21:04:38 -04:00
Amara Emerson
086601eac2 [GlobalISel] Implement some binary reassociations, G_ADD for now
- (op (op X, C1), C2) -> (op X, (op C1, C2))
- (op (op X, C1), Y) -> (op (op X, Y), C1)

Some code duplication with the G_PTR_ADD reassociations unfortunately but no
easy way to avoid it that I can see.

Differential Revision: https://reviews.llvm.org/D150230
2023-06-08 21:14:58 -07:00
Matt Arsenault
c01f284fbb AMDGPU: Fix regressions in integer mad matching
Undo the canonicalize done in
0cfc651032. Restores some regressed
matching of integer mad. The selection patterns fo the actual mads
don't seem to be properly commuting, so some of the commuted cases are
still missed.

Fixes: SWDEV-363009
2023-06-08 16:48:47 -04:00
Matt Arsenault
3d0350b762 AMDGPU: Add MF independent version of getImplicitParameterOffset 2023-06-07 08:26:31 -04:00
Matt Arsenault
bc61bc8d6a AMDGPU: Use available subtarget member 2023-06-07 08:26:31 -04:00
Matt Arsenault
eece6ba283 IR: Add llvm.ldexp and llvm.experimental.constrained.ldexp intrinsics
AMDGPU has native instructions and target intrinsics for this, but
these really should be subject to legalization and generic
optimizations. This will enable legalization of f16->f32 on targets
without f16 support.

Implement a somewhat horrible inline expansion for targets without
libcall support. This could be better if we could introduce control
flow (GlobalISel version not yet implemented). Support for strictfp
legalization is less complete but works for the simple cases.
2023-06-06 17:07:18 -04:00
Jay Foad
a4a3ac10cb [AMDGPU] Remove extract_subvector patterns
Removing them seems to slightly increase code quality as well as
simplifying both the tablegen and C++ parts of the code.

Differential Revision: https://reviews.llvm.org/D149853
2023-06-06 14:04:50 +01:00
Krzysztof Drewniak
faa2c678aa [AMDGPU] Add buffer intrinsics that take resources as pointers
In order to enable the LLVM frontend to better analyze buffer
operations (and to potentially enable more precise analyses on the
backend), define versions of the raw and structured buffer intrinsics
that use `ptr addrspace(8)` instead of `<4 x i32>` to represent their
rsrc arguments.

The new intrinsics are named by replacing `buffer.` with `buffer.ptr`.

One advantage to these intrinsic definitions is that, instead of
specifying that a buffer load/store will read/write some memory, we
can indicate that the memory read or written will be based on the
pointer argument. This means that, for example, a read from a
`noalias` buffer can be pulled out of a loop that is modifying a
distinct buffer.

In the future, we will define custom PseudoSourceValues that will
allow us to package up the (buffer, index, offset) triples that buffer
intrinsics contain and allow for more precise backend analysis.

This work also enables creating address space 7, which represents
manipulation of raw buffers using native LLVM load and store
instructions.

Where tests simply used a buffer intrinsic while testing some other
code path (such as the tests for VGPR spills), they have been updated
to use the new intrinsic form. Tests that are "about" buffer
intrinsics (for instance, those that ensure that they codegen as
expected) have been duplicated, either within existing files or into
new ones.

Depends on D145441

Reviewed By: arsenm, #amdgpu

Differential Revision: https://reviews.llvm.org/D147547
2023-06-05 16:59:07 +00:00
Elliot Goodrich
ac73c48e09 [llvm] Reduce ComplexDeinterleavingPass.h includes
Remove the unnecessary `"llvm/IR/PatternMatch.h"` include directive from
`ComplexDeinterleavingPass.h` and move it to the corresponding source
file.

Add missing includes that were transitively included by this header to 3
other source files.

This reduces the total number of preprocessing tokens across the LLVM
source files in `lib` from (roughly) 1,964,876,961 to 1,935,091,611 - a
reduction of ~1.52%. This should result in a small improvement in
compilation time.
2023-05-20 17:49:18 +01:00
Thomas Symalla
91a7aa4c9b [AMDGPU] Improve abs modifier usage
If a call to the llvm.fabs intrinsic has users in another reachable
BB, SelectionDAG will not apply the abs modifier to these users and
instead generate a v_and ..., 0x7fffffff instruction.
For fneg instructions, the issue is similar.
This patch implements `AMDGPUIselLowering::shouldSinkOperands`,
which allows CodegenPrepare to call `tryToSinkFreeOperands`.

Reviewed By: foad

Differential Revision: https://reviews.llvm.org/D150347
2023-05-19 12:02:21 +02:00
Philip Reames
0dc0c27989 [TLI] Add IsZero parameter to storeOfVectorConstantIsCheap [nfc]
Make the decision to consider zero constant stores cheap target specific.  Will be used in an upcoming change for RISCV.
2023-05-17 09:19:01 -07:00
Nicolai Hähnle
ef13308b26 AMDGPU/SDAG: Improve {extract,insert}_subvector lowering for 16-bit vectors
v2:
- simplify the escape to TableGen patterns

Differential Revision: https://reviews.llvm.org/D149841
2023-05-05 10:55:18 +02:00
Sergei Barannikov
e744e51b12 [SelectionDAG] Rename ADDCARRY/SUBCARRY to UADDO_CARRY/USUBO_CARRY (NFC)
This will make them consistent with other overflow-aware nodes.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D148196
2023-04-29 21:59:58 +03:00
Changpeng Fang
1ab8b9ae15 AMDGPU: Define sub-class of SGPR_64 for tail call return
Summary:
  Registers for tail call return should not be clobbered by callee.
So we need a sub-class of SGPR_64 (excluding callee saved registers (CSR)) to hold
the tail call return address.

Because GFX and C calling conventions have different CSR, we need to define
the sub-class separately. This work is an extension of D147096 with the
consideration of GFX calling convention.

Based on the calling conventions, different instructions will be selected with
different sub-class of SGPR_64 as the input.

Reviewers: arsenm, cdevadas and sebastian-ne

Differential Revision: https://reviews.llvm.org/D148824
2023-04-27 10:45:11 -07:00
Jay Foad
14a7b2bfff [AMDGPU] Move GCN-specific stuff out of AMDGPUISelLowering. NFC.
Differential Revision: https://reviews.llvm.org/D149145
2023-04-25 13:55:50 +01:00
Matt Arsenault
2fce50e8f5 AMDGPU: Fix assertion with multiple uses of f64 fneg of select
A bitcast needs to be inserted back to the original type. Just
skip the multiple use case for a safer quick fix. Handling
the multiple use case seems to be beneficial in some but not
all cases.
2023-04-20 10:15:18 -04:00
Matt Arsenault
f608ac6286 AMDGPU: Push fneg into bitcast of integer select
Avoids some regressions in the math libraries in a future
patch.
2023-04-12 06:48:58 -04:00
Matt Arsenault
0f59720e1c AMDGPU: Fold fneg into bitcast of build_vector
The math libraries have a lot of code that performs
manual sign bit operations by bitcasting doubles to int2
and doing bithacking on them. This is a bad canonical form
we should rewrite to use high level sign operations directly
on double. To avoid codegen regressions, we need to do a better
job moving fnegs to operate only on the high 32-bits.

This is only halfway to fixing the real case.
2023-04-11 07:12:01 -04:00
Craig Topper
219ff07f72 [Targets] Rename Flag->Glue. NFC
Long long ago Glue was called Flag, and it was never completely
renamed.
2023-04-02 19:28:51 -07:00
Simon Pilgrim
8153b92d9b [DAG] Add SelectionDAG::SplitScalar helper
Similar to the existing SelectionDAG::SplitVector helper, this helper creates the EXTRACT_ELEMENT nodes for the LO/HI halves of the scalar source.

Differential Revision: https://reviews.llvm.org/D147264
2023-03-31 18:35:40 +01:00
Kazu Hirata
1a8668cf0c [Target] Use isAllOnesConstant (NFC) 2023-03-26 22:57:39 -07:00
Simon Pilgrim
9041682d2c [DAG] Remove redundant isZExtFree(SDValue,VT) overrides. NFC.
These implementations both match the TargetLoweringBase.isZExtFree implementation
2023-03-12 15:56:04 +00:00
Jon Chesterfield
d3dda422bf [amdgpu][nfc] Replace ad hoc LDS frame recalculation with absolute_symbol MD
Post ISel, LDS variables are absolute values. Representing them as
such is simpler than the frame recalculation currently used to build assembler
tables from their addresses.

This is a precursor to lowering dynamic/external LDS accesses from non-kernel
functions.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D144221
2023-03-12 13:47:48 +00:00
Jon Chesterfield
bf579a7049 [amdgpu] Change LDS lowering default to hybrid
Postponed from D139433 until the bug fixed by D139874 could be resolved.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D141852
2023-02-24 15:20:12 +00:00
Matt Arsenault
c177051f60 AMDGPU: Restrict foldFreeOpFromSelect combine based on legal source mods
Provides a small code size savings for some f32 cases.
2023-02-19 22:05:54 -04:00
Matt Arsenault
28d8889d27 AMDGPU: Teach fneg combines that select has source modifiers
We do match source modifiers for f32 typed selects already, but the
combiner code was never informed of this.

A long time ago the documentation lied and stated that source
modifiers don't work for v_cndmask_b32 when they in fact do. We had a
bunch fo code operating under the assumption that they don't support
source modifiers, so we tried to move fnegs around to work around
this.

Gets a few small improvements here and there. The main hazard to watch
out for is infinite loops in the combiner since we try to move fnegs
up and down the DAG. For now, don't fold fneg directly into select.
The generic combiner does this for a restricted set of cases
when getNegatedExpression obviously shows an improvement for both
operands. It turns out to be trickier to avoid infinite looping the
combiner in conjunction with pulling out source modifiers, so
leave this for a later commit.
2023-02-19 20:13:38 -04:00
Jay Foad
8e5a41e827 Revert "AMDGPU: Override getNegatedExpression constant handling"
This reverts commit 11c3cead23.

It was causing infinite loops in the DAG combiner.
2023-02-16 17:11:32 +00:00
Matt Arsenault
9ccc58893b AMDGPU: Fix not adding to depth in getNegatedExpression 2023-02-15 08:32:58 -04:00
Matt Arsenault
11c3cead23 AMDGPU: Override getNegatedExpression constant handling
Ignore the multiple use heuristics of the default
implementation, and report cost based on inline immediates. This
is mostly interesting for -0 vs. 0. Gets a few small improvements.
fneg_fadd_0_f16 is a small regression. We could probably avoid this
if we handled folding fneg into div_fixup.
2023-02-15 05:21:00 -04:00
Matt Arsenault
a4e8347b36 AMDGPU: Refactor isConstantCostlierToNegate 2023-02-15 05:21:00 -04:00
Kazu Hirata
64dad4ba9a Use llvm::bit_cast (NFC) 2023-02-14 01:22:12 -08:00
Matt Arsenault
4f0eb57222 AMDGPU: Teach getNegatedExpression about rcp 2023-02-14 04:02:39 -04:00
Matt Arsenault
149e8abbd9 AMDGPU: Factor out fneg fold predicate function 2023-02-02 22:50:23 -04:00
Matt Arsenault
36cfe26a52 AMDGPU: Try to unfold fneg source when matching legacy fmin/fmax
This is NFC as it stands, since other combines will effectively
prevent this from being reachable. This will avoid regressions in a
future change which tries to make better use of select source
modifiers.

Didn't bother with the GlobalISel part for now, since the baseline
combine doesn't seem to work on the existing test.
2023-02-02 22:50:23 -04:00
Kazu Hirata
e078201835 [Target] Use llvm::count{l,r}_{zero,one} (NFC) 2023-01-28 09:23:07 -08:00
Matt Arsenault
93ec3fa402 AMDGPU: Support atomicrmw uinc_wrap/udec_wrap
For now keep the exising intrinsics working.
2023-01-27 22:17:16 -04:00