Combine the logic for fcmp and icmp intrinsics and use operand presence
instead.
Reviewed By: kosarev, foad
Differential Revision: https://reviews.llvm.org/D148716
However the imported rules can not be used for now because Global ISel
selectImpl() seems has some bug/limitation to create a illegl COPY
from VGPR to SGPR. So currently workaround this by not auto selecting these
patterns.
Fixes#61468
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D147780
The inverse ballot intrinsic takes in a boolean mask for all lanes and
returns the boolean for the current lane. See SPIR-V's
`subgroupInverseBallot()` in the [[ https://github.com/KhronosGroup/GLSL/blob/master/extensions/khr/GL_KHR_shader_subgroup.txt | GL_KHR_shader_subgroup extension ]].
This allows decision making via branch and select instructions with a manually
manipulated mask.
Implemented in GlobalISel and SelectionDAG, since currently both are supported.
The SelectionDAG required pseudo instructions to use the custom inserter.
The boolean mask needs to be uniform for all lanes.
Therefore we expect SGPR input. In case the source is in a
VGPR, we insert one or more `v_readfirstlane` instructions.
Reviewed By: nhaehnle
Differential Revision: https://reviews.llvm.org/D146287
Values in SGPR and VGPR register are treated as unsigned by hardware.
When value in 32-bit SGPR or VGPR base can be negative calculate offset
using 32-bit add instructions, otherwise use
sgpr(unsigned) + vgpr(unsigned) + offset.
LoopStrengthReduce.cpp changes offsets to negative and in some
iterations value in SGPR or VGPR register could be negative.
Differential Revision: https://reviews.llvm.org/D144957
Values in VGPR register are treated as unsigned by hardware.
When value in 32-bit VGPR base can be negative calculate offset using
32-bit add instruction, otherwise use vgpr base(unsigned) + offset.
Does not affect case where whole offset comes from VGPR register
(immediate offset is 0).
LoopStrengthReduce.cpp changes offsets to negative and in some
iterations value in VGPR register could be negative.
Differential Revision: https://reviews.llvm.org/D144956
Values in SGPR register are treated as unsigned by hardware.
When value in 32-bit SGPR base can be negative calculate offset using
32-bit add instruction, otherwise use sgpr base(unsigned) + offset.
Does not affect case where whole offset comes from SGPR register
(immediate offset is 0).
LoopStrengthReduce.cpp changes offsets to negative and in some
iterations value in SGPR register could be negative.
Differential Revision: https://reviews.llvm.org/D144955
The matching for V_FMA_MIX was partially implemented with a C++
matcher (for fmas with 32 bit results and 16 bit inputs) and partially
in tablegen (for fmas with 16 bit results). Move the C++ matcher logic
into tablegen to make this more consistent and so we can remove the
duplication between SDAG and GISel.
Differential Revision: https://reviews.llvm.org/D144612
This is only used by CodeGen. Moving it out of AMDGPUBaseInfo simplifies
future changes to make some of it depend on the subtarget.
Differential Revision: https://reviews.llvm.org/D144650
If more registers are needed for VAddr then the NSA format allows then the
final register can act as a contigous set of remaining addresses. Update
legalizer to pack register for this new format and allow instruction
selection to use NSA encoding when number of addresses exceeds max size.
Also update SIShrinkInstructions to handle partial NSA.
Differential Revision: https://reviews.llvm.org/D144034
v_permlane16_b32 and v_permlanex16_b32 should not set abs and neg src
modifiers on any input, but they can set op_sel on src0 or src1 to
represent fi or bc when desired. The ISel patterns were setting
the src_modifier bits to -1, effectively setting abs and neg as well,
whenever it was intended to set op_sel, due to an error in ISel. ISel
should now correctly only set the op_sel bits.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D144519
Pre-GFX10 A16 modifier would imply G16. From GFX10 and onwards there are
separate instructions for 16bit gradients. This fixes the condition for
selecting G16 opcodes. Also stop adding G16 flag to instructions that do not
use gradients for GFX10 onwards.
value() has undesired exception checking semantics and calls
__throw_bad_optional_access in libc++. Moreover, the API is unavailable without
_LIBCPP_NO_EXCEPTIONS on older Mach-O platforms (see
_LIBCPP_AVAILABILITY_BAD_OPTIONAL_ACCESS).
This fixes clang.
C++17 allows us to call constructors pair and tuple instead of helper
functions make_pair and make_tuple.
Differential Revision: https://reviews.llvm.org/D139828
We were only allowing these med3 patterns if the operands were known
to not be NaN, but we should also allow it if the calls to max/min
have the `nnan` or `fast` flags.
Differential Revision: https://reviews.llvm.org/D139506
This patch mechanically replaces None with std::nullopt where the
compiler would warn if None were deprecated. The intent is to reduce
the amount of manual work required in migrating from Optional to
std::optional.
This is part of an effort to migrate from llvm::Optional to
std::optional:
https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716
When selectVOP3PMadMixModsImpl fails, it can still create new copy instr
via selectVOP3ModsImpl. When selectG_FMA_FMAD gives up, new copy instr
will remain dead but will not be automatically removed.
InstructionSelect does not check if instructions created during selection
are dead.
Such dead copy doesn't have register class on dst operand and causes crash.
Fix is to build copy when operands are being added to selected instruction.
Differential Revision: https://reviews.llvm.org/D138044
Adds support for selecting the following instructions using GlobalISel:
- v_mad_mix/v_fma_mix
- v_mad_mixhi/v_fma_mixhi
- v_mad_mixlo/v_fma_mixlo
To select those instructions properly, some additional changes were
needed which impacted other tests as well.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D134354
getDefIgnoringCopies and getSrcRegIgnoringCopies should not fail on
valid MIR, so don't bother to check for failure.
Differential Revision: https://reviews.llvm.org/D136238
Preparation patch for D134354 to make V2S16 G_BUILD_VECTOR legal.
Also removes RegBankInfo's scalarization of small BUILD_VECTORs,
replacing it with InstructionSelector logic instead.
This allows for V2S16 BUILD_VECTOR instructions to survive
all the way to ISel so we can select FMA/MAD_MIX instructions
in D134354.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D134433
Remove manual selection for atomic fadd from global-isel.
Stop pre-isel translation to AtomicLoadFAdd/G_ATOMICRMW_FADD
which corresponds to llvm-ir's atomicrmw fadd instruction.
global and flat atomic fadd patterns changes:
Split rtn/no-rtn patterns
Add missing patterns or fix predicates
Remove atomicrmw patterns for v2f16 (atomic rmw doesn't support vectors).
Patterns now check addrspace of pointer, added patterns for flat intrinsic.
with global addrspace pointer that selects into global atomic instruction.
buffer atomic fadd patterns changes:
Rdit patterns to import into global-isel.
Remove gfx6/gfx7 _addr64 and _offset patterns.
Remove patterns that can't be reached (same pattern but different feature).
Differential Revision: https://reviews.llvm.org/D130579
Allows things like `(G_PTR_ADD (G_PTR_ADD a, b), c)` to be
simplified into a single ADD3 instruction instead of two adds.
Reviewed By: foad
Differential Revision: https://reviews.llvm.org/D131254