Lower G_ instructions that can't be inst-selected with register bank assignment from AMDGPURegBankSelect based on uniformity analysis. - Lower instruction to perform it on assigned register bank - Put uniform value in vgpr because SALU instruction is not available - Execute divergent instruction in SALU - "waterfall loop" Given LLTs on all operands after legalizer, some register bank assignments require lowering while other do not. Note: cases where all register bank assignments would require lowering are lowered in legalizer. AMDGPURegBankLegalize goals: - Define Rules: when and how to perform lowering - Goal of defining Rules it to provide high level table-like brief overview of how to lower generic instructions based on available target features and uniformity info (uniform vs divergent). - Fast search of Rules, depends on how complicated Rule.Predicate is - For some opcodes there would be too many Rules that are essentially all the same just for different combinations of types and banks. Write custom function that handles all cases. - Rules are made from enum IDs that correspond to each operand. Names of IDs are meant to give brief description what lowering does for each operand or the whole instruction. - AMDGPURegBankLegalizeHelper implements lowering algorithms Since this is the first patch that actually enables -new-reg-bank-select here is the summary of regression tests that were added earlier: - if instruction is uniform always select SALU instruction if available - eliminate back to back vgpr to sgpr to vgpr copies of uniform values - fast rules: small differences for standard and vector instruction - enabling Rule based on target feature - salu_float - how to specify lowering algorithm - vgpr S64 AND to S32 - on G_TRUNC in reg, it is up to user to deal with truncated bits G_TRUNC in reg is treated as no-op. - dealing with truncated high bits - ABS S16 to S32 - sgpr S1 phi lowering - new opcodes for vcc-to-scc and scc-to-vcc copies - lowering for vgprS1-to-vcc copy (formally this is vgpr-to-vcc G_TRUNC) - S1 zext and sext lowering to select - uniform and divergent S1 AND(OR and XOR) lowering - inst-selected into SALU instruction - divergent phi with uniform inputs - divergent instruction with temporal divergent use, source instruction is defined as uniform(AMDGPURegBankSelect) - missing temporal divergence lowering - uniform phi, because of undef incoming, is assigned to vgpr. Will be fixed in AMDGPURegBankSelect via another fix in machine uniformity analysis.
60 lines
1.9 KiB
C++
60 lines
1.9 KiB
C++
//===- AMDGPUGlobalISelUtils -------------------------------------*- C++ -*-==//
|
|
//
|
|
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
|
|
// See https://llvm.org/LICENSE.txt for license information.
|
|
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
|
|
//
|
|
//===----------------------------------------------------------------------===//
|
|
|
|
#ifndef LLVM_LIB_TARGET_AMDGPU_AMDGPUGLOBALISELUTILS_H
|
|
#define LLVM_LIB_TARGET_AMDGPU_AMDGPUGLOBALISELUTILS_H
|
|
|
|
#include "llvm/ADT/DenseSet.h"
|
|
#include "llvm/CodeGen/Register.h"
|
|
#include <utility>
|
|
|
|
namespace llvm {
|
|
|
|
class MachineRegisterInfo;
|
|
class GCNSubtarget;
|
|
class GISelKnownBits;
|
|
class LLT;
|
|
class MachineFunction;
|
|
class MachineIRBuilder;
|
|
class RegisterBankInfo;
|
|
|
|
namespace AMDGPU {
|
|
|
|
/// Returns base register and constant offset.
|
|
std::pair<Register, unsigned>
|
|
getBaseWithConstantOffset(MachineRegisterInfo &MRI, Register Reg,
|
|
GISelKnownBits *KnownBits = nullptr,
|
|
bool CheckNUW = false);
|
|
|
|
// Currently finds S32/S64 lane masks that can be declared as divergent by
|
|
// uniformity analysis (all are phis at the moment).
|
|
// These are defined as i32/i64 in some IR intrinsics (not as i1).
|
|
// Tablegen forces(via telling that lane mask IR intrinsics are uniform) most of
|
|
// S32/S64 lane masks to be uniform, as this results in them ending up with sgpr
|
|
// reg class after instruction-select, don't search for all of them.
|
|
class IntrinsicLaneMaskAnalyzer {
|
|
SmallDenseSet<Register, 8> S32S64LaneMask;
|
|
MachineRegisterInfo &MRI;
|
|
|
|
public:
|
|
IntrinsicLaneMaskAnalyzer(MachineFunction &MF);
|
|
bool isS32S64LaneMask(Register Reg) const;
|
|
|
|
private:
|
|
void initLaneMaskIntrinsics(MachineFunction &MF);
|
|
// This will not be needed when we turn off LCSSA for global-isel.
|
|
void findLCSSAPhi(Register Reg);
|
|
};
|
|
|
|
void buildReadAnyLane(MachineIRBuilder &B, Register SgprDst, Register VgprSrc,
|
|
const RegisterBankInfo &RBI);
|
|
}
|
|
}
|
|
|
|
#endif
|