Files
clang-p2996/llvm/test/TableGen/immarg.td
Pierre van Houtryve 9375962ac9 [TableGen][GlobalISel] Specialize more MatchTable Opcodes (#89736)
The vast majority of the following (very common) opcodes were always
called with identical arguments:

- `GIM_CheckType` for the root
- `GIM_CheckRegBankForClass` for the root
- `GIR_Copy` between the old and new root
- `GIR_ConstrainSelectedInstOperands` on the new root
- `GIR_BuildMI` to create the new root

I added overloaded version of each opcode specialized for the root
instructions. It always saves between 1 and 2 bytes per instance
depending on the number of arguments specialized into the opcode. Some
of these opcodes had between 5 and 15k occurences in the AArch64
GlobalISel Match Table.

Additionally, the following opcodes are almost always used in the same
sequence:

- `GIR_EraseFromParent 0` + `GIR_Done` 
- `GIR_EraseRootFromParent_Done` has been created to do both. Saves 2
bytes per occurence.
- `GIR_IsSafeToFold` was *always* called for each InsnID except 0.
- Changed the opcode to take the number of instructions to check after
`MI[0]`

The savings from these are pretty neat. For `AArch64GenGlobalISel.inc`:
- `AArch64InstructionSelector.cpp.o` goes down from 772kb to 704kb (-10%
code size)
- Self-reported MatchTable size goes from 420380 bytes to 352426 bytes
(~ -17%)

A smaller match table means a faster match table because we spend less
time iterating and decoding.
I don't have a solid measurement methodology for GlobalISel performance
so I don't have precise numbers but I saw a few % of improvements in a
simple testcase.
2024-04-24 09:19:18 +02:00

32 lines
1.3 KiB
TableGen

// RUN: llvm-tblgen -gen-global-isel -optimize-match-table=false -I %p/Common -I %p/../../include %s -o - < %s | FileCheck -check-prefix=GISEL %s
include "llvm/Target/Target.td"
include "GlobalISelEmitterCommon.td"
let TargetPrefix = "mytarget" in {
def int_mytarget_sleep0 : Intrinsic<[], [llvm_i32_ty], [ImmArg<ArgIndex<0>>]>;
def int_mytarget_sleep1 : Intrinsic<[], [llvm_i32_ty], [ImmArg<ArgIndex<0>>]>;
}
// GISEL: GIM_CheckOpcode, /*MI*/0, GIMT_Encode2(TargetOpcode::G_INTRINSIC_W_SIDE_EFFECTS),
// GISEL-NEXT: // MIs[0] Operand 0
// GISEL-NEXT: GIM_CheckIntrinsicID, /*MI*/0, /*Op*/0, GIMT_Encode2(Intrinsic::mytarget_sleep0),
// GISEL-NEXT: // MIs[0] src
// GISEL-NEXT: GIM_CheckIsImm, /*MI*/0, /*Op*/1,
// GISEL-NEXT: // (intrinsic_void {{[0-9]+}}:{ *:[iPTR] }, (timm:{ *:[i32] }):$src) => (SLEEP0 (timm:{ *:[i32] }):$src)
// GISEL-NEXT: GIR_BuildRootMI, /*Opcode*/GIMT_Encode2(MyTarget::SLEEP0),
// GISEL-NEXT: GIR_RootToRootCopy, /*OpIdx*/1, // src
def SLEEP0 : I<(outs), (ins i32imm:$src),
[(int_mytarget_sleep0 timm:$src)]
>;
// Test for situation which was crashing in ARM patterns.
def p_imm : Operand<i32>;
def SLEEP1 : I<(outs), (ins p_imm:$src), []>;
// FIXME: This should not crash, but should it work or be an error?
// def : Pat <
// (int_mytarget_sleep1 timm:$src),
// (SLEEP1 imm:$src)
// >;