For instructions that don't map to a mnemonic string, the implementation
of MCInstPrinter::getMnemonic would return an invalid pointer due to the
result of the calculation of the instruction's position in the `AsmStrs`
table. This patch fixes the issue by ensuring those cases return a
`nullptr` value instead.
Fixes#74177.
The followed byte of `OPC_EmitRegister` is a MVT type, which is
usually i32 or i64.
We add `OPC_EmitRegisterI32` and `OPC_EmitRegisterI64` so that we
can reduce one byte.
Overall this reduces the llc binary size with all in-tree targets by
about 10K.
- Instead of checking the default ops directly, this change queries DAG
default operands collected during patterns reading. It does not only
simplify the code but also handle few cases where integer values are
converted from convertible types, such as 'bits'.
- A test case is added GlobalISelEmitter.td as the regression test of
default 'bits' values.
This adds a link from the main docs page back to the README where
I have previously added a list of useful resources.
To that list, I've added a link to my recent llvm blog post.
Most users of AddImm and CheckConstantInt only use 1 byte immediates, so
I added an opcode variants for those. That way all those instructions
save 7 bytes.
Also added an opcode for AddTempRegister for the cases where there are
no register flags.
Space savings:
- AMDGPUGenGlobalISel: 470180 bytes to 422564 (-10%)
- AArch64GenGlobalISel.inc: 383893 bytes to 374046
There are a lot of operations to move current node to parent and
then move to another child.
So `OPC_MoveSibling` and its space-optimized forms are added to do
this "move to sibling" operations.
These new operations will be generated when optimizing matcher in
`ContractNodes`. Currently `MoveParent+MoveChild` will be optimized
to `MoveSibling` and sequences `MoveParent+RecordChild+MoveChild`
will be transformed into `MoveSibling+RecordNode`.
Overall this reduces the llc binary size with all in-tree targets by
about 30K.
If there is only one bit set in EmitNodeInfo, then we can encode it
implicitly to save one byte.
Overall this reduces the llc binary size with all in-tree targets by
about 168K.
The most common type is i32 or i64 so we add `OPC_CheckChildTypeI32`
and `OPC_CheckChildTypeI64` to save one byte.
Overall this reduces the llc binary size with all in-tree targets by
about 70K.
These new opcodes implicitly indicate the RecNo.
The old `OPC_EmitCopyToReg2` is renamed to `OPC_EmitCopyToRegTwoByte`.
Overall this reduces the llc binary size with all in-tree targets by
about 33K (most are from RISCV target).
The most common type is i32 or i64 so we add `OPC_CheckTypeI32` and
`OPC_CheckTypeI64` to save one byte.
Overall this reduces the llc binary size with all in-tree targets by
about 29K.
When importing instruction selection patterns into GlobalISel, the
operands matched in the "source" DAG are copied into corresponding
operands of the "destination" DAG according to their names (such as Rd).
If multiple operands in the source DAG share the same name, a
GIM_CheckIsSameOperand predicate makes instruction selector check the
corresponding operands for equality (at compiler run-time) as part of
matching the source pattern.
The Def operands of the root node of the destination DAG are handled
specially. The operands of the instruction corresponding to the root
node are taken and GIM_CheckRegBankForClass predicates are
tablegen-erated accordingly. If by coincidence the Def operand in
question has the same name as one of the named operands in the pattern,
a GIM_CheckIsSameOperand predicate is automatically added that is likely
to prevent matching the source of otherwise applicable selection pattern
at compiler run-time.
This patch mangles the Def operand names taken from the instruction
corresponding to the root of the destination DAG (for example, "Rd"
becomes "DstI[Rd]") preventing unexpected name clashes with pattern's
named operands.
The patch consists of three sets of changes:
* changes to the GlobalISelEmitter.cpp file are the actual fix
* a test case is added to GlobalISelEmitter.td file as a regression test
* everything else is the biggest and least interesting part - updates to
the existing test cases: renames of the form Rd -> DstI[Rd] inside the
inline comments in tablegen-erated code
When we'd originally added unaligned-scalar-mem and
unaligned-vector-mem, they were separated into two parts under the
theory that some processor might implement one, but not the other. At
the moment, we don't have evidence of such a processor. The C/C++ level
interface, and the clang driver command lines have settled on a single
unaligned flag which indicates both scalar and vector support unaligned.
Given that, let's remove the test matrix complexity for a set of
configurations which don't appear useful.
Given these are internal feature names, I don't think we need to provide
any forward compatibility. Anyone disagree?
Note: The immediate trigger for this patch was finding another case
where the unaligned-vector-mem wasn't being properly serialized to IR
from clang which resulted in problems reproducing assembly from clang's
-emit-llvm feature. Instead of fixing this, I decided getting rid of the
complexity was the better approach.
1. Rename the names of tables to simplify the print
2. Align the abbreviation in the same file Instr -> Inst
3. Clang-format
4. Capitalize the first char of the variable name
Split MatchDataInfo, CXXPredicates and the Pattern hierarchy into their
own files.
This should help with maintenance a bit, and make the API easier to
navigate.
I hope this encourages a bit more experimentation with MIR patterns,
e.g. I'd like to try getting them in ISel at some point.
Currently, this is pretty much only moving code around. There is no
significant refactoring in there.
I want to split the Combiner backend even more at some point though,
e.g. by separating the TableGen parsing logic into yet another file so
other backends could very easily parse patterns themselves.
Note: I moved the responsibility of managing string lifetimes into the
backend instead of the Pattern class.
e.g. Before you'd do `P.addOperand(Name)` but now it's
`P.addOperand(insertStrRef(Name))`.
I verified this was done correctly by running the tests with UBSan/ASan.
1. Remove unused variables, e.g X86Subtarget object in performCustomAdjustments
2. Define checkVEXInstPredicate directly instead of generating it b/c
the function is small and it's unlikely we have more instructions to
check the predicate in the future
3. Check the tables are sorted only once for each function
4. Remove some blanks and clang-format code
These two opcodes are used to be followed by a MVT operand, which is
always one of i8/i16/i32/i64.
We add instantiated `OPC_EmitInteger` and `OPC_EmitStringInteger` with
i8/i16/i32/i64 so that we can reduce one byte.
We reserve `OPC_EmitInteger` and `OPC_EmitStringInteger` in case that
we may need them someday, though I haven't found one usage after this
change.
Overall this reduces the llc binary size with all in-tree targets by
about 200K.
PUSH2 and POP2 are two new instructions for (respectively)
pushing/popping 2 GPRs at a time to/from
the stack. The opcodes of PUSH2 and POP2 are those of “PUSH r/m” and
“POP r/m” from legacy map 0, but we
require ModRM.Mod = 3 in order to disallow memory operand.
The 1-bit Push-Pop Acceleration hint described in #73092 applies to
PUSH2/POP2 too, then we have PUSH2P/POP2P.
For AT&T syntax, PUSH2[P] pushes the registers from right to left onto
the stack. POP2[P] pops the stack to registers from right to left. Intel
syntax has the opposite order - from left to right.
The assembly syntax is aligned with GCC & binutils
https://gcc.gnu.org/pipermail/gcc-patches/2023-November/637718.html
JMPABS is a 64-bit only ISA extension, and acts as a near-direct branch
with an absolute target. The 64-bit immediate operand is treated an as
absolute effective address, which is subject to canonicality checks. It
is in legacy map 0 and requires REX2 prefix with `REX2.M0=0` and
`REX2.W=0`. All other REX2 payload bits are ignored.
blog: https://kanrobert.github.io/rfc/All-about-APX-JMPABS/
This patch
1. Extends `ExplicitVEXPrefix` to `ExplicitOpPrefix` for instrcutions
requires explicit `REX2` or `EVEX`
2. Adds `ATTR_REX2` and `IC_64BIT_REX2` to put `JMPABS` , `MOV EAX,
moffs32` in different tables to avoid opcode conflict
NOTE:
1. `ExplicitREX2Prefix` can be reused by the following PUSHP/POPP
instructions.
2. `ExplicitEVEXPrefix` will be used by the instructions promoted to
EVEX space for EGPR.
instregex uses an optimization, where the constant prefix of the regex
is extracted to perform a binary search first. However, this
optimization currently mainly fails to apply, because most instregex
uses have an explicit ^ anchor, which gets counted as a meta char and
disables the optimization.
Make sure the anchor is skipped when determining the prefix. Also fix an
implementation bug this exposes, where the pick a too long prefix if the
first meta character is a quantifier.
This cuts the time needed to generate files like X86GenInstrInfo.inc by
half.
This patch adds "#include <set>" to several files that are relying on
transitive includes of <set>. It in turn unblocks the removal of
unnecessary includes of llvm/ADT/SmallSet.h in several other files.
Also disables generation of MutateOpcode. It's almost never used in
combiners anyway.
If we really want to use it, it needs to be investigated & properly
fixed (see TODO)
Fixes#70780
The inference is trivial and leverages the MCOI OperandTypes encoded in
CodeGenInstructions to infer types across patterns in a CombineRule.
It's thus very limited and only supports CodeGenInstructions (but that's the
main use case so it's fine).
We only try to infer untyped operands in apply patterns when they're
temp reg defs, or immediates. Inference always outputs a `GITypeOf<$x>` where
$x is a named operand from a match pattern.
This allows us to drop the `GITypeOf` in most cases without any errors.
This patch reference ac1ffd3cac to suppot
a soft coding way to identify whether a cpu has a feature
`unaligned-scalar-mem` by `RISCVProcessors.td`.
This patch does not provide test case since there is no risc-v cpu
support `unaligned-scalar-mem` in llvm upstream now.
When this was ported to clang-tblen for https://reviews.llvm.org/D123682,
some of the refactoring for the clang copy was backported to llvm,
but used .front instead of .back as clang does.
This means that if you have values "a, b, c" you get
"must be 'a', ' b' or 'a'." instead of "must be 'a', ' b' or 'c'.".
RegBanks are constructed as global objects. Making the constructor
constexpr helps the compiler construct it without a global
constructor.
clang's optimizer seems to figure this out on its own, but at
least gcc 8 does not.
* Introduce field `PositionOrder` for class `Register` and
`RegisterTuples`
* If register A's `PositionOrder` < register B's `PositionOrder`, then A
is placed before B in the enum in X86GenRegisterInfo.inc
* The new order of registers in the enum for X86 will be
1. Registers before AVX512,
2. AVX512 registers (X/YMM16-31, ZMM0-31, K registers)
3. AMX registers (TMM)
4. APX registers (R16-R31)
* Add a new target hook `getNumSupportedRegs()` to return the number of
registers for the function (may overestimate).
* Replace `getNumRegs()` with `getNumSupportedRegs()` in LiveVariables
to eliminate iterations on unsupported registers
This patch can reduce 0.3% instruction count regression for sqlite3
during compile-stage (O3) by not iterating on APX registers
for #67702