The DAG has the same instructions: the signed and unsigned absolute
difference of it's input. For AArch64, they map to uabd and sabd for
Neon and SVE. The Neon and SVE instructions will require custom
patterns.
They are pseudo opcodes and are not imported by the IRTranslator. We
need combines to create them.
PowerPC, ARM, and AArch64 have native instructions.
/// i.e trunc(abs(sext(Op0) - sext(Op1))) becomes abds(Op0, Op1)
/// or trunc(abs(zext(Op0) - zext(Op1))) becomes abdu(Op0, Op1)
For GlobalISel, we are going to write the combines in MIR patterns.
see:
llvm/test/CodeGen/AArch64/abd-combine.ll
- [ ] combine into abd
- [ ] legalize and add td patterns
Previously the check comments indicated that [pi][0-9]+ would match as a
type suffix, however the check itself was looking for [pi][0-9]* and
hence an 'i' suffix in isolation was being considered as a type suffix
despite it not having a bitwidth.
This change makes the check consistent with the comment and looks for
[pi][0-9]+
Use uppercase in the subvector description ("32x2" -> "32X4" etc.) - matches what we already do in VBROADCAST??X?, and we try to use uppercase for all x86 instruction mnemonics anyway (and lowercase just for the arg description suffix).
FPCLASS is a unary instruction with an immediate operand - update the naming to match similar instructions (e.g. VPSHUFD) by only using the source reg/mem and immediate in the instruction name
The idea is that by preemptively simplifying the result of `!and` and `!or`, we can fold
some of the conditional operators, like `!if` or `!cond`, as early as
possible.
TableGen builds up a map of "SubRegIdx -> Subclass" where Subclass is
the largest class where all registers have SubRegIdx as a sub-register.
When SubRegIdx (vis-a-vis the sub-register) is artificial it should
still include it in the map. This map is used in various places,
including in the calculation of the Lanemask of a register class, which
otherwise calculates an incorrect lanemask.
When CoveredBySubRegs is true and a sub-register consists of two
parts; a regular subreg and an artificial subreg, then TableGen
should consider only concatenating the non-artificial subregs.
For example, S0_S1 is a concatenated subreg from D0_D1,
but S0_S1_HI should not be considered.
This is to address the latest lit regressions
https://lab.llvm.org/buildbot/#/builders/64/builds/1285 caused by using
the internal lit shell. This change will limit using the internal lit
shell to TableGen on AIX so we do not hit these regressions.
`NumDefs` only counts the number of registers in `(outs)`, not any
implicit defs specified with `Defs = [...]`
This causes patterns with physical register defs to fail to import here
instead of later where implicit defs are rendered.
Add on `ImplicitDefs.size()` to count both and create `DstExpDefs` to
count only explicit defs, used later on.
Check name conflicts between intrinsics caused by mangling suffix.
If the base name of an overloaded intrinsic is a proper prefix of
another intrinsic, check if the other intrinsic name suffix after the
proper prefix can match a mangled type and issue an error if it can.
Some organizations have added operators downstream, and the test match-table.td tends to fail with off-by-n errors (with n being the number of `added operators`) periodically. This patch will increase the test robust and reduce the impact of merge process.
Print assert message after the "assertion failed" message instead of
printing it as a separate note. This makes the assert failure reporting
less verbose and also more useful to see the failure message inline with
the "assertion failed" message.
When record assertions fail, print an error message with the record's
location, so it's easier to see where the record that caused the assert
to fail was instantiated. This is useful when the assert condition in a
class depends on a template parameter, so we need to know the context of
the definition to determine why the assert failed.
Also enhanced the assert.td test to check for these context messages,
and also add checks for some assert failures that were missing in the
test.
Fix source location for anonymous records to be the one of the locations
where that record is instantiated as opposed to the location of the
class that was anonymously instantiated.
Currently, when a record is anonymously instantiated (via
`VarDefInit::instantiate`), we use the location of the class for the
record, which is not correct. Instead, pass in the `SMLoc` for the
location where the anonymous instantiation happens and use that location
when the record is instantiated. If there are multiple anonymous
instantiations with the same parameters, the location for the (single)
record created will be one of these instantiation locations as opposed
to the class location.
Decrease code size of `Intrinsic::getAttributes` function by uniquing
the function and argument attributes separately and using the
`IntrinsicsToAttributesMap` to store argument attribute ID in low 8 bits
and function attribute ID in upper 8 bits.
This reduces the number of cases to handle in the generated switch from
368 to 131, which is ~2.8x reduction in the number of switch cases.
Also eliminate the fixed size array `AS` and `NumAttrs` variable, and
instead call `AttributeList::get` directly from each case, with an
inline array of the <index, AttribueSet> pairs.
Currently, type casts can only be used to pattern match for intrinsics
with a single overloaded return value. For instance:
```
def int_foo : Intrinsic<[llvm_anyint_ty], []>;
def : Pat<(i32 (int_foo)), ...>;
```
This patch extends type casts to support matching intrinsics with
multiple overloaded return values. As an example, the following defines
a pattern that matches only if the overloaded intrinsic call returns an
`i16` for the first result and an `i32` for the second result:
```
def int_bar : Intrinsic<[llvm_anyint_ty, llvm_anyint_ty], []>;
def : Pat<([i16, i32] (int_bar)), ...>;
```
Validate that for target independent intrinsics the second dotted
component of their name (after the `llvm.`) does not match any existing
target names (for which atleast one intrinsic has been defined). Doing
so is invalid as LLVM will search for that intrinsic in that target's
intrinsic table and not find it, and conclude that its an unknown
intrinsic.
Add a !listflatten operator that will transform an input list of type
`list<list<X>>` to `list<X>` by concatenating elements of the
constituent lists of the input argument.
Update llvm's TableGen to emit new explicit symbol visibility macros I
added in https://github.com/llvm/llvm-project/pull/96630 to the function
declarations it creates
The generated functions need to be exported from llvm's shared library
for Clang and some OpenMP tests. @compnerd
This is an implementation of the saturating fp to int conversions for
GlobalISel. On AArch64 the converstion instrctions work this way,
producing saturating results. LegalizerHelper::lowerFPTOINT_SAT is
ported from SDAG.
AArch64 has a lot of existing tests for fptosi_sat, covering a wide
range of types. I have tried to make most of them work all at once, but
a few fall back due to other missing features such as f128 handling for
min/max.
Add PrintError and family overload that accepts a print function. This
avoids constructing potentially long strings for passing into these
print functions.
Fail if we see an intrinsic that returns more than the supported number
of return values.
Intrinsics can return only upto a certain nyumber of values, as defined
by the `IIT_RetNumbers` list in `Intrinsics.td`. Currently, if we define
an intrinsic that exceeds the limit, llvm-tblgen crashes. Instead, read
this limit and fail if it's exceeded with a proper error message.
The InitUndef pass currently uses target-specific pseudo instructions,
with one pseudo per register class.
Instead, add a generic pseudo instruction, which can be used by all
targets and register classes.
Eliminate unused `isTarget` field in Intrinsic record.
Eliminate `isOverloaded`, `Types` and `TypeSig` fields from the record,
as they are already available through the `TypeInfo` field. Change
intrinsic emitter code to look for this info using fields of the
`TypeInfo` record attached to the `Intrinsic` record.
Fix several intrinsic related unit tests to source the `Intrinsic` class
def from Intrinsics.td as opposed to defining a skeleton in the test.
This eliminates some duplication of information in the Intrinsic class,
as well as reduces the memory allocated for record fields, resulting in
~2% reduction (though that's not the main goal).
This patch is part of a set of patches that add an `-fextend-lifetimes`
flag to clang, which extends the lifetimes of local variables and
parameters for improved debuggability. In addition to that flag, the
patch series adds a pragma to selectively disable `-fextend-lifetimes`,
and an `-fextend-this-ptr` flag which functions as `-fextend-lifetimes`
for this pointers only. All changes and tests in these patches were
written by Wolfgang Pieb (@wolfy1961), while Stephen Tozer (@SLTozer)
has handled review and merging. The extend lifetimes flag is intended to
eventually be set on by `-Og`, as discussed in the RFC
here:
https://discourse.llvm.org/t/rfc-redefine-og-o1-and-add-a-new-level-of-og/72850
This patch implements a new intrinsic instruction in LLVM,
`llvm.fake.use` in IR and `FAKE_USE` in MIR, that takes a single operand
and has no effect other than "using" its operand, to ensure that its
operand remains live until after the fake use. This patch does not emit
fake uses anywhere; the next patch in this sequence causes them to be
emitted from the clang frontend, such that for each variable (or this) a
fake.use operand is inserted at the end of that variable's scope, using
that variable's value. This patch covers everything post-frontend, which
is largely just the basic plumbing for a new intrinsic/instruction,
along with a few steps to preserve the fake uses through optimizations
(such as moving them ahead of a tail call or translating them through
SROA).
Co-authored-by: Stephen Tozer <stephen.tozer@sony.com>
- Use formatv() and raw string literals to simplify emission code.
- Use range based for loops and structured bindings to simplify loops.
- Use const Pointers to Records.
- Rename `ComputeFixedEncoding` to `ComputeTypeSignature` to reflect
what the function actually does, cnd change it to return a vector.
- Use reverse() and range based for loop to pack 8 nibbles into 32-bits.
- Rename some variables to follow LLVM coding standards.
- For function memory effects, print human readable effects in comment.