IR intrinsics were already defined, but no codegen support had been
added.
I extracted this code from our downstream. Some of it may have come from
https://repo.hca.bsc.es/gitlab/rferrer/llvm-epi/ originally.
we need first judge N1.getNumOperands() > 0.
If Lowering Generated SDNode like.
```
v2i32 t20: TargetOpNode.
i32 t21: extract_vector_elt t20 0
i32 t22: extract_vector_elt t20 1
```
will cause a error.
Pass the original MMO instead of different individual values.
getAlign() was used before where actually getOriginalAlign() would have been
better, and this patch has the same effect.
Avoid the pre-truncate of BUILD_VECTOR sources when there is more than
one use. This can avoid using unnecessary movs later down the
instruction selection pipeline.
The followed byte of `OPC_EmitRegister` is a MVT type, which is
usually i32 or i64.
We add `OPC_EmitRegisterI32` and `OPC_EmitRegisterI64` so that we
can reduce one byte.
Overall this reduces the llc binary size with all in-tree targets by
about 10K.
We were only folding cases which remained extloads, but DAG.getExtLoad can also handle the cases which don't need to extend at all (we just can't do truncloads).
reduceLoadWidth can handle this for scalar loads, but not for vectors.
Noticed while triaging D152928
The CheckInteger routine called from TableGen-generated selection logic
uses getSExtValue - which will abort if the underlying APInt does not
fit into an int64_t.
This case is now triggered by the SystemZ back-end since i128 is a legal
type on certain machines. While we do not have any regular instructions
that take 128-bit immediates (like most other platforms), there are
patterns in the .td files that recognize an i128 "xor ..., -1" as a
"not".
These patterns cause code to be generated that calls the CheckInteger
routine on some i128-valued integer, which may trigger the assert.
Fix by using trySExtValue instead.
Fixes https://github.com/llvm/llvm-project/issues/75710
If we're lowering a fixed length vector load or store which happens to
exactly VLEN in size (when VLEN is exactly known), we can use a whole
register load or store instead of the unit strided variants. This
doesn't require a vsetvli in some cases, allows additional flexibility
of vsetvli cases in others, and doesn't have a runtime dependency on the
value of VL.
This ensures we clip the index to be in bounds of the vector we are
inserting into. If the index is out of bounds the results of the insert
element is poison. If we don't clip the index we can write memory that
was not part of the original store.
Fixes#74248#75557.
This is a boring mechanical update to support DPValues that look like
dbg.declares in SelectionDAG.
The tests will become "live" once #74090 lands (see for more info).
There are a lot of operations to move current node to parent and
then move to another child.
So `OPC_MoveSibling` and its space-optimized forms are added to do
this "move to sibling" operations.
These new operations will be generated when optimizing matcher in
`ContractNodes`. Currently `MoveParent+MoveChild` will be optimized
to `MoveSibling` and sequences `MoveParent+RecordChild+MoveChild`
will be transformed into `MoveSibling+RecordNode`.
Overall this reduces the llc binary size with all in-tree targets by
about 30K.
If there is only one bit set in EmitNodeInfo, then we can encode it
implicitly to save one byte.
Overall this reduces the llc binary size with all in-tree targets by
about 168K.
The most common type is i32 or i64 so we add `OPC_CheckChildTypeI32`
and `OPC_CheckChildTypeI64` to save one byte.
Overall this reduces the llc binary size with all in-tree targets by
about 70K.
These new opcodes implicitly indicate the RecNo.
The old `OPC_EmitCopyToReg2` is renamed to `OPC_EmitCopyToRegTwoByte`.
Overall this reduces the llc binary size with all in-tree targets by
about 33K (most are from RISCV target).
The most common type is i32 or i64 so we add `OPC_CheckTypeI32` and
`OPC_CheckTypeI64` to save one byte.
Overall this reduces the llc binary size with all in-tree targets by
about 29K.
`llvm/test/CodeGen/RISCV/llvm.frexp.ll` and
`llvm/test/CodeGen/X86/llvm.frexp.ll` contain a number of disabled tests
for unimplemented functionality. This implements one missing part of it.
This should have been part of 943f3e52 which removed the never completed
migration code which added these. I left them out because I thought there
was more generic SDAG code to cleanup, but I'd forgotten that SystemZ
relied on custom legalizing ATOMIC_LOAD to (atomic) LoadSDNode. As a
result, we still need the various legality checks on combines and the
common infrastructure to suport them.
This option enables an experimental lowering for unordered atomics I worked
on a few years back. It never reached production quality, and hasn't been
worked on in years. So let's rip it out.
This wasn't a crazy idea, but I hit some stumbling block which prevented me
from pushing it across the finish line. From the look of 027aa27, that
change description is probably a good summary. I don't remember the
details any longer.
This ensures we clip the index to be in bounds of the vector we are
inserting into. If the index is out of bounds the results of the insert
element is poison. If we don't clip the index we can write memory that
was not part of the original store.
Fixes#74248.
getOperandLatency has the following behavior: it returns -1 as a special
value, negative numbers other than -1 on some target-specific overrides,
or a valid non-negative latency. This behavior can be surprising, as
some callers do arithmetic on these negative values. Change the
interface of getOperandLatency to return a std::optional<unsigned> to
prevent surprises in callers. While at it, change the interface of
getInstrLatency to return unsigned instead of int.
This change was inspired by a refactoring in
TargetSchedModel::computeOperandLatency.
The combine was implicitly assuming that the index on the outer
insert_subvector meant the same thing when the source was switched to be
the index of the inner insert_subvector. This is not true if the
innermost sub-vector is fixed, and the outer subvector is scalable.
I could do a less restrictive fix here - i.e. allow the case where the
scalability of the subvectors are the same - but there's no test
coverage which shows this transform actually has profit. Given that, go
for the simplest fix.
Cleanup work towards removing the method Type::getPointerTo.
If a call to Type::getPointerTo is used solely to support an unneeded
pointer-cast, remove the call entirely.
These two opcodes are used to be followed by a MVT operand, which is
always one of i8/i16/i32/i64.
We add instantiated `OPC_EmitInteger` and `OPC_EmitStringInteger` with
i8/i16/i32/i64 so that we can reduce one byte.
We reserve `OPC_EmitInteger` and `OPC_EmitStringInteger` in case that
we may need them someday, though I haven't found one usage after this
change.
Overall this reduces the llc binary size with all in-tree targets by
about 200K.
It seems TypeSize is currently broken in the sense that:
TypeSize::Fixed(4) + TypeSize::Scalable(4) => TypeSize::Fixed(8)
without failing its assert that explicitly tests for this case:
assert(LHS.Scalable == RHS.Scalable && ...);
The reason this fails is that `Scalable` is a static method of class
TypeSize,
and LHS and RHS are both objects of class TypeSize. So this is
evaluating
if the pointer to the function Scalable == the pointer to the function
Scalable,
which is always true because LHS and RHS have the same class.
This patch fixes the issue by renaming `TypeSize::Scalable` ->
`TypeSize::getScalable`, as well as `TypeSize::Fixed` to
`TypeSize::getFixed`,
so that it no longer clashes with the variable in
FixedOrScalableQuantity.
The new methods now also better match the coding standard, which
specifies that:
* Variable names should be nouns (as they represent state)
* Function names should be verb phrases (as they represent actions)
DAGCombiner folds (select_cc seteq (and x, y), 0, 0, A) to (and (sra
(shl x)) A) where y has a single bit set. Previously, DAGCombiner relies
on `shouldAvoidTransformToShift` to decide when to do the combine, but
`shouldAvoidTransformToShift` is only about shift cost. This patch
introuduces a specific hook to decide when to do the combine and disable
the combine when Zicond enabled and AndMask <= 1024.
DPValues are the non-intrinsic replacements for dbg.values, and when an
IR function is converted by SelectionDAG we need to convert the variable
location information in the same way. Happily all the information is in
the same format, it's just stored in a slightly different object,
therefore this patch refactors a few things to store the set of
{Variable,Expr,DILocation,Location} instead of just a pointer to a
DbgValueInst.
This also adds a hook in llc that's much like the one I've added to opt
in PR #71937, allowing tests to optionally ask for the use RemoveDIs
mode if support for it is built into the compiler.
I've added that flag to a variety of SelectionDAG debug-info tests to
ensure that we get some coverage on the RemoveDIs / debug-info-iterator
buildbot.
`-debug-only=isel-dump` is the new debug type for printing SelectionDAG
after each ISel phase. This can be furthered filter by
`-filter-print-funcs=<function names>`.
Note that the existing `-debug-only=isel` will take precedence over the
new behavior and print SelectionDAG dumps of every single function
regardless of `-filter-print-funcs`'s values.
Avoids infinite issues in some upcoming patches to help D152928 - x86 sees a number of regressions that are addressed by extending SimplifyDemandedVectorEltsForTargetNode to cover more binop opcodes
This is a follow-up to #68981, and fix for #72630, #72447.
We may end up in SelectionDAG::salvageDebugInfo() with indirect debug
values, and attempting to salvage ADD nodes with non-constant RHS would
lead us to try to turn those indirect debug values variadic, which is
not allowed.
This triggered the following assert in the SDDbgValue constructor:
Assertion `!(IsVariadic && IsIndirect)' failed.
This also adds a lit test for salvaging when having an indirect debug
value and constant RHS, as there seems like there was no such lit test.
However, I am not sure if the use of the stack_value operation is
correct in that case (which is existing behavior before #68981), but
that at least documents the current behavior.
visitJumpTable is called on FinishBasicBlock. At that time, getCurSDLoc
will always return SDLoc without DebugLoc since CurInst was set to
nullptr after visiting each instruction.
This patch passes SDLoc to buildJumpTable when visiting SwitchInst so
that visitJumpTable can use it later.