I'm planning to add HwMode support to SubRegIdxRanges for RISC-V GPR
pairs. The MC layer is currently unaware of the HwMode for registers and
I'd like to keep it that way.
This information is not used by the MC layer so I think it is safe to
move it.
Recommits llvm/llvm-project#80378 which was reverted in
llvm/llvm-project#84330. The problem was that the change in
llvm/test/CodeGen/AArch64/GlobalISel/legalizer-info-validation.mir used
217 as an opcode instead of a regex.
This patch is stacked on
https://github.com/llvm/llvm-project/pull/80372,
https://github.com/llvm/llvm-project/pull/80307, and
https://github.com/llvm/llvm-project/pull/80306.
ShuffleVector on scalable vector types gets IRTranslate'd to
G_SPLAT_VECTOR since a ShuffleVector that has operates on scalable
vectors is a splat vector where the value of the splat vector is the 0th
element of the first operand, because the index mask operand is the
zeroinitializer (undef and poison are treated as zeroinitializer here).
This is analogous to what happens in SelectionDAG for ShuffleVector.
`buildSplatVector` is renamed to`buildBuildVectorSplatVector`. I did not
make this a separate patch because it would cause problems to revert
that change without reverting this change too.
This extends the known bits for extending loads which have range
metadata, handling the range metadata on the original memory type,
extending that to the correct BitWidth.
This is another smaller step of #70452, changing the signature of
computeAliasing() from optional<uint64_t> to LocationSize, and follow-up
changes in DAGCombiner::mayAlias(). There are some test change due to
the previous AA->isNoAlias call incorrectly using an unknown size
(~UINT64_T(0)). This should then be improved again in #70452 when the
types are known to be scalable.
Akin to `llvm::PatternMatch` and `llvm::MIPatternMatch`, the
`llvm::SDPatternMatch` introduced in this patch provides a DSL-alike
framework to match SDValue / SDNode with a more succinct syntax.
Currently the new PM infra for codegen puts everything into a
MachineFunctionPassManager. The MachineFunctionPassManager owns both
Module passes and MachineFunction passes, and batches adjacent
MachineFunction passes like a typical PassManager.
The current MachineFunctionAnalysisManager also directly references a
module and function analysis manager to get results.
The initial argument was that the codegen pipeline is relatively "flat",
meaning it's mostly machine function passes with a couple of module
passes here and there. However, there are a couple of issues with this
as compared to a more structured nesting more like the optimization
pipeline. For example, it doesn't allow running function passes then
machine function passes on a function and its machine function all at
once. It also currently requires the caller to split out the IR passes
into one pass manager and the MIR passes into another pass manager.
This patch rewrites the new pass manager infra for the codegen pipeline
to be more similar to the nesting in the optimization pipeline.
Basically, a Function contains a MachineFunction. So we can have Module
-> Function -> MachineFunction adaptors. It also rewrites the analysis
managers to have inner/outer proxies like the ones in the optimization
pipeline. The new pass managers/adaptors/analysis managers can be seen
in use in PassManagerTest.cpp.
This allows us to consolidate to just having to add to one
ModulePassManager when using the codegen pipeline.
I haven't added the Function -> MachineFunction adaptor in this patch,
but it should be added when we merge AddIRPass/AddMachinePass so that we
can run IR and MIR passes on a function before proceeding to the next
function.
The MachineFunctionProperties infra for MIR verification is still WIP.
With the legacy pass manager, MachineModuleInfoWrapperPass owned the
MachineModuleInfo used in the codegen pipeline. It can do this since
it's an ImmutablePass that doesn't get invalidated.
However, with the new pass manager, it is legal for the
ModuleAnalysisManager to clear all of its analyses, regardless of if the
analysis does not want to be invalidated. So we must move ownership of
the MachineModuleInfo outside of the analysis (this is similar to
PassInstrumentation). For now, make the PassBuilder user register a
MachineModuleAnalysis that returns a reference to a MachineModuleInfo
that the user owns. Perhaps we can find a better place to own the
MachineModuleInfo to make using the codegen pass manager less cumbersome
in the future.
This function can be called from buildCopyToRegs where at least one of
the types is a scalable vector type. This function crashed because it
did not know how to handle scalable vector types.
This patch extends the functionality of getGCDType to handle when at
least one of the types is a scalable vector. getGCDType between a fixed
and scalable vector is not implemented since the docstring of the
function explains that getGCDType is used to build MERGE/UNMERGE
instructions and we will never build a MERGE/UNMERGE between fixed and
scalable vectors.
---------
Co-authored-by: Matt Arsenault <arsenm2@gmail.com>
This function can be called from buildCopyToRegs where at least one of
the types is a scalable vector type. This function crashed because it
did not know how to handle scalable vector types.
This patch extends the functionality of getLCMType to handle when at
least one of the types is a scalable vector. getLCMType between a fixed
and scalable vector is not implemented since the docstring of the
function explains that getLCMType is used to build MERGE/UNMERGE
instructions and we will never build a MERGE/UNMERGE between fixed and
scalable vectors.
reserveRegisterTuples is slow because it uses MCRegAliasIterator and
hence ends up reserving the same aliased registers many times. This
patch changes getReservedRegs not to use it for reserving SGPRs, VGPRs
and AGPRs. Instead it iterates through base register classes, which
should come closer to reserving each register once only.
Overall this speeds up the time to run check-llvm-codegen-amdgpu in my
Release build from 18.4 seconds to 16.9 seconds (all timings +/- 0.2).
TargetSchedule.td explicitly allows the usage of a ProcResource for zero
cycles, in order to represent that the ProcResource must be available
but is not consumed by the instruction. On the other hand,
ResourceSegments explicitly does not allow for a zero sized interval. In
order to remedy this, this patch handles the special case of when there
is an empty interval usage of a resource by not adding an empty
interval.
We ran into this issue downstream, but it makes sense to have
this upstream since it is explicitly allowed by TargetSchedule.td.
Add new pass manager support to `llc`. Users can use
`--passes=pass1,pass2...` to run mir passes, and use `--enable-new-pm`
to run default codegen pipeline.
This patch is taken from [D83612](https://reviews.llvm.org/D83612), the
original author is @yuanfang-chen.
---------
Co-authored-by: Yuanfang Chen <455423+yuanfang-chen@users.noreply.github.com>
In fact, there are several backends, e.g. AArch64, AMDGPU etc. add
module pass after function pass, this patch removes this constraint.
This patch also adds a simple unit test for `CodeGenPassBuilder`.
This tries to allow libcalls to be tail called, using a similar method
to DAG where the type is checked to make sure they match, and if so the
backend, through lowerCall checks that the tailcall is valid for all
arguments.
Commit 1b531d54f6 (#74203) removed the usage of unique_ptrs of arrays
in favour of using vectors, but inadvertently increased peak memory
usage by removing the ability to deallocate vector memory that was no
longer needed mid-LDV.
In that same review, it was pointed out that `FuncValueTable` typedef
could be removed, since it was "just a vector".
This commit addresses both issues by making `FuncValueTable` a real data
structure, capable of mapping BBs to ValueTables and able to free
ValueTables as needed.
This reduces peak memory usage in the compiler by 10% in the benchmarks
flagged by the original review.
As a consequence, we had to remove a handful of instances of the
"declare-then-initialize" antipattern in unittests, as the
FuncValueTable class is no longer default-constructible.
These are usually difficult to reason about, and they were being used to
pass raw pointers around with array semantic (i.e., we were using
operator [] on raw pointers). To put it in InstrRef terminology: we were
passing a pointer to a ValueTable but using it as if it were a
FuncValueTable.
These could have easily been SmallVectors, which now allow us to have
reference semantics in some places, as well as simpler initialization.
In the future, we can use even more pass-by-reference with some extra
changes in the code.
This patch replaces uses of StringRef::{starts,ends}with with
StringRef::{starts,ends}_with for consistency with
std::{string,string_view}::{starts,ends}_with in C++20.
I'm planning to deprecate and eventually remove
StringRef::{starts,ends}with.
The functions should not compare results of
getExtendedSizeInBits(), i.e TypeSize variables with plain integer values,
but create a fixed TypeSize object so the correct operator can be used.
It seems TypeSize is currently broken in the sense that:
TypeSize::Fixed(4) + TypeSize::Scalable(4) => TypeSize::Fixed(8)
without failing its assert that explicitly tests for this case:
assert(LHS.Scalable == RHS.Scalable && ...);
The reason this fails is that `Scalable` is a static method of class
TypeSize,
and LHS and RHS are both objects of class TypeSize. So this is
evaluating
if the pointer to the function Scalable == the pointer to the function
Scalable,
which is always true because LHS and RHS have the same class.
This patch fixes the issue by renaming `TypeSize::Scalable` ->
`TypeSize::getScalable`, as well as `TypeSize::Fixed` to
`TypeSize::getFixed`,
so that it no longer clashes with the variable in
FixedOrScalableQuantity.
The new methods now also better match the coding standard, which
specifies that:
* Variable names should be nouns (as they represent state)
* Function names should be verb phrases (as they represent actions)
FileCheck currently compiles a regular expression of the form
`Prefix1|Prefix2|...` and uses it to find the next prefix in the input.
If we had a fast regex implementation, this would be a useful thing to
do, as the regex implementation would be able to match multiple prefixes
more efficiently than a naive approach. However, with our actual regex
implementation, finding the prefixes basically becomes O(InputLen *
RegexLen * LargeConstantFactor), which is a lot worse than a simple
string search.
Replace the regex with StringRef::find(), and keeping track of the next
position of each prefix. There are various ways this could be improved
on, but it's already significantly faster that the previous approach.
For me, this improves check-llvm time from 138.5s to 132.5s, so by
around 4-5%.
For vector-interleaved-load-i16-stride-7.ll in particular, test time
drops from 5s to 2.5s.
The current isScalable function requires a user to call isVector before
hand in order to avoid an assertion failure in the case that the LLT is
not a vector.
This patch addds helper functions that allow a user to query whether the
LLT is fixed or scalable, not wanting an assertion failure in the case
that the LLT was never a vector in the first place.
As alluded to in #20571, it would be nice if we could mutate operand
lists of MachineInstr's more safely. Add an insert method that together
with removeOperand allows for easier splicing of operands.
Splitting this patch off early to get feedback; I need to either:
- mutate an INLINEASM{_BR} MachinInstr's MachineOperands from being
registers (physical or virtual) to memory
(MachineOperandType::MO_FrameIndex). These are not 1:1 operand
replacements, but N:M operand replacements. i.e. we need to
update 2 MachineOperands into the middle of the operand list to 5 (at
least for x86_64).
- copy, modify, write a new MachineInstr which has its relevant operands
replaced.
Either approaches are hazarded by existing references to either the
operands being moved, or the instruction being removed+replaced. For my
purposes in regalloc, either seem to work for me, so hopefully reviewers
can help me determine which approach is preferable. The second would
involve no new methods on MachineInstr.
One question I had while looking at this was: "why does MachineInstr
have BOTH a NumOperands member AND a MCInstrDesc member that itself has
a NumOperands member? How many operands can a MachineInstr have? Do I
need to update BOTH (keeping them in sync)?" FWICT, only "variadic"
MachineInstrs have MCInstrDesc with NumOperands (of the MCInstrDesc) set
to zero. If the MCInstrDesc's NumOperands is non-zero, then the
NumOperands
on the MachineInstr itself cannot exceed this value (IIUC) else an
assert will
be triggered.
For most non-psuedo instructions (or at least non-varidic instructions),
insert is less likely to be useful.
To run the newly added unittest:
$ pushd llvm/build; ninja CodeGenTests; popd
$ ./llvm/build/unittests/CodeGen/CodeGenTests \
--gtest_filter=MachineInstrTest.SpliceOperands
This is meant to mirror `MCInst::insert`.
This is an attempt at rebooting https://reviews.llvm.org/D28990
I've included AutoUpgrade changes to modify the data layout to satisfy the compatible layout check. But this does mean alloca, loads, stores, etc in old IR will automatically get this new alignment.
This should fix PR46320.
Reviewed By: echristo, rnk, tmgross
Differential Revision: https://reviews.llvm.org/D86310
The legalizer currently generates lots of G_AND artifacts.
For example between boolean uses and defs there is always a G_AND with a mask of 1, but when the target uses ZeroOrOneBooleanContents, this is unnecessary.
Currently these artifacts have to be removed using post-legalize combines.
Omitting these artifacts at their source in the artifact combiner has a few advantages:
- We know that the emitted G_AND is very likely to be useless, so our KnownBits call is likely worth it.
- The G_AND and G_CONSTANT can interrupt e.g. G_UADDE/... sequences generated during legalization of wide adds which makes it harder to detect these sequences in the instruction selector (e.g. useful to prevent unnecessary reloading of AArch64 NZCV register).
- This cleans up a lot of legalizer output and even improves compilation-times.
AArch64 CTMark geomean: `O0` -5.6% size..text; `O0` and `O3` ~-0.9% compilation-time (instruction count).
Since this introduces KnownBits into code-paths used by `O0`, I reduced the default recursion depth.
This doesn't seem to make a difference in CTMark, but should prevent excessive recursive calls in the worst case.
Reviewed By: aemerson
Differential Revision: https://reviews.llvm.org/D159140
This will make it easy for callers to see issues with and fix up calls
to createTargetMachine after a future change to the params of
TargetMachine.
This matches other nearby enums.
For downstream users, this should be a fairly straightforward
replacement,
e.g. s/CodeGenOpt::Aggressive/CodeGenOptLevel::Aggressive
or s/CGFT_/CodeGenFileType::
There are missing include and using in TextStubTests and
AsmPrinterDwarfTest and they causes build failures when using vanilla
GoogleTest v1.14.0. This patch fixes this issue.