SubtargetEmitter::GenSchedClassTables takes a CodeGenProcModel, but
calls hasReadOfWrite which loops over all ProcModels. We move
hasReadOfWrite to CodeGenProcModel and remove the loop over all
ProcModels. This leads to a 144% speedup on the RISC-V backend of our
downstream.
This is just the TableGen-side changes, split out as the minimal
testable unit. It doesn't yet transition RVA23 and friends to be
experimental (and add the necessary other changes for this to work).
Although choosing not to emit the SupportedExperimentalProfiles array if
no experimental profiles are present isn't consistent with what we do
for experimental extensions, we need to do this in order to avoid adding
a warning for the empty array when building LLVM for as long as we don't
have any experimental profiles defined.
getAllDerivedDefinitions produces a fatal error if there are no
definitions. In practice this isn't much of a problem for
llvm/lib/Target/RISCV/*.td where it's hard to imagine not having at
least one of the required defitions. But it limits our ability to
structure and maintain tests (which is how I came across this issue).
This commit moves to using getAllDerivedDefinitionsIfDefined and aims to
skip emission of data structures that make no sense if no definitions
were found.
Generate TargetParser extension information from tablegen. This includes FMV extension information. FMV only extensions are represented by a separate tablegen class.
Use MArchName/ArchKindEnumSpelling to avoid renamings.
Cases where there is simply a case difference are handled by
consistently uppercasing the AEK_ name in the emitted code.
Remove some Extensions which were not needed.
These had AEK entries but were never actually used for anything.
They are not present in Extensions[] data.
I'm planning to remove StringRef::equals in favor of
StringRef::operator==.
- StringRef::operator==/!= outnumber StringRef::equals by a factor of
70 under llvm/ in terms of their usage.
- The elimination of StringRef::equals brings StringRef closer to
std::string_view, which has operator== but not equals.
- S == "foo" is more readable than S.equals("foo"), especially for
!Long.Expression.equals("str") vs Long.Expression != "str".
1. Bitwise operations are used to access HwMode, allowing for the
coexistence of HwMode IDs for different features (such as RegInfo and
EncodingInfo). This will provide better scalability for HwMode.
Currently, most users utilize HwMode primarily for configuring
Register-related information, and few use it for configuring Encoding.
The limited scalability of HwMode has been a significant factor in this
usage pattern.
2. Sink the HwMode Encodings selection logic down to per instruction
level, this makes the logic for choosing encodings clearer and provides
better error messages.
3. Add some HwMode ID conflict detection to the getHwMode() interface.
Constant registers like the zero registers XZR and WZR are treated as
any other register by LLVM-MCA. This can create non existent dependency
chains.
Currently there is no method in MCA to query if a register is constant.
This patch fixes the issue by adding a bool Constant
variable to MCRegisterDesc that is true for constant registers. Since
constant registers do not create dependencies, it makes sense to add
this check to MCA.
- Remove some cases where ULEB128 isn't needed
- Add a fastDecodeULEB128 tailored for GlobalISel which does unchecked
decoding optimized for the common case, which is 1 byte values. We
rarely have >1 byte Inst IDs, OpIdx, etc. and those are the most common
ULEB users by far.
This specific LEB128 decode function generates almost 2x less
instructions than the generic one.
As the list of profiles grow, this will be a more efficient lookup.
Because the profile name is a prefix of the Arch string, we use
upper_bound to find the first profile that definitely comes after the
Arch string. If that isn't the first supported profile, we move back 1
profile and see if that profile is a prefix of our Arch string.
Re-land 61b2a0e333. Some Windows builds
were failing because AArch64TargetParserDef.inc is a generated header
which is included transitively into some clang components, but this
information is not available to the build system and therefore there is
a missing edge in the dependency graph. This patch incorporates the
fixes described in ac1ffd3caca12c254e0b8c847aa8ce8e51b6cfbf/D142403.
Thanks to ExtensionSet::toLLVMFeatureList, all values of ArchExtKind
should correspond to a particular -target-feature. The valid values of
-target-feature are in turn defined by SubtargetFeature defs.
Therefore we can generate ArchExtKind from the tablegen data. This is
done by adding an Extension class which derives from SubtargetFeature.
Because the Has* FieldNames do not always correspond to the AEK_
names ("extensions", as defined in TargetParser), and AEK_ names do
not always correspond to -march strings, some additional enum entries
have been added to remap the names. I have renamed these to make the
naming consistent, but split them into a separate PR to keep the diff
reasonable (#90320)
Thanks to ExtensionSet::toLLVMFeatureList, all values of ArchExtKind
should correspond to a particular -target-feature. The valid values of
-target-feature are in turn defined by SubtargetFeature defs.
Therefore we can generate ArchExtKind from the tablegen data. This is
done by adding an Extension class which derives from SubtargetFeature.
Because the Has* FieldNames do not always correspond to the AEK_
names ("extensions", as defined in TargetParser), and AEK_ names do
not always correspond to -march strings, some additional enum entries
have been added to remap the names. I have renamed these to make the
naming consistent, but split them into a separate PR to keep the diff
reasonable (#90320)
Support patterns like
Pat<(p0 frameindex:$fi), (ADD tframeindex:$fi, 0)>;
in the GlobalISel emitter in TableGen. Currently, using such a pattern
results in an error message.
We should remove the `experimental-` prefix when printing march
string.
We didn't meet this problem because there is no processor containing
experimental extensions.
Reviewers: fpetrogalli, asb, topperc
Reviewed By: topperc, asb
Pull Request: https://github.com/llvm/llvm-project/pull/90185
Previously we had an individiaul global array of implied extensions for
each extension that needed it. This allowed each array to have a
different length. Then we had a sorted table that stored pointers and
size for the indivual arrays keyed by the extension name.
This patch changes the sorted table to use multiple rows if multiple
extensions are implied. We use equal_range instead of lower_bound to
find all the rows that apply to a given extension.
The CombineIntoExts array was also modified to store only the extension
name that need to be combined. This extension name is looked up in the
implied table to find all the extensions it depends on.
In the AMDGPU backend we have some cases where we'd like to mark an
intrinsic as IntrInaccessibleMemOnly to model dependencies, but the
corresponding MachineInstrs use uses/defs of a special physical register
to express the same thing. In this case TableGen would complain:
Pattern doesn't match mayLoad/mayStore = 0
but the error is not useful.
Added GISelShouldIgnore property to class Pattern in TargetSelectionDAG.td; it's similar to FastISelShouldIgnore. This bit can be put on a record to avoid its pattern import within GlobalISelEmitter. This allows one to avoid the record's GISel .td implementation, .inc generation, and any skipped pattern warnings from -warn-on-skipped-patterns.
This generates the SupportedExtensions and ImpliedExts information from
the RISCVExtension records found in RISCVFeatures.td.
Some of the extensions listed in the individual `ImpliedExts*` arrays
may be in a different, but the order in those array doesn't matter. I
manually verified the all the extensions were still present in each
array.
I've added the new information to the existing RISCVTargetParserDef.inc
and RISCVTargetDefEmitter.cpp so we don't need to re-parse the entirety
of RISCV.td a second time for a new file.
Introduce a mechanism to share data between the ARM and AArch64 backends and
TargetParser, to reduce duplication of code. This is similar to the current
RISC-V implementation.
The target tablegen file (in this case `ARM.td` or `AArch64.td`) is
processed during building of `TargetParser` to generate the following
files in the build tree:
- `build/include/llvm/TargetParser/ARMTargetParserDef.inc`
- `build/include/llvm/TargetParser/AArch64TargetParserDef.inc`
For now, the use of these generated files is limited to files _outside_
of `TargetParser`. The main reason for this is that the modifications to
`TargetParser` will require additional data added to the tablegen files,
which I want to split into separate PRs.
The vast majority of the following (very common) opcodes were always
called with identical arguments:
- `GIM_CheckType` for the root
- `GIM_CheckRegBankForClass` for the root
- `GIR_Copy` between the old and new root
- `GIR_ConstrainSelectedInstOperands` on the new root
- `GIR_BuildMI` to create the new root
I added overloaded version of each opcode specialized for the root
instructions. It always saves between 1 and 2 bytes per instance
depending on the number of arguments specialized into the opcode. Some
of these opcodes had between 5 and 15k occurences in the AArch64
GlobalISel Match Table.
Additionally, the following opcodes are almost always used in the same
sequence:
- `GIR_EraseFromParent 0` + `GIR_Done`
- `GIR_EraseRootFromParent_Done` has been created to do both. Saves 2
bytes per occurence.
- `GIR_IsSafeToFold` was *always* called for each InsnID except 0.
- Changed the opcode to take the number of instructions to check after
`MI[0]`
The savings from these are pretty neat. For `AArch64GenGlobalISel.inc`:
- `AArch64InstructionSelector.cpp.o` goes down from 772kb to 704kb (-10%
code size)
- Self-reported MatchTable size goes from 420380 bytes to 352426 bytes
(~ -17%)
A smaller match table means a faster match table because we spend less
time iterating and decoding.
I don't have a solid measurement methodology for GlobalISel performance
so I don't have precise numbers but I saw a few % of improvements in a
simple testcase.
This introduces a new file, RISCVISAUtils.cpp and moves the rest of
RISCVISAInfo to the TargetParser library.
This will allow us to generate part of RISCVISAInfo.cpp using tablegen.
Coverity (a static analysis tool) reported that the emitted 'Features'
variable inside emitComputeAvailableFeatures in TableGen might be
unitialized.
Silence this warning by adding brackets for the default initialization.
Adapt test cases to take additional brackets into account.
Instead of using RISCVISAInfo's extension information, use the extension
found in tblgen after #89326.
We still need to use RISCVISAInfo code to get the sorting rules for the
ISA string.
The ISA string we generate now is not quite the same extension we had
before. No implied extensions are included in the generate string unless
they are explicitly listed in RISCVProcessors.td. This primarily affects
Zicsr being implied by F, V implying Zve*, and Zvl*b implying a smaller
Zvl*b. All of these implication should be picked up when the string is
used by the frontend.
The benefit is that we get a more manageable ISA string for humans to
deal with.
This is a step towards generating RISCVISAInfo's extension list from
tblgen.
Emit a special leaf construct table in DirectiveEmitter.cpp, which will
allow both decomposition of a construct into leafs, and composition of
constituent constructs into a single compound construct (if possible).
The function `getLeafConstructs` is no longer auto-generated, but
implemented in OMP.cpp.
The table contains a row for each directive, and each row has the
following format
`dir_id, num_leafs, leaf1, leaf2, ..., leafN, -1, ...`
The rows are sorted lexicographically with respect to the leaf
constructs. This allows a binary search for the row corresponding to the
given list of leafs.
There is an auxiliary table that for each directive contains the index
of the row corresponding to that directive.
Looking up leaf constructs for a directive `dir_id` is of constant time,
and and consists of two lookups: `LeafTable[Auxiliary[dir_id]]`.
Finding a compound directive given the set of leafs is of time O(logn),
and is roughly represented by
`row = binary_search(LeafTable); return row[0]`.
The functions `getLeafConstructs` and `getCompoundConstruct` use these
lookup methods internally.
This code used to use the PadToColumn feature of formatted_raw_ostream,
but no longer does. formatted_raw_ostream is slower than regular
raw_ostream because it has to keep track of the number of character
since the last new line character.
- If a def operand includes multiple sub-operands, count them when
generating instr info.
- Found issues in x86 and sparc backends, where memory operands of
store or store-like instructions are wrongly placed in the output
list.
Reviewers: jayfoad, arsenm, Pierre-vh
Reviewed By: arsenm
Pull Request: https://github.com/llvm/llvm-project/pull/88972
This is largely a revert of commit
e817966718.
As #88029 shows, there exists hardware that only supports unaligned
scalar.
I'm leaving how this gets exposed to the clang interface to a future
patch.
And use it to print the correct default OpenMP version for flang and
flang -fc1.
This change adds an optional `HelpTextsForVariants` to options. This
allows you to change the help text that gets shown in documentation and
`--help` based on the program its being generated for.
As `OptTable` needs to be constexpr compatible, I have used a std::array
of help text variants. Each entry is:
(list of visibilities) - > help text string
So for the OpenMP version we have (flang, fc1) -> "OpenMP version for
flang is...".
So you can have multiple visibilities use the same string. The number of
entries is currently set to 1, and the number of visibilities per entry
is 2, because that's the maximum we need for now. The code is written so
we can increase these numbers later, and the unused elements will be initialised.
I have not applied this to group descriptions just because I don't know
of one that needs changing. It could easily be enabled for those too if
needed. There are minor changes to them just to get it all to compile.
This approach of storing many help strings per option in the 1 driver
library seemed preferable to making a whole new library for Flang (even
if that would mostly be including stuff from Clang).
We have a check of whether an operand is in the instruction pattern, and
emit an
error if it is not, but we simply continue execution, including directly
dereferencing a point-like object `InVal`, which will be just created
when
accessing the map. It contains a `nullptr` so dereferencing it causes
crash.
This is a very trivial fix.
1. Remove 'AllModes' and 'DefaultMode' suffixes for DecoderTables under
default HwMode.
2. Introduce a less aggressive suppression for HwMode DecoderTable, only
reduce necessary tables duplications. This allows encodings under
different HwModes to retain the original DecoderNamespace.
3. Change 'suppress-per-hwmode-duplicates' command option from bool type
to enum type, allowing users to choose what level of suppression to use.