SubOpAliases maps a sub-operand name to the respective operand's index
and the sub-operand number within this operand. The operand index is
used for the Operands array.
Currently MIOperandNo is used as the operand index, which is not
correct. For example, if there are 2 operands with 3 sub-operands each:
(ins (bdladdr12onlylen4 $B1, $D1, $L1):$BDL1,
(bdladdr12onlylen4 $B2, $D2, $L2):$BDL2)
then B2's operand index will be 3, but the correct value is 1.
Reviewed By: jyknight
Differential Revision: https://reviews.llvm.org/D155158
In D146869 @arsenm pointed out that the constrained intrinsics aren't
getting the strictfp attribute by default. They should be since they are
required to have it anyway.
TableGen did not know about this attribute until now. This patch adds
strictfp to TableGen, and it uses it on all of the constrained intrinsics.
Differential Revision: https://reviews.llvm.org/D154991
Adds a new backend to power the GISel Combiners using the InstructionSelector's match tables.
This does not depend on any of the data structures created for the current combiner and is intended to replace it entirely.
See the RFC for more details: https://discourse.llvm.org/t/rfc-matchtable-based-globalisel-combiners/71457/6
Note: this would replace D141135.
Reviewed By: aemerson, arsenm
Differential Revision: https://reviews.llvm.org/D153757
Move all of the reusable logic out of `GlobalISelEmitter.cpp` into a `GlobalISelMatchTableExecutorEmitter` class so the future combiner backend can use it as well.
Depends on D153755
Reviewed By: aemerson
Differential Revision: https://reviews.llvm.org/D153756
Makes `InstructionSelector.h`/`InstructionSelectorImpl.h` generic so the match tables can also be used for the combiner.
Some notes:
- Coverage was made an optional parameter of `executeMatchTable`, combines won't use it for now.
- `GIPFP_` -> `GICXXPred_` so it's more generic. Those are just C++ predicates and aren't PatFrag-specific.
- Pass the MatcherState directly to testMIPredicate_MI, the combiner will need it.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D153755
While working on DAGISelMatcherEmitter I've hit several runtime errors
caused by accessing TreePatternNode::Types out of bounds. These were
difficult to debug because the switch from std::vector to unique_ptr
removes bounds checking.
I don't think the slight reduction in class size is worth the extra
debugging and memory safety problems, so I suggest we revert this.
This reverts commit d34125a1a8.
Differential Revision: https://reviews.llvm.org/D154781
We don't expect this to be used on RV32 currently so remove it
to reduce number of entries in the isel table.
Teach RegisterInfoEmitter.cpp to allow a type to be missing for
a particular HwMode.
When using VSCode it'll default to the Python kernel the first
time you open the notebook. Mention this in the readme, as the fix
is simple but only if you know what to look for.
Previously the kernel.json would always point to `python3` even if you
installed using a python from a virtualenv. This meant that tools like VSCode
would try to run the kernel against the system python and fail.
Added a note to the readme about it. I've removed the need to
add to PYTHONPTHON as well, turns out it wasn't needed.
This fixes an issue reported in https://discourse.llvm.org/t/tablegen-the-playground-ipynb-file-is-not-working-as-expected/71745.
Reviewed By: awarzynski
Differential Revision: https://reviews.llvm.org/D154351
ParseStatus is slightly more convenient to use due to implicit
conversion from bool, which allows to do something like:
```
return Error(L, "msg");
```
when with MatchOperandResultTy it had to be:
```
Error(L, "msg");
return MatchOperand_ParseFail;
```
It also has more appropriate name since parse* methods are not only for
parsing operands.
Reviewed By: kosarev
Differential Revision: https://reviews.llvm.org/D154303
The sort of the elements in the GET_SUBTARGETINFO_MACRO block is done on
the "Name" field of each record. This field is not guaranteed to be unique,
is not guaranteed to even have a value at all, and is not used in the
output anyway. Change to sort on the "FieldName" field which should be
unique.
Problem spotted when lib/Target/PowerPC/PPCGenSubtargetInfo.inc changed
unexpectedly.
Differential Revision: https://reviews.llvm.org/D153371
The clause parser generation was not taking into account the
`isValueList` flag. This patch updates the emitter to generate
the correct code.
Reviewed By: razvanlupusoru
Differential Revision: https://reviews.llvm.org/D153801
SubtargetFeature.h is currently part of MC while it doesn't depend on
anything in MC. Since some LLVM components might have the need to work
with target features without necessarily needing MC, it might be
worthwhile to move SubtargetFeature.h to a different location. This will
reduce the dependencies of said components.
Note that I choose TargetParser as the destination because that's where
Triple lives and SubtargetFeatures feels related to that.
This issues came up during a JITLink review (D149522). JITLink would
like to avoid a dependency on MC while still needing to store target
features.
Reviewed By: MaskRay, arsenm
Differential Revision: https://reviews.llvm.org/D150549
Accidentally copy-pasted them into the .cpp while refactoring the file in D151432
Those functions are currently only used in the .cpp so it didn't cause an issue, but it causes an undefined reference if another file attempts to use them.
The word "attribute" has a specific meaning in LLVM. Avoid using it
here to mean something different.
This addresses feedback from D153180.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D153444
Re-landing the code that was reverted because of the buildbot failure
in https://lab.llvm.org/buildbot#builders/9/builds/27319.
Original commit message
======================
The class `ResourceSegments` is used to keep track of the intervals
that represent resource usage of a list of instructions that are
being scheduled by the machine scheduler.
The collection is made of intervals that are closed on the left and
open on the right (represented by the standard notation `[a, b)`).
These collections of intervals can be extended by `add`ing new
intervals accordingly while scheduling a basic block.
Unit tests are added to verify the possible configurations of
intervals, and the relative possibility of scheduling a new
instruction in these configurations. Specifically, the methods
`getFirstAvailableAtFromBottom` and `getFirstAvailableAtFromTop` are
tested to make sure that both bottom-up and top-down scheduling work
when tracking resource usage across the basic block with
`ResourceSegments`.
Note that the scheduler tracks resource usage with two methods:
1. counters (via `std::vector<unsigned> ReservedCycles;`);
2. intervals (via `std::map<unsigned, ResourceSegments> ReservedResourceSegments;`).
This patch can be considered a NFC test for existing scheduling models
because the tracking system that uses intervals is turned off by
default (field `bit EnableIntervals = false;` in the tablegen class
`SchedMachineModel`).
Reviewed By: andreadb
Differential Revision: https://reviews.llvm.org/D150312
Reverted because it produces the following builbot failure at https://lab.llvm.org/buildbot#builders/9/builds/27319:
/b/ml-opt-rel-x86-64-b1/llvm-project/llvm/unittests/CodeGen/SchedBoundary.cpp: In member function ‘virtual void ResourceSegments_getFirstAvailableAtFromBottom_empty_Test::TestBody()’:
/b/ml-opt-rel-x86-64-b1/llvm-project/llvm/unittests/CodeGen/SchedBoundary.cpp:395:31: error: call of overloaded ‘ResourceSegments(<brace-enclosed initializer list>)’ is ambiguous
395 | auto X = ResourceSegments({});
| ^
This reverts commit dc312f0331.
The class `ResourceSegments` is used to keep track of the intervals
that represent resource usage of a list of instructions that are
being scheduled by the machine scheduler.
The collection is made of intervals that are closed on the left and
open on the right (represented by the standard notation `[a, b)`).
These collections of intervals can be extended by `add`ing new
intervals accordingly while scheduling a basic block.
Unit tests are added to verify the possible configurations of
intervals, and the relative possibility of scheduling a new
instruction in these configurations. Specifically, the methods
`getFirstAvailableAtFromBottom` and `getFirstAvailableAtFromTop` are
tested to make sure that both bottom-up and top-down scheduling work
when tracking resource usage across the basic block with
`ResourceSegments`.
Note that the scheduler tracks resource usage with two methods:
1. counters (via `std::vector<unsigned> ReservedCycles;`);
2. intervals (via `std::map<unsigned, ResourceSegments> ReservedResourceSegments;`).
This patch can be considered a NFC test for existing scheduling models
because the tracking system that uses intervals is turned off by
default (field `bit EnableIntervals = false;` in the tablegen class
`SchedMachineModel`).
Reviewed By: andreadb
Differential Revision: https://reviews.llvm.org/D150312
A function is already emitted in *GenInstrInfo.inc that takes Opcode
number and a set of supported Features and reports fatal error if some
of the required features are missing.
The information about features required by the particular opcode can be
reused by llvm-exegesis, so move its computation info a separate
computeRequiredFeatures() function. Then verifyInstructionPredicates()
can just compare the sets of available and required features computed by
the other functions.
This commit moves the definition of FeatureBitsets[] as well as CEFBS_*
enumerator values (that are indices into FeatureBitsets[] array) inside
the computeRequiredFeatures() function because these are implementation
details of that function. The inclusion of potentially huge
computeRequiredFeatures() function is now controlled by a dedicated
macro that is set for simplicity by TableGen-erated code itself if
`defined(ENABLE_INSTR_PREDICATE_VERIFIER) && !defined(NDEBUG)`.
~~
Huawei RRI, OS Lab
Reviewed By: courbet
Differential Revision: https://reviews.llvm.org/D148516
This patch splits the GlobalISelEmitter.cpp file, which imports DAG ISel patterns for GISel, into separate "GISelMatchTable.h/cpp" files.
The main motive is readability & maintainability. GlobalISelEmitter.cpp was about 6400 lines of mixed code, some bits implementing the match table codegen, some others dedicated to importing DAG patterns.
Now it's down to 2700 + a 2150 header + 2000 impl.
It's a tiny bit more lines overall but that's to be expected - moving
inline definitions to out-of-line, adding comments in the .cpp, etc. all of that takes additional space, but I think the tradeoff is worth it.
I did as little unrelated code changes as possible, I would say the biggest change is the introduction of the `gi` namespace used to prevent name conflicts/ODR violations with type common names such as `Matcher`.
It was previously not an issue because all of the code was in an anonymous namespace.
This moves all of the "match table" code out of the file, so predicates,
rules, and actions are all separated now. I believe this helps separating concerns, now `GlobalISelEmitter.cpp` is more focused on importing DAG patterns into GI, instead of also containing the whole match table internals as well.
Note: the new files have a "GISel" prefix to make them distinct from the other "GI" files in the same folder, which are for the combiner.
Reviewed By: aemerson
Differential Revision: https://reviews.llvm.org/D151432
This patch splits the GlobalISelEmitter.cpp file, which imports DAG ISel patterns for GISel, into separate "GISelMatchTable.h/cpp" files.
The main motive is readability & maintainability. GlobalISelEmitter.cpp was about 6400 lines of mixed code, some bits implementing the match table codegen, some others dedicated to importing DAG patterns.
Now it's down to 2700 + a 2150 header + 2000 impl.
It's a tiny bit more lines overall but that's to be expected - moving
inline definitions to out-of-line, adding comments in the .cpp, etc. all of that takes additional space, but I think the tradeoff is worth it.
I did as little unrelated code changes as possible, I would say the biggest change is the introduction of the `gi` namespace used to prevent name conflicts/ODR violations with type common names such as `Matcher`.
It was previously not an issue because all of the code was in an anonymous namespace.
This moves all of the "match table" code out of the file, so predicates,
rules, and actions are all separated now. I believe this helps separating concerns, now `GlobalISelEmitter.cpp` is more focused on importing DAG patterns into GI, instead of also containing the whole match table internals as well.
Note: the new files have a "GISel" prefix to make them distinct from the other "GI" files in the same folder, which are for the combiner.
Reviewed By: aemerson
Differential Revision: https://reviews.llvm.org/D151432
The lists contain differences between register numbers, not the register
numbers themselves. Since a difference can also be negative, this also
changes its type to signed.
Changing the type to signed exposed a "bug". For AMDGPU, which has many
registers, the first element of a sequence could be as big as ~45k.
The value does not fit into int16_t, but fits into uint16_t. The bug
didn't show up because of unsigned wrapping and truncation of the Val
field in the advance() method.
To fix the issue, I changed the way regunit difflists are encoded. The
4-bit 'scale' field of MCRegisterDesc::RegUnit was replaced by 12-bit
number of the first regunit, and the first element of each of the lists
was removed. The higher 20 bits of RegUnit field contain the initial
offset into DiffLists array.
AMDGPU has 1'409 regunits (2^12 = 4'096), and the biggest offset is
80'041 (2^20 = 1'048'576). That is, there is enough room.
Changing the encoding method also resulted in a smaller array size, the
numbers are below (I omitted targets with less than 100 elements).
```
AMDGPU | 80052 | 78741 | -1,6%
RISCV | 6498 | 6297 | -3,1%
ARM | 4181 | 3966 | -5,1%
AArch64 | 2770 | 2592 | -6,4%
PPC | 1578 | 1441 | -8,7%
Hexagon | 994 | 740 | -25,6%
R600 | 508 | 398 | -21,7%
VE | 471 | 459 | -2,5%
Sparc | 381 | 363 | -4,7%
X86 | 326 | 208 | -36,2%
Mips | 253 | 200 | -20,9%
SystemZ | 186 | 162 | -12,9%
```
Reviewed By: foad, arsenm
Differential Revision: https://reviews.llvm.org/D151036
This patch adds logic for determining RegisterBank size to RegisterBankInfo, which allows accounting for the HwMode of the target. Individual RegisterBanks cannot be constructed with HwMode information as construction is generated by TableGen, but a RegisterBankInfo subclass can provide the HwMode as a constructor argument. The HwMode is used to select the appropriate RegisterBank size from an array relating sizes to RegisterBanks.
Targets simply need to provide the HwMode argument to the <target>GenRegisterBankInfo constructor. The RISC-V RegisterBankInfo constructor has been updated accordingly (plus an unused argument removed).
Reviewed By: simoncook, craig.topper
Differential Revision: https://reviews.llvm.org/D76007
Use big obj copy in range for-loop will call copy constructor every time,
which can be avoided by use ref instead.
Reviewed By: skan
Differential Revision: https://reviews.llvm.org/D150024
Recent changes to RISC-V cause the same predicate to appear in the
predicate list multiple times in some cases. This patch filters the
duplicates to reduce the number of predicate string variations.
The lists contain differences between register numbers, not the register
numbers themselves. Since a difference can also be negative, this also
changes its type to signed.
Changing the type to signed exposed a "bug". For AMDGPU, which has many
registers, the first element of a sequence could be as big as ~45k.
The value does not fit into int16_t, but fits into uint16_t. The bug
didn't show up because of unsigned wrapping and truncation of the Val
field in the advance() method.
To fix the issue, I changed the way regunit difflists are encoded. The
4-bit 'scale' field of MCRegisterDesc::RegUnit was replaced by 12-bit
number of the first regunit, and the first element of each of the lists
was removed. The higher 20 bits of RegUnit field contain the initial
offset into DiffLists array.
AMDGPU has 1'409 regunits (2^12 = 4'096), and the biggest offset is
80'041 (2^20 = 1'048'576). That is, there is enough room.
Changing the encoding method also resulted in a smaller array size, the
numbers are below (I omitted targets with less than 100 elements).
```
AMDGPU | 80052 | 78741 | -1,6%
RISCV | 6498 | 6297 | -3,1%
ARM | 4181 | 3966 | -5,1%
AArch64 | 2770 | 2592 | -6,4%
PPC | 1578 | 1441 | -8,7%
Hexagon | 994 | 740 | -25,6%
R600 | 508 | 398 | -21,7%
VE | 471 | 459 | -2,5%
Sparc | 381 | 363 | -4,7%
X86 | 326 | 208 | -36,2%
Mips | 253 | 200 | -20,9%
SystemZ | 186 | 162 | -12,9%
```
Reviewed By: foad, arsenm
Differential Revision: https://reviews.llvm.org/D151036
This caused compiler assertions, see comment on
https://reviews.llvm.org/D150107.
This also reverts the dependent follow-up change:
> [X86] Remove patterns for ADD/AND/OR/SUB/XOR/CMP with immediate 8 and optimize during MC lowering, NFCI
>
> This is follow-up of D150107.
>
> In addition, the function `X86::optimizeToFixedRegisterOrShortImmediateForm` can be
> shared with project bolt and eliminates the code in X86InstrRelaxTables.cpp.
>
> Differential Revision: https://reviews.llvm.org/D150949
This reverts commit 2ef8ae1348 and
5586bc539a.
This is follow-up of D150107.
In addition, the function `X86::optimizeToFixedRegisterOrShortImmediateForm` can be
shared with project bolt and eliminates the code in X86InstrRelaxTables.cpp.
Differential Revision: https://reviews.llvm.org/D150949
Previously, `CCState::AllocateStack` always allocated stack space by increasing
offsets. For targets with stack growing up (away from zero) it is more
convenient to allocate arguments by decreasing offsets, so that the first
argument is at the top of the stack. This is important when calling a function
with variable number of arguments: the callee does not know the size of the
stack, but must be able to access "fixed" arguments. For that to work, the
"fixed" arguments should have fixed offsets relative to the stack top, i.e. the
variadic arguments area should be at the stack bottom (at lowest addresses).
The in-tree target with stack growing up is AMDGPU, but it allocates
arguments by increasing addresses. It does not support variadic arguments.
A drive-by change is to promote stack size/offset to 64-bit integer.
This is what MachineFrameInfo expects.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D149575
Add expensive check that Uses, Defs are same for entries in memory folding table.
MemFolding could not change the Uses/Defs.
Reviewed By: skan
Differential Revision: https://reviews.llvm.org/D150633
llvm-clang-x86_64-expensive-checks-debian will fail after D150436 merged.
The fail occurred in X86, I changed the sort rule in AsmMatcher in Patch D150436, so x86 code will arrive line 633 first(will not affect other targets).
The logic here want to use the order record written in source file to make AsmMatcher to first use AVX instructions, it used field HasPositionOrder.
But the condition here just makes sure one of the compared record is subclass of Instruction and has field HasPositionOrder true, and didn't check another.
(Committing on behalf of @XinWang10 to unblock broken expensive-cjhecks builds)
Differential Revision: https://reviews.llvm.org/D150651
The logic from line 633 to 640 is specific for ARM as the comments said, it will make all the targets will prefer to using instruction with more predicates when compiler do AsmMatching.
And for code from line 642 to 649, X86 want to use the order records written in source file to sort the instructions. So X86 could be affected by this logic. (These code could be arrived only by X86)
After change this, seems AVX instructions have not be affected but it exposed some other errors for instruction push and call.
CALLpcrel16 could not be used in 64 bit mode, we need add Predicate for it. And for push instruction, previously because pushi32 has predicates = [Not64bitmode], so it precede pushi16, which is incorrect here, we should get pushw here and it also align with gcc.
Reviewed By: skan
Differential Revision: https://reviews.llvm.org/D150436
Conditions that need to be met:
1. count(StartAtCycle) == count(ReservedCycles);
2. For each i: StartAtCycles[i] < ReservedCycles[i];
3. For each i: StartAtCycles[i] >= 0;
4. If left unspecified, the elements are set to 0.
Differential Revision: https://reviews.llvm.org/D150310