Commit Graph

710 Commits

Author SHA1 Message Date
Freddy Ye
f4509cf284 [X86][MC] Support enc/dec for SETZUCC and promoted SETCC. (#86473)
apx-spec: https://cdrdv2.intel.com/v1/dl/getContent/784266
apx-syntax-recommendation:
https://cdrdv2.intel.com/v1/dl/getContent/817241
2024-04-11 10:18:29 +08:00
Simon Pilgrim
ecb34599bd [X86] Add missing immediate qualifier to the (V)ROUND instructions (#87636)
Makes it easier to algorithmically recreate the instruction name in various analysis scripts I'm working on
2024-04-04 15:20:16 +01:00
Wang Pengcheng
c9bcb2b7dd [TableGen] Fix MacroFusion.td
We are missing `[[maybe_unused]]`.
2024-04-01 18:32:55 +08:00
superZWT123
da1d3d8fb9 [TableGen] Introduce a less aggressive suppression for HwMode Decoder… (#86060)
1. Remove 'AllModes' and 'DefaultMode' suffixes for DecoderTables under
default HwMode.
2. Introduce a less aggressive suppression for HwMode DecoderTable, only
reduce necessary tables duplications. This allows encodings under
different HwModes to retain the original DecoderNamespace.
3. Change 'suppress-per-hwmode-duplicates' command option from bool type
to enum type, allowing users to choose what level of suppression to use.
2024-04-01 17:19:46 +08:00
Shilei Tian
360f7f5674 [GlobalISel] Call setInstrAndDebugLoc before tryCombineAll (#86993)
This can remove all unnecessary redundant calls in each combiner.
2024-03-29 15:27:28 -04:00
Freddy Ye
db7d243978 [X86][MC] Support enc/dec for IMULZU. (#86653)
apx-spec: https://cdrdv2.intel.com/v1/dl/getContent/784266
apx-syntax-recommendation:
https://cdrdv2.intel.com/v1/dl/getContent/817241
2024-03-29 15:52:41 +08:00
Craig Topper
baf66ec061 [Target][RISCV] Add HwMode support to subregister index size/offset. (#86368)
This is needed to provide proper size and offset for the GPRPair subreg
indices on RISC-V. The size of a GPR already uses HwMode. Previously we
said the subreg indices have unknown size and offset, but this stops
DwarfExpression::addMachineReg from being able to find the registers
that make up the pair.

I believe this fixes https://github.com/llvm/llvm-project/issues/85864
but need to verify.
2024-03-27 12:19:28 -07:00
Jason Eckhardt
f676e84bba [TableGen] Fix operand constraint checking problem. (#85859)
Currently operand constraint checks on "$dest = $src" are inadvertently
accepting any token that contains "=". This has surprising results, e.g,
"$dest != $src" is accepted as a constraint but then treated as "=".

This patch ensures that only exactly the token "=" is accepted.
2024-03-20 13:32:38 -05:00
Wang Pengcheng
4a6bc9fd14 [MacroFusion] Add SingleFusion that accepts a single instruction pair
We add a common class `SingleFusion` that accepts a single instruction
pair to simplify fusion definitions.

Pull Request: https://github.com/llvm/llvm-project/pull/85750
2024-03-19 20:07:17 +08:00
Wang Pengcheng
eb5623d101 [MacroFusion] Complete tests and fix indents 2024-03-19 15:41:18 +08:00
XinWang10
7b766a6f50 [X86] Support APX CMOV/CFCMOV instructions (#82592)
This patch support ND CMOV instructions and CFCMOV instructions.

RFC:
https://discourse.llvm.org/t/rfc-design-for-apx-feature-egpr-and-ndd-support/73031/4
2024-03-17 20:18:56 +08:00
Wang Pengcheng
b890a48a12 [MacroFusion] Support commutable instructions (#82751)
If the second instruction is commutable, we should be able to check
its commutable operands.

A simple RISCV fusion is contained in this PR to show the functionality
is correct, I may remove it when landing.

Fixes #82738
2024-03-15 18:44:49 +08:00
Simon Pilgrim
0858c906db [X86] Add missing register qualifier to the VBLENDVPD/VBLENDVPS/VPBLENDVB instruction names
Matches the SSE variants (which has a 0 qualifier to indicate the xmm0 explicit dependency)
2024-03-11 15:48:07 +00:00
Simon Pilgrim
1ec5b1f483 [X86] Add missing immediate qualifier to the (V)PCLMULQDQ instruction names 2024-03-11 13:39:25 +00:00
Simon Pilgrim
2b8f1daf78 [X86] Add missing immediate qualifier to the SSE42 (V)PCMPEST/PCMPIST string instruction names 2024-03-11 13:02:48 +00:00
Simon Pilgrim
92d7aca441 [X86] Add missing immediate qualifier to the (V)CMPSS/D instructions (#84496)
Matches (V)CMPPS/D and makes it easier to algorithmically recreate the instruction name in various analysis scripts I'm working on
2024-03-09 16:21:25 +00:00
Shengchen Kan
1ca8092e87 [X86][MC] Support encoding/decoding for APX CCMP/CTEST (#83863)
APX assembly syntax recommendations:
  https://cdrdv2.intel.com/v1/dl/getContent/817241

NOTE:
The change in llvm/tools/llvm-exegesis/lib/X86/Target.cpp is for test
LLVM ::
tools/llvm-exegesis/X86/latency/latency-SETCCr-cond-codes-sweep.s

For `SETcc`, llvm-exegesis would randomly choose 1 other instruction to
test with `SETcc`, after selecting the instruction, llvm-exegesis would
check if the operand is initialized and valid, if not
`randomizeTargetMCOperand` would choose a value for invalid operand, it
misses support for condition code operand, which cause the flaky failure
after `CCMP` supported.

llvm-exegesis can choose `CCMP` without specifying ccmp feature b/c it
use `MCSubtarget` and only16/32/64 bit is considered.
llvm-exegesis doesn't choose other instructions b/c requirement in
`hasAliasingRegistersThrough`: the instruction should use GPR (defined
by `SETcc`) and define `EFLAGS` (used by `SETcc`).
2024-03-08 20:54:33 +08:00
Pierre van Houtryve
4b1910b11d [GlobalISel][AMDGPU] Import patterns with multiple defs (#84171)
Fixes #63216
2024-03-08 09:39:10 +01:00
Krzysztof Parzyszek
064c2e7579 Fix failing TableGen tests 2024-03-06 12:07:41 -06:00
Krzysztof Parzyszek
67c82d6ffb [Frontend] Add leaf constructs and association to OpenMP/ACC directives (#83625)
Add members "leafConstructs" and "association" to .td describing
OpenMP/ACC directives. The naming follows the terminology used in the
OpenMP standard: a "leaf" construct is a construct that is itself not a
composition or a combination of other constructs, and "association" is
the source language construct to which the directive applies (e.g. loop,
block, etc.)

The tblgen-generated output then contains two additional functions
- getLeafConstructs(D), and
- getDirectiveAssociation(D)
plus "enum class Association", all in namespaces "llvm::omp" and
"llvm::acc".

Note: getLeafConstructs returns an empty sequence for a construct that
is itself a leaf construct.

Use the new functions to simplify a few OpenMP-related functions in
clang.
2024-03-06 10:46:26 -06:00
Sameer Sahasrabuddhe
60822637bf Restore "Implement convergence control in MIR using SelectionDAG (#71785)"
This restores commit c7fdd8c11e.
Previously reverted in f010b1bef4.

LLVM function calls carry convergence control tokens as operand bundles, where
the tokens themselves are produced by convergence control intrinsics. This patch
implements convergence control tokens in MIR as follows:

1. Introduce target-independent ISD opcodes and MIR opcodes for convergence
   control intrinsics.
2. Model token values as untyped virtual registers in MIR.

The change also introduces an additional ISD opcode CONVERGENCECTRL_GLUE and a
corresponding machine opcode with the same spelling. This glues the convergence
control token to SDNodes that represent calls to intrinsics. The glued token is
later translated to an implicit argument in the MIR.

The lowering of calls to user-defined functions is target-specific. On AMDGPU,
the convergence control operand bundle at a non-intrinsic call is translated to
an explicit argument to the SI_CALL_ISEL instruction. Post-selection adjustment
converts this explicit argument to an implicit argument on the SI_CALL
instruction.
2024-03-06 12:19:32 +05:30
Wang Pengcheng
de1f33873b [TableGen] Fix wrong codegen of BothFusionPredicateWithMCInstPredicate (#83990)
We should generate the `MCInstPredicate` twice, one with `FirstMI`
and another with `SecondMI`.
2024-03-05 19:54:02 +08:00
Mitch Phillips
f010b1bef4 Revert "Restore "Implement convergence control in MIR using SelectionDAG (#71785)""
This reverts commit c7fdd8c11e.

Reason: Broke the sanitizer buildbots. See the comments at
https://github.com/llvm/llvm-project/pull/71785
for more information.
2024-03-04 17:05:34 +01:00
Sameer Sahasrabuddhe
c7fdd8c11e Restore "Implement convergence control in MIR using SelectionDAG (#71785)"
Original commit 79889734b9.
Perviously reverted in commit a2afcd5721.

LLVM function calls carry convergence control tokens as operand bundles, where
the tokens themselves are produced by convergence control intrinsics. This patch
implements convergence control tokens in MIR as follows:

1. Introduce target-independent ISD opcodes and MIR opcodes for convergence
   control intrinsics.
2. Model token values as untyped virtual registers in MIR.

The change also introduces an additional ISD opcode CONVERGENCECTRL_GLUE and a
corresponding machine opcode with the same spelling. This glues the convergence
control token to SDNodes that represent calls to intrinsics. The glued token is
later translated to an implicit argument in the MIR.

The lowering of calls to user-defined functions is target-specific. On AMDGPU,
the convergence control operand bundle at a non-intrinsic call is translated to
an explicit argument to the SI_CALL_ISEL instruction. Post-selection adjustment
converts this explicit argument to an implicit argument on the SI_CALL
instruction.
2024-03-04 13:28:04 +05:30
Bjorn Pettersson
da591d390e [GlobalISel][TableGen] Take first result for multi-output instructions (#81130)
Previously, tblgen would reject patterns where one of its nested
instructions produced more than one result. These arise when the
instruction definition contains 'outs' as well as 'Defs'. This patch
fixes that by always taking the first result, which is how these
situations are handled in SelectionIDAG.

Original patch: https://reviews.llvm.org/D86617
Continued as: https://github.com/llvm/llvm-project/pull/81130
2024-03-02 20:10:02 +01:00
Jason Eckhardt
ad43ea3328 [TableGen] Add support for DefaultMode in per-HwMode encode/decode. (#83029)
Currently the decoder and encoder emitters will crash if DefaultMode is
used within an EncodingByHwMode. As can be done today for
RegInfoByHwMode and ValueTypeByHwMode, this patch adds support for this
usage in EncodingByHwMode:
  let EncodingInfos =
    EncodingByHwMode<[ModeA, DefaultMode], [EncA, EncDefault]>;
2024-02-29 01:47:18 +08:00
Visoiu Mistrih Francis
b791a51730 [CodeGenSchedule] Don't allow invalid ReadAdvances to be formed (#82685)
Forming a `ReadAdvance` with an entry in the `ValidWrites` list that is
not used by any instruction results in the entire `ReadAdvance` to be
ignored by the scheduler due to an invalid entry.

The `SchedRW` collection code only picks up `SchedWrites` that are
reachable from `Instructions`, `InstRW`, `ItinRW` and `SchedAlias`,
leaving the unreachable ones with an invalid entry (0) in
`SubtargetEmitter::GenSchedClassTables` when going through the list of
`ReadAdvances`
2024-02-26 18:25:21 -08:00
FruitClover
d99b148177 [TableGen] Fix __CLAUSE_NO_CLASS macro leak in directive emitter (#82912)
`__CLAUSE_NO_CLASS` was not undefined inside the
`GEN_CLANG_CLAUSE_CLASS` block, resulting in macro redifinition warnings
when several generated directives are used simultaneously.
2024-02-25 20:34:38 +03:00
Jason Eckhardt
05af9c83f3 [TableGen] Suppress per-HwMode duplicate instructions/tables. (#82567)
Currently, for per-HwMode encoding/decoding, those instructions that do
not have a HwMode override are duplicated into the decoder tables for
all HwModes. This includes inducing multiple tables for instructions
that are otherwise unrelated (e.g., different namespace with no
overrides at all).

This patch adds support to suppress instruction and table duplicates.
TableGen option "-gen-disassembler --suppress-per-hwmode-duplicates"
enables the suppression (off by default).

For one downstream backend with a complicated ISA and major
cross-generation encoding differences, this eliminates ~32000 duplicate
table entries at the time of this patch.

There are legitimate reasons to suppress or not suppress duplicates. If
there are relatively few non-overridden related instructions, it can be
convenient to pull them into the per-mode tables (only need to decode
the per-mode tables, slightly simpler decode function in disassembler).
On the other hand, in some backends, the opposite is true or the size is
too large to tolerate any duplication in the first place. We let the
user decide which makes sense.

This is currently off by default, though there is no reason it couldn't
be enabled by default. Any existing backends downstream using the
per-HwMode feature will function as before. Turning on the feature
requires minor modifications to their disassembler due to more/less
tables and naming.
2024-02-22 11:36:10 +08:00
Sameer Sahasrabuddhe
a2afcd5721 Revert "Implement convergence control in MIR using SelectionDAG (#71785)"
This reverts commit 79889734b9.

Encountered multiple buildbot failures.
2024-02-21 11:07:02 +05:30
Sameer Sahasrabuddhe
79889734b9 Implement convergence control in MIR using SelectionDAG (#71785)
LLVM function calls carry convergence control tokens as operand bundles, where
the tokens themselves are produced by convergence control intrinsics. This patch
implements convergence control tokens in MIR as follows:

1. Introduce target-independent ISD opcodes and MIR opcodes for convergence
   control intrinsics.
2. Model token values as untyped virtual registers in MIR.

The change also introduces an additional ISD opcode CONVERGENCECTRL_GLUE and a
corresponding machine opcode with the same spelling. This glues the convergence
control token to SDNodes that represent calls to intrinsics. The glued token is
later translated to an implicit argument in the MIR.

The lowering of calls to user-defined functions is target-specific. On AMDGPU,
the convergence control operand bundle at a non-intrinsic call is translated to
an explicit argument to the SI_CALL_ISEL instruction. Post-selection adjustment
converts this explicit argument to an implicit argument on the SI_CALL
instruction.
2024-02-21 10:06:37 +05:30
Jason Eckhardt
2ed0aacf97 [TableGen] Fixes for per-HwMode decoding problem (#82201)
Today, if any instruction uses EncodingInfos/EncodingByHwMode to
override the default encoding, the opcode field of the decoder table is
generated incorrectly. This causes failed disassemblies and other
problems.

Specifically, the main correctness issue is that the EncodingID is
inadvertently stored in the table rather than the actual opcode. This is
caused by having set up the IndexOfInstruction map incorrectly during
the loop to populate NumberedEncodings-- which is then propagated around
when OpcMap is set up with a bad EncodingIDAndOpcode.

Instead, do away with IndexOfInstruction altogether and use opcode value
queried from CodeGenTarget::getInstrIntValue to set up OpcMap. This
itself exposed another problem where emitTable was using the decoded
opcode to index into NumberedEncodings. Instead pass in the
EncodingIDAndOpcode vector, and create the reverse mapping from Opcode
to EncodingID, which is then used to index NumberedEncodings.

This problem is not currently exposed upstream since no in-tree targets
yet use the per-HwMode feature. It does show up in at least two
downstream targets.
2024-02-19 13:14:22 +08:00
Sergei Barannikov
1e4c76cdc9 [MC][AsmParser] Make MatchRegisterName return MCRegister (NFC) (#81408)
`MCRegister` is preferred over `unsigned` nowadays.
2024-02-18 13:59:49 +03:00
Joseph Huber
11fcae69db [LLVM] Add __builtin_readsteadycounter intrinsic and builtin for realtime clocks (#81331)
Summary:
This patch adds a new intrinsic and builtin function mirroring the
existing `__builtin_readcyclecounter`. The difference is that this
implementation targets a separate counter that some targets have which
returns a fixed frequency clock that can be used to determine elapsed
time, this is different compared to the cycle counter which often has
variable frequency.

This patch only adds support for the NVPTX and AMDGPU targets.

This is done as a new and separate builtin rather than an argument to
`readcyclecounter` to avoid needing to change existing code and to make
the separation more explicit.
2024-02-13 10:06:25 -06:00
Jason Eckhardt
8ae0485070 [TableGen] Extend direct lookup to instruction values in generic tables. (#80486)
Currently, for some tables involving a single primary key field which is
integral and densely numbered, a direct lookup is generated rather than
a binary search. This patch extends the direct lookup function
generation to instructions, where the integral value corresponds to the
instruction's enum value.

While this isn't as common as for other tables, it does occur in at
least one downstream backend and one in-tree backend.

Added a unit test and minimally updated the documentation.
2024-02-07 12:49:39 +08:00
Wang Pengcheng
acf6811d0f [TableGen] Support type aliases via new keyword deftype
We can use `deftype` (not using `typedef` here to be consistent
with `def`, `defm`, `defset`, `defvar`, etc) to define type aliases.

Currently, only primitive types and type aliases are supported to be
the source type and `deftype` statements can only appear at the top
level.

Reviewers: fpetrogalli, Artem-B, nhaehnle, jroelofs

Reviewed By: jroelofs, nhaehnle, Artem-B

Pull Request: https://github.com/llvm/llvm-project/pull/79570
2024-02-02 17:41:47 +08:00
Pierre van Houtryve
7ec996d4c5 [GlobalISel][TableGen] Support Intrinsics in MIR Patterns (#79278) 2024-02-01 08:53:32 +01:00
XinWang10
d9e875dcc1 [X86][MC] Support encoding/decoding for APX variant LZCNT/TZCNT/POPCNT instructions (#79954)
Two variants: promoted legacy, NF (no flags update).

The syntax of NF instructions is aligned with GNU binutils.
https://sourceware.org/pipermail/binutils/2023-September/129545.html
2024-01-31 21:10:02 +08:00
Jason Eckhardt
d93f850c6f [TableGen] Extend OPC_ExtractField/OPC_CheckField start value widths. (#79723)
Both OPC_ExtractField and OPC_CheckField are currently defined to take
an unsigned 8-bit start value. On some architectures with long
instruction words, this value can silently overflow, resulting in a bad
decoder table.

This patch changes each to take a ULE128B-encoded start value instead.
Additionally, a range assertion is added for the 8-bit length to
prominently notify a user in case that field ever overflows.

This problem isn't currently exposed upstream since all in-tree targets
use small instruction words (i.e., bitwidth <= 64 bits). It does show up
in at least one downstream target with instructions > 64 bits long.

Co-authored-by: Jason Eckhardt <jeckhardt@nvidia.com>
2024-01-29 09:22:22 -05:00
Shengchen Kan
7c3ee7cbe6 [X86][tablgen] Fix the broadcast tables (#79675) 2024-01-28 09:06:27 +08:00
XinWang10
02d56801ee [X86] Support APX promoted RAO-INT and MOVBE instructions (#77431)
R16-R31 was added into GPRs in
https://github.com/llvm/llvm-project/pull/70958,
This patch supports the promoted RAO-INT and MOVBE instructions in EVEX
space.

RFC:
https://discourse.llvm.org/t/rfc-design-for-apx-feature-egpr-and-ndd-support/73031/4
2024-01-26 14:33:45 +08:00
Wang Pengcheng
41fe98a6e7 [TableGen] Use MapVector to remove non-determinism
This fixes found non-determinism when `LLVM_REVERSE_ITERATION`
option is `ON`.

Fixes #79420.

Reviewers: ilovepi, MaskRay

Reviewed By: MaskRay

Pull Request: https://github.com/llvm/llvm-project/pull/79411
2024-01-25 16:16:19 +08:00
XinWang10
816cc9d24b [X86][MC] Support Enc/Dec for NF BMI instructions (#76709)
Promoted BMI instructions were supported in #73899
2024-01-25 10:33:14 +08:00
ostannard
56602a48c7 [TableGen] Include source location in JSON dump (#79028)
This adds a '!loc' field to each record containing the file name and
line number of the record declaration.
2024-01-24 17:07:20 +00:00
Shengchen Kan
5c68c6d70f [X86] Support encoding/decoding and lowering for APX variant SHL/SHR/SAR/ROL/ROR/RCL/RCR/SHLD/SHRD (#78853)
Four variants: promoted legacy, ND (new data destination), NF (no flags
update) and NF_ND (NF + ND).

The syntax of NF instructions is aligned with GNU binutils.
https://sourceware.org/pipermail/binutils/2023-September/129545.html
2024-01-23 10:23:27 +08:00
Wang Pengcheng
3d90e1fa94 [TableGen] Integrate TableGen-based macro fusion (#73115)
`Fusion` is inherited from `SubtargetFeature` now. Each definition
of `Fusion` will define a `SubtargetFeature` accordingly.

Method `getMacroFusions` is added to `TargetSubtargetInfo`, which
returns a list of `MacroFusionPredTy` that will be evaluated by
MacroFusionMution.

`getMacroFusions` will be auto-generated if the target has `Fusion`
definitions.
2024-01-19 18:08:09 +08:00
Sergei Barannikov
8e8c954a17 [GISel] Erase the root instruction after emitting all its potential uses (#77494)
This tries to fix a bug by resolving a few FIXMEs. The bug is that
`EraseInstAction` is emitted after emitting the _first_ `BuildMIAction`,
which is too early because the erased instruction may still be used by
subsequent `BuildMIAction`s (in particular, by `CopyRenderer`).

An example of the bug (from `match-table-operand-types.td`):
```
def InstTest0 : GICombineRule<
  (defs root:$a),
  (match  (G_MUL i32:$x, i32:$b, i32:$c),
          (G_MUL $a, i32:$b, i32:$x)),
  (apply  (G_ADD i64:$tmp, $b, i32:$c),
          (G_ADD i8:$a, $b, i64:$tmp))>;

GIR_EraseFromParent, /*InsnID*/0,
GIR_BuildMI, /*InsnID*/1, /*Opcode*/GIMT_Encode2(TargetOpcode::G_ADD),
GIR_Copy, /*NewInsnID*/1, /*OldInsnID*/0, /*OpIdx*/0, // a
GIR_Copy, /*NewInsnID*/1, /*OldInsnID*/0, /*OpIdx*/1, // b
GIR_AddSimpleTempRegister, /*InsnID*/1, /*TempRegID*/0,
```

Here, the root instruction is destroyed before copying its operands ('a'
and 'b') to the new instruction.

The solution is to emit `EraseInstAction` for the root instruction as
the last action in the emission pipeline.
2024-01-13 11:17:41 +03:00
Wang Pengcheng
a2af374284 [SelectionDAG] Add space-optimized forms of OPC_CheckPredicate (#77763)
We record the usage of each `Predicate` and sort them by usage.

For the top 8 `Predicate`s, we will emit a `PC_CheckPredicateN` to
save one byte.

Overall this reduces the llc binary size with all in-tree targets by
about 61K.

This is a recommit of 1a57927, which was reverted in bc98c31.

The CI failures occurred when doing expensive checks (with option
`LLVM_ENABLE_EXPENSIVE_CHECKS` being ON).

The key point here is that we need stable sorting result in the
test, but doing expensive checks uncovered the non-determinism of
`llvm::sort`. So `llvm::sort` is changed to `llvm::stable_sort`
in this revised patch.

And we use `llvm::MapVector` to keep insertion order.
2024-01-12 11:38:05 +08:00
Mikhail Goncharov
bc98c3103a Revert "[SelectionDAG] Add space-optimized forms of OPC_CheckPredicate (#73488)"
This reverts commit 1a5792735a.

Test address-space-patfrags.td.test is failing

https://lab.llvm.org/buildbot/#/builders/104/builds/15012
2024-01-11 12:25:00 +01:00
Wang Pengcheng
1a5792735a [SelectionDAG] Add space-optimized forms of OPC_CheckPredicate (#73488)
We record the usage of each `Predicate` and sort them by usage.

For the top 8 `Predicate`s, we will emit a `PC_CheckPredicateN` to
save one byte.

Overall this reduces the llc binary size with all in-tree targets by
about 61K.
2024-01-11 15:43:40 +08:00