Commit Graph

7550 Commits

Author SHA1 Message Date
Matt Arsenault
925aa514ed AMDGPU: Use DataExtractor for printf string extraction
Attempt 2 to fix big endian bot failures.
2023-01-09 14:03:42 -05:00
Ivan Kosarev
2d945ef864 [AMDGPU][NFC] Rename GFX10A16 operands.
They do not seem to be GFX10-specific anymore. Also renames the
corresponding feature.

Reviewed By: dp

Differential Revision: https://reviews.llvm.org/D141069
2023-01-09 17:18:46 +00:00
Jay Foad
6a6f62a719 [AMDGPU] S_MULK_I32 does not define SCC. NFCI.
Differential Revision: https://reviews.llvm.org/D141281
2023-01-09 15:44:56 +00:00
Benjamin Kramer
b6942a2880 [NFC] Hide implementation details in anonymous namespaces 2023-01-08 17:37:02 +01:00
Matt Arsenault
270e96f435 Revert "AMDGPU: Invert handling of enqueued block detection"
This reverts commit 47288cc977.

The runtime is having trouble with this at -O0 when the inputs are
always enabled.
2023-01-07 21:48:07 -05:00
Matt Arsenault
68b6cabd9e AMDGPU: Use getTypeAllocSize 2023-01-06 21:33:19 -05:00
Matt Arsenault
47554a0c73 AMDGPU: Use more accurate IR type for block handle
The device library uses this as a struct with a pointer sized integer
and 2 ints.
2023-01-06 21:23:28 -05:00
Matt Arsenault
47288cc977 AMDGPU: Invert handling of enqueued block detection
Invert the sense of the attribute and let the attributor figure this
out like everything else. If needed we can have the not-OpenCL
languages set amdgpu-no-default-queue and amdgpu-no-completion-action
up front so they never have to pay the cost.

There are also so many of these now, the offset use API should
probably consider all of them at once. Maybe they should merge into
one attribute with used fields. Having separate functions for each
field in AMDGPUBaseInfo is also not the greatest API (might as well
fix this when the patch to get the object version from the module
lands).
2023-01-06 21:16:08 -05:00
Matt Arsenault
0416883dc1 AMDGPU: Fix enqueue block lowering for opaque pointers
This was looking for a specific constant cast of the function, when
the type doesn't matter. Doesn't bother trying to handle typed
pointers, it will just assert.

Things probably don't work completely correctly if the block kernel
address is captured somewhere else, but that wouldn't work before
either. The uses should really be loads out of the handle, and the
handle initializer should contain the kernel address.
2023-01-06 21:15:39 -05:00
Matt Arsenault
0995a31e5d AMDGPU: Try to fix 32-bit build bot 2023-01-06 17:34:25 -05:00
Matt Arsenault
40078a6b71 AMDGPU: Use BinaryByteStream in printf expansion
Attempt to fix test failures on big endian bots. This pass definitely
needs more test coverage.
2023-01-06 17:22:13 -05:00
Joe Nash
1b12d7d15b [AMDGPU] Combine redundant Asm64 and AsmVOP3DPPBase. NFC
Reduce duplication in the codebase by combining these fields in
VOPProfile.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D141088
2023-01-06 14:09:42 -05:00
Alexey Bataev
9b5f62685a [SLP]Fix cost of the broadcast buildvector/gather.
Need to include the cost of the initial insertelement to the cost of the
broadcasts. Also, need to adjust the cost of the gather/buildvector if
the element is inserted into poison/undef vector.

Differential Revision: https://reviews.llvm.org/D140498
2023-01-06 09:25:05 -08:00
Guillaume Chatelet
87b6b347fc Revert D141134 "[NFC] Only expose getXXXSize functions in TypeSize"
The patch should be discussed further.

This reverts commit dd56e1c92b.
2023-01-06 15:27:50 +00:00
Guillaume Chatelet
dd56e1c92b [NFC] Only expose getXXXSize functions in TypeSize
Currently 'TypeSize' exposes two functions that serve the same purpose:
 - getFixedSize / getFixedValue
 - getKnownMinSize / getKnownMinValue

source : bf82070ea4/llvm/include/llvm/Support/TypeSize.h (L337-L338)

This patch offers to remove one of the two and stick to a single function in the code base.

Differential Revision: https://reviews.llvm.org/D141134
2023-01-06 15:24:52 +00:00
Jay Foad
b7ef63af56 [AMDGPU] Add a feature for VALUTransUseHazard
NFCI. This just allows us to experiment with enabling/disabling the
workaround on different subtargets.

Differential Revision: https://reviews.llvm.org/D141121
2023-01-06 13:50:17 +00:00
Juan Manuel MARTINEZ CAAMAÑO
543db09b97 [CodeGen][AMDGPU] EXTRACT_VECTOR_ELT: input vector element type can differ from output type
In function SITargetLowering::performExtractVectorElt,
the output type was not considered which could lead to type mismatches
later.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D139943
2023-01-06 09:46:02 +01:00
Vang Thao
25d72330ff [AMDGPU] Add .uniform_work_group_size metadata to v5
Amdgpu kernel with function attribute "uniform-work-group-size"="true" requires
uniform work group size (i.e. each dimension of global size is a multiple of
corresponding dimension of work group size). hipExtModuleLaunchKernel allows to
launch HIP kernel with non-uniform workgroup size, which makes it necessary for
runtime to check and enforce uniform workgroup size if kernel requires it. To
let runtime be able to enforce that, this metadata is needed to indicate that
the kernel requires uniform workgroup size.

Reviewed By: kzhuravl, arsenm

Differential Revision: https://reviews.llvm.org/D141012
2023-01-05 21:29:56 +00:00
Alexander Timofeev
6daa983c9d [AMDGPU] MachineScheduler: schedule execution metric added for the UnclusteredHighRPStage
Since the divergence-driven ISel was fully enabled we have more VGPRs available.
         MachineScheduler trying to take advantage of that bumps up the occupancy sacrificing
         the hiding of memory access latency.  This really spoils the initially good schedule.
         A new metric that reflects the latency hiding quality of the schedule has been created
         to make it to balance between occupancy and latency. The metric is based on the latency
         model which computes the bubble to working cycles ratio. Then we use this ratio to decide
         if the higher occupancy schedule is profitable as follows:

             Profit = NewOccupancy/OldOccupancy * OldMetric/NewMetric

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D139710
2023-01-05 21:10:56 +01:00
Matt Arsenault
7c327c2fbb AMDGPU: Fix broken opaque pointer handling in printf pass
This was directly considering the pointee type, and also applying
special semantics to constant address space.
2023-01-05 13:48:32 -05:00
Matt Arsenault
7b922fc0c3 AMDGPU: Fix broken and permissive handling of printf format strings
This was completely broken with opaque pointers because it was
specifically looking for a constant expression with the global
variable as the first operand. Strip casts like normal, and properly
validate all of the restrictions rather than silently ignoring any
unhandled cases. Also be stricter that we aren't calling into some
unresolved or non-constant format string.

Also converts the test to opaque pointers and generated tests. There's
more broken initializer handling for strings inside the format string
processing too, but there's just no test coverage for this at all.
2023-01-05 09:18:00 -05:00
serge-sans-paille
38818b60c5 Move from llvm::makeArrayRef to ArrayRef deduction guides - llvm/ part
Use deduction guides instead of helper functions.

The only non-automatic changes have been:

1. ArrayRef(some_uint8_pointer, 0) needs to be changed into ArrayRef(some_uint8_pointer, (size_t)0) to avoid an ambiguous call with ArrayRef((uint8_t*), (uint8_t*))
2. CVSymbol sym(makeArrayRef(symStorage)); needed to be rewritten as CVSymbol sym{ArrayRef(symStorage)}; otherwise the compiler is confused and thinks we have a (bad) function prototype. There was a few similar situation across the codebase.
3. ADL doesn't seem to work the same for deduction-guides and functions, so at some point the llvm namespace must be explicitly stated.
4. The "reference mode" of makeArrayRef(ArrayRef<T> &) that acts as no-op is not supported (a constructor cannot achieve that).

Per reviewers' comment, some useless makeArrayRef have been removed in the process.

This is a follow-up to https://reviews.llvm.org/D140896 that introduced
the deduction guides.

Differential Revision: https://reviews.llvm.org/D140955
2023-01-05 14:11:08 +01:00
Matt Arsenault
8dfe60c356 AMDGPU: Set scratch_en if there is dynamic stack but no fixed stack 2023-01-04 20:51:18 -05:00
Anshil Gandhi
4bbcbdaee5 [AMDGPU] Unify divergent nodes if the PostDom tree has one root
This patch allows AMDGPUUnifyDivergenceExitNodes pass
to transform a function whose PDT has exactly one root
and ends in a branch instruction. Fixes
https://github.com/llvm/llvm-project/issues/58861.

Reviewed By: ruiling, arsenm

Differential Revision: https://reviews.llvm.org/D139780
2023-01-04 10:45:03 -07:00
Jay Foad
6f7ff9b933 [MC] Consistently use MCInstrDesc::getImplicitUses and getImplicitDefs. NFC. 2023-01-04 13:16:12 +00:00
James Y Knight
7ff64d44b9 [AMDGPU] Fix useDeprecatedPositionallyEncodedOperands errors.
This is a follow-on to https://reviews.llvm.org/D134073.

The errors in the R600 half were fixed previously in
https://reviews.llvm.org/D134078. Originally, I thought that the fixes
to the AMDGPU half would be tricky, but upon taking another look,
there were only a couple minor issues that needed fixing:

1. Previously, buffer load instructions (`BUFFER_LOAD_*_LDS_*`) were
populating the `vdata` field in the instruction from the `swz`
operand. This was incorrect, but harmless, as when the LDS option is
set, the instruction does not use the vdata field.

2. The `BUFFER_STORE_LDS_DWORD_gfx90a` instruction was populating
`acc` from the `swz` operand, because `acc` was set to `?`. (I believe
that the intent here was to leave the instruction bit as an "unknown
value", but you can't do that except by setting the bits on `Inst`
directly). Also harmless, for the same reason.

Differential Revision: https://reviews.llvm.org/D140918
2023-01-03 17:52:10 -05:00
Ron Lieberman
750e1c8dbd Revert "[libomptarget][plugin-nextgen] fix for [TypePromotion] NewPM support."
This reverts commit 135f6a1ee8.
2023-01-03 12:26:39 -06:00
Ron Lieberman
135f6a1ee8 [libomptarget][plugin-nextgen] fix for [TypePromotion] NewPM support. 2023-01-03 11:04:13 -06:00
Matt Arsenault
687e0e205e AMDGPU: Create alloca wide load/store with explicit alignment
This was introducing transient UB by using the default alignment of a
larger vector type.
2023-01-03 11:29:18 -05:00
Matt Arsenault
49caf70121 AMDGPU: Use cast instead of unchecked dyn_cast 2023-01-03 10:32:10 -05:00
Matt Arsenault
6fed2c90d3 AMDGPU: Diagnose which LDS global failed to lower
Also lowercase the message to start since that seems to be the
prevailing convention for error messages.
2023-01-03 09:31:07 -05:00
Thomas Symalla
9aa0ee36fe [NFC][AMDGPU] Make method declarations in SIInstrInfo equivalent to their definitions.
Some functions from SIInstrInfo have their operands named different in
their declarations vs. their defs. This was caught by cppcheck.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D140778
2022-12-30 19:18:41 +01:00
Ivan Kosarev
0a6dc9a816 [AMDGPU][AsmParser] Refine parsing cache policy modifiers.
Reviewed By: dp, arsenm

Differential Revision: https://reviews.llvm.org/D140108
2022-12-30 15:44:34 +00:00
Dmitry Preobrazhensky
e7a306310b [AMDGPU][GFX11] Correct tied src2 of v_fmac_f16_e64
src2 was incorrectly defined as VSrc_f16 but it is tied to dst which is VGPR_32. As a result, disassembler failed to decode src2.

Differential Revision: https://reviews.llvm.org/D140299
2022-12-30 16:42:15 +03:00
Dmitry Preobrazhensky
9f40d9ffd1 [AMDGPU][MC][GFX11] Correct encoding of neg modifier for v_dot2_f32_bf16
Fix a bug with neg_lo:[0,1,0] and neg_hi:[0,1,0] modifiers - they are accepted but not encoded.

Differential Revision: https://reviews.llvm.org/D140470
2022-12-30 16:25:22 +03:00
Matt Arsenault
888228f2b0 AMDGPU: Use early continue to reduce indentation 2022-12-22 12:38:59 -05:00
Jay Foad
7e1e993816 [AMDGPU] Remove permlane discard vdst_in optimization from isel
D72845 implemented the equivalent IR optimization in InstCombine so it
seems that there's no advantage to doing it during isel too.

This partially reverts D72844.

Differential Revision: https://reviews.llvm.org/D140546
2022-12-22 15:49:26 +00:00
Jay Foad
821c7be8e6 [AMDGPU] Simplify simplifyAMDGCNMemoryIntrinsicDemanded. NFC. 2022-12-22 11:50:04 +00:00
Matt Arsenault
4463badf46 AMDGPU: Use DenormalMode type in FP mode tracking
This simplies a future patch. The MIR handling should be fixed. We're
still printing these in custom MachineFunctionInfo as bools (plus the
inverted meaning is hard to follow).
2022-12-21 20:35:48 -05:00
Matt Arsenault
69e75ae695 CodeGen: Don't lazily construct MachineFunctionInfo
This fixes what I consider to be an API flaw I've tripped over
multiple times. The point this is constructed isn't well defined, so
depending on where this is first called, you can conclude different
information based on the MachineFunction. For example, the AMDGPU
implementation inspected the MachineFrameInfo on construction for the
stack objects and if the frame has calls. This kind of worked in
SelectionDAG which visited all allocas up front, but broke in
GlobalISel which hasn't visited any of the IR when arguments are
lowered.

I've run into similar problems before with the MIR parser and trying
to make use of other MachineFunction fields, so I think it's best to
just categorically disallow dependency on the MachineFunction state in
the constructor and to always construct this at the same time as the
MachineFunction itself.

A missing feature I still could use is a way to access an custom
analysis pass on the IR here.
2022-12-21 10:49:32 -05:00
Mirko Brkusanin
a80edb7fc9 [AMDGPU][GlobalISel] Fix mapping G_FREEZE
Differential Revision: https://reviews.llvm.org/D140416
2022-12-21 15:25:04 +01:00
Christudasan Devadasan
a3028239a7 Revert "[AMDGPU][SILowerSGPRSpills] Spill SGPRs to virtual VGPRs"
This reverts commit 40ba0942e2.
2022-12-21 16:17:42 +05:30
Piotr Sobczak
cce3cd203e [AMDGPU][MC][NFC] MUBUF/MTBUF code cleanup
Refactor code to reduce code duplication and improve maintainability.

- Extract BUF_Pseudo common base class
- Refactor getMUBUFInsDA
- Refactor getMUBUFAtomicInsDA
- Refactor getMTBUFInsDA
- Refactor getMUBUFAsmOps
- Refactor getMTBUFAsmOps

Differential Revision: https://reviews.llvm.org/D140410
2022-12-21 10:09:05 +01:00
Nick Desaulniers
ad99774a5f [llvm][PassSupport] don't require passes to be default constructible
Quite a few passes are not default constructible. In order to properly
support -{start|stop}-{before|after}= for these passes, we would like to
continue to use INITIALIZE_PASS, but not necessarily provide a default
constructor.

Delete the default constructors of classes derived from
SelectionDAGISel.

Link: https://github.com/llvm/llvm-project/issues/59538

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D140349
2022-12-20 14:07:29 -08:00
Leon Clark
daa022ca57 Enable roundeven.
Add support for roundeven and implement appropriate tests.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D137954
2022-12-20 15:40:20 +00:00
Archibald Elliott
f09cf34d00 [Support] Move TargetParsers to new component
This is a fairly large changeset, but it can be broken into a few
pieces:
- `llvm/Support/*TargetParser*` are all moved from the LLVM Support
  component into a new LLVM Component called "TargetParser". This
  potentially enables using tablegen to maintain this information, as
  is shown in https://reviews.llvm.org/D137517. This cannot currently
  be done, as llvm-tblgen relies on LLVM's Support component.
- This also moves two files from Support which use and depend on
  information in the TargetParser:
  - `llvm/Support/Host.{h,cpp}` which contains functions for inspecting
    the current Host machine for info about it, primarily to support
    getting the host triple, but also for `-mcpu=native` support in e.g.
    Clang. This is fairly tightly intertwined with the information in
    `X86TargetParser.h`, so keeping them in the same component makes
    sense.
  - `llvm/ADT/Triple.h` and `llvm/Support/Triple.cpp`, which contains
    the target triple parser and representation. This is very intertwined
    with the Arm target parser, because the arm architecture version
    appears in canonical triples on arm platforms.
- I moved the relevant unittests to their own directory.

And so, we end up with a single component that has all the information
about the following, which to me seems like a unified component:
- Triples that LLVM Knows about
- Architecture names and CPUs that LLVM knows about
- CPU detection logic for LLVM

Given this, I have also moved `RISCVISAInfo.h` into this component, as
it seems to me to be part of that same set of functionality.

If you get link errors in your components after this patch, you likely
need to add TargetParser into LLVM_LINK_COMPONENTS in CMake.

Differential Revision: https://reviews.llvm.org/D137838
2022-12-20 11:05:50 +00:00
Carl Ritson
5bc703f755 [AMDGPU] Replace getPhysRegClass with getPhysRegBaseClass
Accelerate finding the base class for a physical register by
building a statically mapping table from physical registers
to base classes using TableGen.

Replace uses of SIRegisterInfo::getPhysRegClass with
TargetRegisterInfo::getPhysRegBaseClass in order to use
the computed table.

Reviewed By: arsenm, foad

Differential Revision: https://reviews.llvm.org/D139422
2022-12-20 16:22:14 +09:00
Sameer Sahasrabuddhe
475ce4c200 RFC: Uniformity Analysis for Irreducible Control Flow
Uniformity analysis is a generalization of divergence analysis to
include irreducible control flow:

  1. The proposed spec presents a notion of "maximal convergence" that
     captures the existing convention of converging threads at the
     headers of natual loops.

  2. Maximal convergence is then extended to irreducible cycles. The
     identity of irreducible cycles is determined by the choices made
     in a depth-first traversal of the control flow graph. Uniformity
     analysis uses criteria that depend only on closed paths and not
     cycles, to determine maximal convergence. This makes it a
     conservative analysis that is independent of the effect of DFS on
     CycleInfo.

  3. The analysis is implemented as a template that can be
     instantiated for both LLVM IR and Machine IR.

Validation:
  - passes existing tests for divergence analysis
  - passes new tests with irreducible control flow
  - passes equivalent tests in MIR and GMIR

Based on concepts originally outlined by
Nicolai Haehnle <nicolai.haehnle@amd.com>

With contributions from Ruiling Song <ruiling.song@amd.com> and
Jay Foad <jay.foad@amd.com>.

Support for GMIR and lit tests for GMIR/MIR added by
Yashwant Singh <yashwant.singh@amd.com>.

Differential Revision: https://reviews.llvm.org/D130746
2022-12-20 07:22:24 +05:30
Ivan Kosarev
85dada81e3 [AMDGPU][CodeGen] Support raw format TFE buffer loads other than byte, short and d16 ones.
Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D138215
2022-12-19 11:39:08 +00:00
Sergei Barannikov
4d48ccfc88 [MC] Use MCRegister instead of unsigned in MCTargetAsmParser
Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D140273
2022-12-18 12:12:05 -08:00