Commit Graph

4140 Commits

Author SHA1 Message Date
Bangtian Liu
511cfe9441 Revert "Ensure SplitEdge to return the new block between the two given blocks"
This reverts commit d20e0c3444.
2020-12-17 21:00:37 +00:00
Bangtian Liu
d20e0c3444 Ensure SplitEdge to return the new block between the two given blocks
This PR implements the function splitBasicBlockBefore to address an
issue
that occurred during SplitEdge(BB, Succ, ...), inside splitBlockBefore.
The issue occurs in SplitEdge when the Succ has a single predecessor
and the edge between the BB and Succ is not critical. This produces
the result ‘BB->Succ->New’. The new function splitBasicBlockBefore
was added to splitBlockBefore to handle the issue and now produces
the correct result ‘BB->New->Succ’.

Below is an example of splitting the block bb1 at its first instruction.

/// Original IR
bb0:
	br bb1
bb1:
        %0 = mul i32 1, 2
	br bb2
bb2:
/// IR after splitEdge(bb0, bb1) using splitBasicBlock
bb0:
	br bb1
bb1:
	br bb1.split
bb1.split:
        %0 = mul i32 1, 2
	br bb2
bb2:
/// IR after splitEdge(bb0, bb1) using splitBasicBlockBefore
bb0:
	br bb1.split
bb1.split
	br bb1
bb1:
        %0 = mul i32 1, 2
	br bb2
bb2:

Differential Revision: https://reviews.llvm.org/D92200
2020-12-17 16:00:15 +00:00
Matt Arsenault
f333736757 AMDGPU: Remove SGPRSpillVGPRDefinedSet hack
These VGPRs should be reserved and therefore do not need "correct"
liveness. They should not have undef uses, which can still cause
issues.
2020-12-16 21:33:35 -05:00
Bangtian Liu
c10757200d Revert "Ensure SplitEdge to return the new block between the two given blocks"
This reverts commit cf638d793c.
2020-12-16 11:52:30 +00:00
Stanislav Mekhanoshin
eb66bf0802 [AMDGPU] Print SCRATCH_EN field after the kernel
Differential Revision: https://reviews.llvm.org/D93353
2020-12-15 22:44:30 -08:00
Bangtian Liu
cf638d793c Ensure SplitEdge to return the new block between the two given blocks
This PR implements the function splitBasicBlockBefore to address an
issue
that occurred during SplitEdge(BB, Succ, ...), inside splitBlockBefore.
The issue occurs in SplitEdge when the Succ has a single predecessor
and the edge between the BB and Succ is not critical. This produces
the result ‘BB->Succ->New’. The new function splitBasicBlockBefore
was added to splitBlockBefore to handle the issue and now produces
the correct result ‘BB->New->Succ’.

Below is an example of splitting the block bb1 at its first instruction.

/// Original IR
bb0:
	br bb1
bb1:
        %0 = mul i32 1, 2
	br bb2
bb2:
/// IR after splitEdge(bb0, bb1) using splitBasicBlock
bb0:
	br bb1
bb1:
	br bb1.split
bb1.split:
        %0 = mul i32 1, 2
	br bb2
bb2:
/// IR after splitEdge(bb0, bb1) using splitBasicBlockBefore
bb0:
	br bb1.split
bb1.split
	br bb1
bb1:
        %0 = mul i32 1, 2
	br bb2
bb2:

Differential Revision: https://reviews.llvm.org/D92200
2020-12-15 23:32:29 +00:00
Matt Arsenault
60eba8161b RegisterCoalescer: Remove phi-only subranges when erasing identity copies
Undef subranges are not present in the live range values, except when
they cross block boundaries. In this situation, a identity copy is
inside a loop, and one of the lanes is undefined. It only appears
alive inside the loop due to the copy. Once the copy was erased, it
would leave behind a segment inside the loop body with no
corresponding def anywhere in the program.

When RenameIndependentSubregs processed this dummy interval, it would
introduce a "Multiple connected components in live interval" verifier
error when IMPLICIT_DEFs were added to the other two blocks. I believe
there is a missing verifier check for this type of dummy interval.

I have found additional cases from the same fundamental problem in
other areas I haven't managed to fix yet (e.g. the commented out
prune_subrange_phi_value_* cases).
2020-12-15 17:36:32 -05:00
Changpeng Fang
ce0c0013d8 AMDGPU: If a store defines (alias) a load, it clobbers the load.
Summary:
 If a store defines (must alias) a load, it clobbers the load.

Fixes: SWDEV-258915

Reviewers:
  arsenm

Differential Revision:
  https://reviews.llvm.org/D92951
2020-12-14 16:34:32 -08:00
Stanislav Mekhanoshin
cf5845d6c4 [AMDGPU] Use multi-dword flat scratch for spilling
Differential Revision: https://reviews.llvm.org/D93067
2020-12-14 14:19:29 -08:00
Michael Liao
1fd1f638b6 [amdgpu] Fix a crash case when V_CNDMASK could be simplified.
- Once an instruction is simplified, foldable candidates from it should
  be invalidated or skipped as the operand index is no longer valid.

Differential Revision: https://reviews.llvm.org/D93174
2020-12-14 13:08:13 -05:00
Sebastian Neubauer
5733167f54 [AMDGPU] Mark amdgpu_gfx functions as module entry function
- Allows lds allocations
- Writes resource usage into COMPUTE_PGM_RSRC1 registers in PAL metadata

Differential Revision: https://reviews.llvm.org/D92946
2020-12-14 10:43:39 +01:00
Georgii Rymar
98a4289810 [llvm-readobj] - For SHT_REL relocations, don't display an addend.
This is https://bugs.llvm.org/show_bug.cgi?id=44257.

In LLVM style we always print `0` as addend when dumping
SHT_REL relocations. It is confusing, this patch stops
printing it as the first comment on the bug page suggests.

Differential revision: https://reviews.llvm.org/D93033
2020-12-14 12:03:00 +03:00
Mirko Brkusanin
0c7cce54eb [AMDGPU] Resolve issues when picking between ds_read/write and ds_read2/write2
Both ds_read_b128 and ds_read2_b64 are valid for 128bit 16-byte aligned
loads but the one that will be selected is determined either by the order in
tablegen or by the AddedComplexity attribute. Currently ds_read_b128 has
priority.

While ds_read2_b64 has lower alignment requirements, we cannot always
restrict ds_read_b128 to 16-byte alignment because of unaligned-access-mode
option. This was causing ds_read_b128 to be selected for 8-byte aligned
loads regardles of chosen access mode.

To resolve this we use two patterns for selecting ds_read_b128. One
requires alignment of 16-byte and the other requires
unaligned-access-mode option.

Same goes for ds_write2_b64 and ds_write_b128.

Differential Revision: https://reviews.llvm.org/D92767
2020-12-10 12:40:49 +01:00
Stanislav Mekhanoshin
4617cc68f6 [AMDGPU] Fix expansion of 192 bit spills in PEI
Differential Revision: https://reviews.llvm.org/D92979
2020-12-09 16:36:29 -08:00
Austin Kerbow
4aa842a800 [AMDGPU] Add new pseudos for indirect addressing with VGPR Indexing
It is possible for copies or spills to be inserted in the middle of indirect
addressing sequences which use VGPR indexing. Spills to accvgprs could be
effected by the indexing mode.

Add new pseudo instructions that are expanded after register allocation to avoid
the problematic spill or copy placement.

Differential Revision: https://reviews.llvm.org/D91048
2020-12-08 12:24:12 -08:00
Jay Foad
03663e4130 [AMDGPU] Add occupancy level tests for GFX10.3. NFC.
getMaxWavesPerEU and getVGPRAllocGranule both changed in GFX10.3 and
they both affect the occupancy calculation.

Differential Revision: https://reviews.llvm.org/D92839
2020-12-08 14:15:01 +00:00
Stanislav Mekhanoshin
dd89249498 [AMDGPU] Annotate vgpr<->agpr spills in asm
Differential Revision: https://reviews.llvm.org/D92125
2020-12-07 11:25:25 -08:00
Scott Linder
f6b9afae00 [AMDGPU] Extend and reorganize memory legalizer tests
* Rename some tests to try to make a convention (where all components
  are optional) of:

    <addrspace>_<syncscope>_<memory-orders>_<operation>

* Split up at a level of granularity appropriate for the different RUN
  lines (i.e. split on addrspace so GFX6 can avoid FLAT) and that makes
  running a specific test reasonable in terms of wall time taken. This
  also means when run as part of the test suite the testing is not one
  serial bottleneck.

* Auto-generate check lines with `update_llc_test_checks.py` to make
  future maintenance more tractable.

Reviewed By: rampitec, t-tye

Differential Revision: https://reviews.llvm.org/D91545
2020-12-03 19:36:33 +00:00
Jay Foad
d28624a209 [AMDGPU] Stop adding an implicit def of vcc_hi for wave32
This doesn't seem to be needed for anything.

Differential Revision: https://reviews.llvm.org/D92400
2020-12-02 10:11:42 +00:00
Juneyoung Lee
53040a968d [ConstantFold] Fold more operations to poison
This patch folds more operations to poison.

Alive2 proof: https://alive2.llvm.org/ce/z/mxcb9G (it does not contain tests about div/rem because they fold to poison when raising UB)

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D92270
2020-11-29 21:19:48 +09:00
Nikita Popov
4df8efce80 [AA] Split up LocationSize::unknown()
Currently, we have some confusion in the codebase regarding the
meaning of LocationSize::unknown(): Some parts (including most of
BasicAA) assume that LocationSize::unknown() only allows accesses
after the base pointer. Some parts (various callers of AA) assume
that LocationSize::unknown() allows accesses both before and after
the base pointer (but within the underlying object).

This patch splits up LocationSize::unknown() into
LocationSize::afterPointer() and LocationSize::beforeOrAfterPointer()
to make this completely unambiguous. I tried my best to determine
which one is appropriate for all the existing uses.

The test changes in cs-cs.ll in particular illustrate a previously
clearly incorrect AA result: We were effectively assuming that
argmemonly functions were only allowed to access their arguments
after the passed pointer, but not before it. I'm pretty sure that
this was not intentional, and it's certainly not specified by
LangRef that way.

Differential Revision: https://reviews.llvm.org/D91649
2020-11-26 18:39:55 +01:00
Roman Lebedev
ba74fa244f [AMDGPU] Actually fully update opt-pipeline.ll test to account for -loop-idiom vs -indvars switch 2020-11-25 19:39:32 +03:00
Roman Lebedev
a8d74517dc [PassManager] Run Induction Variable Simplification pass *after* Recognize loop idioms pass, not before
Currently, `-indvars` runs first, and then immediately after `-loop-idiom` does.
I'm not really sure if `-loop-idiom` requires `-indvars` to run beforehand,
but i'm *very* sure that `-indvars` requires `-loop-idiom` to run afterwards,
as it can be seen in the phase-ordering test.

LoopIdiom runs on two types of loops: countable ones, and uncountable ones.
For uncountable ones, IndVars obviously didn't make any change to them,
since they are uncountable, so for them the order should be irrelevant.
For countable ones, well, they should have been countable before IndVars
for IndVars to make any change to them, and since SCEV is used on them,
it shouldn't matter if IndVars have already canonicalized them.
So i don't really see why we'd want the current ordering.

Should this cause issues, it will give us a reproducer test case
that shows flaws in this logic, and we then could adjust accordingly.

While this is quite likely beneficial in-the-wild already,
it's a required part for the full motivational pattern
behind `left-shift-until-bittest` loop idiom (D91038).

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D91800
2020-11-25 19:20:07 +03:00
Sebastian Neubauer
edd675643d [AMDGPU] Emit stack frame size in metadata
Add .shader_functions to pal metadata, which contains the stack frame
size for all non-entry-point functions.

Differential Revision: https://reviews.llvm.org/D90036
2020-11-25 16:30:02 +01:00
Matt Arsenault
79f75468b4 AMDGPU: Fix counting kernel arguments towards register usage
Also use DataLayout to get type size. Relying on the IR type size is
also pretty broken here, since this won't perfectly capture how types
are legalized.
2020-11-20 21:23:33 -05:00
Matt Arsenault
1d1234b2a4 OpaquePtr: Update more tests to use typed sret 2020-11-20 20:08:43 -05:00
Matt Arsenault
20c43d6bd5 OpaquePtr: Bulk update tests to use typed sret 2020-11-20 17:58:26 -05:00
Matt Arsenault
06c192d454 OpaquePtr: Bulk update tests to use typed byval
Upgrade of the IR text tests should be the only thing blocking making
typed byval mandatory. Partially done through regex and partially
manual.
2020-11-20 14:00:46 -05:00
Sebastian Neubauer
7a18bdb350 [AMDGPU] Implement flat scratch init for pal
Extract the scratch offset from the scratch buffer descriptor that is
stored in the global table.

Differential Revision: https://reviews.llvm.org/D91701
2020-11-20 11:14:30 +01:00
Scott Linder
0fe4b8e4b5 [NFC][AMDGPU] Remove some generic pointers in memory-legalizer tests
These tests implicitly depend on the target supporting generic pointers,
so to prepare for testing them on GFX6 (which lacks FLAT) remove the
dependency where possible.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D91666
2020-11-18 20:52:18 +00:00
Sebastian Neubauer
72ccec1bbc [AMDGPU] Fix v3f16 interaction with image store workaround
In some cases, the wrong amount of registers was reserved.

Also enable more v3f16 tests.

Differential Revision: https://reviews.llvm.org/D90847
2020-11-18 18:21:04 +01:00
Jay Foad
7ecf19697e [AMDGPU] Fix and extend vccz workarounds
We have workarounds for two different cases where vccz can get out of
sync with the value in vcc. This fixes them in two ways:

1. Fix the case where the def of vcc was in a previous basic block, by
pessimistically assuming that vccz might be incorrect at a basic block
boundary.

2. Fix the handling of pre-existing waitcnt instructions by calling
generateWaitcntInstBefore before examining ScoreBrackets to determine
whether there's an outstanding smem read operation.

Differential Revision: https://reviews.llvm.org/D91636
2020-11-18 15:26:06 +00:00
Jay Foad
e67d8859f2 [AMDGPU] Precommit more vccz workaround tests 2020-11-17 15:55:40 +00:00
Michael Liao
f375885ab8 [InferAddrSpace] Teach to handle assumed address space.
- In certain cases, a generic pointer could be assumed as a pointer to
  the global memory space or other spaces. With a dedicated target hook
  to query that address space from a given value, infer-address-space
  pass could infer and propagate that to all its users.

Differential Revision: https://reviews.llvm.org/D91121
2020-11-16 17:06:33 -05:00
Matt Arsenault
d2e52eec51 AMDGPU: Select global saddr mode from SGPR pointer
Use the 64-bit SGPR base with a 0 offset, since it's 1 fewer
instruction to materialize the 0 vs. the 64-bit copy.
2020-11-16 11:51:06 -05:00
Mirko Brkusanin
4cf6dd518e [AMDGPU][GlobalISel] Fix lowerShlSat
RegBankSelect would crash on G_SELECT when type is not s1.

Differential Revision: https://reviews.llvm.org/D91437
2020-11-16 17:43:31 +01:00
Matt Arsenault
a6e353b1d0 AMDGPU: Split large offsets when selecting global saddr mode
When the offset doesn't fit in the immediate field, move some to
voffset.
2020-11-16 11:36:01 -05:00
Florian Hahn
8dbe44cb29 Add pass to add !annotate metadata from @llvm.global.annotations.
This patch adds a new pass to add !annotation metadata for entries in
@llvm.global.anotations, which is generated  using
__attribute__((annotate("_name"))) on functions in Clang.

This has been discussed on llvm-dev as part of
    RFC: Combining Annotation Metadata and Remarks
    http://lists.llvm.org/pipermail/llvm-dev/2020-November/146393.html

Reviewed By: thegameg

Differential Revision: https://reviews.llvm.org/D91195
2020-11-16 14:57:11 +00:00
Matt Arsenault
e7eb2ac53f AMDGPU/GlobalISel: Regenerate some checks
Fixes indentation confusing diff in future patch.
2020-11-13 11:29:15 -05:00
Florian Hahn
8bb6347939 Add !annotation metadata and remarks pass.
This patch adds a new !annotation metadata kind which can be used to
attach annotation strings to instructions.

It also adds a new pass that emits summary remarks per function with the
counts for each annotation kind.

The intended uses cases for this new metadata is annotating
'interesting' instructions and the remarks should provide additional
insight into transformations applied to a program.

To motivate this, consider these specific questions we would like to get answered:

* How many stores added for automatic variable initialization remain after optimizations? Where are they?
* How many runtime checks inserted by a frontend could be eliminated? Where are the ones that did not get eliminated?

Discussed on llvm-dev as part of 'RFC: Combining Annotation Metadata and Remarks'
(http://lists.llvm.org/pipermail/llvm-dev/2020-November/146393.html)

Reviewed By: thegameg, jdoerfert

Differential Revision: https://reviews.llvm.org/D91188
2020-11-13 13:24:10 +00:00
Stanislav Mekhanoshin
5ab1702129 [AMDGPU] Remove scratch rsrc from spill pseudos
Differential Revision: https://reviews.llvm.org/D91110
2020-11-12 15:23:37 -08:00
Stanislav Mekhanoshin
cf6565f6d0 [AMDGPU] Enable multi-dword flat scratch load/stores
Differential Revision: https://reviews.llvm.org/D91384
2020-11-12 13:38:56 -08:00
Jay Foad
6881a82e8c [AMDGPU] Fix scheduling of exp pos4
Also fix a similar issue in SIInsertWaitcnts, but I don't think that fix
has any effect in practice.

Differential Revision: https://reviews.llvm.org/D91290
2020-11-12 19:57:14 +00:00
Scott Linder
d5f2c3e7c0 [NFC][AMDGPU] Clean up some lit test prefixes
Replace some instances of "ALL" with "GCN" where it applies. Committed
as obvious.
2020-11-11 17:12:37 +00:00
Jay Foad
830ed64ccd Revert "Revert "[AMDGPU] Reorganize GCN subtarget features for unaligned access""
This reverts commit 8b08fa0103.

The underlying problems were fixed by D90607.
2020-11-11 14:40:14 +00:00
Mirko Brkusanin
a75d6178b8 [GlobalISel] Add combine for (x | mask) -> x when (x | mask) == x
If we have a mask, and a value x, where (x | mask) == x, we can drop the OR
and just use x.

Differential Revision: https://reviews.llvm.org/D90952
2020-11-10 11:32:13 +01:00
Mirko Brkusanin
fb36ab0a42 [GlobalISel] Expand combine for (x & mask) -> x when (x & mask) == x
We can use KnownBitsAnalysis to cover cases when mask is not trivial. It can
also help with cases when mask is not constant but can still be folded into
one. Since 'and' is comutative we should treat both operands as possible
replacements.

Differential Revision: https://reviews.llvm.org/D90674
2020-11-10 11:32:13 +01:00
Mirko Brkusanin
53ae95c946 [AMDGPU][GlobalISel] Combine shift + logic + shift with constant operands
This sequence of instructions can be simplified if they are single use and
some operands are constants. Additional combines may be applied afterwards.

Differential Revision: https://reviews.llvm.org/D90223
2020-11-10 11:32:13 +01:00
Mirko Brkusanin
de719586a8 [AMDGPU][GlobalISel] Fold a chain of two shift instructions with constant operands
Sequence of same shift instructions with constant operands can be combined into
a single shift instruction.

Differential Revision: https://reviews.llvm.org/D90217
2020-11-10 11:32:12 +01:00
Carl Ritson
fde8351743 [AMDGPU] Fix lowering of S_MOV_{B32,B64}_term
If the source of S_MOV_{B32,B64}_term is an immediate then it
cannot be lowered to a COPY.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D90451
2020-11-10 12:16:31 +09:00