clang-p2996

Author	SHA1	Message	Date
Bangtian Liu	511cfe9441	Revert "Ensure SplitEdge to return the new block between the two given blocks" This reverts commit `d20e0c3444`.	2020-12-17 21:00:37 +00:00
Bangtian Liu	d20e0c3444	Ensure SplitEdge to return the new block between the two given blocks This PR implements the function splitBasicBlockBefore to address an issue that occurred during SplitEdge(BB, Succ, ...), inside splitBlockBefore. The issue occurs in SplitEdge when the Succ has a single predecessor and the edge between the BB and Succ is not critical. This produces the result ‘BB->Succ->New’. The new function splitBasicBlockBefore was added to splitBlockBefore to handle the issue and now produces the correct result ‘BB->New->Succ’. Below is an example of splitting the block bb1 at its first instruction. /// Original IR bb0: br bb1 bb1: %0 = mul i32 1, 2 br bb2 bb2: /// IR after splitEdge(bb0, bb1) using splitBasicBlock bb0: br bb1 bb1: br bb1.split bb1.split: %0 = mul i32 1, 2 br bb2 bb2: /// IR after splitEdge(bb0, bb1) using splitBasicBlockBefore bb0: br bb1.split bb1.split br bb1 bb1: %0 = mul i32 1, 2 br bb2 bb2: Differential Revision: https://reviews.llvm.org/D92200	2020-12-17 16:00:15 +00:00
Matt Arsenault	f333736757	AMDGPU: Remove SGPRSpillVGPRDefinedSet hack These VGPRs should be reserved and therefore do not need "correct" liveness. They should not have undef uses, which can still cause issues.	2020-12-16 21:33:35 -05:00
Bangtian Liu	c10757200d	Revert "Ensure SplitEdge to return the new block between the two given blocks" This reverts commit `cf638d793c`.	2020-12-16 11:52:30 +00:00
Stanislav Mekhanoshin	eb66bf0802	[AMDGPU] Print SCRATCH_EN field after the kernel Differential Revision: https://reviews.llvm.org/D93353	2020-12-15 22:44:30 -08:00
Bangtian Liu	cf638d793c	Ensure SplitEdge to return the new block between the two given blocks This PR implements the function splitBasicBlockBefore to address an issue that occurred during SplitEdge(BB, Succ, ...), inside splitBlockBefore. The issue occurs in SplitEdge when the Succ has a single predecessor and the edge between the BB and Succ is not critical. This produces the result ‘BB->Succ->New’. The new function splitBasicBlockBefore was added to splitBlockBefore to handle the issue and now produces the correct result ‘BB->New->Succ’. Below is an example of splitting the block bb1 at its first instruction. /// Original IR bb0: br bb1 bb1: %0 = mul i32 1, 2 br bb2 bb2: /// IR after splitEdge(bb0, bb1) using splitBasicBlock bb0: br bb1 bb1: br bb1.split bb1.split: %0 = mul i32 1, 2 br bb2 bb2: /// IR after splitEdge(bb0, bb1) using splitBasicBlockBefore bb0: br bb1.split bb1.split br bb1 bb1: %0 = mul i32 1, 2 br bb2 bb2: Differential Revision: https://reviews.llvm.org/D92200	2020-12-15 23:32:29 +00:00
Matt Arsenault	60eba8161b	RegisterCoalescer: Remove phi-only subranges when erasing identity copies Undef subranges are not present in the live range values, except when they cross block boundaries. In this situation, a identity copy is inside a loop, and one of the lanes is undefined. It only appears alive inside the loop due to the copy. Once the copy was erased, it would leave behind a segment inside the loop body with no corresponding def anywhere in the program. When RenameIndependentSubregs processed this dummy interval, it would introduce a "Multiple connected components in live interval" verifier error when IMPLICIT_DEFs were added to the other two blocks. I believe there is a missing verifier check for this type of dummy interval. I have found additional cases from the same fundamental problem in other areas I haven't managed to fix yet (e.g. the commented out prune_subrange_phi_value_* cases).	2020-12-15 17:36:32 -05:00
Changpeng Fang	ce0c0013d8	AMDGPU: If a store defines (alias) a load, it clobbers the load. Summary: If a store defines (must alias) a load, it clobbers the load. Fixes: SWDEV-258915 Reviewers: arsenm Differential Revision: https://reviews.llvm.org/D92951	2020-12-14 16:34:32 -08:00
Stanislav Mekhanoshin	cf5845d6c4	[AMDGPU] Use multi-dword flat scratch for spilling Differential Revision: https://reviews.llvm.org/D93067	2020-12-14 14:19:29 -08:00
Michael Liao	1fd1f638b6	[amdgpu] Fix a crash case when `V_CNDMASK` could be simplified. - Once an instruction is simplified, foldable candidates from it should be invalidated or skipped as the operand index is no longer valid. Differential Revision: https://reviews.llvm.org/D93174	2020-12-14 13:08:13 -05:00
Sebastian Neubauer	5733167f54	[AMDGPU] Mark amdgpu_gfx functions as module entry function - Allows lds allocations - Writes resource usage into COMPUTE_PGM_RSRC1 registers in PAL metadata Differential Revision: https://reviews.llvm.org/D92946	2020-12-14 10:43:39 +01:00
Georgii Rymar	98a4289810	[llvm-readobj] - For SHT_REL relocations, don't display an addend. This is https://bugs.llvm.org/show_bug.cgi?id=44257. In LLVM style we always print `0` as addend when dumping SHT_REL relocations. It is confusing, this patch stops printing it as the first comment on the bug page suggests. Differential revision: https://reviews.llvm.org/D93033	2020-12-14 12:03:00 +03:00
Mirko Brkusanin	0c7cce54eb	[AMDGPU] Resolve issues when picking between ds_read/write and ds_read2/write2 Both ds_read_b128 and ds_read2_b64 are valid for 128bit 16-byte aligned loads but the one that will be selected is determined either by the order in tablegen or by the AddedComplexity attribute. Currently ds_read_b128 has priority. While ds_read2_b64 has lower alignment requirements, we cannot always restrict ds_read_b128 to 16-byte alignment because of unaligned-access-mode option. This was causing ds_read_b128 to be selected for 8-byte aligned loads regardles of chosen access mode. To resolve this we use two patterns for selecting ds_read_b128. One requires alignment of 16-byte and the other requires unaligned-access-mode option. Same goes for ds_write2_b64 and ds_write_b128. Differential Revision: https://reviews.llvm.org/D92767	2020-12-10 12:40:49 +01:00
Stanislav Mekhanoshin	4617cc68f6	[AMDGPU] Fix expansion of 192 bit spills in PEI Differential Revision: https://reviews.llvm.org/D92979	2020-12-09 16:36:29 -08:00
Austin Kerbow	4aa842a800	[AMDGPU] Add new pseudos for indirect addressing with VGPR Indexing It is possible for copies or spills to be inserted in the middle of indirect addressing sequences which use VGPR indexing. Spills to accvgprs could be effected by the indexing mode. Add new pseudo instructions that are expanded after register allocation to avoid the problematic spill or copy placement. Differential Revision: https://reviews.llvm.org/D91048	2020-12-08 12:24:12 -08:00
Jay Foad	03663e4130	[AMDGPU] Add occupancy level tests for GFX10.3. NFC. getMaxWavesPerEU and getVGPRAllocGranule both changed in GFX10.3 and they both affect the occupancy calculation. Differential Revision: https://reviews.llvm.org/D92839	2020-12-08 14:15:01 +00:00
Stanislav Mekhanoshin	dd89249498	[AMDGPU] Annotate vgpr<->agpr spills in asm Differential Revision: https://reviews.llvm.org/D92125	2020-12-07 11:25:25 -08:00
Scott Linder	f6b9afae00	[AMDGPU] Extend and reorganize memory legalizer tests * Rename some tests to try to make a convention (where all components are optional) of: <addrspace>_<syncscope>_<memory-orders>_<operation> * Split up at a level of granularity appropriate for the different RUN lines (i.e. split on addrspace so GFX6 can avoid FLAT) and that makes running a specific test reasonable in terms of wall time taken. This also means when run as part of the test suite the testing is not one serial bottleneck. * Auto-generate check lines with `update_llc_test_checks.py` to make future maintenance more tractable. Reviewed By: rampitec, t-tye Differential Revision: https://reviews.llvm.org/D91545	2020-12-03 19:36:33 +00:00
Jay Foad	d28624a209	[AMDGPU] Stop adding an implicit def of vcc_hi for wave32 This doesn't seem to be needed for anything. Differential Revision: https://reviews.llvm.org/D92400	2020-12-02 10:11:42 +00:00
Juneyoung Lee	53040a968d	[ConstantFold] Fold more operations to poison This patch folds more operations to poison. Alive2 proof: https://alive2.llvm.org/ce/z/mxcb9G (it does not contain tests about div/rem because they fold to poison when raising UB) Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D92270	2020-11-29 21:19:48 +09:00
Nikita Popov	4df8efce80	[AA] Split up LocationSize::unknown() Currently, we have some confusion in the codebase regarding the meaning of LocationSize::unknown(): Some parts (including most of BasicAA) assume that LocationSize::unknown() only allows accesses after the base pointer. Some parts (various callers of AA) assume that LocationSize::unknown() allows accesses both before and after the base pointer (but within the underlying object). This patch splits up LocationSize::unknown() into LocationSize::afterPointer() and LocationSize::beforeOrAfterPointer() to make this completely unambiguous. I tried my best to determine which one is appropriate for all the existing uses. The test changes in cs-cs.ll in particular illustrate a previously clearly incorrect AA result: We were effectively assuming that argmemonly functions were only allowed to access their arguments after the passed pointer, but not before it. I'm pretty sure that this was not intentional, and it's certainly not specified by LangRef that way. Differential Revision: https://reviews.llvm.org/D91649	2020-11-26 18:39:55 +01:00
Roman Lebedev	ba74fa244f	[AMDGPU] Actually fully update opt-pipeline.ll test to account for -loop-idiom vs -indvars switch	2020-11-25 19:39:32 +03:00
Roman Lebedev	a8d74517dc	[PassManager] Run Induction Variable Simplification pass after Recognize loop idioms pass, not before Currently, `-indvars` runs first, and then immediately after `-loop-idiom` does. I'm not really sure if `-loop-idiom` requires `-indvars` to run beforehand, but i'm very sure that `-indvars` requires `-loop-idiom` to run afterwards, as it can be seen in the phase-ordering test. LoopIdiom runs on two types of loops: countable ones, and uncountable ones. For uncountable ones, IndVars obviously didn't make any change to them, since they are uncountable, so for them the order should be irrelevant. For countable ones, well, they should have been countable before IndVars for IndVars to make any change to them, and since SCEV is used on them, it shouldn't matter if IndVars have already canonicalized them. So i don't really see why we'd want the current ordering. Should this cause issues, it will give us a reproducer test case that shows flaws in this logic, and we then could adjust accordingly. While this is quite likely beneficial in-the-wild already, it's a required part for the full motivational pattern behind `left-shift-until-bittest` loop idiom (D91038). Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D91800	2020-11-25 19:20:07 +03:00
Sebastian Neubauer	edd675643d	[AMDGPU] Emit stack frame size in metadata Add .shader_functions to pal metadata, which contains the stack frame size for all non-entry-point functions. Differential Revision: https://reviews.llvm.org/D90036	2020-11-25 16:30:02 +01:00
Matt Arsenault	79f75468b4	AMDGPU: Fix counting kernel arguments towards register usage Also use DataLayout to get type size. Relying on the IR type size is also pretty broken here, since this won't perfectly capture how types are legalized.	2020-11-20 21:23:33 -05:00
Matt Arsenault	1d1234b2a4	OpaquePtr: Update more tests to use typed sret	2020-11-20 20:08:43 -05:00
Matt Arsenault	20c43d6bd5	OpaquePtr: Bulk update tests to use typed sret	2020-11-20 17:58:26 -05:00
Matt Arsenault	06c192d454	OpaquePtr: Bulk update tests to use typed byval Upgrade of the IR text tests should be the only thing blocking making typed byval mandatory. Partially done through regex and partially manual.	2020-11-20 14:00:46 -05:00
Sebastian Neubauer	7a18bdb350	[AMDGPU] Implement flat scratch init for pal Extract the scratch offset from the scratch buffer descriptor that is stored in the global table. Differential Revision: https://reviews.llvm.org/D91701	2020-11-20 11:14:30 +01:00
Scott Linder	0fe4b8e4b5	[NFC][AMDGPU] Remove some generic pointers in memory-legalizer tests These tests implicitly depend on the target supporting generic pointers, so to prepare for testing them on GFX6 (which lacks FLAT) remove the dependency where possible. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D91666	2020-11-18 20:52:18 +00:00
Sebastian Neubauer	72ccec1bbc	[AMDGPU] Fix v3f16 interaction with image store workaround In some cases, the wrong amount of registers was reserved. Also enable more v3f16 tests. Differential Revision: https://reviews.llvm.org/D90847	2020-11-18 18:21:04 +01:00
Jay Foad	7ecf19697e	[AMDGPU] Fix and extend vccz workarounds We have workarounds for two different cases where vccz can get out of sync with the value in vcc. This fixes them in two ways: 1. Fix the case where the def of vcc was in a previous basic block, by pessimistically assuming that vccz might be incorrect at a basic block boundary. 2. Fix the handling of pre-existing waitcnt instructions by calling generateWaitcntInstBefore before examining ScoreBrackets to determine whether there's an outstanding smem read operation. Differential Revision: https://reviews.llvm.org/D91636	2020-11-18 15:26:06 +00:00
Jay Foad	e67d8859f2	[AMDGPU] Precommit more vccz workaround tests	2020-11-17 15:55:40 +00:00
Michael Liao	f375885ab8	[InferAddrSpace] Teach to handle assumed address space. - In certain cases, a generic pointer could be assumed as a pointer to the global memory space or other spaces. With a dedicated target hook to query that address space from a given value, infer-address-space pass could infer and propagate that to all its users. Differential Revision: https://reviews.llvm.org/D91121	2020-11-16 17:06:33 -05:00
Matt Arsenault	d2e52eec51	AMDGPU: Select global saddr mode from SGPR pointer Use the 64-bit SGPR base with a 0 offset, since it's 1 fewer instruction to materialize the 0 vs. the 64-bit copy.	2020-11-16 11:51:06 -05:00
Mirko Brkusanin	4cf6dd518e	[AMDGPU][GlobalISel] Fix lowerShlSat RegBankSelect would crash on G_SELECT when type is not s1. Differential Revision: https://reviews.llvm.org/D91437	2020-11-16 17:43:31 +01:00
Matt Arsenault	a6e353b1d0	AMDGPU: Split large offsets when selecting global saddr mode When the offset doesn't fit in the immediate field, move some to voffset.	2020-11-16 11:36:01 -05:00
Florian Hahn	8dbe44cb29	Add pass to add !annotate metadata from @llvm.global.annotations. This patch adds a new pass to add !annotation metadata for entries in @llvm.global.anotations, which is generated using __attribute__((annotate("_name"))) on functions in Clang. This has been discussed on llvm-dev as part of RFC: Combining Annotation Metadata and Remarks http://lists.llvm.org/pipermail/llvm-dev/2020-November/146393.html Reviewed By: thegameg Differential Revision: https://reviews.llvm.org/D91195	2020-11-16 14:57:11 +00:00
Matt Arsenault	e7eb2ac53f	AMDGPU/GlobalISel: Regenerate some checks Fixes indentation confusing diff in future patch.	2020-11-13 11:29:15 -05:00
Florian Hahn	8bb6347939	Add !annotation metadata and remarks pass. This patch adds a new !annotation metadata kind which can be used to attach annotation strings to instructions. It also adds a new pass that emits summary remarks per function with the counts for each annotation kind. The intended uses cases for this new metadata is annotating 'interesting' instructions and the remarks should provide additional insight into transformations applied to a program. To motivate this, consider these specific questions we would like to get answered: * How many stores added for automatic variable initialization remain after optimizations? Where are they? * How many runtime checks inserted by a frontend could be eliminated? Where are the ones that did not get eliminated? Discussed on llvm-dev as part of 'RFC: Combining Annotation Metadata and Remarks' (http://lists.llvm.org/pipermail/llvm-dev/2020-November/146393.html) Reviewed By: thegameg, jdoerfert Differential Revision: https://reviews.llvm.org/D91188	2020-11-13 13:24:10 +00:00
Stanislav Mekhanoshin	5ab1702129	[AMDGPU] Remove scratch rsrc from spill pseudos Differential Revision: https://reviews.llvm.org/D91110	2020-11-12 15:23:37 -08:00
Stanislav Mekhanoshin	cf6565f6d0	[AMDGPU] Enable multi-dword flat scratch load/stores Differential Revision: https://reviews.llvm.org/D91384	2020-11-12 13:38:56 -08:00
Jay Foad	6881a82e8c	[AMDGPU] Fix scheduling of exp pos4 Also fix a similar issue in SIInsertWaitcnts, but I don't think that fix has any effect in practice. Differential Revision: https://reviews.llvm.org/D91290	2020-11-12 19:57:14 +00:00
Scott Linder	d5f2c3e7c0	[NFC][AMDGPU] Clean up some lit test prefixes Replace some instances of "ALL" with "GCN" where it applies. Committed as obvious.	2020-11-11 17:12:37 +00:00
Jay Foad	830ed64ccd	Revert "Revert "[AMDGPU] Reorganize GCN subtarget features for unaligned access"" This reverts commit `8b08fa0103`. The underlying problems were fixed by D90607.	2020-11-11 14:40:14 +00:00
Mirko Brkusanin	a75d6178b8	[GlobalISel] Add combine for (x \| mask) -> x when (x \| mask) == x If we have a mask, and a value x, where (x \| mask) == x, we can drop the OR and just use x. Differential Revision: https://reviews.llvm.org/D90952	2020-11-10 11:32:13 +01:00
Mirko Brkusanin	fb36ab0a42	[GlobalISel] Expand combine for (x & mask) -> x when (x & mask) == x We can use KnownBitsAnalysis to cover cases when mask is not trivial. It can also help with cases when mask is not constant but can still be folded into one. Since 'and' is comutative we should treat both operands as possible replacements. Differential Revision: https://reviews.llvm.org/D90674	2020-11-10 11:32:13 +01:00
Mirko Brkusanin	53ae95c946	[AMDGPU][GlobalISel] Combine shift + logic + shift with constant operands This sequence of instructions can be simplified if they are single use and some operands are constants. Additional combines may be applied afterwards. Differential Revision: https://reviews.llvm.org/D90223	2020-11-10 11:32:13 +01:00
Mirko Brkusanin	de719586a8	[AMDGPU][GlobalISel] Fold a chain of two shift instructions with constant operands Sequence of same shift instructions with constant operands can be combined into a single shift instruction. Differential Revision: https://reviews.llvm.org/D90217	2020-11-10 11:32:12 +01:00
Carl Ritson	fde8351743	[AMDGPU] Fix lowering of S_MOV_{B32,B64}_term If the source of S_MOV_{B32,B64}_term is an immediate then it cannot be lowered to a COPY. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D90451	2020-11-10 12:16:31 +09:00

1 2 3 4 5 ...

4140 Commits