clang-p2996

Author	SHA1	Message	Date
Jon Chesterfield	30b29db7c7	[amdgpu] Don't crash on empty global ctor/dtor Global ctor/dtor can be an empty array, which is a Constant not a ConstantArray. The cast<ConstantArray> therefore asserts / crashes. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D113800	2021-11-16 14:36:08 +00:00
Matt Arsenault	659887b405	AMDGPU: Mark prolog/epilog SCC defs as dead A future change will add SCC liveness checks. Since we are still relying on forward register scavenging, add dead flags to avoid spuriously detecting SCC as live.	2021-11-15 21:35:06 -05:00
Matt Arsenault	e6bfbd7e0d	AMDGPU: Regenerate test checks	2021-11-15 21:35:06 -05:00
Dmitry Preobrazhensky	91f4650ebb	[AMDGPU][MC][GFX10] Corrected global_atomic_fcmpswap* Corrected src data size of global_atomic_fcmpswap and global_atomic_fcmpswap_x2 opcodes. Differential Revision: https://reviews.llvm.org/D113746	2021-11-15 12:51:12 +03:00
Matt Arsenault	54172326e0	AMDGPU: Regenerate test checks Regenerate with -NEXT checks to make a future diff clearer.	2021-11-13 11:35:35 -05:00
Simon Pilgrim	3170670541	[AMDGPU] Regenerate udiv.ll tests	2021-11-12 17:57:40 +00:00
Jay Foad	a70bbb5f7a	[AMDGPU] Simplify 64-bit division/remainder expansion The old expansion open-coded a 64-bit addition in a strange way, by adding the high parts without carry-in from the low part, and then adding the carry back in later on. Fixing this saves a couple of instructions and makes the code much easier to understand. Differential Revision: https://reviews.llvm.org/D113679	2021-11-12 15:48:41 +00:00
Jay Foad	8313b47a58	[AMDGPU] Regenerate some div/rem test checks	2021-11-11 15:26:22 +00:00
Jay Foad	9ba73b6099	[AMDGPU] Fix line endings	2021-11-11 15:18:22 +00:00
Jay Foad	491beae71d	[TwoAddressInstruction] Update LiveIntervals after rewriting INSERT_SUBREG to COPY Also add subranges to an existing live interval when introducing a new subreg def. Differential Revision: https://reviews.llvm.org/D113044	2021-11-11 12:24:59 +00:00
Jay Foad	6abbc3a420	[LiveIntervals] Update subranges in processTiedPairs In TwoAddressInstructionPass::processTiedPairs when updating live intervals after moving the last use of RegB back to the newly inserted copy, update any affected subranges as well as the main range. Differential Revision: https://reviews.llvm.org/D110411	2021-11-11 12:24:59 +00:00
kpyzhov	c9690092c8	[AMDGPU] Small correction in SITargetLowering::performOrCombine(). Differential Revision: https://reviews.llvm.org/D113203	2021-11-10 21:07:27 -05:00
Stanislav Mekhanoshin	476ab0f809	[AMDGPU] Fixed stack pointer init with architected flat scratch Even if wave offset is not present we still need to do the rest of the initialization. The mov into s32 was missing in the kernels. Fixes: SWDEV-310935 Differential Revision: https://reviews.llvm.org/D113628	2021-11-10 17:18:38 -08:00
Matt Arsenault	c7a0c2d0f7	AMDGPU: Report large stack usage for recursive calls We were previously setting an ignored bit in the kernel headers. The current behavior is to add the large amount on top of the statically known size of a single stack frame. I'm not sure if we should just use the large size as the entire reported size instead.	2021-11-10 20:02:01 -05:00
Yaxun (Sam) Liu	4b3881e9f3	Emit hidden hostcall argument for sanitized kernels this patch - https://reviews.llvm.org/D110337 changes the way how hostcall hidden argument is emitted for printf, but the sanitized kernels also use hostcall buffer to report a error for invalid memory access, which is not handled by the above patch and it leads to vdi runtime error: Device::callbackQueue aborting with error : HSA_STATUS_ERROR_MEMORY_FAULT: Agent attempted to access an inaccessible address. code: 0x2b Patch by: Praveen Velliengiri Reviewed by: Yaxun Liu, Matt Arsenault Differential Revision: https://reviews.llvm.org/D112820	2021-11-10 17:05:57 -05:00
Matt Arsenault	90ff148719	AMDGPU: Account for implicit argument alignment for kernarg segment If a kernel had no formal arguments but did have the implicit arguments, we were reporting a required kernarg alignment of 4. For some reason we require an 8-byte alignment for this, even though there's no real advantage and I don't see where this is documented in the ABI. The code object header code also claims the minimum alignment is 16, which is what I thought you always got at runtime anyway so I don't know why this matters.	2021-11-09 17:48:37 -05:00
Matt Arsenault	62ffcc5f37	AMDGPU: Regenerate test checks Update these to include -NEXT to avoid spurious changes in a future commit.	2021-11-09 15:28:59 -05:00
Thomas Symalla	76cbe62262	[AMDGPU] Changes the AMDGPU_Gfx calling convention by making the SGPRs 4..29 callee-save. This is to avoid superfluous s_movs when executing amdgpu_gfx function calls as the callee is likely not going to change the argument values. This patch changes the AMDGPU_Gfx calling convention. It defines the SGPR registers s[4:29] as callee-save and leaves some SGPRs usable for callers. The intention is to avoid unneccessary s_mov instructions for arguments the caller would otherwise save and restore in these registers. Reviewed By: sebastian-ne Differential Revision: https://reviews.llvm.org/D111637	2021-11-04 21:50:18 +01:00
Simon Pilgrim	53becf5df2	[AMDGPU] Regenerate shift-and-i128-ubfe.ll test checks	2021-11-04 14:27:30 +00:00
RamNalamothu	539f500e78	[AMDGPU] Do not add debug locations to the code inside prologue There is no real source location for code inside prologue as it is generated by compiler but source locations are being added to code inside prologue as a side effect of https://reviews.llvm.org/D99269 because buildSpillLoadStore() is using source location of the real instruction in the basic block if any. Fixes: SWDEV-307590 Reviewed By: scott.linder, sebastian-ne Differential Revision: https://reviews.llvm.org/D113100	2021-11-04 08:02:41 +05:30
alex-t	0a3d755ee9	[AMDGPU] Enable divergence-driven BFE selection Detailed description: This change enables the bit field extract patterns selection to s_bfe_u32 or v_bfe_u32 dependent on the pattern root node divergence. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D110950	2021-11-03 23:26:59 +03:00
Abinav Puthan Purayil	fbe61fb0aa	[AMDGPU] Fix SGPR checks in S_MOV_B64_IMM_PSEUDO generation. The function to generate S_MOV_B64_IMM_PSEUDO was recently modified to optimize AGPR to AGPR copy but it missed checking for the SGPR clobbering for the S_MOV_B64_IMM_PSEUDO generation. Differential Revision: https://reviews.llvm.org/D113005	2021-11-03 09:09:24 +05:30
Jay Foad	fce5a567c6	[AMDGPU] More robust checks in extract_vector_dynelt.ll	2021-11-02 13:26:31 +00:00
hsmahesha	e9ea992496	[IR] Replace all uses of a constant expression by corresponding instruction When a constant expression CE is being converted into a corresponding instruction I, CE is supposed to be replaced by I. However, it is possible that CE is being used multiple times within a parent instruction PI. Make sure that all the uses of CE within PI are replaced by I. Reviewed By: rampitec, arsenm Differential Revision: https://reviews.llvm.org/D112717	2021-11-02 10:01:46 +05:30
Jay Foad	b85995f6c4	[AMDGPU] Add tests for legacy multiply-add with immediate	2021-11-01 14:24:13 +00:00
Jay Foad	2b548b18c1	[AMDGPU] Shrink v_mac_legacy_f32 and v_fmac_legacy_f32 Differential Revision: https://reviews.llvm.org/D112917	2021-11-01 13:55:53 +00:00
Christudasan Devadasan	aa2d3b59ce	GlobalISel/Utils: Use incoming regbank while constraining the superclasses Register operands with superclasses can possibly have multiple regBanks if they have different register types. The regBank ambiguity resolved during regbankselect should be used to constrain the operand regclass instead of obtaining one from the MCInstrDesc. This is a prerequisite patch for D109300 that introduces allocatable AV_* Superclasses for AMDGPU by combining both VGPRs and AGPRs and we want to restrain the regclass to either A or V based on the incoming regbank. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D112323	2021-10-30 07:20:45 -04:00
Stanislav Mekhanoshin	e5340ed30c	[AMDGPU] Fix global isel for kernels using agprs on gfx90a With Global ISel getReservedRegs() is called before function is regbank selected for the first time. Defer caching of usesAGPRs() in this case. Differential Revision: https://reviews.llvm.org/D112644	2021-10-29 14:23:14 -07:00
Matt Arsenault	52fc2edb53	AMDGPU: Check kernarg alignments in test Strangely the kernel code object header clamps the value to a minimum of 16, but the emitted metadata only clamps to a minimum of 4.	2021-10-29 12:42:36 -04:00
Neubauer, Sebastian	c78640ee6a	[TailDuplicator] Fix merging block with terminator The TailDuplicator merged two blocks, even if the first one ended with a terminator, resulting in invalid MIR, where a terminator is in the middle of a block. Abort merging if the first block ends with a terminator. Differential Revision: https://reviews.llvm.org/D112226	2021-10-29 10:52:46 +02:00
Vang Thao	52b43d1549	[AMDGPU] Fix cvt_f32_ubyte combine with shl Shift node is still needed to check if the shift is shr or shl to increment/decrement offset. Do not override the node. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D112733	2021-10-28 21:43:06 -07:00
Abinav Puthan Purayil	2da6ef3664	[AMDGPU] Add 24-bit mulhi intrinsics in INTRINSIC_WO_CHAIN combine. mul24 intrinsic's operands are simplified by AMDGPUTargetLowering::performIntrinsicWOChainCombine(). This change adds the mul24hi intrinsics in the combine since its operands can be simplified like that of the mul24 intrinsics. Differential Revision: https://reviews.llvm.org/D112702	2021-10-28 16:57:48 +05:30
Abinav Puthan Purayil	9f8e779b42	[AMDGPU] Fix rhs of the tests in amdgpu-codegenprepare-mul24.ll. Differential Revision: https://reviews.llvm.org/D112685	2021-10-28 16:57:48 +05:30
Jay Foad	c6b4fb87c0	[AMDGPU] Add gfx10 uaddsat test coverage. NFC.	2021-10-28 10:24:12 +01:00
Sebastian Neubauer	fd1cfc9094	[AMDGPU][GlobalISel] Fix waterfall loops - Move the `s_and exec` to its correct position before the content of the waterfall loop - Use the SI_WATERFALL pseudo instruction, like for sdag, to benefit from optimizations - Add support for indirect function calls To support indirect calls, add a G_SI_CALL instruction without register class restrictions and insert a waterfall loop when applying register banks. Differential Revision: https://reviews.llvm.org/D109052	2021-10-28 10:30:55 +02:00
Abinav Puthan Purayil	fa592180b3	[AMDGPU] Add more llc tests for 48-bit mul generation. Differential Revision: https://reviews.llvm.org/D112554	2021-10-28 08:10:04 +05:30
Michael Liao	e6a4ba3aa6	[amdgpu] Handle the case where there is no scavenged register. - When an unconditional branch is expanded into an indirect branch, if there is no scavenged register, an SGPR pair needs spilling to enable the destination PC calculation. In addition, before jumping into the destination, that clobbered SGPR pair need restoring. - As SGPR cannot be spilled to or restored from memory directly, the spilling/restoring of that SGPR pair reuses the regular SGPR spilling support but without spilling it into memory. As that spilling and restoring points are fully controlled, we only need to spill that SGPR into the temporary VGPR, which needs spilling into its emergency slot. - The target-specific hook is revised to take additional restore block, where the restoring code is filled. After that, the relaxation will place that restore block directly before the destination block and insert an unconditional branch in any fall-through block into the destination block. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D106449	2021-10-27 18:37:27 -04:00
Austin Kerbow	02e60f2e77	[AMDGPU] Use max waves for scheduler's initial occupancy target The scheduler should set critical/excess register usage thresholds that are guided by the maximum possible occupancy for the function. This change is focused on setting proper lower bounds on register usage which we would typically only see when a specific number of maximum waves is requested with the "waves-per-eu" attribute, or by setting "amdgpu-num-vgpr\|sgpr" directly. This was broken previously. I have a follow-on patch that will address issues with the scheduler not targeting correct upper bounds on register usage which is typical with launch bounds and min "waves-per-eu". Changes by this patch: Set the initial critical register usage thresholds to minimum values that are determined by the maximum possible occupancy for the function, or the number of allocatable registers, whichever is lower. Avoid unisgned overflow if register limits are lower than the register tracking "ErrorMargin", I.e. when using stress-regalloc=2. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D112373	2021-10-26 15:30:26 -07:00
Abinav Puthan Purayil	61e3b9fefe	[AMDGPU] Add constrained shift pattern matches. The motivation for this is due to clang's conformance to https://www.khronos.org/registry/OpenCL/specs/3.0-unified/html/OpenCL_C.html#operators-shift which makes clang emit (<shift> a, (and b, <width> - 1)) for `a <shift> b` in OpenCL where a is an int of bit width <width>. Differential revision: https://reviews.llvm.org/D110231	2021-10-26 19:07:19 +05:30
Abinav Puthan Purayil	781dd39b7b	[AMDGPU] Enable 48-bit mul in AMDGPUCodeGenPrepare. We were bailing out of creating 24-bit muls for results wider than 32 bits in AMDGPUCodeGenPrepare. With the 24-bit mulhi intrinsic, this change teaches AMDGPUCodeGenPrepare to generate the 48-bit mul correctly. Differential Revision: https://reviews.llvm.org/D112395	2021-10-26 18:53:07 +05:30
Abinav Puthan Purayil	9bd5cfeb1f	[AMDGPU] Implement llvm.amdgcn.mulhi.[i,u]24 intrinsics. These intrinsics maps to the 24-bit v_mul_hi instructions. This change also fixes an incorrect assumption on the associativity of 24-bit mulhi in its SDNode record in tblgen. Differential Revision: https://reviews.llvm.org/D112394	2021-10-26 18:53:07 +05:30
Neubauer, Sebastian	487f15603e	[AMDGPU] Fix setcc combine for i128 The combine asserted if constants could not be represented as uint64_t. Use APInts to fix this. Differential Revision: https://reviews.llvm.org/D112416	2021-10-26 13:39:50 +02:00
Sanjay Patel	6e46b66e2a	[DAGCombiner] make matching bit-hack form of usubsat more flexible (i8 X ^ 128) & (i8 X s>> 7) --> usubsat X, 128 As suggested in D112085, we can substitute 'xor' with 'add' in this pattern, and it is logically equivalent: https://alive2.llvm.org/ce/z/eJtWWC We canonicalize to 'xor' in IR, but SDAG does not do that (and it probably should not - https://llvm.org/PR52267 ), so it is possible to see either pattern in codegen. Note that 'sub' is a another potential pattern, but that is canonicalized to 'add' in DAGCombiner, so we don't need to worry about that variation. Differential Revision: https://reviews.llvm.org/D112377	2021-10-25 09:01:52 -04:00
Thomas Symalla	f0331100f7	[AMDGPU] Regenerate some tests with the current version of update_mir_test_checks.py	2021-10-25 14:42:13 +02:00
Sanjay Patel	d34cad3196	[AMDGPU] add tests for alternate form of usubsat; NFC	2021-10-24 07:52:07 -04:00
Matt Arsenault	ec57b37551	AMDGPU: Use attributor to propagate amdgpu-flat-work-group-size This can merge the acceptable ranges based on the call graph, rather than the simple application of the attribute. Remove the handling from the old pass.	2021-10-22 16:23:50 -04:00
Matt Arsenault	8d4b74ac3f	AMDGPU: Don't consider whether amdgpu-flat-work-group-size was set It should be semantically identical if it was set to the same value as the default. Also improve the documentation.	2021-10-22 16:23:50 -04:00
Matt Arsenault	7d962f9ca3	AMDGPU: Regenerate MIR test checks Recently this started using -NEXT checks, so regenerate these to avoid extra test churn in a future change.	2021-10-22 15:36:50 -04:00
Matt Arsenault	ae698f89b8	AMDGPU: Fix hardcoded registers in tests	2021-10-22 15:36:50 -04:00
Jay Foad	58e7ec471c	[AMDGPU] Run SIShrinkInstructions before post-RA scheduling Run post-RA SIShrinkInstructions just before post-RA scheduling, instead of afterwards. After the fixes in D112305 and D112317 this seems to make no difference, but it paves the way for scheduler tweaks that are sensitive to the e32 vs e64 encoding of VALU instructions. Differential Revision: https://reviews.llvm.org/D112341	2021-10-22 20:24:03 +01:00

1 2 3 4 5 ...

4971 Commits