clang-p2996

Author	SHA1	Message	Date
Matt Arsenault	74a148ad39	GlobalISel: Verify G_BITCAST changes the type Updated the AArch64 tests the best I could with my vague, inferred understanding of AArch64 register banks. As far as I can tell, there is only one 32-bit/64-bit type which will use the gpr register bank, so we have to use the fpr bank for the other operand.	2020-07-08 17:16:27 -04:00
Jay Foad	47788b97a9	SILoadStoreOptimizer: add support for GFX10 image instructions GFX10 image instructions use one or more address operands starting at vaddr0, instead of a single vaddr operand, to allow for NSA forms. Differential Revision: https://reviews.llvm.org/D81675	2020-07-08 19:15:46 +01:00
Jay Foad	a8816ebee0	[AMDGPU] Fix and simplify AMDGPULegalizerInfo::legalizeUDIV_UREM32Impl Use the algorithm from AMDGPUCodeGenPrepare::expandDivRem32. Differential Revision: https://reviews.llvm.org/D83383	2020-07-08 19:14:49 +01:00
Jay Foad	ecac951be9	[AMDGPU] Fix and simplify AMDGPUTargetLowering::LowerUDIVREM Use the algorithm from AMDGPUCodeGenPrepare::expandDivRem32. Differential Revision: https://reviews.llvm.org/D83382	2020-07-08 19:14:49 +01:00
Jay Foad	f4bd01c191	[AMDGPU] Fix and simplify AMDGPUCodeGenPrepare::expandDivRem32 Fix the division/remainder algorithm by adding a second quotient refinement step, which is required in some cases like 0xFFFFFFFFu / 0x11111111u (https://bugs.llvm.org/show_bug.cgi?id=46212). Also document, rewrite and simplify it by ensuring that we always have a lower bound on inv(y), which simplifies the UNR step and the quotient refinement steps. Differential Revision: https://reviews.llvm.org/D83381	2020-07-08 19:14:48 +01:00
Matt Arsenault	23157f3bdb	GlobalISel: Handle EVT argument lowering correctly handleAssignments was assuming every argument type is an MVT, and assignArg would always fail. This fixes one of the hacks in the current AMDGPU calling convention code that pre-processes the arguments.	2020-07-07 16:36:14 -04:00
Matt Arsenault	42bb481442	AMDGPU/GlobalISel: Fix skipping unused kernel arguments The tests in `a5b9ad7e9a` actually failed the verifier, which for some reason is not the default. Also add tests for 0-sized function arguments, which do not add entries to the expected register lists.	2020-07-07 16:36:13 -04:00
Matt Arsenault	c19c153e74	AMDGPU: Don't ignore carry out user when expanding add_co_pseudo This was resulting in a missing vreg def in the use select instruction. The output of the pseudo doesn't make sense, since it really shouldn't have the vreg output in the first place, and instead an implicit scc def to match the real scalar behavior. We could have easier to understand tests if we selected scalar versions of the [us]{add\|sub}.with.overflow intrinsics. This does still end up producing vector code in the end, since it gets moved later.	2020-07-06 14:28:01 -04:00
Matt Arsenault	a5b9ad7e9a	AMDGPU/GlobalISel: Don't emit code for unused kernel arguments	2020-07-06 09:04:06 -04:00
Matt Arsenault	581f1823cd	AMDGPU/GlobalISel: Fix hardcoded register number checks in test	2020-07-06 09:01:59 -04:00
Matt Arsenault	7b76a5c8a2	AMDGPU: Fix fixed ABI SGPR arguments The default constructor wasn't setting isSet o the ArgDescriptor, so while these had the value set, they were treated as missing. This only ended up mattering in the indirect call case (and for regular calls in GlobalISel, which current doesn't have a way to support the variable ABI).	2020-07-06 09:01:18 -04:00
Matt Arsenault	bcff3deaa1	AMDGPU/GlobalISel: Add some missing return tests	2020-07-06 09:01:18 -04:00
Simon Pilgrim	c37400f6e7	Regenerate subreg liverange tests. NFC. To simplify the diffs in a patch in development.	2020-07-06 13:58:25 +01:00
vpykhtin	bb69ca822a	[AMDGPU] Don't combine DPP if DPP register is used more than once per instruction Reviewers: arsenm, rampitec, foad Reviewed By: rampitec, foad Subscribers: wuzish, kzhuravl, nemanjai, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, kbarton, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D82551	2020-07-03 15:08:26 +03:00
Carl Ritson	42ca2070d7	[AMDGPU] Insert PS early exit at end of control flow Exit early if the exec mask is zero at the end of control flow. Mark the ends of control flow during control flow lowering and convert these to exits during the insert skips pass. Reviewed By: nhaehnle Differential Revision: https://reviews.llvm.org/D82737	2020-07-03 14:04:34 +09:00
Carl Ritson	7ec6927bad	Revert "[AMDGPU] Insert PS early exit at end of control flow" This reverts commit `2bfcacf0ad`. There appears to be an issue to analysis preservation.	2020-07-03 13:03:33 +09:00
Carl Ritson	2bfcacf0ad	[AMDGPU] Insert PS early exit at end of control flow Exit early if the exec mask is zero at the end of control flow. Mark the ends of control flow during control flow lowering and convert these to exits during the insert skips pass. Reviewed By: nhaehnle Differential Revision: https://reviews.llvm.org/D82737	2020-07-03 12:26:28 +09:00
Carl Ritson	a3daa3f75a	[AMDGPU] Unify early PS termination blocks Generate a single early exit block out-of-line and branch to this if all lanes are killed. This avoids branching if lanes are active. Reviewed By: nhaehnle Differential Revision: https://reviews.llvm.org/D82641	2020-07-03 09:58:05 +09:00
Dmitry Preobrazhensky	1c9d681092	[AMDGPU][CODEGEN] Added support of new inline assembler constraints Added support for constraints 'I', 'J', 'B', 'C', 'DA', 'DB'. See https://gcc.gnu.org/onlinedocs/gcc/Machine-Constraints.html#Machine-Constraints. Reviewers: arsenm, rampitec Differential Revision: https://reviews.llvm.org/D81651	2020-07-02 17:20:15 +03:00
Jay Foad	6f1694759c	[AMDGPU] Fix formatting in MIR tests	2020-07-02 10:27:34 +01:00
Pushpinder Singh	e1a31f52cd	[AMDGPU] Control num waves per EU for implicit work-group size Summary: If amdgpu-flat-work-group-size is not specified in LLVM IR, the backend uses default value of 1024. For this, minimum waves per EU should be 4. However, backend is still setting minimum value to 1 instead of calculated value. This is not observed normally as frontend always provide amdgpu-flat-work-group-size attribute. Reviewers: rampitec, b-sumner, sameerds, msearles Reviewed By: rampitec Subscribers: qcolombet, arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D81991	2020-07-01 22:53:52 -04:00
Matt Arsenault	d2e74fad20	AMDGPU: Set more mov flags on V_ACCVGPR_{READ\|WRITE}_B32 This fixes extra copies when materializing constants in AGPRs. This made it a lot harder to trigger the spilling in spill-agpr.ll	2020-07-01 18:58:59 -04:00
Matt Arsenault	a230f1db3f	AMDGPU: Fix missing tracksRegLiveness in tests I have no idea why this is considered optional, or why it's not the default. Also add uses of the copied registers for more useful liveness testing.	2020-07-01 18:58:59 -04:00
Stanislav Mekhanoshin	54e2dc7537	[AMDGPU] Limit promote alloca to vector with VGPR budget Allow only up to 1/4 of available VGPRs for the vectorization of any given alloca. Differential Revision: https://reviews.llvm.org/D82990	2020-07-01 15:57:24 -07:00
Matt Arsenault	ba3bafe46a	AMDGPU: Convert AGPR copy test to generated checks	2020-07-01 13:59:13 -04:00
Matt Arsenault	14fe4607f1	AMDGPU: Support commuting register and global operand	2020-07-01 13:59:13 -04:00
Matt Arsenault	a21544ad11	AMDGPU: Fix handling of target flags when commuting instruction If the original register operand had a subregister, it wasn't getting cleared. This resulted in reinterpreted the subreg index as unrecognized target flags, which produced unparseable MIR.	2020-07-01 13:59:13 -04:00
Matt Arsenault	16ea23ff78	AMDGPU: Clear subreg when folding immediate copies This was getting reinterpreted as operand target flags, and appearing as as <unknown target flag>, resulting in unparseable MIR.	2020-07-01 13:59:13 -04:00
Petar Avramovic	4b9ae1b7e5	AMDGPU/GlobalISel: Select init_exec intrinsic Change imm with timm in pattern for SI_INIT_EXEC_LO and remove regbank mappings for non register operands. Differential Revision: https://reviews.llvm.org/D82885	2020-07-01 11:50:59 +02:00
Saiyedul Islam	9182316395	[AMDGPU] Spill more than wavesize CSR SGPRs In case of more than wavesize CSR SGPR spills, lanes of reserved VGPR were getting overwritten due to wrap around. Reserve a VGPR (when NumVGPRSpillLanes = 0, WaveSize, 2*WaveSize, ..) and when one of the two conditions is true: 1. One reserved VGPR being tracked by VGPRReservedForSGPRSpill is not yet reserved. 2. All spill lanes of reserved VGPR(s) are full and another spill lane is required. Reviewed By: arsenm, kerbowa Differential Revision: https://reviews.llvm.org/D82463	2020-07-01 07:40:47 +00:00
Matt Arsenault	291ece0efa	AMDGPU/GlobalISel: Remove some selection tests which should be invalid These use undef generic virtual register operands, which should be rejected by the verifier.	2020-06-30 19:18:01 -04:00
Petar Avramovic	d717382633	AMDGPU/GlobalISel: Select icmp intrinsic Select into corresponding V_CMP instruction based on CmpInst predicate, stored as immediate, in last operand. Differential Revision: https://reviews.llvm.org/D82652	2020-06-30 10:57:41 +02:00
Petar Avramovic	4b980cc9ca	[GlobalISel][InlineAsm] Add support for matching input constraints Find def operand that corresponds to matching constraint and tie input to that operand. Differential Revision: https://reviews.llvm.org/D82651	2020-06-30 10:49:05 +02:00
Christudasan Devadasan	226cda58d5	[AMDGPU] Moving SI_RETURN_TO_EPILOG handling out of SIInsertSkips. For now, moving it to SIPreEmitPeephole. Should find a right place to have this code. Reviewed By: nhaehnle Differential revision: https://reviews.llvm.org/D77544	2020-06-29 20:41:53 +05:30
Matt Arsenault	d0b0b252e1	AMDGPU: Use IsSSA property check instead of asserting on isSSA Also fix an SSA violation in a test the MIRParser/verifier fails to catch. It's illegal to define a subregister in SSA. For the purpose of the test, it just needs to define the super-register to use the subregister in the use operand.	2020-06-29 10:05:23 -04:00
Fangrui Song	4cd19a6e15	[BasicAA] Rename -disable-basicaa to -disable-basic-aa to be consistent with the canonical name "basic-aa"	2020-06-26 20:55:44 -07:00
Fangrui Song	f31811f2dc	[BasicAA] Rename deprecated -basicaa to -basic-aa Follow-up to D82607 Revert an accidental change (empty.ll) of D82683	2020-06-26 20:41:37 -07:00
Matt Arsenault	443556c18f	AMDGPU/GlobalISel: Fix some legalization of < dword vector stores This avoids many instances of failing to legalize a vector truncstore of <4 x s8> to 2 bytes. We don't perfectly handle every truncstore yet, largely because the given set of legalization actions can't actually differentiate between changing the result type and changing the memory type.	2020-06-26 18:07:39 -04:00
Matt Arsenault	9e03bdebc1	AMDGPU: Add llvm.amdgcn.sqrt intrinsic I spread the GlobalISel test into the regular one, which I've been avoiding so far.	2020-06-26 15:07:07 -04:00
Matt Arsenault	431daedee4	AMDGPU/GlobalISel: Fix legacy clover kernel argument ABI This had an extra attempt to align the pointer, which only did anything with a base kernel argument offset which only clover used to use.	2020-06-26 10:03:05 -04:00
Matt Arsenault	54573528ae	AMDGPU/GlobalISel: Add baseline checks for legacy clover kernel ABI I'm not sure we actually need to support this now, since I think clover always explicitly uses amdgcn-mesa-mesa3d now, not the ill-defined amdgcn-- behavior.	2020-06-26 10:03:05 -04:00
Matt Arsenault	b1cfa64cb1	AMDGPU/GlobalISel: Uncomment some fixed tests	2020-06-26 10:03:05 -04:00
Piotr Sobczak	0045786f14	[AMDGPU] Select s_cselect Summary: Add patterns to select s_cselect in the isel. Handle more cases of implicit SCC accesses in si-fix-sgpr-copies to allow new patterns to work. Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, asbirlea, kerbowa, llvm-commits Tags: #llvm Re-commit D81925 with a bugfix D82370. Differential Revision: https://reviews.llvm.org/D81925 Differential Revision: https://reviews.llvm.org/D82370	2020-06-25 10:38:23 +02:00
dstuttar	e8775c8d81	[AMDGPU] Make sure to fix implicit operands on insertBranch Summary: Without fixImplicitOperands we may end up creating default implicit operands that are the wrong wave size Includes simple test that provokes insertBranch in the correct way to expose the issue being fixed. Change-Id: I92bdcdee9fcb7b4d91529b84e76a48ac8218483e Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, tpr, t-tye, hiraditya, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D82459	2020-06-24 16:50:48 +01:00
Matt Arsenault	a448670752	AMDGPU/GlobalISel: Legalize 64-bit G_SDIV/G_SREM Now all the divisions should be complete, although we should fix emitting the entire common part for div/rem when you use both.	2020-06-24 11:39:45 -04:00
Matt Arsenault	778351df77	Revert "[AMDGPU] Enable compare operations to be selected by divergence" This reverts commit `521ac0b5ce`. Reported to break thousands of piglit tests.	2020-06-24 11:21:30 -04:00
Tim Corringham	c3b3b999ec	[AMDGPU] Avoid redundant mode register writes Summary: The SIModeRegister pass attempts to generate the minimal number of writes to the mode register. However it was failing to correctly deal with some loops, resulting in some redundant setreg instructions being inserted. This change amends the pass to avoid generating these redundant instructions. Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D82215	2020-06-24 14:11:29 +01:00
alex-t	521ac0b5ce	[AMDGPU] Enable compare operations to be selected by divergence Summary: Details: This patch enables SETCC to be selected to S_CMP_* if uniform and V_CMP_* if divergent. Reviewers: rampitec, arsenm Reviewed By: rampitec Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D82194	2020-06-24 11:50:40 +03:00
Matt Arsenault	a162048a47	AMDGPU/GlobalISel: Fix fixed ABI special VGPR function arguments I forgot to copy the new fixed function ABI into GlobalISel, so this was mismatched with the DAG compiled calling function. This was allocating part of the argument list to v31, which was supposed to be reserved for the workitem IDs.	2020-06-23 21:21:35 -04:00
Your Name	cc9d693856	[AMDGPU/MemOpsCluster] Implement new heuristic for computing max mem ops cluster size Summary: Make use of both the - (1) clustered bytes and (2) cluster length, to decide on the max number of mem ops that can be clustered. On an average, when loads are dword or smaller, consider `5` as max threshold, otherwise `4`. This heuristic is purely based on different experimentation conducted, and there is no analytical logic here. Reviewers: foad, rampitec, arsenm, vpykhtin Reviewed By: rampitec Subscribers: llvm-commits, kerbowa, hiraditya, t-tye, Anastasia, tpr, dstuttard, yaxunl, nhaehnle, wdng, jvesely, kzhuravl, thakis Tags: #llvm Differential Revision: https://reviews.llvm.org/D82393	2020-06-24 00:39:41 +05:30

1 2 3 4 5 ...

3672 Commits