clang-p2996

Author	SHA1	Message	Date
Mirko Brkusanin	cf40db21af	[AMDGPU][GlobalISel] Fix G_AMDGPU_TBUFFER_STORE_FORMAT mapping Add missing mappings and tablegen definitions for TBUFFER_STORE_FORMAT. Differential Revision: https://reviews.llvm.org/D83240	2020-07-10 11:32:32 +02:00
Matt Arsenault	fdde69aac9	AMDGPU/GlobalISel: Work around verifier error in test The unfortunate split between finalizeLowering and the selector pass means there's a point where the verifier fails. The DAG selector pass skips the verifier, but this seems to not work when using the GlobalISel fallback.	2020-07-09 10:24:16 -04:00
Matt Arsenault	74a148ad39	GlobalISel: Verify G_BITCAST changes the type Updated the AArch64 tests the best I could with my vague, inferred understanding of AArch64 register banks. As far as I can tell, there is only one 32-bit/64-bit type which will use the gpr register bank, so we have to use the fpr bank for the other operand.	2020-07-08 17:16:27 -04:00
Jay Foad	a8816ebee0	[AMDGPU] Fix and simplify AMDGPULegalizerInfo::legalizeUDIV_UREM32Impl Use the algorithm from AMDGPUCodeGenPrepare::expandDivRem32. Differential Revision: https://reviews.llvm.org/D83383	2020-07-08 19:14:49 +01:00
Jay Foad	f4bd01c191	[AMDGPU] Fix and simplify AMDGPUCodeGenPrepare::expandDivRem32 Fix the division/remainder algorithm by adding a second quotient refinement step, which is required in some cases like 0xFFFFFFFFu / 0x11111111u (https://bugs.llvm.org/show_bug.cgi?id=46212). Also document, rewrite and simplify it by ensuring that we always have a lower bound on inv(y), which simplifies the UNR step and the quotient refinement steps. Differential Revision: https://reviews.llvm.org/D83381	2020-07-08 19:14:48 +01:00
Matt Arsenault	23157f3bdb	GlobalISel: Handle EVT argument lowering correctly handleAssignments was assuming every argument type is an MVT, and assignArg would always fail. This fixes one of the hacks in the current AMDGPU calling convention code that pre-processes the arguments.	2020-07-07 16:36:14 -04:00
Matt Arsenault	42bb481442	AMDGPU/GlobalISel: Fix skipping unused kernel arguments The tests in `a5b9ad7e9a` actually failed the verifier, which for some reason is not the default. Also add tests for 0-sized function arguments, which do not add entries to the expected register lists.	2020-07-07 16:36:13 -04:00
Matt Arsenault	a5b9ad7e9a	AMDGPU/GlobalISel: Don't emit code for unused kernel arguments	2020-07-06 09:04:06 -04:00
Matt Arsenault	581f1823cd	AMDGPU/GlobalISel: Fix hardcoded register number checks in test	2020-07-06 09:01:59 -04:00
Matt Arsenault	bcff3deaa1	AMDGPU/GlobalISel: Add some missing return tests	2020-07-06 09:01:18 -04:00
Jay Foad	6f1694759c	[AMDGPU] Fix formatting in MIR tests	2020-07-02 10:27:34 +01:00
Petar Avramovic	4b9ae1b7e5	AMDGPU/GlobalISel: Select init_exec intrinsic Change imm with timm in pattern for SI_INIT_EXEC_LO and remove regbank mappings for non register operands. Differential Revision: https://reviews.llvm.org/D82885	2020-07-01 11:50:59 +02:00
Matt Arsenault	291ece0efa	AMDGPU/GlobalISel: Remove some selection tests which should be invalid These use undef generic virtual register operands, which should be rejected by the verifier.	2020-06-30 19:18:01 -04:00
Petar Avramovic	d717382633	AMDGPU/GlobalISel: Select icmp intrinsic Select into corresponding V_CMP instruction based on CmpInst predicate, stored as immediate, in last operand. Differential Revision: https://reviews.llvm.org/D82652	2020-06-30 10:57:41 +02:00
Petar Avramovic	4b980cc9ca	[GlobalISel][InlineAsm] Add support for matching input constraints Find def operand that corresponds to matching constraint and tie input to that operand. Differential Revision: https://reviews.llvm.org/D82651	2020-06-30 10:49:05 +02:00
Matt Arsenault	443556c18f	AMDGPU/GlobalISel: Fix some legalization of < dword vector stores This avoids many instances of failing to legalize a vector truncstore of <4 x s8> to 2 bytes. We don't perfectly handle every truncstore yet, largely because the given set of legalization actions can't actually differentiate between changing the result type and changing the memory type.	2020-06-26 18:07:39 -04:00
Matt Arsenault	431daedee4	AMDGPU/GlobalISel: Fix legacy clover kernel argument ABI This had an extra attempt to align the pointer, which only did anything with a base kernel argument offset which only clover used to use.	2020-06-26 10:03:05 -04:00
Matt Arsenault	54573528ae	AMDGPU/GlobalISel: Add baseline checks for legacy clover kernel ABI I'm not sure we actually need to support this now, since I think clover always explicitly uses amdgcn-mesa-mesa3d now, not the ill-defined amdgcn-- behavior.	2020-06-26 10:03:05 -04:00
Matt Arsenault	b1cfa64cb1	AMDGPU/GlobalISel: Uncomment some fixed tests	2020-06-26 10:03:05 -04:00
Matt Arsenault	a448670752	AMDGPU/GlobalISel: Legalize 64-bit G_SDIV/G_SREM Now all the divisions should be complete, although we should fix emitting the entire common part for div/rem when you use both.	2020-06-24 11:39:45 -04:00
Matt Arsenault	a162048a47	AMDGPU/GlobalISel: Fix fixed ABI special VGPR function arguments I forgot to copy the new fixed function ABI into GlobalISel, so this was mismatched with the DAG compiled calling function. This was allocating part of the argument list to v31, which was supposed to be reserved for the workitem IDs.	2020-06-23 21:21:35 -04:00
Your Name	cc9d693856	[AMDGPU/MemOpsCluster] Implement new heuristic for computing max mem ops cluster size Summary: Make use of both the - (1) clustered bytes and (2) cluster length, to decide on the max number of mem ops that can be clustered. On an average, when loads are dword or smaller, consider `5` as max threshold, otherwise `4`. This heuristic is purely based on different experimentation conducted, and there is no analytical logic here. Reviewers: foad, rampitec, arsenm, vpykhtin Reviewed By: rampitec Subscribers: llvm-commits, kerbowa, hiraditya, t-tye, Anastasia, tpr, dstuttard, yaxunl, nhaehnle, wdng, jvesely, kzhuravl, thakis Tags: #llvm Differential Revision: https://reviews.llvm.org/D82393	2020-06-24 00:39:41 +05:30
Matt Arsenault	db777eaea3	AMDGPU/GlobalISel: Fix asserts on non-s32 sitofp/uitofp sources The combine to form cvt_f32_ubyte0 was assuming the source type was always 32-bit, but this needs to tolerate any legal source type.	2020-06-23 10:00:35 -04:00
Carl Ritson	8f3b2c8aa3	AMDGPU/GlobalISel: Remove selection of MAD/MAC when not available Add code to respect mad-mac-f32-insts target feature. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D81990	2020-06-19 10:30:19 +09:00
Matt Arsenault	3b34f3fcca	AMDGPU/GlobalISel: Fix obvious bug in ported 32-bit udiv/urem This was hidden by the IR expansion in AMDGPUCodeGenPrepare, which I forgot to turn off.	2020-06-16 22:46:35 -04:00
Matt Arsenault	c5c58fd6b5	AMDGPU: Remove intermediate DAG node for trig_preop intrinsic We weren't doing anything with this, and keeping it would just add more boilerplate for GlobalISel.	2020-06-16 21:06:25 -04:00
Stanislav Mekhanoshin	9ee272f13d	[AMDGPU] Add gfx1030 target Differential Revision: https://reviews.llvm.org/D81886	2020-06-15 16:18:05 -07:00
Matt Arsenault	1a7f115dce	AMDGPU/GlobalISel: Extend load/store workaround to i128 vectors	2020-06-15 14:55:11 -04:00
Matt Arsenault	362eedcbb4	AMDGPU/GlobalISel: Correct memory size in test	2020-06-15 14:12:28 -04:00
Matt Arsenault	2ca552322c	AMDGPU/GlobalISel: Fix 8-byte aligned, 96-bit scalar loads These are legal since we can do a 96-bit load on some subtargets, but this is only for vector loads. If we can't widen the load, it needs to be broken down once known scalar. For 16-byte alignment, widen to a 128-bit load.	2020-06-15 11:33:16 -04:00
Matt Arsenault	dae9554b2b	AMDGPU/GlobalISel: Workaround some load/store type selection patterns The logic is written for what loads/stores should be selectable. There are a set of cases that should be selectable, but due to missing MVTs and/or selection patterns, will fail to select. I think eventually load/store select patterns should ignore the type and only look at the value size, but until that happens, bitcast these to equivalent i32 vectors.	2020-06-15 07:42:20 -04:00
Matt Arsenault	96229606f9	AMDGPU/GlobalISel: Use less artifical example to avoid abort=0 These were failing due to an unlegalizable G_CONCAT_VECTORS due to registers with types that are naturally illegal.	2020-06-15 07:37:15 -04:00
Matt Arsenault	33e9086501	GlobalISel: Support lowering vector->vector G_BITCAST Extract subvectors and cast to the result element type before remerging.	2020-06-15 07:36:30 -04:00
Matt Arsenault	fb51d508ee	AMDGPU/GlobalISel: Select general case for G_PTRMASK	2020-06-14 13:12:29 -04:00
Matt Arsenault	350ee7fb3f	GlobalISel: Fix not erasing old instruction in sitofp/uitofp lowering	2020-06-12 10:33:23 -04:00
Sebastian Neubauer	29a6ad94fd	[AMDGPU] Add G16 support to image instructions Add G16 feature for GFX10 and support A16 and G16 in GlobalISel. Differential Revision: https://reviews.llvm.org/D76836	2020-06-12 11:26:31 +02:00
Matt Arsenault	7d913becfc	AMDGPU/GlobalISel: Fix select of private <2 x s16> load	2020-06-11 19:25:25 -04:00
Matt Arsenault	27f8bd94cb	AMDGPU/GlobalISel: Fix select of <8 x s64> scalar load	2020-06-11 19:09:43 -04:00
Matt Arsenault	2247072b65	AMDGPU/GlobalISel: Set insert point when emitting control flow pseudos This was implicitly assuming the branch instruction was the next after the pseudo. It's possible for another non-terminator instruction to be inserted between the intrinsic and the branch, so adjust the insertion point. Fixes a non-terminator after terminator verifier error (which without the verifier, manifested itself as an infinite loop in analyzeBranch much later on).	2020-06-11 18:53:26 -04:00
Petar Avramovic	bd3d951b8b	AMDGPU/GlobalISel: Fix lower for f64->f16 G_FPTRUNC Put AND before ADD in LegalizerHelper::lowerFPTRUNC_F64_TO_F16 in order to match algorithm from AMDGPUTargetLowering::LowerFP_TO_FP16. Differential Revision: https://reviews.llvm.org/D81666	2020-06-11 18:19:27 +02:00
Matt Arsenault	19b3b886b7	AMDGPU/GlobalISel: Fix porting error in 32-bit division The baffling thing is this passed the OpenCL conformance test for 32-bit integer divisions, but only failed in the 32-bit path of BypassSlowDivisions for the 64-bit tests.	2020-06-10 21:48:58 -04:00
Stanislav Mekhanoshin	09d325b20c	AMDGPU/GlobalISel: cmp/select method for insert element Differential Revision: https://reviews.llvm.org/D80754	2020-06-10 13:12:54 -07:00
Matt Arsenault	ea1bd95411	AMDGPU/GlobalISel: Make G_IMPLICIT_DEF legality more consistent Makes <6 x s16> legal, <4 x s8> illegal, and clamps the maximum size to 1024.	2020-06-10 11:05:59 -04:00
Matt Arsenault	44b355f34b	AMDGPU/GlobalISel: Add new baseline tests for bitcast legalization	2020-06-09 15:46:53 -04:00
hsmahesha	7410571ce9	Revert "[AMDGPU/MemOpsCluster] Implement new heuristic for computing max mem ops cluster size" This reverts commit `40a632a335`.	2020-06-09 19:27:17 +05:30
hsmahesha	40a632a335	[AMDGPU/MemOpsCluster] Implement new heuristic for computing max mem ops cluster size Summary: Make use of both the - (1) clustered bytes and (2) cluster length, to decide on the max number of mem ops that can be clustered. On an average, when loads are dword or smaller, consider `5` as max threshold, otherwise `4`. This heuristic is purely based on different experimentation conducted, and there is no analytical logic here. Reviewers: foad, rampitec, arsenm, vpykhtin Reviewed By: foad, rampitec Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, Anastasia, t-tye, hiraditya, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D81085	2020-06-09 14:09:14 +05:30
Matt Arsenault	67b700480b	AMDGPU/GlobalISel: Precommit regenerated check lines The update_*test_checks scripts miss new stuff added at the end of lines. Regenerate checks so the new mode register operands don't show up in the diff of a future patch.	2020-06-08 12:47:45 -04:00
Matt Arsenault	38fb446fc7	AMDGPU/GlobalISel: Fix test failure in release build The annoying behavior where the output is different due to the legality check struck again, plus the subtarget predicate wasn't really correctly set for DS FP atomics. Some of the FP min/max instructions seem to be in the gfx6/gfx7 manuals, but IIRC this might have been one of the cases where the manual got ahead of the actual hardware support, but I've left these as-is for now since the assembler tests seem to expect them.	2020-06-06 11:01:18 -04:00
Matt Arsenault	bc20bdb9f9	AMDGPU/GlobalISel: Start rewriting load/store legality rules The current set is an incomprehensible mess riddled with ordering hacks for various limitations in the legalizer at the time of writing, many of which have been fixed. This takes a very small step in correcting this. The core first change is to start checking for fully legal cases first, rather than trying to figure out all of the actions that could need to be performed. It's recommended to check the legal cases first for faster legality checks in the common case. This still has a table listing some common cases, but it needs measuring whether this really helps or not. More significantly, stop trying to allow any arbitrary type with a legal bitwidth as a legal memory type, and start using the bitcast legalize action for them. Allowing loads of these weird vector types produced new burdens we don't need for handling all of the legalization artifacts. Unlike the SelectionDAG handling, this is still not casting 64 or 16-bit element vectors to 32-bit vectors. These cases should still be handled by increasing/decreasing the number of 16-bit elements. This is primarily to fix 8-bit element vectors. Another change is to stop trying to handle the load-widening based on a higher alignment. We should still do this, but the way it was handled wasn't really correct. We really need to modify the MMO's size at the same time, and not just increase the result type. The LegalizerHelper does not do this, and I think this would really require a separate WidenMemory action (or to add a memory action payload to the LegalizeMutation). These will now fail to legalize. The structure of the legalizer rules makes writing concise rules here difficult. It would be easier if the same function could answer the query the query, and report the action to perform at the same time. Instead these two are split into distinct predicate and action functions. This is mostly tolerable for other cases, but the load/store rules get pretty complicated so it's difficult to keep two versions of these functions in sync.	2020-06-06 09:59:46 -04:00
Stanislav Mekhanoshin	5d62606f90	AMDGPU/GlobalISel: cmp/select method for extract element Differential Revision: https://reviews.llvm.org/D80749	2020-06-05 12:57:40 -07:00

1 2 3 4 5 ...

953 Commits