clang-p2996

Author	SHA1	Message	Date
QingShan Zhang	2b59e9f1bd	[DAGCombine] Remove the getNegatibleCost to avoid the out of sync with getNegatedExpression We have the getNegatibleCost/getNegatedExpression to evaluate the cost and negate the expression. However, during negating the expression, the cost might change as we are changing the DAG, and then, hit the assertion if we negated the wrong expression as the cost is not trustful anymore. This patch is target to remove the getNegatibleCost to avoid the out of sync with getNegatedExpression, and check the cost during negating the expression. It also reduce the duplicated code between getNegatibleCost and getNegatedExpression. And fix the crash for the test in D76638 Reviewed By: RKSimon, spatel Differential Revision: https://reviews.llvm.org/D77319	2020-05-20 02:12:16 +00:00
Stanislav Mekhanoshin	50f3bb1329	[AMDGPU] Fixed selection error for 64 bit extract_subvector Differential Revision: https://reviews.llvm.org/D80155	2020-05-18 14:17:59 -07:00
Stanislav Mekhanoshin	9d4cf5bd42	[AMDGPU] Make v16f64/v16i64 legal This allows indirect VGPR addressing to work. Differential Revision: https://reviews.llvm.org/D79960	2020-05-14 14:46:55 -07:00
Stanislav Mekhanoshin	184b383457	Add v16f64 value type We need to use it to handle <16 x double> indirect indexes in the AMDGPU BE. The only visible change from adding it is in ARM cost model. To me it looks reasonable. With doubling a vector size it quadruples the cost up to the size 8 and then it did only double it. Now it also quadruples, which seems a logical progression to me. Actual AMDGPU code is to follow, this is a common part, plus load/store legalization in the AMDGPU BE not to break what works now. Differential Revision: https://reviews.llvm.org/D79952	2020-05-14 14:28:00 -07:00
Matt Arsenault	704b539f65	AMDGPU: Use Register	2020-05-13 15:31:54 -04:00
Stanislav Mekhanoshin	71ed66d97f	[AMDGPU] Make v4i64/v4f64/v8i64/v8f64 legal We can produce such vectors in the Promote Alloca pass, but we are unable to use movrel to operate it and lower via scratch. Making it legal makes SI_INDIRECT patterns work. There is more work to do in subsequent changes: 1. We initialize m0 twice to access each dword. It shall be possible to only do it once and increment base register number instead. 2. We also need v16i64/v16f64 but these first need to be added to tablegen. Differential Revision: https://reviews.llvm.org/D79808	2020-05-12 16:05:12 -07:00
Sam McCall	728cf6d86b	Revert "[DAGCombine] Remove the getNegatibleCost to avoid the out of sync with getNegatedExpression" This reverts commit `3c44c441db`. Causes infloops on some inputs, see https://reviews.llvm.org/D77319 for repro	2020-05-11 16:44:01 +02:00
QingShan Zhang	3c44c441db	[DAGCombine] Remove the getNegatibleCost to avoid the out of sync with getNegatedExpression We have the getNegatibleCost/getNegatedExpression to evaluate the cost and negate the expression. However, during negating the expression, the cost might change as we are changing the DAG, and then, hit the assertion if we negated the wrong expression as the cost is not trustful anymore. This patch is target to remove the getNegatibleCost to avoid the out of sync with getNegatedExpression, and check the cost during negating the expression. It also reduce the duplicated code between getNegatibleCost and getNegatedExpression. And fix the crash for the test in D76638 Reviewed By: RKSimon, spatel Differential Revision: https://reviews.llvm.org/D77319	2020-05-11 02:41:10 +00:00
Matt Arsenault	f463792506	AMDGPU: Remove custom node for RSQ_LEGACY Directly select from the intrinsic. This wasn't getting much value from the custom node.	2020-04-17 19:50:36 -04:00
Matt Arsenault	5660bb6bc9	AMDGPU: Remove denormal subtarget features Switch to using the denormal-fp-math/denormal-fp-math-f32 attributes.	2020-04-02 17:17:12 -04:00
Matt Arsenault	2ad5fc1d91	AMDGPU/GlobalISel: Implement computeNumSignBitsForTargetInstr	2020-03-23 15:02:30 -04:00
Matt Arsenault	84386b2d8a	AMDGPU: Drop special case f64 fround lowering The result is better if ftrunc is emitted and separately legalized when unavailable.	2020-03-16 12:09:30 -04:00
Simon Pilgrim	e91feeed21	[AMDGPU] Add ISD::FSHR -> ALIGNBIT support This patch allows ISD::FSHR(i32) patterns to lower to ALIGNBIT instructions. This improves test coverage of ISD::FSHR matching - x86 has both FSHL/FSHR instructions and we prefer FSHL by default. Differential Revision: https://reviews.llvm.org/D76070	2020-03-12 20:16:57 +00:00
Matt Arsenault	1e0c540360	AMDGPU: Don't hard error on LDS globals in functions Instead, emit a trap and a warning. We force inlining of this situation, so any function where this happens should be dead as indirect or external calls are not yet supported. This should avoid erroring on dead code.	2020-03-11 15:34:11 -04:00
Matt Arsenault	156a1b59df	AMDGPU: Make signext/zeroext behave more sensibly over > i32 Interpret these as extending to the next multiple of 32-bits. This had no effect with i48 for example, which is really split into {i32, i16}, which should extend the high part.	2020-03-09 12:56:10 -07:00
Simon Pilgrim	6085593c12	[AMDGPU] simplifyI24 - replace GetDemandedBits with SimplifyMultipleUseDemandedBits GetDemandedBits mostly just calls SimplifyMultipleUseDemandedBits now, but it does a very blunt constant simplification that SimplifyMultipleUseDemandedBits avoids. If we need to demand bits from constants we should handle this through ShrinkDemandedConstant/targetShrinkDemandedConstant. @arsenm confirmed that the sign extended immediates are better for code size. Differential Revision: https://reviews.llvm.org/D74857	2020-02-20 12:03:08 +00:00
Matt Arsenault	4bb0c8f91c	AMDGPU: Enable integer division bypass We probably want this, and I've meant to turn this on for a long time. SC actually emits a special case to early-out for a 1 denominator, which perhaps should also be considered.	2020-02-19 17:50:19 -05:00
Simon Pilgrim	9eb426c88c	[TargetLowering] Add NegatibleCost enum for isNegatibleForFree return codes The isNegatibleForFree/getNegatedExpression methods currently rely on a raw char value to indicate whether a negation is beneficial or not. This patch replaces the char return value with an NegatibleCost enum to more clearly demonstrate what is implied. It also renames isNegatibleForFree to getNegatibleCost to more accurately reflect whats going on. Differential Revision: https://reviews.llvm.org/D74221	2020-02-12 11:51:42 +00:00
Matt Arsenault	f734ce0488	AMDGPU: Fix crash on v3i15 kernel arguments This was split into 3 i15 arguments. The i15 piece needs to be rounded to a simple MVT for the memory type.	2020-02-11 18:11:39 -05:00
Stanislav Mekhanoshin	453a8f3af7	[AMDGPU] Remove AMDGPURegisterInfo R600 and GCN do not have anything in common in terms of register file organization anymore. Differential Revision: https://reviews.llvm.org/D74426	2020-02-11 11:13:38 -08:00
Matt Arsenault	00115d767f	AMDGPU: Remove dead kill handling At one point a custom node was used for kill handling, but now the intrinsic is directly selected. Remove leftover pattern machinery.	2020-02-09 17:59:24 -05:00
Austin Kerbow	0f116fd9d8	[AMDGPU] Fix infinite loop with fma combines https://reviews.llvm.org/D72312 introduced an infinite loop which involves DAGCombiner::visitFMA and AMDGPUTargetLowering::performFNegCombine. fma( a, fneg(b), fneg(c) ) => fneg( fma (a, b, c) ) => fma( a, fneg(b), fneg(c) ) ... This only breaks with types where 'isFNegFree' returns flase, e.g. v4f32. Reproducing the issue also needs the attribute 'no-signed-zeros-fp-math', and no source mods allowed on one of the users of the Op. This fix makes changes to indicate that it is not free to negate a fma if it has users with source mods. Differential Revision: https://reviews.llvm.org/D73939	2020-02-04 13:11:09 -08:00
Matt Arsenault	1024b73ef5	AMDGPU: Split denormal mode tracking bits Prepare to accurately track the future denormal-fp-math attribute changes. The way to actually set these separately is not wired in yet. This is just a mechanical change, and mostly still assumes the input and output mode match. This should be refined for some cases. For example, fcanonicalize lowering should use the flushing variant if either input or output flushing is enabled	2020-02-04 10:44:21 -08:00
Matt Arsenault	68b102b97a	AMDGPU: Directly select 16-bank LDS case of llvm.amdgcn.interp.p1.f16 Manually select this is as a tablegen workraound. Both SelectionDAG and GlobalISel end up misplacing the copy to m0 when both instructions in the output need it. Neither considers that both output instructions depend on m0. I don't know of any other pattern we need to handle this case, so it's less effort to just workaround this for now.	2020-01-29 08:24:31 -08:00
Stanislav Mekhanoshin	44b865fa7f	[AMDGPU] Allow narrowing muti-dword loads Currently BE allows only a little load narrowing because of the fear it will produce sub-dword ext loads. However, we can always allow narrowing if we are shrinking one multi-dword load to another multi-dword load. In particular we were unable to reduce s_load_dwordx8 into s_load_dwordx4 if identity shuffle was used to extract low 4 dwords. Differential Revision: https://reviews.llvm.org/D73133	2020-01-24 11:03:41 -08:00
Michael Liao	6d0d86a64d	[DAG] Add helper for creating constant vector index with correct type. NFC.	2020-01-18 01:23:36 -05:00
Matt Arsenault	eef92f25cc	AMDGPU: Remove custom node for exports I'm mildly worried about potentially reordering exp/exp_done with IntrWriteMem on the intrinsic. Requires hacking out the illegal type on SI, so manually select that case during lowering.	2020-01-15 18:33:15 -05:00
Matt Arsenault	68e70fb098	AMDGPU: Fix not using v_cvt_f16_[iu]16 We weren't treating i16->f16 casts as legal on targets with these instructions, and always using a pair of casts through i32.	2020-01-07 15:10:07 -05:00
Craig Topper	787e078f3e	[TargetLowering][AMDGPU] Make scalarizeVectorLoad return a pair of SDValues instead of creating a MERGE_VALUES node. NFCI This allows us to clean up some places that were peeking through the MERGE_VALUES node after the call. By returning the SDValues directly, we can clean that up. Unfortunately, there are several call sites in AMDGPU that wanted the MERGE_VALUES and now need to create their own.	2019-12-30 19:36:04 -08:00
Matt Arsenault	9e1a2a668b	AMDGPU: Improve llvm.round.f64 lowering for CI+ The path already used for f16/f32 works a lot better when v_trunc_f64 is available.	2019-12-30 09:55:46 -05:00
Jay Foad	f8495017f0	Fix whitespace.	2019-12-16 10:42:34 +00:00
Jay Foad	4f17b1784e	Fix for AMDGPU MUL_I24 known bits calculation Summary: At present, the code calculating known bits of AMDGPU MUL_I24 confuses the concepts of "non-negative number" and "positive number". In some situations, it results in incorrect code. I have a case where the optimizer replaces the result of calculating MUL_I24(-5, 0) with -8. Reviewers: foad, arsenm Reviewed By: arsenm Subscribers: foad, arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Patch by Eugene Kuznetsov. Differential Revision: https://reviews.llvm.org/D70367	2019-12-16 10:25:57 +00:00
Alex Richardson	be15dfa88f	[NFC] Use EVT instead of bool for getSetCCInverse() Summary: The use of a boolean isInteger flag (generally initialized using VT.isInteger()) caused errors in our out-of-tree CHERI backend (https://github.com/CTSRD-CHERI/llvm-project). In our backend, pointers use a separate ValueType (iFATPTR) and therefore .isInteger() returns false. This meant that getSetCCInverse() was using the floating-point variant and generated incorrect code for us: `(void )0x12033091e < (void )0xffffffffffffffff` would return false. Committing this change will significantly reduce our merge conflicts for each upstream merge. Reviewers: spatel, bogner Reviewed By: bogner Subscribers: wuzish, arsenm, sdardis, nemanjai, jvesely, nhaehnle, hiraditya, kbarton, jrtc27, atanasyan, jsji, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70917	2019-12-13 12:22:03 +00:00
Matt Arsenault	db0ed3e429	AMDGPU: Refactor treatment of denormal mode Start moving towards treating this as a property of the calling convention, and not the subtarget. The default denormal mode should not be part of the subtarget, and be moved into a separate function attribute. This patch is still NFC. The denormal mode remains as a subtarget feature for now, but make the necessary changes to switch to using an attribute.	2019-11-19 19:55:43 +05:30
Matt Arsenault	31479d868e	AMDGPU: Change boolean content type to 0 or 1 The usage of target boolean checks is overly inflexible, since sext and zext of a compare are equally cheap. The choice is arbitrary, but using 0/1 to some degree is the choice of lower resistance since that's what most targets use. This enables a few combines that don't bother to support ZeroOrNegativeOneBooleanContent.	2019-11-15 13:43:47 +05:30
Matt Arsenault	e16a71382d	AMDGPU: Select global atomicrmw fadd This only works if there is no use of the return value.	2019-11-06 16:06:38 -08:00
Michael Liao	4531aee2ac	[amdgpu] Fix known bits compuation on `MUL_I24`/`MUL_U24`. Reviewers: arsenm, rampitec Subscribers: kzhuravl, jvesely, wdng, nhaehnle, dstuttard, tpr, t-tye, hiraditya, llvm-commits, yaxunl Tags: #llvm Differential Revision: https://reviews.llvm.org/D69735	2019-11-01 17:06:17 -04:00
Matt Arsenault	ef9a0278f0	AMDGPU: Select basic interp directly from intrinsics llvm-svn: 375457	2019-10-21 21:49:44 +00:00
Guillaume Chatelet	b65fa48305	[Alignment] Migrate Attribute::getWith(Stack)Alignment Summary: This is patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Reviewers: courbet, jdoerfert Reviewed By: courbet Subscribers: arsenm, jvesely, nhaehnle, hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D68792 llvm-svn: 374884	2019-10-15 12:56:24 +00:00
Matt Arsenault	4227c62bc7	AMDGPU: Move SelectFlatOffset back into AMDGPUISelDAGToDAG llvm-svn: 374495	2019-10-11 01:28:27 +00:00
Evandro Menezes	c57a9dc487	[AMDGPU] Use math constants defined in MathExtras (NFC) Use the the new math constants in `MathExtras.h`. Differential revision: https://reviews.llvm.org/D68285 llvm-svn: 374208	2019-10-09 20:00:43 +00:00
Thomas Raoux	3c8c667235	[TargetLowering] Make allowsMemoryAccess methode virtual. Rename old function to explicitly show that it cares only about alignment. The new allowsMemoryAccess call the function related to alignment by default and can be overridden by target to inform whether the memory access is legal or not. Differential Revision: https://reviews.llvm.org/D67121 llvm-svn: 372935	2019-09-26 00:16:01 +00:00
Simon Pilgrim	557cee337b	[AMDGPU] isSDNodeAlwaysUniform - silence static analyzer dyn_cast<LoadSDNode> null dereference warning. NFCI. The static analyzer is warning about a potential null dereference, but we should be able to use cast<LoadSDNode> directly and if not assert will fire for us. llvm-svn: 372528	2019-09-22 21:01:13 +00:00
Graham Hunter	1a9195d817	[SVE][MVT] Fixed-length vector MVT ranges * Reordered MVT simple types to group scalable vector types together. * New range functions in MachineValueType.h to only iterate over the fixed-length int/fp vector types. * Stopped backends which don't support scalable vector types from iterating over scalable types. Reviewers: sdesmalen, greened Reviewed By: greened Differential Revision: https://reviews.llvm.org/D66339 llvm-svn: 372099	2019-09-17 10:19:23 +00:00
Matt Arsenault	64ecca90d4	AMDGPU/GlobalISel: Implement LDS G_GLOBAL_VALUE Handle the simple case that lowers to a constant. llvm-svn: 371424	2019-09-09 17:13:44 +00:00
Matt Arsenault	acc9571406	AMDGPU: Remove pointless wrapper nodes for init.exec intrinsics llvm-svn: 371364	2019-09-09 05:49:52 +00:00
Matt Arsenault	59ff77ee38	AMDGPU: Fix emitting multiple stack loads for stack passed workitems The same stack is loaded for each workitem ID, and each use. Nothing prevents you from creating multiple fixed stack objects with the same offsets, so this was creating a load for each unique frame index, despite them being the same offset. Re-use the same frame index so the loads are CSEable. llvm-svn: 371148	2019-09-05 23:40:14 +00:00
Matt Arsenault	ede9a5293d	AMDGPU: Remove unused custom node definition llvm-svn: 370603	2019-09-01 02:00:08 +00:00
Matt Arsenault	0a6564980b	AMDGPU: Combine directly on mul24 intrinsics The problem these are supposed to work around can occur before the intrinsics are lowered into the nodes. Try to directly simplify them so they are matched before the bit assert operations can be optimized out. llvm-svn: 369994	2019-08-27 00:18:09 +00:00
Craig Topper	3f59bfd5be	[MVT] Add v16f16 and v32f16 vectors. I might look at improving PR43065 which will require being able to mark a 256 and 512 bit vector of f16 as Legal. Differential Revision: https://reviews.llvm.org/D66515 llvm-svn: 369565	2019-08-21 19:14:48 +00:00

1 2 3 4 5 ...

388 Commits