clang-p2996

Author	SHA1	Message	Date
Tim Northover	d276d85309	MIR: update test for noVRegs removal. I think I hadn't git pulled recently enough to bring it in. llvm-svn: 304250	2017-05-30 22:02:19 +00:00
Tim Northover	fb26d9a286	MIR: remove explicit "noVRegs" property. We can infer this from the incoming MIR, so there's no reason to represent it with a special flag. llvm-svn: 304246	2017-05-30 21:28:57 +00:00
Stanislav Mekhanoshin	56ea488d8b	[AMDGPU] Allow SDWA in instructions with immediates and SGPRs An encoding does not allow to use SDWA in an instruction with scalar operands, either literals or SGPRs. That is however possible to copy these operands into a VGPR first. Several copies of the value are produced if multiple SDWA conversions were done. To cleanup MachineLICM (to hoist copies out of loops), MachineCSE (to remove duplicate copies) and SIFoldOperands (to replace SGPR to VGPR copy with immediate copy right to the VGPR) runs are added after the SDWA pass. Differential Revision: https://reviews.llvm.org/D33583 llvm-svn: 304219	2017-05-30 16:49:24 +00:00
Mark Searles	00ce96f6ee	[AMDGPU] Require waitcnt before barrier for all targets; adjust tests. Differential Revision: https://reviews.llvm.org/D33576 llvm-svn: 304217	2017-05-30 16:22:43 +00:00
Konstantin Zhuravlyov	b2ff8dfea0	Resubmit r303859 with test fixed. [AMDGPU] add intrinsic for s_getpc Summary: The s_getpc instruction is exposed as intrinsic llvm.amdgcn.s.getpc. Patch by Tim Corringham llvm-svn: 304031	2017-05-26 20:38:26 +00:00
Tom Stellard	dde28a8c92	AMDGPU/GlobalISel: Mark 32-bit float constants as legal Reviewers: arsenm Reviewed By: arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, igorb, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D33212 llvm-svn: 304003	2017-05-26 16:40:03 +00:00
Matthias Braun	1527baab0c	CodeGen: Rename DEBUG_TYPE to match passnames Rename the DEBUG_TYPE to match the names of corresponding passes where it makes sense. Also establish the pattern of simply referencing DEBUG_TYPE instead of repeating the passname where possible. llvm-svn: 303921	2017-05-25 21:26:32 +00:00
Nico Weber	b3d83a092a	Revert r303859, CodeGen/AMDGPU/llvm.amdgcn.s.getpc.ll fails on bots. llvm-svn: 303902	2017-05-25 19:19:29 +00:00
Tim Corringham	32d0d38679	[AMDGPU] add intrinsic for s_getpc Summary: The s_getpc instruction is exposed as intrinsic llvm.amdgcn.s.getpc. Reviewers: arsenm Reviewed By: arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye Differential Revision: https://reviews.llvm.org/D32862 llvm-svn: 303859	2017-05-25 14:04:14 +00:00
Simon Pilgrim	c910a70b21	[AMDGPU] Add INDIRECT_BASE_ADDR to R600_Reg32 class (PR33045) This fixes 17 of the 41 -verify-machineinstrs test failures identified in PR33045 Differential Revision: https://reviews.llvm.org/D33451 llvm-svn: 303691	2017-05-23 21:27:15 +00:00
Changpeng Fang	1dbace195d	AMDGPU/SI: Move the local memory usage related checking after calling convention checking in PromoteAlloca Summary: Promoting Alloca to Vector and Promoting Alloca to LDS are two independent handling of Alloca and should not affect each other. As a result, we should not give up promoting to vector if there is not enough LDS. This patch factors out the local memory usage related checking out and replace it after the calling convention checking. Reviewer: arsenm Differential Revision: http://reviews.llvm.org/D33139 llvm-svn: 303684	2017-05-23 20:25:41 +00:00
Stanislav Mekhanoshin	53a21292f8	[AMDGPU] Combine and (srl) into shl (bfe) Perform DAG combine: and (srl x, c), mask => shl (bfe x, nb + c, mask >> nb), nb Where nb is a number of trailing zeroes in mask. It replaces two instructions with two and BFE is generally a more expensive one. However this is only done if we are selecting a byte or word at an aligned boundary which results in a proper SDWA operand pattern. It is only done if SDWA is supported. TODO: improve SDWA pass to actually convert this pattern. It is not done now because we have an immediate in the instruction, which has be moved into a VGPR. Differential Revision: https://reviews.llvm.org/D33455 llvm-svn: 303681	2017-05-23 19:54:48 +00:00
Stanislav Mekhanoshin	a96ec3f360	[AMDGPU] Convert shl (add) into add (shl) shl (or\|add x, c2), c1 => or\|add (shl x, c1), (c2 << c1) This allows to fold a constant into an address in some cases as well as to eliminate second shift if the expression is used as an address and second shift is a result of a GEP. Differential Revision: https://reviews.llvm.org/D33432 llvm-svn: 303641	2017-05-23 15:59:58 +00:00
Stanislav Mekhanoshin	5fa289f0d8	[AMDGPU] Narrow lshl from 64 to 32 bit if possible Turn expensive 64 bit shift into 32 bit if shift does not overflow int: shl (ext x) => zext (shl x) Differential Revision: https://reviews.llvm.org/D33367 llvm-svn: 303569	2017-05-22 16:58:10 +00:00
Matthias Braun	420713c40b	Fix typo in test llvm-svn: 303436	2017-05-19 17:25:20 +00:00
Matthias Braun	d6e75ed93e	LiveIntervalAnalysis: Fix missing case in pruneSubRegValues() pruneSubRegValues() needs to remove subregister ranges starting at instructions that later get removed by eraseInstrs(). It missed to check one case in which eraseInstrs() would remove an instruction. Fixes http://llvm.org/PR32688 llvm-svn: 303396	2017-05-19 00:18:03 +00:00
Sam Kolton	ebfdaf7394	[AMDGPU] SDWA operands should not intersect with potential MIs Summary: There should be no intesection between SDWA operands and potential MIs. E.g.: ``` v_and_b32 v0, 0xff, v1 -> src:v1 sel:BYTE_0 v_and_b32 v2, 0xff, v0 -> src:v0 sel:BYTE_0 v_add_u32 v3, v4, v2 ``` In that example it is possible that we would fold 2nd instruction into 3rd (v_add_u32_sdwa) and then try to fold 1st instruction into 2nd (that was already destroyed). So if SDWAOperand is also a potential MI then do not apply it. Reviewers: vpykhtin, arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye Differential Revision: https://reviews.llvm.org/D32804 llvm-svn: 303347	2017-05-18 12:12:03 +00:00
Matt Arsenault	2b1f9aa577	AMDGPU: Start defining a calling convention Partially implement callee-side for arguments and return values. byval doesn't work properly, and most likely sret or other on-stack return values most as well. llvm-svn: 303308	2017-05-17 21:56:25 +00:00
Matt Arsenault	a53292779a	AMDGPU: Remove old intrinsic uses llvm-svn: 303305	2017-05-17 21:38:21 +00:00
Matt Arsenault	98f2946ab3	AMDGPU: Make better use of op_sel with high components Handle more general swizzles. llvm-svn: 303296	2017-05-17 20:30:58 +00:00
Matt Arsenault	786eeea23e	AMDGPU: Try to use op_sel when selecting packed instructions Avoids instructions to pack a vector when the source is really a scalar being broadcast. Also be smarter and look for per-component fneg. Doesn't yet handle scalar from upper half of register or other swizzles. llvm-svn: 303291	2017-05-17 20:00:00 +00:00
Matt Arsenault	ee324ffc1f	AMDGPU: Fix min3/max3 combines for f16/i16 Fix missing instruction definitions for min3/max3. llvm-svn: 303284	2017-05-17 19:25:06 +00:00
Nirav Dave	da8f221273	Elide stores which are overwritten without being observed. Summary: In SelectionDAG, when a store is immediately chained to another store to the same address, elide the first store as it has no observable effects. This is causes small improvements dealing with intrinsics lowered to stores. Test notes: * Many testcases overwrite store addresses multiple times and needed minor changes, mainly making stores volatile to prevent the optimization from optimizing the test away. * Many X86 test cases optimized out instructions associated with associated with va_start. * Note that test_splat in CodeGen/AArch64/misched-stp.ll no longer has dependencies to check and can probably be removed and potentially replaced with another test. Reviewers: rnk, john.brawn Subscribers: aemerson, rengolin, qcolombet, jyknight, nemanjai, nhaehnle, javed.absar, llvm-commits Differential Revision: https://reviews.llvm.org/D33206 llvm-svn: 303198	2017-05-16 19:43:56 +00:00
Dmitry Preobrazhensky	167f8b69e3	[AMDGPU][MC] Corrected several VI opcodes to avoid printing _e64 See bug 32936: https://bugs.llvm.org//show_bug.cgi?id=32936 Reviewers: artem.tamazov, vpykhtin Differential Revision: https://reviews.llvm.org/D33123 llvm-svn: 303070	2017-05-15 14:28:23 +00:00
Changpeng Fang	161e8c39af	AMDGPU/SI: Don't promote to vector if the load/store is volatile. Summary: We should not change volatile loads/stores in promoting alloca to vector. Reviewers: arsenm Differential Revision: http://reviews.llvm.org/D33107 llvm-svn: 302943	2017-05-12 20:31:12 +00:00
Tom Stellard	fab6b1af6e	AMDGPU: Add lit.local.cfg to disable global-isel tests when global-isel is disabled This should fix bots broken by r302919. llvm-svn: 302928	2017-05-12 17:59:30 +00:00
Tom Stellard	a0d67c748a	AMDGPU/GlobalISel: Mark 32-bit integer constants as legal Reviewers: arsenm Reviewed By: arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, igorb, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D33115 llvm-svn: 302919	2017-05-12 16:46:46 +00:00
Matt Arsenault	47ccafe787	AMDGPU: Remove tfe bit from flat instruction definitions We don't use it and it was removed in gfx9, and the encoding bit repurposed. Additionally actually using it requires changing the output register class, which wasn't done anyway. llvm-svn: 302814	2017-05-11 17:38:33 +00:00
Matt Arsenault	bf5482e4bb	AMDGPU: Pull fneg out of extract_vector_elt This allows folding source modifiers in more f16 cases. Makes it easier to select per-component packed neg modifiers. llvm-svn: 302813	2017-05-11 17:26:25 +00:00
Dmitry Preobrazhensky	da61a7f9ef	[AMDGPU][MC] Corrected v_madak/madmk to avoid printing "_e32" in disassembler output See bug 32927: https://bugs.llvm.org//show_bug.cgi?id=32927 Reviewers: vpykhtin, artem.tamazov, arsenm Differential Revision: https://reviews.llvm.org/D32913 llvm-svn: 302648	2017-05-10 13:00:28 +00:00
Kannan Narayanan	5e73b04b84	[AMDGPU] In the new waitcnt insertion pass, use getHeader instead of getTopBlock to find the loop header. Differential Revision: https://reviews.llvm.org/D32831 llvm-svn: 302290	2017-05-05 21:10:17 +00:00
Matthias Braun	8940114f61	MIParser/MIRPrinter: Compute block successors if not explicitely specified - MIParser: If the successor list is not specified successors will be added based on basic block operands in the block and possible fallthrough. - MIRPrinter: Adds a new `simplify-mir` option, with that option set: Skip printing of block successor lists in cases where the parser is guaranteed to reconstruct it. This means we still print the list if some successor cannot be determined (happens for example for jump tables), if the successor order changes or branch probabilities being unequal. Differential Revision: https://reviews.llvm.org/D31262 llvm-svn: 302289	2017-05-05 21:09:30 +00:00
Konstantin Zhuravlyov	6ccb076aeb	AMDGPU/AMDHSA: Set COMPUTE_PGM_RSRC2:LDS_SIZE to 0 This field is populated by the CP Differential Revision: https://reviews.llvm.org/D32619 llvm-svn: 302277	2017-05-05 20:13:55 +00:00
Marek Olsak	584d2c05d4	AMDGPU: GFX9 GS and HS shaders always have the scratch wave offset in SGPR5 Reviewers: arsenm, nhaehnle Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D32645 llvm-svn: 302200	2017-05-04 22:25:20 +00:00
Chad Rosier	84a238dd62	[DAGCombine] Transform (fadd A, (fmul B, -2.0)) -> (fsub A, (fadd B, B)). Differential Revision: http://reviews.llvm.org/D32596 llvm-svn: 302153	2017-05-04 14:14:44 +00:00
Matt Arsenault	5c80618fb7	AMDGPU: Don't promote alloca to LDS for leaf functions LDS use in leaf functions not currently handled. llvm-svn: 301958	2017-05-02 18:33:18 +00:00
Matt Arsenault	7b82b4bddb	AMDGPU: Make intrinsics speculatable llvm-svn: 301937	2017-05-02 16:57:44 +00:00
Matt Arsenault	2a80369ae4	AMDGPU: Fix copies from physical registers in SIFixSGPRCopies This would assert when there were multiple defs of a physical register. We just need to move all of the users of it. llvm-svn: 301730	2017-04-29 01:26:34 +00:00
Marek Olsak	2d82590f64	AMDGPU: Add new amdgcn.init.exec intrinsics v2: More tests, bug fixes, cosmetic changes. Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, llvm-commits, t-tye Differential Revision: https://reviews.llvm.org/D31762 llvm-svn: 301677	2017-04-28 20:21:58 +00:00
Konstantin Zhuravlyov	54ba4312a3	AMDGPU: Fix ValueKind code object metadata for images Differential Revision: https://reviews.llvm.org/D32504 llvm-svn: 301360	2017-04-25 20:38:26 +00:00
Matt Arsenault	4474652c95	Revert "StructurizeCFG: Directly invert cmp instructions" This reverts commit r300732. This breaks a few tests. I think the problem is related to adding more uses of the condition that don't yet exist at this point. llvm-svn: 301242	2017-04-24 20:25:01 +00:00
Matt Arsenault	0774ea267a	AMDGPU: Select scratch mubuf offsets when pointer is a constant In call sequence setups, there may not be a frame index base and the pointer is a constant offset from the frame pointer / scratch wave offset register. llvm-svn: 301230	2017-04-24 19:40:59 +00:00
Stanislav Mekhanoshin	bd5394be3d	[AMDGPU] Merge M0 initializations Merges equivalent initializations of M0 and hoists them into a common dominator block. Technically the same code can be used with any register, physical or virtual. Differential Revision: https://reviews.llvm.org/D32279 llvm-svn: 301228	2017-04-24 19:37:54 +00:00
Yaxun Liu	fd23a0c095	CodeGen: Add a hook for getFenceOperandTy Currently the operand type for ATOMIC_FENCE assumes value type of a pointer in address space 0. This is fine for most targets. However for amdgcn target, the size of pointer in address space 0 depends on triple environment. For amdgiz environment, it is 64 bit but for other environment it is 32 bit. On the other hand, amdgcn target expects 32 bit fence operands independent of the target triple environment. Therefore a hook is need in target lowering for getting the fence operand type. This patch has no effect on targets other than amdgcn. Differential Revision: https://reviews.llvm.org/D32186 llvm-svn: 301215	2017-04-24 18:26:27 +00:00
Matt Arsenault	3e02538a02	AMDGPU: Move trap lowering to DAG Fixes traps in any block besides the entry block, and fixes depending on a live-in physical register by using a virtual register copy. Also happens to stop emitting a nop in the case debug trap is not supported. llvm-svn: 301206	2017-04-24 17:49:13 +00:00
Nicolai Haehnle	5dea645138	AMDGPU: Move v_readlane lane select from VGPR to SGPR Summary: Fix a compiler bug when the lane select happens to end up in a VGPR. Clarify the semantic of the corresponding intrinsic to be that of the corresponding GLSL: the lane select must be uniform across a wave front, otherwise results are undefined. Reviewers: arsenm Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D32343 llvm-svn: 301197	2017-04-24 17:17:36 +00:00
Nicolai Haehnle	ef449787d8	AMDGPU: Fix crash when scheduling non-memory SMRD instructions Summary: Fixes piglit spec/arb_shader_clock/execution/* Reviewers: arsenm Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, llvm-commits, t-tye Differential Revision: https://reviews.llvm.org/D32345 llvm-svn: 301191	2017-04-24 16:53:52 +00:00
David Blaikie	6cce69020c	Fix test from polluting the source tree (though this seems like a "does this not crash" test - which isn't very good. Should be fixed) llvm-svn: 301071	2017-04-22 07:53:40 +00:00
Konstantin Zhuravlyov	3d1cc88c68	AMDGPU: Temporarily disable packed inlinable literals (v2f16, v2i16) Differential Revision: https://reviews.llvm.org/D32361 llvm-svn: 301028	2017-04-21 19:45:22 +00:00
Konstantin Zhuravlyov	88938d4e67	AMDGPU: Fix S_PACK_HH_B32_B16 - We really ought to zero out lower 16 bits Differential Revision: https://reviews.llvm.org/D32356 llvm-svn: 301026	2017-04-21 19:35:05 +00:00

1 2 3 4 5 ...

1032 Commits