clang-p2996

Author	SHA1	Message	Date
Matt Arsenault	5f9707b796	AMDGPU: Fix redundant FP spilling/assert in some functions If a function has stack objects, and a call, we require an FP. If we did not initially have any stack objects, and only introduced them during PrologEpilogInserter for CSR VGPR spills, SILowerSGPRSpills would end up spilling the FP register as if it were a normal register. This would result in an assert in a debug build, or redundant handling of the FP register in a release build. Try to predict that we will have an FP later, although this is ugly.	2021-01-26 13:01:45 -05:00
Mitch Phillips	c9466ede7e	Revert "Revert "[GlobalISel] LegalizerHelper - Extract widenScalarAddoSubo method"" This reverts commit `554b3211fe`. Differential Revision: https://reviews.llvm.org/D95035	2021-01-25 16:22:22 -08:00
Stanislav Mekhanoshin	eace81c48f	[AMDGPU] Added -mcpu=tahiti to 3 tests. NFC.	2021-01-25 15:50:59 -08:00
Konstantin Zhuravlyov	2cdb34efda	Revert "[IndirectFunctions] Skip propagating attributes to address taken functions" This reverts commit `dd8ae42674`. This commit causes infinite loop when compiling rocThrust and hipCUB. Differential Revision: https://reviews.llvm.org/D95389	2021-01-25 15:58:06 -05:00
Carl Ritson	a80ebd0179	[AMDGPU] Fix llvm.amdgcn.init.exec and frame materialization Frame-base materialization may insert vector instructions before EXEC is initialised. Fix this by moving lowering of llvm.amdgcn.init.exec later in backend. Also remove SI_INIT_EXEC_LO pseudo as this is not necessary. Reviewed By: ruiling Differential Revision: https://reviews.llvm.org/D94645	2021-01-25 08:31:17 +09:00
Roger Ferrer Ibanez	d4ce062340	[RISCV][PrologEpilogInserter] "Float" emergency spill slots to avoid making them immediately unreachable from the stack pointer In RISC-V there is a single addressing mode of the form imm(reg) where imm is a signed integer of 12-bit with a range of [-2048..2047] bytes from reg. The test MultiSource/UnitTests/C++11/frame_layout of the LLVM test-suite exercises several scenarios with the stack, including function calls where the stack will need to be realigned to to a local variable having a large alignment of 4096 bytes. In situations of large stacks, the RISC-V backend (in RISCVFrameLowering) reserves an extra emergency spill slot which can be used (if no free register is found) by the register scavenger after the frame indexes have been eliminated. PrologEpilogInserter already takes care of keeping the emergency spill slots as close as possible to the stack pointer or frame pointer (depending on what the function will use). However there is a final alignment step to honour the maximum alignment of the stack that, when using the stack pointer to access the emergency spill slots, has the side effect of setting them farther from the stack pointer. In the case of the frame_layout testcase, the net result is that we do have an emergency spill slot but it is so far from the stack pointer (more than 2048 bytes due to the extra alignment of a variable to 4096 bytes) that it becomes unreachable via any immediate offset. During elimination of the frame index, many (regular) offsets of the stack may be immediately unreachable already. Their address needs to be computed using a register. A virtual register is created and later RegisterScavenger should be able to find an unused (physical) register. However if no register is available, RegisterScavenger will pick a physical register and spill it onto an emergency stack slot, while we compute the offset (restoring the chosen register after all this). This assumes that the emergency stack slot is easily reachable (this is, without requiring another register!). This is the assumption we seem to break when we perform the extra alignment in PrologEpilogInserter. We can "float" the emergency spill slots by increasing (in absolute value) their offsets from the incoming stack pointer. This way the emergency spill slots will remain close to the stack pointer (once the function has allocated storage for the stack, including the needed realignment). The new size computed in PrologEpilogInserter is padding so it should be OK to move the emergency spill slots there. Also because we're increasing the alignment, the new location should stay aligned for the purpose of the emergency spill slots. Note that this change also impacts other backends as shown by the tests. Changes are minor adjustments to the emergency stack slot offset. Differential Revision: https://reviews.llvm.org/D89239	2021-01-23 09:10:03 +00:00
Stanislav Mekhanoshin	ca904b81e6	[AMDGPU] Fix FP materialization/resolve with flat scratch Differential Revision: https://reviews.llvm.org/D95266	2021-01-22 16:06:47 -08:00
Mitch Phillips	554b3211fe	Revert "[GlobalISel] LegalizerHelper - Extract widenScalarAddoSubo method" This reverts commit `2bb92bf451`. Dependent patch broke UBSan on Android: `3dedad475d`	2021-01-22 14:32:11 -08:00
Cassie Jones	2bb92bf451	[GlobalISel] LegalizerHelper - Extract widenScalarAddoSubo method The widenScalar implementation for signed and unsigned overflowing operations were very similar: both are checked by truncating the result and then re-sign/zero-extending it and checking that it matches the computed operation. Using a truncate + zero-extend for the unsigned case instead of manually producing the AND instruction like before leads to an extra copy instruction during legalization, but this should be harmless. Differential Revision: https://reviews.llvm.org/D95035	2021-01-22 14:08:46 -08:00
Sebastian Neubauer	8214982b50	[AMDGPU] Implement mir parseCustomPseudoSourceValue Allow parsing generated mir with custom pseudo source value tokens. Also rename pseudo source values to have more meaningful names. Relands `ba7dcd8542`, which had memory leaks. Differential Revision: https://reviews.llvm.org/D95215	2021-01-22 11:24:08 +01:00
Christudasan Devadasan	ff8a1cae18	[AMDGPU] Fix the inconsistency in soffset for MUBUF stack accesses. During instruction selection, there is an inconsistency in choosing the initial soffset value. With certain early passes, this value is getting modified and that brought additional fixup during eliminateFrameIndex to work for all cases. This whole transformation looks trivial and can be handled better. This patch clearly defines the initial value for soffset and keeps it unchanged before eliminateFrameIndex. The initial value must be zero for MUBUF with a frame index. The non-frame index MUBUF forms that use a raw offset from SP will have the stack register for soffset. During frame elimination, the soffset remains zero for entry functions with zero dynamic allocas and no callsites, or else is updated to the appropriate frame/stack register. Also, did some code clean up and made all asserts around soffset stricter to match. Reviewed By: scott.linder Differential Revision: https://reviews.llvm.org/D95071	2021-01-22 14:20:59 +05:30
Christudasan Devadasan	c971bcd210	[AMDGPU] Test clean up (NFC)	2021-01-22 13:38:52 +05:30
Arthur Eubanks	a11bf9a7fb	[AMDGPU][Inliner] Remove amdgpu-inline and add a new TTI inline hook Having a custom inliner doesn't really fit in with the new PM's pipeline. It's also extra technical debt. amdgpu-inline only does a couple of custom things compared to the normal inliner: 1) It disables inlining if the number of BBs in a function would exceed some limit 2) It increases the threshold if there are pointers to private arrays(?) These can all be handled as TTI inliner hooks. There already exists a hook for backends to multiply the inlining threshold. This way we can remove the custom amdgpu-inline pass. This caused inline-hint.ll to fail, and after some investigation, it looks like getInliningThresholdMultiplier() was previously getting applied twice in amdgpu-inline (https://reviews.llvm.org/D62707 fixed it not applying at all, so some later inliner change must have fixed something), so I had to change the threshold in the test. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D94153	2021-01-21 20:29:17 -08:00
RamNalamothu	b6c3a59c3f	[AMDGPU] Test case demonstrating issues with generation of .debug_frame This test case demonstrates that the Call Frame Information generation is totally biased towards whether exceptions are enabled or not. Currently LLVM does not generate CFI i.e. a .debug_frame for debug purpose even if --force-dwarf-frame-section is enabled unless exceptions are enabled. Reviewed By: scott.linder Differential Revision: https://reviews.llvm.org/D94801	2021-01-22 07:39:06 +05:30
Nikita Popov	65fd034b95	[FunctionAttrs] Infer willreturn for functions without loops If a function doesn't contain loops and does not call non-willreturn functions, then it is willreturn. Loops are detected by checking for backedges in the function. We don't attempt to handle finite loops at this point. Differential Revision: https://reviews.llvm.org/D94633	2021-01-21 20:29:33 +01:00
Sebastian Neubauer	4dbdff66fe	Revert "[AMDGPU] Implement mir parseCustomPseudoSourceValue" This reverts commit `ba7dcd8542`. (caused memory leaks)	2021-01-21 18:11:48 +01:00
Jay Foad	c0b3c5a064	[AMDGPU][GlobalISel] Run SIAddImgInit This pass is required to get correct codegen for image instructions with the tfe or lwe bits set. Differential Revision: https://reviews.llvm.org/D95132	2021-01-21 15:54:54 +00:00
Matt Arsenault	94375d1083	AMDGPU: Remove v_rsq_f64 patterns This isn't accurate enough without correction	2021-01-21 10:51:36 -05:00
Matt Arsenault	2a0db8d70e	AMDGPU: Use more accurate fast f64 fdiv A raw v_rcp_f64 isn't accurate enough, so start applying correction.	2021-01-21 10:51:36 -05:00
Sebastian Neubauer	ba7dcd8542	[AMDGPU] Implement mir parseCustomPseudoSourceValue Allow parsing generated mir with custom pseudo source value tokens. Also rename pseudo source values to have more meaningful names. Differential Revision: https://reviews.llvm.org/D94768	2021-01-21 16:32:17 +01:00
Simon Pilgrim	69bc0990a9	[DAGCombiner] Enable SimplifyDemandedBits vector support for TRUNCATE (REAPPLIED). Add DemandedElts support inside the TRUNCATE analysis. REAPPLIED - this was reverted by @hans at rGa51226057fc3 due to an issue with vector shift amount types, which was fixed in rG935bacd3a724 and an additional test case added at rG0ca81b90d19d Differential Revision: https://reviews.llvm.org/D56387	2021-01-21 13:01:34 +00:00
madhur13490	dd8ae42674	[IndirectFunctions] Skip propagating attributes to address taken functions In case of indirect calls or address taken functions, skip propagating any attributes to them. We just propagate features to such functions. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D94585	2021-01-21 07:04:28 +00:00
Hans Wennborg	a51226057f	Revert "[DAGCombiner] Enable SimplifyDemandedBits vector support for TRUNCATE" It caused "Vector shift amounts must be in the same as their first arg" asserts in Chromium builds. See the code review for repro instructions. > Add DemandedElts support inside the TRUNCATE analysis. > > Differential Revision: https://reviews.llvm.org/D56387 This reverts commit `cad4275d69`.	2021-01-20 20:06:55 +01:00
Simon Pilgrim	cad4275d69	[DAGCombiner] Enable SimplifyDemandedBits vector support for TRUNCATE Add DemandedElts support inside the TRUNCATE analysis. Differential Revision: https://reviews.llvm.org/D56387	2021-01-20 15:39:58 +00:00
Mirko Brkusanin	a6a72dfdf2	[AMDGPU][GlobalISel] Avoid selecting S_PACK with constants If constants are hidden behind G_ANYEXT we can treat them same way as G_SEXT. For that purpose we extend getConstantVRegValWithLookThrough with option to handle G_ANYEXT same way as G_SEXT. Differential Revision: https://reviews.llvm.org/D92219	2021-01-20 11:54:53 +01:00
Jay Foad	0808c7009a	[AMDGPU] Fix test case for D94010	2021-01-19 16:46:47 +00:00
Jay Foad	de2f942399	[AMDGPU] Simplify test case for D94010	2021-01-19 16:36:43 +00:00
Simon Pilgrim	207f32948b	[DAG] SimplifyDemandedBits - use KnownBits comparisons to remove ISD::UMIN/UMAX ops Use the KnownBits icmp comparisons to determine when a ISD::UMIN/UMAX op is unnecessary should either op be known to be ULT/ULE or UGT/UGE than the other. Differential Revision: https://reviews.llvm.org/D94532	2021-01-18 10:29:23 +00:00
Carl Ritson	790c75c163	[AMDGPU] Add SI_EARLY_TERMINATE_SCC0 for early terminating shader Add pseudo instruction to allow early termination of pixel shader anywhere based on the value of SCC. The intention is to use this when a mask of live lanes is updated, e.g. live lanes in WQM pass. This facilitates early termination of shaders even when EXEC is incomplete, e.g. in non-uniform control flow. Reviewed By: foad Differential Revision: https://reviews.llvm.org/D88777	2021-01-13 13:29:05 +09:00
Joe Nash	314e29ed2b	[AMDGPU] Add _e64 suffix to VOP3 Insts Previously, instructions which could be expressed as VOP3 in addition to another encoding had a _e64 suffix on the tablegen record name, while those only available as VOP3 did not. With this patch, all VOP3s will have the _e64 suffix. The assembly does not change, only the mir. Reviewed By: foad Differential Revision: https://reviews.llvm.org/D94341 Change-Id: Ia8ec8890d47f8f94bbbdac43745b4e9dd2b03423	2021-01-12 18:33:18 -05:00
Matt Arsenault	3d39709159	AMDGPU: Remove wrapper only call limitation This seems to only have overridden cold handling, which we probably shouldn't do. As far as I can tell the wrapper library functions are still inlined as appropriate.	2021-01-12 17:12:49 -05:00
Craig Topper	03c8d6a0c4	[LegalizeDAG][RISCV][PowerPC][AMDGPU][WebAssembly] Improve expansion of SETONE/SETUEQ on targets without SETO/SETUO. If SETO/SETUO aren't legal, they'll be expanded and we'll end up with 3 comparisons. SETONE is equivalent to (SETOGT \|\| SETOLT) so if one of those operations is supported use that expansion. We don't need both since we can commute the operands to make the other. SETUEQ can be implemented with !(SETOGT \|\| SETOLT) or (SETULE && SETUGE). I've only implemented the first because it didn't look like most of the affected targets had legal SETULE/SETUGE. Reviewed By: frasercrmck, tlively, nemanjai Differential Revision: https://reviews.llvm.org/D94450	2021-01-12 10:45:03 -08:00
Simon Pilgrim	a4931d4fe3	[AMDGPU] Regenerate umax crash test	2021-01-12 18:02:15 +00:00
Jay Foad	794e3d94d5	[AMDGPU][GlobalISel] Remove some duplicate RUN lines Differential Revision: https://reviews.llvm.org/D86618	2021-01-12 11:02:16 +00:00
Sebastian Neubauer	6a195491b6	[AMDGPU] Fix failing assert with scratch ST mode In ST mode, flat scratch instructions have neither an sgpr nor a vgpr for the address. This lead to an assertion when inserting hard clauses. Differential Revision: https://reviews.llvm.org/D94406	2021-01-12 09:54:02 +01:00
Craig Topper	b1c304c494	[CodeGen] Try to make the print of memory operand alignment a little more user friendly. Memory operands store a base alignment that does not factor in the effect of the offset on the alignment. Previously the printing code only printed the base alignment if it was different than the size. If there is an offset, the reader would need to figure out the effective alignment themselves. This has confused me before and someone else was recently confused on IRC. This patch prints the possibly offset adjusted alignment if it is different than the size. And prints the base alignment if it is different than the alignment. The MIR parser has been updated to read basealign in addition to align. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D94344	2021-01-11 19:58:47 -08:00
Mircea Trofin	05e90cefeb	[NFC] Disallow unused prefixes under llvm/test/CodeGen This patch finishes addressing unused prefixes under CodeGen: 2 remaining tests fixed, and then undo-ing the lit.local.cfg changes under various subdirs and moving the policy under CodeGen. Differential Revision: https://reviews.llvm.org/D94430	2021-01-11 12:32:18 -08:00
Jay Foad	6dcf9207df	[AMDGPU] Fix a urem combine test to test what it was supposed to	2021-01-11 13:32:34 +00:00
QingShan Zhang	7539c75bb4	[DAGCombine] Remove the check for unsafe-fp-math when we are checking the AFN We are checking the unsafe-fp-math for sqrt but not for fpow, which behaves inconsistent. As the direction is to remove this global option, we need to remove the unsafe-fp-math check for sqrt and update the test with afn fast-math flags. Reviewed By: Spatel Differential Revision: https://reviews.llvm.org/D93891	2021-01-11 02:25:53 +00:00
Tony	2f499b9aff	[AMDGPU] Add volatile support to SIMemoryLegalizer Treat a non-atomic volatile load and store as a relaxed atomic at system scope for the address spaces accessed. This will ensure all relevant caches will be bypassed. A volatile atomic is not changed and still only bypasses caches upto the level specified by the SyncScope operand. Differential Revision: https://reviews.llvm.org/D94214	2021-01-09 00:52:33 +00:00
Mircea Trofin	a8bda3df42	[NFC] Disallow unused prefixes in CodeGen/AMDGPU This adds the lit config, and cleans up remaining tests. Differential Revision: https://reviews.llvm.org/D94245	2021-01-08 11:49:23 -08:00
Christudasan Devadasan	ae25a397e9	AMDGPU/GlobalISel: Enable sret demotion	2021-01-08 10:56:35 +05:30
Matt Arsenault	2cbbc6e87c	GlobalISel: Fail legalization on narrowing extload below memory size	2021-01-07 17:40:34 -05:00
Matt Arsenault	1f9b6ef91f	GlobalISel: Add combine for G_UREM by power of 2 Really I want this in the legalizer, but this is a start.	2021-01-07 16:36:35 -05:00
Mircea Trofin	ee57d30f44	[NFC] Removed unused prefixes from CodeGen/AMDGPU Last bulk batch. Differential Revision: https://reviews.llvm.org/D94236	2021-01-07 09:48:14 -08:00
Mircea Trofin	e881a25f1e	[NFC] Removed unused prefixes in CodeGen/AMDGPU This covers tests starting with s. Differential Revision: https://reviews.llvm.org/D94184	2021-01-07 08:00:11 -08:00
Matt Arsenault	6b7d5a928f	AMDGPU/GlobalISel: Start cleaning up calling convention lowering There are various hacks working around limitations in handleAssignments, and the logical split between different parts isn't correct. Start separating the type legalization to satisfy going through the DAG infrastructure from the code required to split into register types. The type splitting should be moved to generic code.	2021-01-07 10:36:45 -05:00
Arthur Eubanks	a515342de9	[test] Pin AMDGPU/opt-pipeline.ll to legacy PM The pipeline being tested is specifically the legacy PM pipeline.	2021-01-06 11:44:16 -08:00
Mircea Trofin	90347ab96f	[NFC] Removed unused prefixes in CodeGen/AMDGPU This covers tests starting with m-r. Differential Revision: https://reviews.llvm.org/D94181	2021-01-06 10:32:44 -08:00
Mircea Trofin	b470630913	[NFC] Removed unused prefixes from CodeGen/AMDGPU All the 'l'-starting tests. Differential Revision: https://reviews.llvm.org/D94151	2021-01-06 09:34:11 -08:00

1 2 3 4 5 ...

4229 Commits