clang-p2996

Author	SHA1	Message	Date
Matt Arsenault	270e96f435	Revert "AMDGPU: Invert handling of enqueued block detection" This reverts commit `47288cc977`. The runtime is having trouble with this at -O0 when the inputs are always enabled.	2023-01-07 21:48:07 -05:00
Matt Arsenault	47554a0c73	AMDGPU: Use more accurate IR type for block handle The device library uses this as a struct with a pointer sized integer and 2 ints.	2023-01-06 21:23:28 -05:00
Matt Arsenault	b7587ca837	AMDGPU: Add more opencl printf tests	2023-01-06 21:23:14 -05:00
Matt Arsenault	47288cc977	AMDGPU: Invert handling of enqueued block detection Invert the sense of the attribute and let the attributor figure this out like everything else. If needed we can have the not-OpenCL languages set amdgpu-no-default-queue and amdgpu-no-completion-action up front so they never have to pay the cost. There are also so many of these now, the offset use API should probably consider all of them at once. Maybe they should merge into one attribute with used fields. Having separate functions for each field in AMDGPUBaseInfo is also not the greatest API (might as well fix this when the patch to get the object version from the module lands).	2023-01-06 21:16:08 -05:00
Matt Arsenault	0416883dc1	AMDGPU: Fix enqueue block lowering for opaque pointers This was looking for a specific constant cast of the function, when the type doesn't matter. Doesn't bother trying to handle typed pointers, it will just assert. Things probably don't work completely correctly if the block kernel address is captured somewhere else, but that wouldn't work before either. The uses should really be loads out of the handle, and the handle initializer should contain the kernel address.	2023-01-06 21:15:39 -05:00
Matt Arsenault	4ce5400a3f	AMDGPU: Convert enqueue-kernel.ll to opaque pointers This demonstrates the pass is broken with them, the follow up change will fix it.	2023-01-06 21:15:39 -05:00
Matt Arsenault	8723836358	AMDGPU: Add additional printf string tests Test various inputs passed to %s.	2023-01-06 17:22:13 -05:00
Matt Arsenault	b4d44322d9	AMDGPU/GlobalISel: Add missing test for implicit_def regbankselect	2023-01-06 08:58:10 -05:00
Matt Arsenault	6fe85933d4	AMDGPU/GlobalISel: Add wave32 checks to bool test	2023-01-06 08:58:10 -05:00
Juan Manuel MARTINEZ CAAMAÑO	543db09b97	[CodeGen][AMDGPU] EXTRACT_VECTOR_ELT: input vector element type can differ from output type In function SITargetLowering::performExtractVectorElt, the output type was not considered which could lead to type mismatches later. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D139943	2023-01-06 09:46:02 +01:00
Jeffrey Byrnes	33aba5d0d0	[AMDGPU] Switch to autogenerated checks	2023-01-05 16:27:18 -08:00
Vang Thao	25d72330ff	[AMDGPU] Add .uniform_work_group_size metadata to v5 Amdgpu kernel with function attribute "uniform-work-group-size"="true" requires uniform work group size (i.e. each dimension of global size is a multiple of corresponding dimension of work group size). hipExtModuleLaunchKernel allows to launch HIP kernel with non-uniform workgroup size, which makes it necessary for runtime to check and enforce uniform workgroup size if kernel requires it. To let runtime be able to enforce that, this metadata is needed to indicate that the kernel requires uniform workgroup size. Reviewed By: kzhuravl, arsenm Differential Revision: https://reviews.llvm.org/D141012	2023-01-05 21:29:56 +00:00
Alexander Timofeev	6daa983c9d	[AMDGPU] MachineScheduler: schedule execution metric added for the UnclusteredHighRPStage Since the divergence-driven ISel was fully enabled we have more VGPRs available. MachineScheduler trying to take advantage of that bumps up the occupancy sacrificing the hiding of memory access latency. This really spoils the initially good schedule. A new metric that reflects the latency hiding quality of the schedule has been created to make it to balance between occupancy and latency. The metric is based on the latency model which computes the bubble to working cycles ratio. Then we use this ratio to decide if the higher occupancy schedule is profitable as follows: Profit = NewOccupancy/OldOccupancy * OldMetric/NewMetric Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D139710	2023-01-05 21:10:56 +01:00
Matt Arsenault	7c327c2fbb	AMDGPU: Fix broken opaque pointer handling in printf pass This was directly considering the pointee type, and also applying special semantics to constant address space.	2023-01-05 13:48:32 -05:00
Matt Arsenault	1f93517b25	AMDGPU: Switch enqueue kernel test to generated checks	2023-01-05 11:39:23 -05:00
Matt Arsenault	7b922fc0c3	AMDGPU: Fix broken and permissive handling of printf format strings This was completely broken with opaque pointers because it was specifically looking for a constant expression with the global variable as the first operand. Strip casts like normal, and properly validate all of the restrictions rather than silently ignoring any unhandled cases. Also be stricter that we aren't calling into some unresolved or non-constant format string. Also converts the test to opaque pointers and generated tests. There's more broken initializer handling for strings inside the format string processing too, but there's just no test coverage for this at all.	2023-01-05 09:18:00 -05:00
Nikita Popov	60442f0d44	[CodeGen] Convert some tests to opaque pointers (NFC) These are mostly MIR tests, which I did not handle during previous conversions.	2023-01-05 13:21:20 +01:00
Jay Foad	0d518ae50c	[GlobalISel] New combine to commute constant operands to the RHS Differential Revision: https://reviews.llvm.org/D140907	2023-01-05 11:12:40 +00:00
Diana Picus	6ee4f253b2	[GlobalISel] Add G_BUILD_VECTOR[_TRUNC] to CSE Add G_BUILD_VECTOR and G_BUILD_VECTOR_TRUNC to the list of opcodes in `shouldCSEOpc`. This simplifies the code generated for vector splats. Differential Revision: https://reviews.llvm.org/D140965	2023-01-05 10:15:31 +01:00
Diana Picus	61c5775b36	[GlobalISel] Precommit a test for D140965 Add a test for CSE-ing G_BUILD_VECTOR. This will be enabled in D140965.	2023-01-05 09:59:27 +01:00
Matt Arsenault	8dfe60c356	AMDGPU: Set scratch_en if there is dynamic stack but no fixed stack	2023-01-04 20:51:18 -05:00
Anshil Gandhi	4bbcbdaee5	[AMDGPU] Unify divergent nodes if the PostDom tree has one root This patch allows AMDGPUUnifyDivergenceExitNodes pass to transform a function whose PDT has exactly one root and ends in a branch instruction. Fixes https://github.com/llvm/llvm-project/issues/58861. Reviewed By: ruiling, arsenm Differential Revision: https://reviews.llvm.org/D139780	2023-01-04 10:45:03 -07:00
Matt Arsenault	687e0e205e	AMDGPU: Create alloca wide load/store with explicit alignment This was introducing transient UB by using the default alignment of a larger vector type.	2023-01-03 11:29:18 -05:00
Matt Arsenault	6fed2c90d3	AMDGPU: Diagnose which LDS global failed to lower Also lowercase the message to start since that seems to be the prevailing convention for error messages.	2023-01-03 09:31:07 -05:00
Dmitry Preobrazhensky	e7a306310b	[AMDGPU][GFX11] Correct tied src2 of v_fmac_f16_e64 src2 was incorrectly defined as VSrc_f16 but it is tied to dst which is VGPR_32. As a result, disassembler failed to decode src2. Differential Revision: https://reviews.llvm.org/D140299	2022-12-30 16:42:15 +03:00
Matt Arsenault	e630d9b299	AMDGPU/clang: Remove target features from address space test builtins It turns out we can codegen these on targets without flat addressing, although the runtime probably didn't put anything useful there. The proper diagnostic would be to disallow flat pointer uses or languages with them, not this one edge case. Allows removing one of the special cases requiring subtarget support in the device libraries.	2022-12-29 18:46:41 -05:00
Craig Topper	8abd70081f	[TargetLowering] Teach BuildUDIV to take advantage of leading zeros in the dividend. If the dividend has leading zeros, we can use them to reduce the size of the multiplier and avoid the fixup cases. This patch is for scalars only, but we might be able to do this for vectors in a follow up. Differential Revision: https://reviews.llvm.org/D140750	2022-12-29 13:58:46 -08:00
Matt Arsenault	52c44a441c	AMDGPU: Modernize sqrt f64 test Use the readfirstlane hack for the scalar cases as a hack to combine globalisel and sdag tests. gfx6 stores are a bit broken in globalisel, and scalar returns are totally broken in sdag.	2022-12-22 13:01:41 -05:00
Matt Arsenault	5da812461a	AMDGPU: Update constant address spaces used in printf test This was never updated for the address space number shuffle.	2022-12-22 12:38:59 -05:00
Jay Foad	7e1e993816	[AMDGPU] Remove permlane discard vdst_in optimization from isel D72845 implemented the equivalent IR optimization in InstCombine so it seems that there's no advantage to doing it during isel too. This partially reverts D72844. Differential Revision: https://reviews.llvm.org/D140546	2022-12-22 15:49:26 +00:00
Yashwant Singh	9e0d8ab822	[AMDGPU][Test] Update perfhint test to use opaque pointers Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D140452	2022-12-22 09:49:03 +05:30
Mirko Brkusanin	a80edb7fc9	[AMDGPU][GlobalISel] Fix mapping G_FREEZE Differential Revision: https://reviews.llvm.org/D140416	2022-12-21 15:25:04 +01:00
Christudasan Devadasan	a3028239a7	Revert "[AMDGPU][SILowerSGPRSpills] Spill SGPRs to virtual VGPRs" This reverts commit `40ba0942e2`.	2022-12-21 16:17:42 +05:30
Jay Foad	e73b35699b	[SelectionDAG] Fix EmitCopyFromReg for cloned nodes Change EmitCopyFromReg to check all users of cloned nodes (as well as non-cloned nodes) instead of assuming that they all copy the defined value back to the same physical register. This partially reverts `968e2e7b3d` (svn r62356) which claimed: CreateVirtualRegisters does trivial copy coalescing. If a node def is used by a single CopyToReg, it reuses the virtual register assigned to the CopyToReg. This won't work for SDNode that is a clone or is itself cloned. Disable this optimization for those nodes or it can end up with non-SSA machine instructions. This is true for CreateVirtualRegisters but r62356 also updated EmitCopyFromReg where it is not true. Firstly EmitCopyFromReg only coalesces physical register copies, so the concern about SSA form does not apply. Secondly making the loop over users in EmitCopyFromReg conditional on `!IsClone && !IsCloned` breaks the handling of cloned nodes, because it leaves MatchReg set to true by default, so it assumes that all users will copy the defined value back to the same physical register instead of actually checking. Differential Revision: https://reviews.llvm.org/D140417	2022-12-21 10:44:45 +00:00
Jay Foad	087cd5e5d1	[SelectionDAG] Precommit EmitCopyFromReg test for D140417	2022-12-21 10:44:45 +00:00
Leon Clark	daa022ca57	Enable roundeven. Add support for roundeven and implement appropriate tests. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D137954	2022-12-20 15:40:20 +00:00
Jessica Del	5ee13e6c65	[AMDGPU] Wide multiplies tests for D140208 These tests show suboptimal code generation that will be improved by the changes in D140208	2022-12-20 12:08:36 +01:00
Matt Arsenault	0dc4bdd888	GlobalISel: Enable CSE of G_SELECT Stop trying to delete a select in one combine since it would be deleting the CSE'd instruction if that happened.	2022-12-19 21:26:47 -05:00
Matt Arsenault	a20503caa1	AMDGPU: Add regression tests for fmin/fmax legacy matching	2022-12-19 11:36:13 -05:00
Matt Arsenault	c60e67b1f9	AMDGPU: Add more fneg combine tests	2022-12-19 10:34:55 -05:00
Matt Arsenault	7a3682f666	AMDGPU: Convert a few more special case tests to opaque pointers lower-kernargs.ll needed a switch to use update_test_checks metadata matching.	2022-12-19 09:42:42 -05:00
Matt Arsenault	262c2c0fd2	AMDGPU: Update some tests to use opaque pointers vectorize-buffer-fat-pointer.ll required a manual check line fix. vector-alloca-addrspacecast.ll required a manual fixup of a check line. partial-regcopy-and-spill-missed-at-regalloc.ll required re-running update_mir_test_checks. The HSA metadata tests required avoiding the script touching the type name in the metadata. annotate-noclobber.ll ran into one update script bug. It deleted a check line with a 0 offset GEP, moving the following -NEXT check logically up one line.	2022-12-19 09:28:58 -05:00
Matt Arsenault	04bd576f89	AMDGPU: Convert some amdgpu-codegenprepare tests to opaque pointers amdgpu-late-codegenprepare.ll required running update_test_checks after converting.	2022-12-19 09:28:58 -05:00
Matt Arsenault	ce096b2207	AMDGPU: Convert some tests to opaque pointers These required update_mir_test_checks.	2022-12-19 09:04:17 -05:00
Nikita Popov	bdf2fbba9c	[AMDGPU] Convert some tests to opaque pointers (NFC)	2022-12-19 12:41:13 +01:00
Matt Arsenault	012a85296b	AMDGPU/GlobalISel: Use ptrtoint to legalize constant 32-bit addrspacecast This was trying to merge 2 32-bit pointers into a 64-bit pointer. The artifact combiner was assuming merges to pointers use scalar sources, and ended up inserting invalid bitcast from a pointer to a scalar. It should probably be a verifier error to have pointer merge sources with a pointer result. Fixes verifier errors with EXPENSIVE_CHECKS.	2022-12-18 13:15:58 -05:00
Matt Arsenault	1706960894	AMDGPU/R600: Special case addrspacecast lowering for null Due to poor support for non-0 null pointers, clang always emits addrspacecast from a null flat constant for private/local null. We can trivially handle this case for old hardware. Should fix issue 55679.	2022-12-18 08:02:45 -05:00
Matt Arsenault	9d6003c764	AMDGPU: Lower addrspacecast on gfx6 Fixes inconsistent handling of constant-32bit case. Turns out we can lower all the casts just fine, it's just accessing the flat results that's a problem.	2022-12-18 08:02:45 -05:00
Sameer Sahasrabuddhe	9c1b82599d	[AAPointerInfo] handle multiple offsets in PHI Previously reverted in `8b446ea2ba` Reapplying because this commit is NOT DEPENDENT on the reverted commit `fc21f2d7ba`, which broke the ASAN buildbot. See https://reviews.llvm.org/rGfc21f2d7bae2e0be630470cc7ca9323ed5859892 for more information. The arguments to a PHI may represent a recurrence by eventually using the output of the PHI itself. This is now handled by checking for cycles in the control flow. If a PHI is not in a recurrence, it is now able to report multiple offsets instead of conservatively reporting unknown. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D138991	2022-12-18 10:51:20 +05:30
Christudasan Devadasan	40ba0942e2	[AMDGPU][SILowerSGPRSpills] Spill SGPRs to virtual VGPRs Currently, the custom SGPR spill lowering pass spills SGPRs into physical VGPR lanes and the remaining VGPRs are used by regalloc for vector regclass allocation. This imposes many restrictions that we ended up with unsuccessful SGPR spilling when there won't be enough VGPRs and we are forced to spill the leftover into memory during PEI. The custom spill handling during PEI has many edge cases and often breaks the compiler time to time. This patch implements spilling SGPRs into virtual VGPR lanes. Since we now split the register allocation for SGPRs and VGPRs, the virtual registers introduced for the spill lanes would get allocated automatically in the subsequent regalloc invocation for VGPRs. Spill to virtual registers will always be successful, even in the high-pressure situations, and hence it avoids most of the edge cases during PEI. We are now left with only the custom SGPR spills during PEI for special registers like the frame pointer which isn an unproblematic case. This patch also implements the whole wave spills which might occur if RA spills any live range of virtual registers involved in the whole wave operations. Earlier, we had been hand-picking registers for such machine operands. But now with SGPR spills into virtual VGPR lanes, we are exposing them to the allocator. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D124196	2022-12-17 11:56:32 +05:30

1 2 3 4 5 ...

6072 Commits