clang-p2996

Author	SHA1	Message	Date
Jay Foad	92542f2a40	[AMDGPU] Add targets gfx1150 and gfx1151 This is the target definition only. Currently they are treated the same as GFX 11.0.x. Differential Revision: https://reviews.llvm.org/D155429	2023-07-17 13:06:12 +01:00
Jakub Chlanda	3cd3f11c17	[NFC][AMDGPU] Default initialize the Subtarget This is to address a static analizer warning: The pointer field will point to an arbitrary memory location, any attempt to write may cause corruption. In <unnamed> R600DAGToDAGISel.:R600DAGToDAGISel (llvm::TargetMachine &, livm::CodeGenOpt::Level): A pointer field is not initialized in the constructor (CWE-457) Differential Revision: https://reviews.llvm.org/D154414	2023-07-17 11:39:29 +02:00
Jon Chesterfield	6043d4dfec	[amdgpu] Accept an optional max to amdgpu-lds-size attribute for use in PromoteAlloca	2023-07-15 21:37:21 +01:00
Jon Chesterfield	a222951148	[amdgpu][nfc] Use unsigned for getIntegerPairAttribute to match the only call sites	2023-07-15 20:42:13 +01:00
pvanhout	e5296c52e5	[AMDGPU] Relax restrictions on unbreakable PHI users in BreakLargePHis The previous heuristic rejected a PHI if one of its user was an unbreakable PHI, no matter what the other users were. This worked well in most cases, but there's one case in rocRAND where it doesn't work. In that case, a PHI node has 2 PHI users where one is breakable but not the other. When that PHI node isn't broken performance falls by 35%. Relaxing the restriction to "require that half of the PHI node users are breakable" fixes the issue, and seems like a sensible change. Solves SWDEV-409648, SWDEV-398393 Reviewed By: #amdgpu, arsenm Differential Revision: https://reviews.llvm.org/D155184	2023-07-14 09:02:51 +02:00
Jon Chesterfield	d3316bc111	[amdgpu] Delete elide-module-lds attribute Requires D155190 Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D155238	2023-07-14 00:36:33 +01:00
Jon Chesterfield	74e928a081	[amdgpu][lds] Remove recalculation of LDS frame from backend Do the LDS frame calculation once, in the IR pass, instead of repeating the work in the backend. Prior to this patch: The IR lowering pass sets up a per-kernel LDS frame and annotates the variables with absolute_symbol metadata so that the assembler can build lookup tables out of it. There is a fragile association between kernel functions and named structs which is used to recompute the frame layout in the backend, with fatal_errors catching inconsistencies in the second calculation. After this patch: The IR lowering pass additionally sets a frame size attribute on kernels. The backend uses the same absolute_symbol metadata that the assembler uses to place objects within that frame size. Deleted the now dead allocation code from the backend. Left for a later cleanup: - enabling lowering for anonymous functions - removing the elide-module-lds attribute (test churn, it's not used by llc any more) - adjusting the dynamic alignment check to not use symbol names Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D155190	2023-07-13 23:54:38 +01:00
Stanislav Mekhanoshin	7972b9c829	[AMDGPU] Move SIEncodingFamily into SIDefines.h. NFC. I need this for future patch in the MC, while TII is not available in the llvm-mc. Besides this is not a first time I want it there. Differential Revision: https://reviews.llvm.org/D155228	2023-07-13 12:42:28 -07:00
Jeffrey Byrnes	6b7805fcb1	[AMDGPU][IGLP] Add iglp_opt(1) strategy for single wave gemms This adds the IGLP strategy for single-wave gemms. The SchedGroup pipeline is laid out in multiple phases, with each phase corresponding to a distinct pattern present in gemm kernels. The resilience of the optimization is dependent upon IR (as seen by pre-RA scheduling) continuing to have these patterns (as defined by instruction class and dependencies) in their current relative ordering. The kernels of interest have these specific phases: NT: 1, 2a, 2c NN: 1, 2a, 2b TT: 1, 2b, 2c TN: 1, 2b The general approach taken was to have a long SchedGroup pipeline. In this way the scheduler will have less capability of doing the wrong thing. In order to resolve the challenge of correctly fitting these long pipelines, we leverage the rules infrastructure to help the solver. Differential Revision: https://reviews.llvm.org/D149773 Change-Id: I1a35962a95b4bdf740602b8f110d3297c6fb9d96	2023-07-13 12:03:04 -07:00
Ivan Kosarev	7b6e606dac	[AMDGPU][AsmParser][NFC] Translate parsed MIMG instructions to MCInsts automatically. Part of <https://github.com/llvm/llvm-project/issues/62629>. Reviewed By: foad Differential Revision: https://reviews.llvm.org/D155061	2023-07-13 19:47:31 +01:00
Ivan Kosarev	289ae6525d	[AMDGPU][MC] Fix handling of A16 operands in intersect_ray instructions. The patch adds the support for 'noa16' operands in non-A16 variants of the instructions, fixes validation of A16 operands and eliminates the custom conversion to MCInst. Part of <https://github.com/llvm/llvm-project/issues/62629>. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D155057	2023-07-13 19:46:03 +01:00
Mateja Marjanovic	fa46feb314	[AMDGPU] Use V_FMA_MIX* more often Combine mul (f32) + fptrunc (f32->f16) to "v_fma_mixlo_f16 mulSrc1, mulSrc2, 0". Differential Revision: https://reviews.llvm.org/D153544 Reviewers: arsenm, foad	2023-07-13 16:56:16 +02:00
pvanhout	07c5920487	Reland "[AMDGPU] Wave32 CodeGen for amdgcn.ballot.i64" This time without the extra `->dump()` A recent addition to the device libs, `__ockl_dm_trim`, caused a series of failures at O0 due to a i64 ballot intrinsic being inlined into a wave32 function. The quick fix for this is to support codegen for this rare case. A proper long-term fix for this type of issue is still being discussed. Fixes SWDEV-408929, SWDEV-408957, SWDEV-409885, SWDEV-410193 Reviewed By: #amdgpu, arsenm Differential Revision: https://reviews.llvm.org/D155050	2023-07-13 15:58:48 +02:00
pvanhout	aec971adec	Revert "[AMDGPU] Wave32 CodeGen for amdgcn.ballot.i64" This reverts commit `cfa2d0a3aa`.	2023-07-13 15:52:27 +02:00
Mateja Marjanovic	701c4adcea	Check for denormal flushing when selecting V_FMA/MAD_MIX*	2023-07-13 15:26:20 +02:00
pvanhout	cfa2d0a3aa	[AMDGPU] Wave32 CodeGen for amdgcn.ballot.i64 A recent addition to the device libs, `__ockl_dm_trim`, caused a series of failures at O0 due to a i64 ballot intrinsic being inlined into a wave32 function. The quick fix for this is to support codegen for this rare case. A proper long-term fix for this type of issue is still being discussed. Fixes SWDEV-408929, SWDEV-408957, SWDEV-409885, SWDEV-410193 Reviewed By: #amdgpu, arsenm Differential Revision: https://reviews.llvm.org/D155050	2023-07-13 15:20:58 +02:00
Jon Chesterfield	9418c40af7	[amdgpu][lds] Raise an explicit unimplemented error on absolute address LDS variables These aren't implemented. They could be at moderate implementation complexity. Raising an error is better than silently miscompiling. Patching now because the patch at D155125 is a step towards using this metadata more extensively as part of the lowering path and that will interact badly with input variables with this annotation. Lowering user defined variables at specific addresses would drop this error, put them at the requested position in the frame during this pass, and then use the same codegen that will be used for the kernel specific struct shortly. Reviewed By: jmmartinez Differential Revision: https://reviews.llvm.org/D155132	2023-07-13 11:32:03 +01:00
pvanhout	361e9eec51	[AMDGPU] Corrrectly emit AGPR copies in tryFoldPhiAGPR - Don't create COPY instructions between PHI nodes. - Don't create V_ACCVGPR_WRITE with operands that aren't AGPR_32 Solves SWDEV-410408 Reviewed By: #amdgpu, arsenm Differential Revision: https://reviews.llvm.org/D155080	2023-07-13 08:55:22 +02:00
pvanhout	3c30179e98	[GlobalISel] Rename KnownBits field of InstructionSelector `KnownBits` is also a type name. Having a field with this name prevents derived classes from using the `KnownBits` type unless they use `struct KnownBits`. Reviewed By: foad Differential Revision: https://reviews.llvm.org/D155082	2023-07-12 15:28:11 +02:00
Juan Manuel MARTINEZ CAAMAÑO	367b1f28db	[NFC][AMDGPULowerModuleLDSPass] Fix buildbot santizier failed to compile It seems that the sanitizer-x86_64-linux-android wasn't able to deduce the template argument: AMDGPULowerModuleLDSPass.cpp:1192:53: error: no viable constructor or deduction guide for deduction of template arguments of 'vector' auto TableLookupVariablesOrdered = sortByName(std::vector( This patch makes the template argument explicit.	2023-07-12 11:08:16 +02:00
Juan Manuel MARTINEZ CAAMAÑO	3a75551e85	Reland "[NFC][AMDGPULowerModuleLDSPass] Factorize repetead sort code" Fixed compilation error and reudndant copy warning Differential Revision: https://reviews.llvm.org/D154977	2023-07-12 09:27:20 +02:00
Jay Foad	f7684d8510	[DAG] Use legal shift amount type in DAGTypeLegalizer::JoinIntegers Documentation for TargetLowering::getShiftAmountTy says that LegalTypes should generally be true during type legalization, so this patch does that. On AMDGPU the effect is that we use i32 (a sane type) instead of i64 (pointer sized type) for more shift amounts, which in turn allows more formation of rotates and funnel shifts pre-legalization. Differential Revision: https://reviews.llvm.org/D154960	2023-07-12 08:12:09 +01:00
Jon Chesterfield	e75ce77cd7	[amdgpu][lds] Fix missing markUsedByKernel calls and undef lookup table elements More robust association between the kernels and lds struct. Use poison instead of value() for lookup table elements introduced by dynamic lds lowering. Extracted from D154946, new test from there verbatim. Segv fixed. Fixes issues/63338 Fixes SWDEV-404491 Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D154972	2023-07-12 00:37:21 +01:00
Matt Arsenault	fbe4ff8149	AMDGPU: Partially fix not respecting dynamic denormal mode The most notable issue was producing v_mad_f32 in functions with the dynamic mode, since it just ignores the mode. fdiv lowering is still somewhat broken because it involves a mode switch and we need to query the original mode.	2023-07-11 15:14:52 -04:00
Juan Manuel MARTINEZ CAAMAÑO	ebdd610ad4	Revert "[NFC][AMDGPULowerModuleLDSPass] Factorize repetead sort code" This reverts commit `125b90749a`.	2023-07-11 17:08:59 +02:00
Juan Manuel MARTINEZ CAAMAÑO	125b90749a	[NFC][AMDGPULowerModuleLDSPass] Factorize repetead sort code Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D154970	2023-07-11 17:03:58 +02:00
Juan Manuel MARTINEZ CAAMAÑO	70bb5d2b9d	[NFC][AMDGPULowerModuleLDSPass] Add const to some variables/parameters Moving out some changes not related to the bugfix in https://reviews.llvm.org/D154946 Reviewed By: JonChesterfield, arsenm Differential Revision: https://reviews.llvm.org/D154959	2023-07-11 15:51:57 +02:00
Juan Manuel MARTINEZ CAAMAÑO	abf081975e	[NFC][AMDGPULowerModuleLDSPass] Remove dead variable	2023-07-11 12:35:21 +02:00
pvanhout	8444038d16	[AMDGPU] Use GlobalISel MatchTable Combiner Backend Use the new matchtable-based combiner backend for all AMDGPU combiners. This drop-in from the user's perspective; there are no test changes, the new combiner behaves exactly like the old one. Depends on D153757 NOTE: This would land iff D153757 (RFC) lands too. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D153758	2023-07-11 11:27:13 +02:00
pvanhout	1fe7d9c799	[GlobalISel] Generalize `InstructionSelector` Match Tables Makes `InstructionSelector.h`/`InstructionSelectorImpl.h` generic so the match tables can also be used for the combiner. Some notes: - Coverage was made an optional parameter of `executeMatchTable`, combines won't use it for now. - `GIPFP_` -> `GICXXPred_` so it's more generic. Those are just C++ predicates and aren't PatFrag-specific. - Pass the MatcherState directly to testMIPredicate_MI, the combiner will need it. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D153755	2023-07-11 09:42:30 +02:00
Amara Emerson	3a80bdb316	[GlobalISel] Remove an erroneous oneuse check in the G_ADD reassociation combine. This check was unnecessary/incorrect, it was already being done by the target hook default implementation, and the one in the matcher was checking for a completely different thing. This change: 1) Removes the check and updates affected tests which now do some more reassociations. 2) Modifies the AMDGPU hooks which were stubbed with "return true" to also do the oneuse check. Not sure why I didn't do this the first time.	2023-07-10 01:03:12 -07:00
Johannes Doerfert	02a4fcec6b	[Attributor] Port AANonNull to the isImpliedByIR interface AANonNull is now the first AA that is always queried via the new APIs and not created manually. Others will follow shortly to avoid trivial AAs whenever possible. This commit introduced some helper logic that will make it simpler to port the next one. It also untangles AADereferenceable and AANonNull such that the former does not keep a handle on the latter. Finally, we stop deducing `nonnull` for `undef`, which was incorrect.	2023-07-09 16:04:19 -07:00
Matt Arsenault	64d325454b	AMDGPU: Delete custom combine on class intrinsic This is no longer necessary as class-with-constant will always be transformed to the generic class intrinsic. https://reviews.llvm.org/D153901	2023-07-07 15:28:21 -04:00
Christudasan Devadasan	7a98f084c4	[AMDGPU][SILowerSGPRSpills] Spill SGPRs to virtual VGPRs Currently, the custom SGPR spill lowering pass spills SGPRs into physical VGPR lanes and the remaining VGPRs are used by regalloc for vector regclass allocation. This imposes many restrictions that we ended up with unsuccessful SGPR spilling when there won't be enough VGPRs and we are forced to spill the leftover into memory during PEI. The custom spill handling during PEI has many edge cases and often breaks the compiler time to time. This patch implements spilling SGPRs into virtual VGPR lanes. Since we now split the register allocation for SGPRs and VGPRs, the virtual registers introduced for the spill lanes would get allocated automatically in the subsequent regalloc invocation for VGPRs. Spill to virtual registers will always be successful, even in the high-pressure situations, and hence it avoids most of the edge cases during PEI. We are now left with only the custom SGPR spills during PEI for special registers like the frame pointer which is an unproblematic case. Differential Revision: https://reviews.llvm.org/D124196	2023-07-07 23:14:32 +05:30
Christudasan Devadasan	b4a62b1fa5	[AMDGPU] Enable whole wave register copy So far, we haven't exposed the allocation of whole-wave registers to regalloc. We hand-picked them for various whole wave mode operations. With a future patch, we want the allocator to efficiently allocate them rather than using the custom pre-allocation pass. Any liverange split of virtual registers involved in whole-wave operations require the resulting COPY introduced with the split to be performed for all lanes. It isn't implemented in the compiler yet. This patch would identify all such copies and manipulate the exec mask around them to enable all lanes without affecting the value of exec mask elsewhere. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D143762	2023-07-07 22:58:55 +05:30
Christudasan Devadasan	b78b36e1a2	[AMDGPU] Implement whole wave register spill To reduce the register pressure during allocation, when the allocator spills a virtual register that corresponds to a whole wave mode operation, the spill loads and restores should be activated for all lanes by temporarily flipping all bits in exec register to one just before the spills. It is not implemented in the compiler as of today and this patch enables the necessary support. This is a pre-patch before the SGPR spill to virtual VGPR lanes that would eventually causes the whole wave register spills during allocation. Reviewed By: arsenm, cdevadas Differential Revision: https://reviews.llvm.org/D143759	2023-07-07 22:51:45 +05:30
Matt Arsenault	94e24624c2	AMDGPU: Remove attempt at simplifying the format string in printf lowering This avoids computing the dominator tree by removing the simplifyInstruction use. This was applying simplification with some kind of questionable load-store forwarding and looking for the global. This had to have been an ancient hack copied from previous backends. In the OpenCL case, this is always emitted as required the direct global reference anyway.	2023-07-07 09:26:07 -04:00
Scott Linder	986001c827	[AMDGPU] Improve assembler + disassembler handling of kernel descriptors * Relax the AsmParser to accept `.amdhsa_wavefront_size32 0` when the `.amdhsa_shared_vgpr_count` directive is present. * Teach the KD disassembler to respect the setting of KERNEL_CODE_PROPERTY_ENABLE_WAVEFRONT_SIZE32 when calculating the value of `.amdhsa_next_free_vgpr`. * Teach the KD disassembler to disassemble COMPUTE_PGM_RSRC3 for gfx90a and gfx10+. * Include "pseudo directive" comments for gfx10 fields which are not controlled by any assembler directive. * Fix disassembleObject failure diagnostic in llvm-objdump to not hard-code a comment string, and to follow the convention of not capitalizing the first sentence. Reviewed By: rochauha Differential Revision: https://reviews.llvm.org/D128014	2023-07-06 21:20:51 +00:00
Tom Stellard	4b36b2c23c	[Support] Use C++11 attribute syntax for visibility attributes The gnu extension __attribute syntax cannot be mixed with the C++11 alignas specifier, so in order to use visibility attributes on classes that also use alignas, we need to use the C++11 standard syntax. Also fix a few warnings introduced by this change. Reviewed By: compnerd Differential Revision: https://reviews.llvm.org/D152043	2023-07-06 10:30:56 -07:00
Matt Arsenault	9df70e4a4d	AMDGPU: Fix not applying the correct default memcpy expansion threshold Fixes `3c848194f2`. The TTI hook name got renamed at some point in the process and the target implementation was left behind. Fixes: SWDEV-407329	2023-07-06 12:14:14 -04:00
Matt Arsenault	c70cae6315	AMDGPU: Make SIFixVGPRCopies preserve everything All this does is add uses of reserved registers, which aren't tracked by anything. Saves a loop info computation.	2023-07-06 10:26:21 -04:00
Matt Arsenault	8ee1cc82c9	AMDGPU: Fold out sign bit ops on frexp_exp The sign bit has no impact on the exponent, so strip these away. Saves on the source modifier encoding cost. I left the GlobalISel handling until there's a resolution to issue #62628. We should do this in instcombine too, but legalization should be introducing more frexps than it currently is where this would occur.	2023-07-06 10:26:21 -04:00
Valery Pykhtin	98aa8439f5	[AMDGPU] Fix register class for a subreg in GCNRewritePartialRegUses. 1. Improved code that deduces register class from instruction definitions. Previously if some instruction didn't contain a reg class for an operand it was considered as no information on register class even if other instructions specified the class. 2. Added check on required size of resulting register because in some cases classes with smaller registers had been selected (for example VReg_1). Reviewed By: arsenm, #amdgpu Differential Revision: https://reviews.llvm.org/D152832	2023-07-06 08:48:45 +02:00
Tom Stellard	62748e934c	AMDGPU: Remove add_dependencies calls from CMakeLists.txt These are redundant. The same dependencies are being added as part of the add_llvm_component_library() call. I confirmed this by diff'ing the build.ninja files before and after the change and saw no change. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D153166	2023-07-05 20:03:11 -07:00
Matt Arsenault	5491666248	AMDGPU: Correctly lower llvm.exp.f32 The library expansion has too many paths for all the permutations of DAZ, unsafe and the 3 exp functions. It's easier to expand it in the backend when we know all of these things. The library currently misses the no-infinity check on the overflow, which this handles optimizing out. Some of the <3 x half> fast tests regress due to vector widening dropping flags which will be fixed separately. Apparently there is no exp10 intrinsic, but there should be. Adds some deadish code in preparation for adding one while I'm following along with the current library expansion.	2023-07-05 17:23:49 -04:00
Matt Arsenault	ed556a1ad5	AMDGPU: Correctly lower llvm.exp2.f32 Previously this did a fast math expansion only.	2023-07-05 17:23:48 -04:00
Matt Arsenault	9c82dc6a6b	AMDGPU: Always use v_rcp_f16 and v_rsq_f16 These inherited the fast math checks from f32, but the manual suggests these should be accurate enough for unconditional use. The definition of correctly rounded is 0.5ulp, but the manual says "0.51ulp". I've been a bit nervous about changing this as the OpenCL conformance test does not cover half. Brute force produces identical values compared to a reference host implementation for all values.	2023-07-05 16:53:01 -04:00
Matt Arsenault	4e15f378ee	AMDGPU: Correctly lower llvm.log.f32 and llvm.log10.f32 Previously we expanded these in a fast-math way and the device libraries were relying on this behavior. The libraries have a pending change to switch to the new target intrinsic. Unlike the library version, this takes advantage of no-infinities on the result overflow check.	2023-07-05 15:30:35 -04:00
Ivan Kosarev	7208fde09e	[AMDGPU][AsmParser][NFC] Generate printers for named-bit operands automatically. Part of <https://github.com/llvm/llvm-project/issues/62629>. Reviewed By: foad Differential Revision: https://reviews.llvm.org/D154433	2023-07-05 10:53:33 +01:00
Ivan Kosarev	12460cf90f	[AMDGPU][AsmParser] Simplify the implementation of SWZ operands. Those are implicit helper operands and therefore don't need any parsers or printers. Part of <https://github.com/llvm/llvm-project/issues/62629>. Reviewed By: piotr, foad Differential Revision: https://reviews.llvm.org/D154432	2023-07-05 10:45:12 +01:00

1 2 3 4 5 ...

8091 Commits