clang-p2996

Author	SHA1	Message	Date
Matt Arsenault	3d0350b762	AMDGPU: Add MF independent version of getImplicitParameterOffset	2023-06-07 08:26:31 -04:00
Matt Arsenault	bc61bc8d6a	AMDGPU: Use available subtarget member	2023-06-07 08:26:31 -04:00
Matt Arsenault	eece6ba283	IR: Add llvm.ldexp and llvm.experimental.constrained.ldexp intrinsics AMDGPU has native instructions and target intrinsics for this, but these really should be subject to legalization and generic optimizations. This will enable legalization of f16->f32 on targets without f16 support. Implement a somewhat horrible inline expansion for targets without libcall support. This could be better if we could introduce control flow (GlobalISel version not yet implemented). Support for strictfp legalization is less complete but works for the simple cases.	2023-06-06 17:07:18 -04:00
Jay Foad	a4a3ac10cb	[AMDGPU] Remove extract_subvector patterns Removing them seems to slightly increase code quality as well as simplifying both the tablegen and C++ parts of the code. Differential Revision: https://reviews.llvm.org/D149853	2023-06-06 14:04:50 +01:00
Krzysztof Drewniak	faa2c678aa	[AMDGPU] Add buffer intrinsics that take resources as pointers In order to enable the LLVM frontend to better analyze buffer operations (and to potentially enable more precise analyses on the backend), define versions of the raw and structured buffer intrinsics that use `ptr addrspace(8)` instead of `<4 x i32>` to represent their rsrc arguments. The new intrinsics are named by replacing `buffer.` with `buffer.ptr`. One advantage to these intrinsic definitions is that, instead of specifying that a buffer load/store will read/write some memory, we can indicate that the memory read or written will be based on the pointer argument. This means that, for example, a read from a `noalias` buffer can be pulled out of a loop that is modifying a distinct buffer. In the future, we will define custom PseudoSourceValues that will allow us to package up the (buffer, index, offset) triples that buffer intrinsics contain and allow for more precise backend analysis. This work also enables creating address space 7, which represents manipulation of raw buffers using native LLVM load and store instructions. Where tests simply used a buffer intrinsic while testing some other code path (such as the tests for VGPR spills), they have been updated to use the new intrinsic form. Tests that are "about" buffer intrinsics (for instance, those that ensure that they codegen as expected) have been duplicated, either within existing files or into new ones. Depends on D145441 Reviewed By: arsenm, #amdgpu Differential Revision: https://reviews.llvm.org/D147547	2023-06-05 16:59:07 +00:00
Elliot Goodrich	ac73c48e09	[llvm] Reduce ComplexDeinterleavingPass.h includes Remove the unnecessary `"llvm/IR/PatternMatch.h"` include directive from `ComplexDeinterleavingPass.h` and move it to the corresponding source file. Add missing includes that were transitively included by this header to 3 other source files. This reduces the total number of preprocessing tokens across the LLVM source files in `lib` from (roughly) 1,964,876,961 to 1,935,091,611 - a reduction of ~1.52%. This should result in a small improvement in compilation time.	2023-05-20 17:49:18 +01:00
Thomas Symalla	91a7aa4c9b	[AMDGPU] Improve abs modifier usage If a call to the llvm.fabs intrinsic has users in another reachable BB, SelectionDAG will not apply the abs modifier to these users and instead generate a v_and ..., 0x7fffffff instruction. For fneg instructions, the issue is similar. This patch implements `AMDGPUIselLowering::shouldSinkOperands`, which allows CodegenPrepare to call `tryToSinkFreeOperands`. Reviewed By: foad Differential Revision: https://reviews.llvm.org/D150347	2023-05-19 12:02:21 +02:00
Philip Reames	0dc0c27989	[TLI] Add IsZero parameter to storeOfVectorConstantIsCheap [nfc] Make the decision to consider zero constant stores cheap target specific. Will be used in an upcoming change for RISCV.	2023-05-17 09:19:01 -07:00
Nicolai Hähnle	ef13308b26	AMDGPU/SDAG: Improve {extract,insert}_subvector lowering for 16-bit vectors v2: - simplify the escape to TableGen patterns Differential Revision: https://reviews.llvm.org/D149841	2023-05-05 10:55:18 +02:00
Sergei Barannikov	e744e51b12	[SelectionDAG] Rename ADDCARRY/SUBCARRY to UADDO_CARRY/USUBO_CARRY (NFC) This will make them consistent with other overflow-aware nodes. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D148196	2023-04-29 21:59:58 +03:00
Changpeng Fang	1ab8b9ae15	AMDGPU: Define sub-class of SGPR_64 for tail call return Summary: Registers for tail call return should not be clobbered by callee. So we need a sub-class of SGPR_64 (excluding callee saved registers (CSR)) to hold the tail call return address. Because GFX and C calling conventions have different CSR, we need to define the sub-class separately. This work is an extension of D147096 with the consideration of GFX calling convention. Based on the calling conventions, different instructions will be selected with different sub-class of SGPR_64 as the input. Reviewers: arsenm, cdevadas and sebastian-ne Differential Revision: https://reviews.llvm.org/D148824	2023-04-27 10:45:11 -07:00
Jay Foad	14a7b2bfff	[AMDGPU] Move GCN-specific stuff out of AMDGPUISelLowering. NFC. Differential Revision: https://reviews.llvm.org/D149145	2023-04-25 13:55:50 +01:00
Matt Arsenault	2fce50e8f5	AMDGPU: Fix assertion with multiple uses of f64 fneg of select A bitcast needs to be inserted back to the original type. Just skip the multiple use case for a safer quick fix. Handling the multiple use case seems to be beneficial in some but not all cases.	2023-04-20 10:15:18 -04:00
Matt Arsenault	f608ac6286	AMDGPU: Push fneg into bitcast of integer select Avoids some regressions in the math libraries in a future patch.	2023-04-12 06:48:58 -04:00
Matt Arsenault	0f59720e1c	AMDGPU: Fold fneg into bitcast of build_vector The math libraries have a lot of code that performs manual sign bit operations by bitcasting doubles to int2 and doing bithacking on them. This is a bad canonical form we should rewrite to use high level sign operations directly on double. To avoid codegen regressions, we need to do a better job moving fnegs to operate only on the high 32-bits. This is only halfway to fixing the real case.	2023-04-11 07:12:01 -04:00
Craig Topper	219ff07f72	[Targets] Rename Flag->Glue. NFC Long long ago Glue was called Flag, and it was never completely renamed.	2023-04-02 19:28:51 -07:00
Simon Pilgrim	8153b92d9b	[DAG] Add SelectionDAG::SplitScalar helper Similar to the existing SelectionDAG::SplitVector helper, this helper creates the EXTRACT_ELEMENT nodes for the LO/HI halves of the scalar source. Differential Revision: https://reviews.llvm.org/D147264	2023-03-31 18:35:40 +01:00
Kazu Hirata	1a8668cf0c	[Target] Use isAllOnesConstant (NFC)	2023-03-26 22:57:39 -07:00
Simon Pilgrim	9041682d2c	[DAG] Remove redundant isZExtFree(SDValue,VT) overrides. NFC. These implementations both match the TargetLoweringBase.isZExtFree implementation	2023-03-12 15:56:04 +00:00
Jon Chesterfield	d3dda422bf	[amdgpu][nfc] Replace ad hoc LDS frame recalculation with absolute_symbol MD Post ISel, LDS variables are absolute values. Representing them as such is simpler than the frame recalculation currently used to build assembler tables from their addresses. This is a precursor to lowering dynamic/external LDS accesses from non-kernel functions. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D144221	2023-03-12 13:47:48 +00:00
Jon Chesterfield	bf579a7049	[amdgpu] Change LDS lowering default to hybrid Postponed from D139433 until the bug fixed by D139874 could be resolved. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D141852	2023-02-24 15:20:12 +00:00
Matt Arsenault	c177051f60	AMDGPU: Restrict foldFreeOpFromSelect combine based on legal source mods Provides a small code size savings for some f32 cases.	2023-02-19 22:05:54 -04:00
Matt Arsenault	28d8889d27	AMDGPU: Teach fneg combines that select has source modifiers We do match source modifiers for f32 typed selects already, but the combiner code was never informed of this. A long time ago the documentation lied and stated that source modifiers don't work for v_cndmask_b32 when they in fact do. We had a bunch fo code operating under the assumption that they don't support source modifiers, so we tried to move fnegs around to work around this. Gets a few small improvements here and there. The main hazard to watch out for is infinite loops in the combiner since we try to move fnegs up and down the DAG. For now, don't fold fneg directly into select. The generic combiner does this for a restricted set of cases when getNegatedExpression obviously shows an improvement for both operands. It turns out to be trickier to avoid infinite looping the combiner in conjunction with pulling out source modifiers, so leave this for a later commit.	2023-02-19 20:13:38 -04:00
Jay Foad	8e5a41e827	Revert "AMDGPU: Override getNegatedExpression constant handling" This reverts commit `11c3cead23`. It was causing infinite loops in the DAG combiner.	2023-02-16 17:11:32 +00:00
Matt Arsenault	9ccc58893b	AMDGPU: Fix not adding to depth in getNegatedExpression	2023-02-15 08:32:58 -04:00
Matt Arsenault	11c3cead23	AMDGPU: Override getNegatedExpression constant handling Ignore the multiple use heuristics of the default implementation, and report cost based on inline immediates. This is mostly interesting for -0 vs. 0. Gets a few small improvements. fneg_fadd_0_f16 is a small regression. We could probably avoid this if we handled folding fneg into div_fixup.	2023-02-15 05:21:00 -04:00
Matt Arsenault	a4e8347b36	AMDGPU: Refactor isConstantCostlierToNegate	2023-02-15 05:21:00 -04:00
Kazu Hirata	64dad4ba9a	Use llvm::bit_cast (NFC)	2023-02-14 01:22:12 -08:00
Matt Arsenault	4f0eb57222	AMDGPU: Teach getNegatedExpression about rcp	2023-02-14 04:02:39 -04:00
Matt Arsenault	149e8abbd9	AMDGPU: Factor out fneg fold predicate function	2023-02-02 22:50:23 -04:00
Matt Arsenault	36cfe26a52	AMDGPU: Try to unfold fneg source when matching legacy fmin/fmax This is NFC as it stands, since other combines will effectively prevent this from being reachable. This will avoid regressions in a future change which tries to make better use of select source modifiers. Didn't bother with the GlobalISel part for now, since the baseline combine doesn't seem to work on the existing test.	2023-02-02 22:50:23 -04:00
Kazu Hirata	e078201835	[Target] Use llvm::count{l,r}_{zero,one} (NFC)	2023-01-28 09:23:07 -08:00
Matt Arsenault	93ec3fa402	AMDGPU: Support atomicrmw uinc_wrap/udec_wrap For now keep the exising intrinsics working.	2023-01-27 22:17:16 -04:00
Stanislav Mekhanoshin	c8ed36281a	[AMDGPU] Cast sub-dword elements to i32 in concat_vectors This produces better code by avoiding repacking in some cases. Fixes: SWDEV-373436 Differential Revision: https://reviews.llvm.org/D141329	2023-01-09 15:35:49 -08:00
Leon Clark	daa022ca57	Enable roundeven. Add support for roundeven and implement appropriate tests. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D137954	2022-12-20 15:40:20 +00:00
Jay Foad	6443c0ee02	[AMDGPU] Stop using make_pair and make_tuple. NFC. C++17 allows us to call constructors pair and tuple instead of helper functions make_pair and make_tuple. Differential Revision: https://reviews.llvm.org/D139828	2022-12-14 13:22:26 +00:00
Pierre van Houtryve	678d8946ba	[AMDGPU] Add bf16 storage support - [Clang] Declare AMDGPU target as supporting BF16 for storage-only purposes on amdgcn - Add Sema & CodeGen tests cases. - Also add cases that D138651 would have covered as this patch replaces it. - [AMDGPU] Add BF16 storage-only support - Support legalization/dealing with bf16 operations in DAGIsel. - bf16 as a type remains illegal and is represented as i16 for storage purposes. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D139398	2022-12-13 10:34:26 -05:00
Nico Weber	a862d09a92	Revert "[amdgpu] Reimplement LDS lowering" This reverts commit `982017240d`. Breaks check-llvm, see https://reviews.llvm.org/D139433#3974862	2022-12-06 12:01:36 -05:00
Jon Chesterfield	982017240d	[amdgpu] Reimplement LDS lowering Renames the current lowering scheme to "module" and introduces two new ones, "kernel" and "table", plus a "hybrid" that chooses between those three on a per-variable basis. Unit tests are set up to pass with the default lowering of "module" or "hybrid" with this patch defaulting to "module", which will be a less dramatic codegen change relative to the current. This reflects the sparsity of test coverage for the table lowering method. Hybrid is better than module in every respect and will be default in a subsequent patch. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D139433	2022-12-06 16:28:15 +00:00
Kazu Hirata	20cde15415	[Target] Use std::nullopt instead of None (NFC) This patch mechanically replaces None with std::nullopt where the compiler would warn if None were deprecated. The intent is to reduce the amount of manual work required in migrating from Optional to std::optional. This is part of an effort to migrate from llvm::Optional to std::optional: https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716	2022-12-02 20:36:06 -08:00
Mateja Marjanovic	595a08847a	[AMDGPU] Add support for new LLVM vector types Add VReg, AReg and SReg on AMDGPU for bit widths: 288, 320, 352 and 384. Differential Revision: https://reviews.llvm.org/D138205	2022-11-29 17:02:04 +01:00
Janek van Oirschot	322966f8f8	[AMDGPU] Add llvm.is.fpclass intrinsic to existing SelectionDAG fp class support and introduce GlobalISel implementation for AMDGPU Uses existing SelectionDAG lowering of the llvm.amdgcn.class intrinsic for llvm.is.fpclass	2022-11-28 16:00:36 -05:00
Ivan Kosarev	ec8ede8177	[AMDGPU][CodeGen] Support raw format TFE buffer loads other than byte, short and d16 ones. Differential Revision: https://reviews.llvm.org/D138215	2022-11-24 10:50:26 +00:00
Stanislav Mekhanoshin	bcaf31ec3f	[AMDGPU] Allow finer grain control of an unaligned access speed A target can return if a misaligned access is 'fast' as defined by the target or not. In reality there can be different levels of 'fast' and 'slow'. This patch changes the boolean 'Fast' argument of the allowsMisalignedMemoryAccesses family of functions to an unsigned representing its speed. A target can still define it as it wants and the direct translation of the current code uses 0 and 1 for current false and true. This makes the change an NFC. Subsequent patch will start using an actual value of speed in the load/store vectorizer to compare if a vectorized access going to be not just fast, but not slower than before. Differential Revision: https://reviews.llvm.org/D124217	2022-11-17 09:23:53 -08:00
Simon Pilgrim	78739fdb4d	[DAG] Enable combineShiftOfShiftedLogic folds after type legalization This was disabled to prevent regressions, which appear to be just occurring on AMDGPU (at least in our current lit tests), which I've addressed by adding AMDGPUTargetLowering::isDesirableToCommuteWithShift overrides. Fixes #57872 Differential Revision: https://reviews.llvm.org/D136042	2022-10-29 12:30:04 +01:00
Pierre van Houtryve	824dd811be	[AMDGPU][DAG] Fix trunc/shift combine condition The condition needs to be different for right-shifts, else we may lose information in some cases. Reviewed By: foad Differential Revision: https://reviews.llvm.org/D136059	2022-10-21 06:36:07 +00:00
Leon Clark	6370bc2435	Add f16 nearbyint support. Enable lowering of FNEARBYINT for f16 and extend existing tests. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D135124	2022-10-14 08:05:24 +01:00
Stanislav Mekhanoshin	5a3fe9a039	[AMDGPU] Move SIModeRegisterDefaults to SI MFI It does not belong to a general AMDGPU MFI. Differential Revision: https://reviews.llvm.org/D134666	2022-09-28 13:13:40 -07:00
Vitaly Buka	20a80d60a8	Revert "[AMDGPU] Move SIModeRegisterDefaults to SI MFI" Break msan bots. Details in D134666. This reverts commit `0ce96e06ee`.	2022-09-26 22:22:09 -07:00
Stanislav Mekhanoshin	0ce96e06ee	[AMDGPU] Move SIModeRegisterDefaults to SI MFI It does not belong to a general AMDGPU MFI. Differential Revision: https://reviews.llvm.org/D134666	2022-09-26 13:20:24 -07:00

1 2 3 4 5 ...

530 Commits