clang-p2996

Author	SHA1	Message	Date
Carl Ritson	e5b0b434f6	[AMDGPU] Refactor MIMG tables to better handle hardware variants Add mimgopc object to represent the opcode allowing different opcodes for different hardware variants. This enables image_atomic_fcmpswap, image_atomic_fmin, and image_atomic_fmax on GFX10 Reviewed By: foad, rampitec Differential Revision: https://reviews.llvm.org/D96309	2021-02-11 13:22:41 +09:00
Jay Foad	2114b458b0	[AMDGPU] Fix comments in SILoadStoreOptimizer::offsetsCanBeCombined	2021-02-10 14:49:33 +00:00
Matt Arsenault	f4ca6d8289	AMDGPU: Fix verifier error with argument passed in CSR SGPR We need to avoid setting the kill flag on the CSR spill if there's an additional use of the register after the spill. This does rely on consistency between the entry block liveins and the MRI's function live ins, which is not something the verifier checks now.	2021-02-09 13:49:44 -05:00
Matt Arsenault	b72a23650f	GlobalISel: Fix using wrong calling convention for callees This was taking the calling convention from the parent function, instead of the callee. Avoids regressions in a future patch when the caller and callee have different type breakdowns. For some reason AArch64's lowerFormalArguments seems to intentionally ignore the parent isVarArg.	2021-02-09 13:48:56 -05:00
Jinsong Ji	9202806241	Revert "[CostModel] Remove VF from IntrinsicCostAttributes" This reverts commit `502a67dd7f`. This expose a failure in test-suite build on PowerPC, revert to unblock buildbot first, Dave will re-commit in https://reviews.llvm.org/D96287. Thanks Dave.	2021-02-09 02:14:14 +00:00
Matt Arsenault	bcf723b2fd	AMDGPU: Stop adding stack passed wide arguments to call conv handler The generated calling convention code shouldn't see these types since we split large types into 32-bit chunks before the calling convention code is triggered. GlobalISel ends up directly calls the generated CC code before checking for the register count breakdown. Arguably this difference is a bug, but this was dead code for the DAG anyway.	2021-02-08 17:09:28 -05:00
Jay Foad	a4b1df8af3	[AMDGPU] Use named unified buffer format constant. NFC.	2021-02-08 17:34:36 +00:00
Thomas Symalla	f89f6d1e5d	[AMDGPU]: Fixes an invalid clamp selection pattern. When running the tests on PowerPC and x86, the lit test GlobalISel/trunc.ll fails at the memory sanitize step. This seems to be due to wrong invalid logic (which matches even if it shouldn't) and likely missing variable initialisation." Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D95878	2021-02-08 13:06:30 +01:00
Dmitry Preobrazhensky	05433a8d03	[AMDGPU][MC] Corrected error position for invalid dim modifiers Fixed bug 49054. Differential Revision: https://reviews.llvm.org/D96117	2021-02-08 14:32:28 +03:00
Dmitry Preobrazhensky	168ccc8ecb	[AMDGPU][MC][GFX10] Improved errors reporting for invalid MIMG NSA operands Differential Revision: https://reviews.llvm.org/D96118	2021-02-08 14:04:28 +03:00
Kazu Hirata	7725b81822	[AMDGPU] Drop unnecessary const from a return type (NFC) Identified with const-return-type.	2021-02-05 21:02:04 -08:00
Guillaume Chatelet	79b3ab725d	[NFC] Simplify expression	2021-02-05 10:17:02 +00:00
David Green	502a67dd7f	[CostModel] Remove VF from IntrinsicCostAttributes getIntrinsicInstrCost takes a IntrinsicCostAttributes holding various parameters of the intrinsic being costed. It can either be called with a scalar intrinsic (RetTy==Scalar, VF==1), with a vector instruction (RetTy==Vector, VF==1) or from the vectorizer with a scalar type and vector width (RetTy==Scalar, VF>1). A RetTy==Vector, VF>1 is considered an error. Both of the vector modes are expected to be treated the same, but because this is confusing many backends end up getting it wrong. Instead of trying work with those two values separately this removes the VF parameter, widening the RetTy/ArgTys by VF used called from the vectorizer. This keeps things simpler, but does require some other modifications to keep things consistent. Most backends look like this will be an improvement (or were not using getIntrinsicInstrCost). AMDGPU needed the most changes to keep the code from `c230965ccf` working. ARM removed the fix in `dfac521da1`, webassembly happens to get a fixup for an SLP cost issue and both X86 and AArch64 seem to now be using better costs from the vectorizer. Differential Revision: https://reviews.llvm.org/D95291	2021-02-05 09:34:24 +00:00
Craig Topper	11ef356d9e	[TargetLowering] Use Align in allowsMisalignedMemoryAccesses. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D96097	2021-02-04 19:22:06 -08:00
Dan Gohman	698c6b0a09	[WebAssembly] Support single-floating-point immediate value As mentioned in TODO comment, casting double to float causes NaNs to change bits. To avoid the change, this patch adds support for single-floating-point immediate value on MachineCode. Patch by Yuta Saito. Differential Revision: https://reviews.llvm.org/D77384	2021-02-04 18:05:06 -08:00
Wen-Heng (Jack) Chung	50578cf339	[AMDGPU] Add f16 to i1 CodeGen patterns. Follow patterns used for f32 and f64 types. Differential Revision: https://reviews.llvm.org/D95964	2021-02-04 11:44:18 -06:00
Jay Foad	d84e5fdac1	[AMDGPU][GlobalISel] Fix v2s16 right shifts When widening, each half of the v2s16 operands needs to be sign extended for G_ASHR or zero extended for G_LSHR. Differential Revision: https://reviews.llvm.org/D96048	2021-02-04 17:04:32 +00:00
Jay Foad	b3bb5c3efc	[AMDGPU][GlobalISel] Use scalar min/max instructions SALU min/max s32 instructions exist so use them. This means that regbankselect can handle min/max much like add/sub/mul/shifts. Differential Revision: https://reviews.llvm.org/D96047	2021-02-04 17:04:32 +00:00
Konstantin Zhuravlyov	6054a456da	AMDGPU: Add support for amdgpu-unsafe-fp-atomics attribute If amdgpu-unsafe-fp-atomics is specified, allow {flat\|global}_atomic_add_f32 even if atomic modes don't match. Differential Revision: https://reviews.llvm.org/D95391	2021-02-04 08:09:34 -05:00
Sebastian Neubauer	6c59dc474d	[AMDGPU] Save all lanes for reserved VGPRs When SGPRs are spilled to VGPRs, they can overwrite any lane. We need to preserve the value of inactive lanes in function calls, so we save the register even if it is marked as caller saved. Also, teach buildPrologSpill to work when no registers are free like in CodeGen/AMDGPU/pei-scavenge-vgpr-spill.mir and update the comment on findScratchNonCalleeSaveRegister as it is not used anymore to realign the stack pointer since D95865. Differential Revision: https://reviews.llvm.org/D95946	2021-02-04 09:56:36 +01:00
Matt Arsenault	477e3fe4f8	Revert "AMDGPU: Don't consider global pressure when bundling soft clauses" This reverts commit `1e377a273f`. A regression was reported.	2021-02-03 13:25:05 -05:00
Jay Foad	26ec785386	[AMDGPU] Fix multiclass template parameter types. NFC. This fixes TableGen parser errors that will be reported when D95874 is applied. Differential Revision: https://reviews.llvm.org/D95955	2021-02-03 16:21:51 +00:00
Matt Arsenault	9719f17011	AMDGPU: Move handling of allocation of fixed ABI inputs For the fixed ABI, set this in the initial argument constructor, rather than relying on the allocation logic to set the values. Also stop passing them for amdgpu_gfx, since the DAG path seems to skip these. I'm unclear on what amdgpu_gfx's expectations are. This will allow moving the special input registers out of the normal argument range.	2021-02-03 09:27:59 -05:00
Sebastian Neubauer	d49efdc969	Revert "[AMDGPU] Add a new Clamp Pattern to the GlobalISel Path." This reverts commits 62af0305b7cc..677a3529d3e6 from D93708. They cause failures in the sanitizer builds because of uninitialized values. A fix is in D95878, but it might take some time until this is pushed, so reverting the changes for now.	2021-02-03 11:03:34 +01:00
Matt Arsenault	af2cbe8eff	AMDGPU: Fix adding extra operands for i128 asm constraints We don't register i128 as a legal type with addRegisterClass, but it appears in the list of legal register types. This inconsistency resulted in the asm constraint lowering trying to use 2 128-bit registers for these operands. This would leave behind a dead def that would waste registers. Regresses GlobalISel tests for i128 load/store, but these aren't very important right now. Ideally these would not depend on the list of register types.	2021-02-02 19:01:04 -05:00
Matt Arsenault	1e377a273f	AMDGPU: Don't consider global pressure when bundling soft clauses This should only consider whether the pressure impact of the bundle at the given point in the program will decrease the occupancy. High VGPR pressure was incorrectly blocking the formation of scalar bundles, and vice versa. This was also blocking bundling from high pressure situations at other points in the program.	2021-02-02 19:00:14 -05:00
Sebastian Neubauer	8b898b19a8	[AMDGPU] Remove unused tmp register The temporary register is only used to compute the frame pointer. The frame pointer is overwritten and not used in between, so we can reuse the frame pointer for the computation, saving one register. Differential Revision: https://reviews.llvm.org/D95865	2021-02-02 17:17:54 +01:00
Sebastian Neubauer	6b6ae583cf	[AMDGPU] Save fp/bp after csr saves Saving callee-save registers happens in whole wave mode. Exec is saved to a free register, which can be reused to save the frame pointer. Therefore, saving the fp needs to happen after saving csrs. Differential Revision: https://reviews.llvm.org/D95861	2021-02-02 17:17:54 +01:00
Dmitry Preobrazhensky	586df38478	[AMDGPU][MC] Corrected parsing of optional modifiers Fixed bugs in parsing of "no*" modifiers and improved errors handling. See https://bugs.llvm.org/show_bug.cgi?id=41282. Differential Revision: https://reviews.llvm.org/D95675	2021-02-02 14:52:29 +03:00
Benjamin Kramer	679ef22f2e	Fold one-use variable into assert. NFCI. Avoids a warning in Release builds.	2021-02-02 10:50:48 +01:00
Sebastian Neubauer	b91afa474e	[AMDGPU] Mark epilog restores as frame-destroy I guess instructions were marked as frame-setup by accident, they are restores as part of the epilog. Differential Revision: https://reviews.llvm.org/D95783	2021-02-02 10:24:37 +01:00
Thomas Symalla	faeed774d1	Fixed includes. Differential Revision: https://reviews.llvm.org/D93708	2021-02-02 09:14:54 +01:00
Thomas Symalla	6c85e98f06	Fixed includes.	2021-02-02 09:14:54 +01:00
Thomas Symalla	09508d2849	Reverted whitespace changes. Differential Revision: https://reviews.llvm.org/D90968	2021-02-02 09:14:54 +01:00
Thomas Symalla	e630dd476c	Added missing includes.	2021-02-02 09:14:54 +01:00
Thomas Symalla	602896b9d2	Renamed med3 opcode, removed superfluous copy.	2021-02-02 09:14:54 +01:00
Thomas Symalla	fa3e840d3d	Removed the generic virtual register creations. Reworked the tests.	2021-02-02 09:14:54 +01:00
Thomas Symalla	c781c25412	Implemented a MED3_S32 GIR opcode.	2021-02-02 09:14:53 +01:00
Thomas Symalla	6604d81e1b	Added and used new target pseudo for v_cvt_pk_i16_i32, changes due to code review.	2021-02-02 09:14:53 +01:00
Thomas Symalla	52bfb50145	Formatting changes	2021-02-02 09:14:53 +01:00
Thomas Symalla	7d24026ed2	Formatting changes.	2021-02-02 09:14:53 +01:00
Thomas Symalla	bcd6c2d203	Updating formatting changes.	2021-02-02 09:14:53 +01:00
Thomas Symalla	ecbed4e0ab	Resolve formatting changes.	2021-02-02 09:14:53 +01:00
Thomas Symalla	7b2e701906	Code changes yielded from review.	2021-02-02 09:14:53 +01:00
Thomas Symalla	3a46502264	Move step to PreLegalizer	2021-02-02 09:14:53 +01:00
Thomas Symalla	cdfd9b3bf5	Move Combiner to PreLegalize step	2021-02-02 09:14:53 +01:00
Thomas Symalla	9a8da909f1	Reverted unintended git-format change.	2021-02-02 09:14:52 +01:00
Thomas Symalla	dae85e4671	Fixed the lit tests and a bug in the implementation.	2021-02-02 09:14:52 +01:00
Thomas Symalla	88a832aef1	Refactored the pattern matching.	2021-02-02 09:14:52 +01:00
Thomas Symalla	fce3230be2	Added early exit.	2021-02-02 09:14:52 +01:00

1 2 3 4 5 ...

5702 Commits