clang-p2996

Author	SHA1	Message	Date
Craig Topper	75620fadf5	[RISCV] Change how we encode AVL operands in vector pseudoinstructions to use GPRNoX0. This patch changes the register class to avoid accidentally setting the AVL operand to X0 through MachineIR optimizations. There are cases where we really want to use X0, but we can't get that past the MachineVerifier with the register class as GPRNoX0. So I've use a 64-bit -1 as a sentinel for X0. All other immediate values should be uimm5. I convert it to X0 at the earliest possible point in the VSETVLI insertion pass to avoid touching the rest of the algorithm. In SelectionDAG lowering I'm using a -1 TargetConstant to hide it from instruction selection and treat it differently than if the user used -1. A user -1 should be selected to a register since it doesn't fit in uimm5. This is the rest of the changes started in D109110. As mentioned there, I don't have a failing test from MachineIR optimizations anymore. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D109116	2021-09-03 09:19:25 -07:00
Evandro Menezes	cd6064bb9e	[RISCV] Improve shrink wrap test (NFC) Restore test for shrink wrapping disabled.	2021-09-02 12:14:04 -05:00
Craig Topper	498e8ae412	[RISCV] Add Zba command line to rv64i-exhaustive-w-insts.ll Zba adds a zext.w pseudoinstruction using ADDUW. This can simplify the generated code for many of these tests. There are at least 2 suboptimal cases in this config that I've marked with TODOs.	2021-09-02 08:36:27 -07:00
Craig Topper	eaa560582a	[RISCV] Remove stale TODOs from test. NFC These were fixed by D106230.	2021-09-02 08:36:27 -07:00
Craig Topper	b5fd6b46f5	[RISCV] Teach instruction selection to elide sext.w in some cases. If a sext_inreg is up for isel, and all its users are W instructions, we can skip emitting the sext_inreg. This helpful if the producing instruction can't become a W instruction. Reviewed By: asb Differential Revision: https://reviews.llvm.org/D108966	2021-09-02 07:54:34 -07:00
Evandro Menezes	5ebdb07e7e	[RISCV] Enable shrink wrap by default Differential Revision: https://reviews.llvm.org/D109037	2021-09-02 09:47:58 -05:00
Craig Topper	e4e69ba4d1	[RISCV] Split PseudoVSETVLI into 2 instructions to allow different register classes for rs1. X0 has special meaning for vsetvli, we need to make sure we never create it a vsetvli that uses it by accident. This could happen if the register coalescer coalesces a copy from X0 into this instruction. This patch splits the instruction so that we can have GPRNoX0 register class to use for the cases where we don't want the source to be X0. The verifier won't let us explicitly use X0 on a GPRNoX0 operand so we need a separate pseudo for those cases. I don't currently have a failing example for this. There was a failure in D107957, but the coalescable copy from that example should have been optimized away much earlier so I've fixed that. This is not a complete fix. We still need to prevent the same possible issue on the AVL operand of all of the vector instruction pseudos. I don't want to make two versions of all of those so we need to find a different solution for those. I have an idea I'm going to try. Differential Revision: https://reviews.llvm.org/D109110	2021-09-02 07:45:31 -07:00
Ben Shi	9621bbdf62	[RISCV][test] Add more tests for (mul (add x, c1), c2) Reviewed By: asb Differential Revision: https://reviews.llvm.org/D108606	2021-09-02 17:30:03 +08:00
Ben Shi	e47ab56398	[RISCV][test] Add tests for optimization with SH*ADD in the zba extension Reviewed By: asb Differential Revision: https://reviews.llvm.org/D108915	2021-09-02 17:30:03 +08:00
Fraser Cormack	ef78f2106c	[LegalizeTypes][VP] Add splitting support for binary VP ops This patch extends D107904's introduction of vector-predicated (VP) operation legalization to include vector splitting. When the result of a binary VP operation needs splitting, all of its operands are split in kind. The two operands and the mask are split as usual, and the vector-length parameter EVL is "split" such that the low and high halves each execute the correct number of elements. Tests have been added to the RISC-V target to show splitting several scenarios for fixed- and scalable-vector types. Without support for `umax` (e.g. in the `B` extension) the generated code starts to branch. Ideally a cost model would prevent their insertion in the first place. Through these tests many opportunities for better codegen can be seen: combining known-undef VP operations and for constant-folding operations on `ISD::VSCALE`, to name but a few. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D107957	2021-09-02 10:15:53 +01:00
Craig Topper	ccbb4c8b4f	[RISCV] Fold (RISCVISD::SELECT_CC X, Y, CC, Z, Z) -> Z. If the true and false values are the same, we don't need a SELECT_CC. This would normally be folded before a select is legalized to select_cc. The test case exploits the late legalization of vscale to trigger a case where they become identical after legalization. This works around an issue found on a test case in D107957. In that case the true/false values were both eventually 0 and the select was used by a vector AVL operand. The select_cc got expanded to control flow and a phi, but the phi inputs were both copies from X0. MachineIR optimizations simplified this to a single copy from X0 going into the vector instruction. This became the input of a vsetvli after vsetvli insertion. Then register coalescing folded the copy into the vsetvli. X0 as the source of a vsetvli is a special encoding and should not be created by coalesing. We need to fix our vsetvli handling to make sure this can never happen any other way, but removing the unneeded select is still a worthwhile optimization.	2021-09-01 12:37:52 -07:00
Craig Topper	af1ca4353e	[RISCV] Add a test case showing an extra sext.w near a sh2add with multiple uses. NFC See description in test. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D108965	2021-09-01 11:01:05 -07:00
Nick Desaulniers	e9b3f25730	[RISCVISelLowering] avoid emitting libcalls to __mulodi4() and __multi3() Similar to D108842, D108844, D108926, D108928, and D108936. __has_builtin(builtin_mul_overflow) returns true for 32b RISCV targets, but Clang is deferring to compiler RT when encountering long long types. If the semantics of __has_builtin mean "the compiler resolves these, always" then we shouldn't conditionally emit a libcall. Link: https://bugs.llvm.org/show_bug.cgi?id=28629 Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D108939	2021-08-31 11:23:56 -07:00
Craig Topper	0560a4adb3	[RISCV] Enable CONCAT_VECTORS for fixed FP vectors. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D108487	2021-08-30 08:47:45 -07:00
Craig Topper	705d005781	[DAGCombiner][RISCV] Don't use vector types in DAGCombiner::tryStoreMergeOfLoads if we need a rotate. The check for whether a rotate is possible occurs before the memory legality checks for the integer type. So it's possible we decide we can use a rotate, but then fail the legality checks. If that happens we should not fall back to a vector type. This triggers an assertion in the rotate handling when it finds a vector type instead of an integer type. In theory we could use a shufflevector in place of the rotate, but right now I'd just like to fix the crash. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D108839	2021-08-30 08:47:15 -07:00
Craig Topper	0eeab8b282	[RISCV] Add -riscv-v-fixed-length-vector-elen-max to limit the ELEN used for fixed length vectorization. This adds an ELEN limit for fixed length vectors. This will scalarize any elements larger than this. It will also disable some fractional LMULs. For example, if ELEN=32 then mf8 becomes illegal, i32/f32 vectors can't use any fractional LMULs, i16/f16 can only use mf2, and i8 can use mf2 and mf4. We may also need something for the scalable vectors, but that has interactions with the intrinsics and we can't scalarize a scalable vector. Longer term this should come from one of the Zve* features	2021-08-27 10:17:35 -07:00
Craig Topper	1b9417454e	[RISCV] Insert a sext_inreg when type legalizing i32 shl by constant on RV64. Similar to what we do for add/sub/mul. This can help remove some sext.w. There are some regressions on some bswap tests, but I have an idea how to fix that for a follow up. A new PACKW pattern is added to handle the new sext_inreg placement. Differential Revision: https://reviews.llvm.org/D108663	2021-08-26 10:20:19 -07:00
Craig Topper	8bb24289f3	[SelectionDAG] Optimize bitreverse expansion to minimize the number of mask constants. We can halve the number of mask constants by masking before shl and after srl. This can reduce the number of mov immediate or constant materializations. Or reduce the number of constant pool loads for X86 vectors. I think we might be able to do something similar for bswap. I'll look at it next. Differential Revision: https://reviews.llvm.org/D108738	2021-08-26 09:33:24 -07:00
Craig Topper	ccd364286b	[RISCV] Fix the check prefixes in some B extension tests. NFC Looks like a bad merge happened after these were renamed in D107992.	2021-08-25 14:26:51 -07:00
Stanislav Mekhanoshin	92c1fd19ab	Allow rematerialization of virtual reg uses Currently isReallyTriviallyReMaterializableGeneric() implementation prevents rematerialization on any virtual register use on the grounds that is not a trivial rematerialization and that we do not want to extend liveranges. It appears that LRE logic does not attempt to extend a liverange of a source register for rematerialization so that is not an issue. That is checked in the LiveRangeEdit::allUsesAvailableAt(). The only non-trivial aspect of it is accounting for tied-defs which normally represent a read-modify-write operation and not rematerializable. The test for a tied-def situation already exists in the /CodeGen/AMDGPU/remat-vop.mir, test_no_remat_v_cvt_f32_i32_sdwa_dst_unused_preserve. The change has affected ARM/Thumb, Mips, RISCV, and x86. For the targets where I more or less understand the asm it seems to reduce spilling (as expected) or be neutral. However, it needs a review by all targets' specialists. Differential Revision: https://reviews.llvm.org/D106408	2021-08-24 11:09:02 -07:00
Ben Shi	f69fb7ac72	[DAGCombiner] Add target hook function to decide folding (mul (add x, c1), c2) Reviewed by: lebedev.ri, spatel, craig.topper, luismarques, jrtc27 Differential Revision: https://reviews.llvm.org/D107711	2021-08-22 16:53:32 +08:00
Ben Shi	5b6c9a5ab0	[RISCV] Optimize add in the zba extension with SHADD Optimize (add x, c) to (SHADD (c>>b), x) if c is not simm12 while (c>>b) is simm12 and c has b trailing zeros. Reviewed By: luismarques Differential Revision: https://reviews.llvm.org/D108193	2021-08-20 22:41:49 +08:00
Fraser Cormack	5b06cbac11	[RISCV] Fix reporting of incorrect commutable operand indices This patch fixes an issue where RISCV's `findCommutedOpIndices` would incorrectly return the pseudo `CommuteAnyOperandIndex` as a commutable operand index, rather than fixing a specific index. Reviewed By: rogfer01 Differential Revision: https://reviews.llvm.org/D108206	2021-08-20 10:27:15 +01:00
David Green	d10f23a25d	[ISel] Expand saddsat and ssubsat via asr and xor This changes the lowering of saddsat and ssubsat so that instead of using: r,o = saddo x, y c = setcc r < 0 s = c ? INTMAX : INTMIN ret o ? s : r into using asr and xor to materialize the INTMAX/INTMIN constants: r,o = saddo x, y s = ashr r, BW-1 x = xor s, INTMIN ret o ? x : r https://alive2.llvm.org/ce/z/TYufgD This seems to reduce the instruction count in most testcases across most architectures. X86 has some custom lowering added to compensate for cases where it can increase instruction count. Differential Revision: https://reviews.llvm.org/D105853	2021-08-19 16:08:07 +01:00
Ben Shi	b10e74389e	[RISCV][test] Improve tests for (add (mul x, c1), c2) Reviewed By: luismarques Differential Revision: https://reviews.llvm.org/D107710	2021-08-19 21:04:35 +08:00
Fraser Cormack	e6b1ac8546	[LegalizeTypes][VP] Add widening support for binary VP ops This patch adds the beginnings of more thorough support in the legalizers for vector-predicated (VP) operations. The first step is the ability to widen illegal vectors. The more complicated scenario in which the result/operands need widening but the mask doesn't has not been handled here. That would require a lot of code without an in-tree target on which to test it. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D107904	2021-08-19 13:08:47 +01:00
Ben Shi	9e40a32620	[RISCV][test] Add new tests for add optimization in the zba extension Reviewed By: asb Differential Revision: https://reviews.llvm.org/D108188	2021-08-19 19:59:23 +08:00
Craig Topper	3f9b37ccb1	[RISCV] Remove sext_inreg+add/sub/mul/shl isel patterns. Let the sext_inreg be selected to sext.w. Remove unneeded sext.w during PostProcessISelDAG. This gives opportunities for some other isel patterns to match like the ADDIPair or matching mul with immediate to shXadd. This becomes possible after D107658 started selecting W instructions based on users. The sext.w will be considered a W user so isel will often select a W instruction for the sext.w input and we can just remove the sext.w. Otherwise we can combine the sext.w with a ADD/SUB/MUL/SLLI to create a new W instruction in parallel to the the original instruction. Reviewed By: luismarques Differential Revision: https://reviews.llvm.org/D107708	2021-08-18 11:07:11 -07:00
Craig Topper	6d7ea597ef	[RISCV] Insert sext_inreg when type legalizing add/sub/mul with constant LHS. We already do this for non-constants RHS. This just removes the special case. I believe the special case may have been needed because the ANY_EXTEND of a constant used to create zero extended constants, but we recently changed that to produce sign extended constants. D107658 is needed to prevent some regressions. Reviewed By: luismarques Differential Revision: https://reviews.llvm.org/D107697	2021-08-18 10:44:25 -07:00
Craig Topper	20e6265873	[RISCV] Improve constant materialization for stores of i16 or i32 negative constants. DAGCombiner::visitStore can clear the upper bits of constants used by stores. This leads prevents them from being recognized as sign extended negative values making them more expensive to materialize. This patch uses the hasAllNBitUsers method from D107658 to make a negative constant if none of the users care about the upper bits. Reviewed By: luismarques Differential Revision: https://reviews.llvm.org/D108052	2021-08-18 10:25:12 -07:00
Craig Topper	d9ba1a9c5c	[RISCV] Teach isel to select ADDW/SUBW/MULW/SLLIW when only the lower 32-bits are used. We normally select these when the root node is a sext_inreg, but SimplifyDemandedBits can sometimes bypass the sext_inreg for some users. This can create situation where sext_inreg+add/sub/mul/shl is selected to a W instruction, and then the add/sub/mul/shl is separately selected to a non-W instruction with the same inputs. This patch tries to detect when it would still be ok to use a W instruction without the sext_inreg by checking the direct users. This can allow the W instruction to CSE with one created for a sext_inreg+add/sub/mul/shl. To minimize complexity and cost of checking, we make no attempt to determine if the CSE will happen and just always use a W instruction when we can. Differential Revision: https://reviews.llvm.org/D107658	2021-08-18 10:22:00 -07:00
Petr Hosek	2d4470ab89	Revert "Allow rematerialization of virtual reg uses" This reverts commit `877572cc19` which introduced PR51516.	2021-08-18 00:12:41 -07:00
jacquesguan	a7ebc4d145	[DAGCombiner] Teach isKnownToBeAPowerOfTwo handle SPLAT_VECTOR Make DAGCombine turn mul by power of 2 into shl for scalable vector. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D107883	2021-08-18 10:10:40 +08:00
Stanislav Mekhanoshin	877572cc19	Allow rematerialization of virtual reg uses Currently isReallyTriviallyReMaterializableGeneric() implementation prevents rematerialization on any virtual register use on the grounds that is not a trivial rematerialization and that we do not want to extend liveranges. It appears that LRE logic does not attempt to extend a liverange of a source register for rematerialization so that is not an issue. That is checked in the LiveRangeEdit::allUsesAvailableAt(). The only non-trivial aspect of it is accounting for tied-defs which normally represent a read-modify-write operation and not rematerializable. The test for a tied-def situation already exists in the /CodeGen/AMDGPU/remat-vop.mir, test_no_remat_v_cvt_f32_i32_sdwa_dst_unused_preserve. The change has affected ARM/Thumb, Mips, RISCV, and x86. For the targets where I more or less understand the asm it seems to reduce spilling (as expected) or be neutral. However, it needs a review by all targets' specialists. Differential Revision: https://reviews.llvm.org/D106408	2021-08-16 12:42:42 -07:00
Simon Pilgrim	d6fe8d37c6	[DAG] Fold concat_vectors(concat_vectors(x,y),concat_vectors(a,b)) -> concat_vectors(x,y,a,b) Follow-up to D107068, attempt to fold nested concat_vectors/undefs, as long as both the vector and inner subvector types are legal. This exposed the same issue in ARM's MVE LowerCONCAT_VECTORS_i1 (raised as PR51365) and AArch64's performConcatVectorsCombine which both assumed concat_vectors only took 2 subvector operands. Differential Revision: https://reviews.llvm.org/D107597	2021-08-16 16:06:54 +01:00
Craig Topper	d63f117210	[RISCV] Support RISCVISD::SELECT_CC in ComputeNumSignBitsForTargetNode.	2021-08-13 18:00:09 -07:00
Craig Topper	a2556bf44c	[RISCV] Improve check prefixes in B extension tests. NFC -Add Z for the B extension subextensions. -Don't mention I along with B or its sub extensions. This is based on comments in D107817. Differential Revision: https://reviews.llvm.org/D107992	2021-08-12 12:41:40 -07:00
Craig Topper	e25665f52e	[RISCV] Add test cases showing inefficient materialization for stores of immediates. NFC DAGCombiner::visitStore can call GetDemandedBits which will remove upper bits from immediates. The upper bits are important for good materialization of negative constants on RISCV. GetDemandedBits is a different mechanism than SimplifyDemandedBits so TargetShrinkDemandedConstant can't block it. As far as I know this behavior is unique to stores. I think we can fix this in isel using a concept similar to D107658. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D107860	2021-08-12 10:14:07 -07:00
Craig Topper	79fbddbea0	[RISCV] Teach vsetvli insertion pass that it doesn't need to insert vsetvli for unit-stride or strided loads/stores in some cases. For unit-stride and strided load/stores we set the SEW operand of the pseudo instruction equal the EEW in the opcode. The LMUL of the pseudo instruction is the LMUL we want. These instructions calculate EMUL=(EEW/SEW) * LMUL. We can use this to avoid changing vtype if the SEW/LMUL of the previous vtype matches the EEW/EMUL ratio we need for the instruction. Due to how the global analysis works, we can only do this optimization when the previous vsetvli was produced in the block containing the store. We need to know in the first phase if the vsetvli will be inserted so we can propagate information to the successors in the second phase correctly. This means we can't depend on predecessors. Reviewed By: rogfer01 Differential Revision: https://reviews.llvm.org/D106601	2021-08-12 10:05:27 -07:00
Ben Shi	0247403910	[RISCV][test] Add new tests for mul optimization in the zba extension with SH*ADD Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D107817	2021-08-11 09:56:45 +08:00
Craig Topper	6d5b14d854	[RISCV] Remove stale TODO from test. NFC	2021-08-10 13:13:48 -07:00
Craig Topper	6f5edc3487	[RISCV] Fold (add (select lhs, rhs, cc, 0, y), x) -> (select lhs, rhs, cc, x, (add x, y)) Similar for sub except sub isn't commutative. Modify the existing and/or/xor folds to also work on ISD::SELECT and not just RISCVISD::SELECT_CC. This is needed to make sure we do this transform before type legalization turns i32 add/sub into add/sub+sign_extend_inreg on RV64. If we don't do this before that, the sign_extend_inreg will still be after the select. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D107603	2021-08-10 09:02:56 -07:00
Fraser Cormack	2b4a1d4b86	[RISCV] Improve codegen for shuffles with LHS/RHS splats Shuffles which are broken into separate halves reveal splats in which a half is accessed via one index; such operations can be optimized to use "vrgather.vi". This optimization could be achieved by adding extra patterns to match `vrgather_vv_vl` which uses a splat as an index operand, but this patch instead identifies splat earlier. This way, future optimizations can build on top of the data gathered here, e.g., to splat-gather dominant indices and insert any leftovers. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D107449	2021-08-09 10:31:40 +01:00
Fraser Cormack	b5c608c377	[RISCV] Add tests covering shuffles which can be optimized These shuffles all take the form of a "splat" of the LHS and/or RHS to some degree, with one or two elements needing patched up afterwards. We currently lower all of these to full LHS/RHS vector-index shuffles with vrgather.vv. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D107447	2021-08-09 10:20:42 +01:00
Craig Topper	2f3b738960	[RISCV] Add optimizations for FMV_X_ANYEXTH similar to FMV_X_ANYEXTW_RV64. This enables the fneg and fabs combines we have for FMV_X_ANYEXTW_RV64.	2021-08-08 18:30:48 -07:00
Craig Topper	6606936322	[RISCV] Remove -target-abi from half-bitmanip-dagcombines.ll. This should be testing the custom ISD nodes we use for passing half values in GPRs. We should optimize these to integer operations, but we currently don't.	2021-08-08 18:19:35 -07:00
Craig Topper	88bc29f5f2	[RISCV] Introduce a RISCV CondCode enum instead of using ISD:SET* in MIR. NFC Previously we converted ISD condition codes to integers and stored them directly in our MIR instructions. The ISD enum kind of belongs to SelectionDAG so that seems like incorrect layering. This patch instead uses a CondCode node on RISCV::SELECT_CC until isel and then converts it from ISD encoding to a RISCV specific value. This value can be converted to/from the RISCV branch opcodes in the RISCV namespace. My larger motivation is to possibly support a microarchitectural feature of some CPUs where a short forward branch over a single instruction can be predicated internally. This will require a new pseudo instruction for select that needs to carry a branch condition and live probably until RISCVExpandPseudos. At that point it can be expanded to control flow without other instructions ending up in the predicated basic block. Using an ISD encoding in RISCVExpandPseudos doesn't seem like correct layering. Reviewed By: luismarques Differential Revision: https://reviews.llvm.org/D107400	2021-08-08 17:25:37 -07:00
Craig Topper	5894134c6e	[RISCV] Autogenerate test. NFC	2021-08-07 17:11:11 -07:00
Craig Topper	d4ee84ceee	[RISCV] Support FP_TO_S/UINT_SAT for i32 and i64. The fcvt fp to integer instructions saturate if their input is infinity or out of range, but the instructions produce a maximum integer for nan instead of 0 required for the ISD opcodes. This means we can use the instructions to do the saturating conversion, but we'll need to fix up the nan case at the end. We can probably improve the i8 and i16 default codegen as well, but I'll leave that for a follow up. Reviewed By: luismarques Differential Revision: https://reviews.llvm.org/D107230	2021-08-07 16:06:00 -07:00
Craig Topper	f7076cfd3a	[DAGCombiner][RISCV][AMDGPU] Call SimplifyDemandedBits at the end of visitMULHU to enable known bits contant folding. We don't have real demanded bits support for MULHU, but we can still use the known bits based constant folding support at the end of SimplifyDemandedBits to simplify a MULHU. This helps with cases where we know the LHS and RHS have enough leading zeros so that the high multiply result is always 0. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D106471	2021-08-05 08:31:26 -07:00

1 2 3 4 5 ...

1038 Commits