clang-p2996

Author	SHA1	Message	Date
Craig Topper	f1fd5c9b36	[RISCV] Remove pseudos for whole register load, store, and move. The MC layer instructions have the correct register classes, and the pseudos don't have any additional operands. So there doesn't seem to be any reason for them to exist. The pseudos were incorrectly going through code in RISCVMCInstLower that converted LMUL>1 register classes to LMUL1 register class. This makes the MCInst technically malformed, and prevented the vl2r.v, vl4r.v, and vl8r.v InstAliases from matching. This accounts for all of the .ll test diffs. Differential Revision: https://reviews.llvm.org/D139511	2022-12-07 10:19:58 -08:00
Craig Topper	938d0d6d7b	[RISCV] Replace uses of hasStdExtC with COrZca. Except MakeCompressible which will need more work. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D139504	2022-12-07 09:34:01 -08:00
Anton Sidorenko	f8ed709345	[MachineCombiner] Extend reassociation logic to handle inverse instructions Machine combiner supports generic reassociation only of associative and commutative instructions, for example (A + X) + Y => (X + Y) + A. However, we can extend this generic support to handle patterns like (X + A) - Y => (X - Y) + A), where `-` is the inverse of `+`. This patch adds interface functions to process reassociation patterns of associative/commutative instructions and their inverse variants with minimal changes in backends. Differential Revision: https://reviews.llvm.org/D136754	2022-12-07 13:50:28 +03:00
Yeting Kuo	0f8c761c48	[VP][RISCV] Recommit "Add vp.fshl/fshr and RISC-V support." This reverts commit `7883e5b061`. The original commit was reverted that it didn't update test files after D136263 landed. The recommit fixed those. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D139509	2022-12-07 15:58:12 +08:00
Kazu Hirata	7883e5b061	Revert "[VP][RISCV] Add vp.fshl/fshr and RISC-V support." This reverts commit `70de0e0140`. I'm seeing: Failed Tests (2): LLVM :: CodeGen/RISCV/rvv/fixed-vectors-fshr-fshl-vp.ll LLVM :: CodeGen/RISCV/rvv/fshr-fshl-vp.ll Also reported at: https://lab.llvm.org/buildbot/#/builders/123/builds/14531	2022-12-06 22:27:43 -08:00
Monk Chiang	7b50c18360	[RISCV] Codegen support for Zfhmin. The Zfhmin subset only has FLH, FSH, FMV.X.H, FMV.H.X, FCVT.S.H, and FCVT.H.S. If the D extension is present, the FCVT.D.H and FCVT.H.D instructions are also included. Since most instructions are not included for Zfhmin, so most operations are promoted. The patch primarily about making f16 a legal type. RISC-V ISA info: https://wiki.riscv.org/display/HOME/Recently+Ratified+Extensions Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D139391	2022-12-06 22:14:15 -08:00
Yeting Kuo	70de0e0140	[VP][RISCV] Add vp.fshl/fshr and RISC-V support. The patch made VectorLegalizer expand ISD::VP_FSHL and ISD::VP_FSHR to achieve the codegen. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D138379	2022-12-07 12:16:36 +08:00
Craig Topper	8d30b9e64f	[RISCV] Move VSPILL/VRELOAD expansion for vector tuples to eliminateFrameIndex. We need a scratch GPR to increment the base pointer for each subsequent register. We currently reuse the input GPR for the base pointer without declaring it as a Def of the pseudo. We can't add it as a Def of the pseudo at creation time because it doesn't get register allocated. This was tried in D109405. Seems the only choice we have is to scavenge the GPR. This patch moves the expansion to eliminateFrameIndex where we can create virtual registers that will be scavenged. This also eliminates the extra operand for passing vlenb from frame lowering to expand pseudos. I need to do more testing on real world code, but wanted to get this up for early review. I hope this will fix the issue reported in D123394, but I haven't checked yet. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D139169	2022-12-06 15:42:00 -08:00
Craig Topper	1806ce9097	[RISCV] Teach RISCVMatInt to prefer li+slli over lui+addi(w) for compressibility. With C extension, li with a 6 bit immediate followed by slli is 4 bytes. The lui+addi(w) sequence is at least 6 bytes. The two sequences probably have similar execution latency. The exception being if the target supports lui+addi(w) macrofusion. Since the execution latency is probably the same I didn't restrict this to C extension. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D139135	2022-12-06 10:31:17 -08:00
Sanjay Patel	772c2f461b	[AArch64][RISCV][x86] add tests for masked val equality with 0; NFC	2022-12-06 11:34:48 -05:00
Sergey Kachkov	132dc442ba	[RISCV] Generate .cfi_def_cfa_expression for RVV stack adjustment Cannonical frame address after RVV stack adjustment is sp + StackSize + RVVStackSize * vlenb, and since vlenb is unknown at compile-time (but it is a constant for particular HW implementation), emit .cfi_def_cfa_expression so libunwind can read VLENB CSR register at run-time and obtain correct frame address. Fixes https://github.com/llvm/llvm-project/issues/58356 (but additional run-time support for reading CSR may be required) Differential Revision: https://reviews.llvm.org/D136263	2022-12-06 12:45:59 +03:00
jacquesguan	f7a46aa8fb	[RISCV] Fold vector binary operatrion into select with identity constant. This patch implements shouldFoldSelectWithIdentityConstant for RISCV. It would try to generate vmerge after the binary instruction and let them folded to maksed instruction later. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D131551	2022-12-06 11:19:31 +08:00
jacquesguan	6392cf331a	[RISCV][test] Add pre-commit test for D131551. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D131950	2022-12-06 10:38:25 +08:00
ChunyuLiao	85834d8685	[RISCV]Keep (select c, 0/-1, X) during PerformDAGCombine D135833, lowerSelect: (select C, -1/0, X) -> or/and Keep (select c, 0/-1, X), thus making better use of lowerSelect to eliminate branch instructions. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D139272	2022-12-06 09:26:29 +08:00
Dmitry Vyukov	dbe8c2c316	Use-after-return sanitizer binary metadata Currently per-function metadata consists of: (start-pc, size, features) This adds a new UAR feature and if it's set an additional element: (start-pc, size, features, stack-args-size) Reviewed By: melver Differential Revision: https://reviews.llvm.org/D136078	2022-12-05 14:40:31 +01:00
Philip Reames	b775333068	[RISCV] Fold low 12 bits into instruction during frame index elimination Fold the low 12 bits of an immediate offset into the offset field of the using instruction. That using instruction will be a load, store, or addi which performs an add of a signed 12-bit immediate as part of it's operation. Splitting out the low bits allows the high bits to be generated via a single LUI instead of needing an LUI/ADDI pair. The codegen effect of this is mostly converting cases where "split addi" kicks in to using LUI + a folded offset. There are a couple of straight dynamic instruction count wins, and using a canonical LUI is probably better than a chain of SP adds if the dynamic instruction count is equal. Differential Revision: https://reviews.llvm.org/D139037	2022-12-02 11:54:06 -08:00
Craig Topper	e00e20a055	[RISCV] Add ADDW/AND/OR/XOR/SUB/SUBW to getRegAllocHints. These instructions requires both register operands to be compressible so I've only applied the hint if we already have a GPRC physical register assigned for the other register operand. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D139079	2022-12-01 11:09:38 -08:00
ZHU Zijia	010a8f7a90	[CodeGen] Fix restore blocks' BasicBlock information in branch relaxation In branch relaxation pass, restore blocks are created and placed before the jump destination if indirect branches are required. For example: foo sd s11, 0(sp) jump .restore, s11 bar bar bar j .dest .restore: ld s11, 0(sp) .dest: baz The BasicBlock information of the restore MachineBasicBlock should be identical to the dest MachineBasicBlock. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D131863	2022-12-02 02:42:22 +08:00
ZHU Zijia	3adf828a0e	[CodeGen][test] Pre-commit test for D131863 Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D131862	2022-12-02 02:39:14 +08:00
Jonas Paulsson	8ef4632681	Revert "[CodeGen] Add new pass for late cleanup of redundant definitions." Temporarily revert and fix buildbot failure. This reverts commit `6d12599fd4`.	2022-12-01 13:29:24 -05:00
Jonas Paulsson	6d12599fd4	[CodeGen] Add new pass for late cleanup of redundant definitions. A new pass MachineLateInstrsCleanup is added to be run after PEI. This is a simple pass that removes redundant and identical instructions whenever found by scanning the MF once while keeping track of register definitions in a map. These instructions are typically immediate loads resulting from rematerialization, and address loads emitted by target in eliminateFrameInde(). This is enabled by default, but a target could easily disable it by means of 'disablePass(&MachineLateInstrsCleanupID);'. This late cleanup is naturally not "optimal" in removing instructions as it is done by looking at phys-regs, but still quite effective. It would be desirable to improve other parts of CodeGen and avoid these redundant instructions in the first place, but there are no ideas for this yet. Differential Revision: https://reviews.llvm.org/D123394 Reviewed By: RKSimon, foad, craig.topper, arsenm, asb	2022-12-01 13:21:35 -05:00
Freddy Ye	89f36dd8f3	[X86] Add ExpandLargeFpConvert Pass and enable for X86 As stated in https://discourse.llvm.org/t/rfc-llc-add-expandlargeintfpconvert-pass-for-fp-int-conversion-of-large-bitint/65528, this implementation is very similar to ExpandLargeDivRem, which expands ‘fptoui .. to’, ‘fptosi .. to’, ‘uitofp .. to’, ‘sitofp .. to’ instructions with a bitwidth above a threshold into auto-generated functions. This is useful for targets like x86_64 that cannot lower fp convertions with more than 128 bits. The expanded nodes are referring from the IR generated by `compiler-rt/lib/builtins/floattidf.c`, `compiler-rt/lib/builtins/fixdfti.c`, and etc. Corner cases: 1. For fp16: as there is no related builtins added in compliler-rt. So I mainly utilized the fp32 <-> fp16 lib calls to implement. 2. For fp80: as this pass is soft fp emulation and no fp80 instructions can help in this problem. I recommend users to deprecate this usage. For now, the implementation uses fp128 as the temporary conversion type and inserts fptrunc/ext at top/end of the function. 3. For bf16: as clang FE currently doesn't support bf16 algorithm operations (convert to int, float, +, -, *, ...), this patch doesn't consider bf16 for now. 4. For unsigned FPToI: since both default hardware behaviors and libgcc are ignoring "returns 0 for negative input" spec. This pass follows this old way to ignore unsigned FPToI. See this example: https://gcc.godbolt.org/z/bnv3jqW1M The end-to-end tests are uploaded at https://reviews.llvm.org/D138261 Reviewed By: LuoYuanke, mgehre-amd Differential Revision: https://reviews.llvm.org/D137241	2022-12-01 13:47:43 +08:00
Craig Topper	df7ab6a52e	[RISCV] Add ANDI to getRegAllocationHints.	2022-11-30 20:59:02 -08:00
Marco Elver	b95646fe70	Revert "Use-after-return sanitizer binary metadata" This reverts commit `d3c851d3fc`. Some bots broke: - https://luci-milo.appspot.com/ui/p/fuchsia/builders/toolchain.ci/clang-linux-x64/b8796062278266465473/overview - https://lab.llvm.org/buildbot/#/builders/124/builds/5759/steps/7/logs/stdio	2022-11-30 23:35:50 +01:00
Craig Topper	a8c79121bf	[RISCV] Teach getRegAllocationHints about compressible SRAI/SRLI. Similar to previous patches for ADDI/ADDIW/SLLI/ADD, but restricted to only cases where the register is x8-x15(GPRC reg class). I've restricted it so that we can be precise about whether the resulting instruction would be compressible. Changing the register allocation may make some other instruction not compressible so we should try to be accurate. Reviewed By: asb Differential Revision: https://reviews.llvm.org/D138740	2022-11-30 10:28:57 -08:00
Philip Reames	ac1ec9e290	[RISCV] Share code for fixed offsets adjustRegs (thus materializing fewer constants) This reuses the existing optimized implementation of adjustReg, and commons up code. This has the effect of enabling two code changes for the new caller. First, we enable the "split andi" lowering (with no alignment requirement), and second we use a sub with smaller constant in register instead of a add with negative constant in register. Differential Revision: https://reviews.llvm.org/D132839	2022-11-30 09:28:29 -08:00
Dmitry Vyukov	d3c851d3fc	Use-after-return sanitizer binary metadata Currently per-function metadata consists of: (start-pc, size, features) This adds a new UAR feature and if it's set an additional element: (start-pc, size, features, stack-args-size) Reviewed By: melver Differential Revision: https://reviews.llvm.org/D136078	2022-11-30 14:50:22 +01:00
Philip Reames	fc0efb7e78	[SDAG] Allow scalable vectors in ComputeNumSignBits (try 2) I had reverted this before the holiday week because a problem was reported with a related change (D137140 - scalable vector known bits in DAG). I had initially confused the two patches, and then decided to leave this reverted out an abundance of caution. Now that we're through the holiday week, reapplying. I also roled in fixes for several post commit review comments that hadn't landed with the original change. Original commit message This is a continuation of the series of patches adding lane wise support for scalable vectors in various knownbit-esq routines. The basic idea here is that we track a single lane for scalable vectors which corresponds to an unknown number of lanes at runtime. This is enough for us to perform lane wise reasoning on many arithmetic operations. Differential Revision: https://reviews.llvm.org/D137141	2022-11-29 08:25:05 -08:00
Philip Reames	5583972fe1	[RISCV] Simplify eliminateFrameIndex in advance of reuse [nfc-ish] The prior code intermixed several concerns - the actual materialization of the offset, the choice of destination register, and whether to prune the ADDI. This version factors the first part out, and then reasons only about the later two. My intention is to merge the adjustReg routine with the one from frame lowering, and then explore using the merged result to simplify frame setup and tear down. This change is conceptually NFC, but since it results in slightly different vreg usage, the end result can change register allocation in minor ways. Differential Revision: https://reviews.llvm.org/D138502	2022-11-28 10:09:37 -08:00
Craig Topper	64612f5d8e	[RISCV] Add ADD to getRegAllocationHints to improve to improve use of c.add. add can always be compressed to c.add if one of the sources is the same as the destination. The same is not true for c.addw where the registers need to be x8-x15.	2022-11-25 08:59:27 -08:00
Craig Topper	a2b5b584a5	[RISCV] Use register allocation hints to improve use of compressed instructions. Compressed instructions usually require one of the source registers to also be the source register. The register allocator doesn't have that bias on its own. This patch adds register allocation hints to introduce this bias. I've started with ADDI, ADDIW, and SLLI. These all have a 5-bit field for the register. If the source and dest register are the same they are guaranteed to compress as long as the immediate is also 6 bits. This code was inspired by similar code from the SystemZ target. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D138242	2022-11-25 08:39:44 -08:00
LiaoChunyu	aa14f002d5	[RISCV] Branchless lowering for (select (x < 0), TrueConstant, FalseConstant) and (select (x >= 0), TrueConstant, FalseConstant) This patch reduces the number of unpredictable branches (select (x < 0), y, z) -> x >> (XLEN - 1) & (y - z) + z (select (x >= 0), y, z) -> x >> (XLEN - 1) & (z - y) + y Reviewed By: craig.topper, reames Differential Revision: https://reviews.llvm.org/D137949	2022-11-25 20:18:30 +08:00
wangpc	241accea2a	[RISCV] Lower unmasked zero-stride vector load to (scalar load + splat) So we have the opportunity to fold splat into .vx instruction as what D101138 has done. If failed, we can select zero-stride vector load again. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D138101	2022-11-24 11:09:45 +08:00
wangpc	25b51b7924	[RISCV] Precommit test for D138101 Add a test that splat can't be folded. Reviewed By: pcwang-thead Differential Revision: https://reviews.llvm.org/D138543	2022-11-24 11:07:19 +08:00
WuXinlong	219417b2c6	[RISCV] Add CodeGen support and MC testcase of RISCV Zca Extension This patch add the support of RISCV Zca ext `Zca` is a subset of C extension instructions that are compatible with the Zc extension. So this patch implements Zca code generation with reference to the C extension and sets the 2-byte alignment for the Zca extension, just like C extension does. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D130483	2022-11-22 17:22:26 +08:00
Craig Topper	24810acb62	[RISCV] Add isel patterns to select slli+shXadd.uw. This matches what we get for something like. %0 = shl i32 %x, C %1 = zext i32 %0 to i64 %2 = getelementptr i32, ptr %y, %1 The shift before the zext and the shift implied by the GEP get combined with an AND after them. We need to split it back into 2 shifts so we can fold one into shXadd.uw. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D137886	2022-11-21 09:32:51 -08:00
Philip Reames	06e2b44c46	[RISCV] Optimize scalable frame setup when VLEN is precisely known If we know the exact value of VLEN, the frame offset adjustment for scalable stack slots becomes a fixed constant. This avoids the need to read vlenb, and may allow the offset to be folded into the immediate field of an add/sub. We could go further here, and fold the offset into a single larger frame adjustment - instead of having a separate scalable adjustment step - but that requires a bit more code reorganization. I may (or may not) return to that in a future patch. Differential Revision: https://reviews.llvm.org/D137593	2022-11-18 15:30:39 -08:00
Philip Reames	102f05bd34	Revert "[SDAG] Allow scalable vectors in ComputeNumSignBits" and follow up This reverts commits `3fb08d14a6` and `f8c63a7fbf`. There was a "timeout for a Halide Hexagon test" reported. Revert until investigation complete.	2022-11-18 15:25:59 -08:00
Philip Reames	f8c63a7fbf	[SDAG] Allow scalable vectors in ComputeNumSignBits This is a continuation of the series of patches adding lane wise support for scalable vectors in various knownbit-esq routines. The basic idea here is that we track a single lane for scalable vectors which corresponds to an unknown number of lanes at runtime. This is enough for us to perform lane wise reasoning on many arithmetic operations. Differential Revision: https://reviews.llvm.org/D137141	2022-11-18 10:50:06 -08:00
Philip Reames	18fda867f4	[RISCV] Optimize scalable frame offset calculation when VLEN is precisely known When we have a precisely known VLEN, we can replace runtime usage of VLENB with compile time constants. This converts offsets involving both fixed and scalable components into fixed offsets. The result is that we avoid the csr read of vlenb, and can often fold the multiply as well. Differential Revision: https://reviews.llvm.org/D137591	2022-11-18 09:56:55 -08:00
luxufan	18c5f3c35d	[RegisterScavenger][RISCV] Don't search for FrameSetup instrs if we were searching from Non-FrameSetup instrs Otherwise, the spill position may point to position where before FrameSetup instructions. In which case, the spill instruction may store to caller's frame since the stack pointer has not been adjustted. Fixes https://github.com/llvm/llvm-project/issues/58286 Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D135693	2022-11-18 15:13:52 +08:00
Han-Kuan Chen	7e6dbfcd9d	[RISCV] Make lowerVECTOR_SHUFFLEAsVSlidedown follow source until not EXTRACT_SUBVECTOR. Current lowerVECTOR_SHUFFLEAsVSlidedown only seeks whether input are EXTRACT_SUBVECTOR and their source are same. The commit will make the function seek input and their source until they are not EXTRACT_SUBVECTOR. Differential Revision: https://reviews.llvm.org/D138025	2022-11-17 22:32:53 -08:00
Han-Kuan Chen	2e58d4bc4b	[RISCV] Pre-commit test. Differential Revision: https://reviews.llvm.org/D138024	2022-11-17 22:32:53 -08:00
YingChi Long	7a715bf317	[VP] Add support for vp.inttoptr & vp.ptrtoint Add vp.inttoptr & vp.ptrtoint support by lowering them into vp.zext / vp.truncate with in SelectionDAGBuilder. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D137169	2022-11-18 10:42:24 +08:00
Anton Sidorenko	b6c790736e	[MachineCombiner][RISCV] Add fmadd/fmsub/fnmsub instructions patterns This patch adds tranformation of fmul+fadd/fsub chains to fused multiply instructions: * fmul+fadd->fmadd * fmul+fsub->fmsub/fnmsub We also will try to combine these instructions if the fmul has more than one use and cannot be deleted. However, removing the dependence between fmul and fadd can still be profitable, and we rely on machine combiner approximations of scheduling. Differential Revision: https://reviews.llvm.org/D136764	2022-11-17 13:24:04 +03:00
Anton Sidorenko	374d076563	[MachineCombiner][RISCV] Precommit tests for D136764	2022-11-17 12:12:46 +03:00
Craig Topper	7e15ea102f	[RISCV] Add a DAG combine to pre-promote (i1 (truncate (i32 (srl X, Y)))) with Zbs on RV64. Type legalization will want to turn (srl X, Y) into RISCVISD::SRLW, which will prevent us from using a BEXT instruction. This is similar to what we do for (i32 (and (srl X, Y), 1)).	2022-11-16 19:07:33 -08:00
Yeting Kuo	ed9638c44b	[VP][RISCV] Add vp.nearbyint and RISC-V support. nearbyint has the property to execute without exception. For not modifying fflags, the patch added new machine opcode PseudoVFROUND_NOEXCEPT_V that expands vfcvt.x.f.v and vfcvt.f.x.v between a pair of frflags and fsflags. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D137685	2022-11-16 14:05:35 +08:00
Yeting Kuo	5c3ca10b09	[VP][RISCV] Add vp.bswap and RISC-V support. The patch also added function expandVPBSWAP to expand ISD::VP_BSWAP nodes. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D137928	2022-11-16 11:36:38 +08:00
wangpc	a214c521f8	[RISCV] Don't use zero-stride vector load for gather if not optimized We may form a zero-stride vector load when lowering gather to strided load. As what D137699 has done, we use `load+splat` for this form if there is no optimized implementation. We restrict this to unmasked loads currently in consideration of the complexity of hanlding all falses masks. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D137931	2022-11-16 10:43:10 +08:00

1 2 3 4 5 ...

2138 Commits