clang-p2996

Author	SHA1	Message	Date
Carl Ritson	e5b0b434f6	[AMDGPU] Refactor MIMG tables to better handle hardware variants Add mimgopc object to represent the opcode allowing different opcodes for different hardware variants. This enables image_atomic_fcmpswap, image_atomic_fmin, and image_atomic_fmax on GFX10 Reviewed By: foad, rampitec Differential Revision: https://reviews.llvm.org/D96309	2021-02-11 13:22:41 +09:00
Craig Topper	5189c5b940	[X86] Simplify patterns for avx512 vpcmp. NFC This removes the commuted PatFrags that only existed to carry an SDNodeXForm in its OperandTransform field. We know all the places that need to use the commuted SDNodeXForm and there is one transform shared by signed and unsigned compares. So just hardcode the the SDNodeXForm where it is needed and use the non commuted PatFrag in the pattern. I think when I wrote this I thought the SDNodeXForm name had to match what is in the PatFrag that is being used. But that's not true. The OperandTransform is only used when the PatFrag is used in an instruction pattern and not a separate Pat pattern. All the commuted cases are Pat patterns.	2021-02-10 19:24:27 -08:00
Jessica Clarke	ca606dc988	[RISCV] More whitespace and comment typo fixes in RISCVInstrInfoC.td	2021-02-11 02:32:36 +00:00
Jessica Clarke	0973ce8596	[RISCV] Fix whitespace in RISCVInstrInfoC.td	2021-02-11 02:23:09 +00:00
Craig Topper	350ab4e617	[RISCV] Use OperandTransform field of ImmLeaf to slightly simplify a couple bitmanip patterns. NFC This binds the SDNodeXForm to the ImmLeaf so we only need to mention the ImmLeaf in both the input and output pattern.	2021-02-10 17:52:07 -08:00
Jessica Paquette	1514f3b2c8	[AArch64][GlobalISel] Don't perform the mul const combine with G_PTR_ADD A G_MUL + G_PTR_ADD can also be folded into a madd. So, conservatively, we shouldn't combine when the G_MUL is used by a G_PTR_ADD either. Differential Revision: https://reviews.llvm.org/D96457	2021-02-10 15:30:45 -08:00
Jessica Paquette	5f7a4d8d05	[AArch64][GlobalISel] Perform load/store extended reg folding with optsize GlobalISel was only doing this with minsize. SDAG does this with optsize. (See: `SelectionDAG::shouldOptForSize()`) This is a 0.3% code size improvement for CTMark at -Os. (Best: 1.1% improvements on lencod + pairlocalalign) Differential Revision: https://reviews.llvm.org/D96451	2021-02-10 14:42:25 -08:00
Jessica Paquette	9283058abb	[AArch64][GlobalISel] Fold G_ADD into the cset for G_ICMP When we have a G_ADD which is fed by a G_ICMP on one side, we can fold it into the cset for the G_ICMP. e.g. Given ``` %cmp = G_ICMP ... %x, %y %add = G_ADD %cmp, %z ``` We would normally emit a cmp, cset, and add. However, `%add` is either `%z` or `%z + 1`. So, we can just use `%z` as the source of the cset rather than wzr, saving an instruction. This would probably be cleaner in AArch64PostLegalizerLowering, but we'd need to change the way we represent G_ICMP to do that, I think. For now, it's easiest to implement in selection. This is a 0.1% code size improvement on CTMark/pairlocalalign at -Os. Example: https://godbolt.org/z/7KdrP8 Differential Revision: https://reviews.llvm.org/D96388	2021-02-10 13:28:01 -08:00
Craig Topper	fc4d780eaf	[RISCV] Remove superfluous semicolon. NFC	2021-02-10 11:20:29 -08:00
Nick Desaulniers	68945a8686	[Thumb2] support `movs pc, lr` alias for `subs pc, lr, #0`/`eret` This is used by the Linux kernel built with CONFIG_THUMB2_KERNEL. Because different operands are not permitted to `movs`, the diagnostics now provide multiple suggestions along the lines of using a non-pc destination operand or lr source operand. Forked from D95586. Signed-off-by: Nick Desaulniers <ndesaulniers@google.com> Reviewed By: DavidSpickett Differential Revision: https://reviews.llvm.org/D96304	2021-02-10 11:00:42 -08:00
Craig Topper	cb161b3a88	[RISCV] Add support for matching .vf forms of fadd/fsub/fmul/fdiv/fma for fixed vectors. fma+neg will come in a different patch since I haven't done it for .vv yet either. Differential Revision: https://reviews.llvm.org/D96375	2021-02-10 10:16:27 -08:00
Craig Topper	0c254b4a69	[RISCV] Add support for selecting vrgather.vx/vi for fixed vector splat shuffles. The test cases extract a fixed element from a vector and splat it into a vector. This gets DAG combined into a splat shuffle. I've used some very wide vectors in the test to make sure we have at least a couple tests where the element doesn't fit into the uimm5 immediate of vrgather.vi so we fall back to vrgather.vx. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D96186	2021-02-10 10:01:56 -08:00
Jay Foad	2114b458b0	[AMDGPU] Fix comments in SILoadStoreOptimizer::offsetsCanBeCombined	2021-02-10 14:49:33 +00:00
Daniel Cederman	ad3b023c88	[Sparc] Support relocatable expressions in the assembler Allow assembler expressions to start with an identifier. This allows for expressions such as ``` b symbol + 4 ``` and ``` mov symEnd - symStart, %g1 ``` The patch builds upon https://reviews.llvm.org/D47136. Reviewed By: joerg Differential Revision: https://reviews.llvm.org/D47458	2021-02-10 14:52:44 +01:00
Fraser Cormack	a3c74d6d53	[RISCV] Add support for selecting vid.v from build_vector This patch optimizes a build_vector "index sequence" and lowers it to the existing custom RISCVISD::VID node. This pattern is common in autovectorized code. The custom node was updated to allow it to be used by both scalable and fixed-length vectors, thus avoiding pattern duplication. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D96332	2021-02-10 10:58:40 +00:00
Simon Pilgrim	eb31c3c5cb	Revert rGe1172959226689a "[X86][AVX] canonicalizeLaneShuffleWithRepeatedOps - merge VPERMILPD ops with different low/high masks." Revert this while I investigate a downstream breakage report.	2021-02-10 10:26:44 +00:00
Sam Parker	9d81ccc02f	[WebAssembly] Enable loop unrolling Enable partial and runtime unrolling with a threshold of 30, which was derived from a large number of kernels running on node and wasmtime for amd64 and aarch64. Unrolling is enabled by default at -O2 and -O3 and is disabled at -Oz and -Os. Compiling with -Os is recommended if the wasm binary size is the most important factor. Differential Revision: https://reviews.llvm.org/D95125	2021-02-10 08:25:46 +00:00
Jessica Paquette	7eee858585	[AArch64][GlobalISel] Fold selects fed by G_PTR_ADD Similar to the case for G_ADD. There was a function in CTMark/pairlocalalign which was missing this case, causing GlobalISel to emit a add + csel when a csinc is all that is necessary. https://godbolt.org/z/ax69E9 Minor code size improvements on CTMark at -Os. Differential Revision: https://reviews.llvm.org/D96390	2021-02-10 00:03:13 -08:00
Jessica Paquette	0e85d63486	[AArch64][GlobalISel] Allow vector load legalization into 128-bit-wide types Similar to `3d25fdc5c2` This fixes bad codegen in cases like so: https://godbolt.org/z/hePhz1 Differential Revision: https://reviews.llvm.org/D96296	2021-02-09 13:35:59 -08:00
Artem Belevich	2aa01ccec3	[CUDA, NVPTX] Allow targeting sm_86 GPUs. The patch only plumbs through the option necessary for targeting sm_86 GPUs w/o adding any new functionality. Differential Revision: https://reviews.llvm.org/D95974	2021-02-09 11:01:10 -08:00
Matt Arsenault	f4ca6d8289	AMDGPU: Fix verifier error with argument passed in CSR SGPR We need to avoid setting the kill flag on the CSR spill if there's an additional use of the register after the spill. This does rely on consistency between the entry block liveins and the MRI's function live ins, which is not something the verifier checks now.	2021-02-09 13:49:44 -05:00
Matt Arsenault	b72a23650f	GlobalISel: Fix using wrong calling convention for callees This was taking the calling convention from the parent function, instead of the callee. Avoids regressions in a future patch when the caller and callee have different type breakdowns. For some reason AArch64's lowerFormalArguments seems to intentionally ignore the parent isVarArg.	2021-02-09 13:48:56 -05:00
Craig Topper	18ff7e045a	[RISCV] Make the min and max vector width command line options more consistent and check their relationship to each other.	2021-02-09 10:47:23 -08:00
Craig Topper	fd5adae02c	[RISCV] Remove SRO* and SLO* instructions from bitmanip. As of the current draft these are no longer being considered for the bitmanip spec. It wasn't clear what sub extension they belonged in in the 0.93 spec. So remove them. They can always be added back if something changes. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D96157	2021-02-09 09:35:05 -08:00
Nico Weber	de1966e542	Revert "[ObjC][ARC] Use operand bundle 'clang.arc.rv' instead of explicitly" This reverts commit `4a64d8fe39`. Makes clang crash when buildling trivial iOS programs, see comment after https://reviews.llvm.org/D92808#2551401	2021-02-09 11:06:32 -05:00
Simon Pilgrim	89d9ff8229	[X86][SSE] foldShuffleOfHorizOp - add SHUFPS v4f32 handling Fold shufps(hop(x,y),hop(z,w)) -> permute(hop(x,z)) - this is very similar to the equivalent unpack fold. I did start trying to convert foldShuffleOfHorizOp to handle generic shuffle masks but we're relying on a lot of special cases at the moment.	2021-02-09 14:18:45 +00:00
Nemanja Ivanovic	a5222aa085	[DAGCombine] Do not remove masking argument to FP16_TO_FP for some targets As of commit `284f2bffc9`, the DAG Combiner gets rid of the masking of the input to this node if the mask only keeps the bottom 16 bits. This is because the underlying library function does not use the high order bits. However, on PowerPC's ELFv2 ABI, it is the caller that is responsible for clearing the bits from the register. Therefore, the library implementation of __gnu_h2f_ieee will return an incorrect result if the bits aren't cleared. This combine is desired for ARM (and possibly other targets) so this patch adds a query to Target Lowering to check if this zeroing needs to be kept. Fixes: https://bugs.llvm.org/show_bug.cgi?id=49092 Differential revision: https://reviews.llvm.org/D96283	2021-02-09 06:33:48 -06:00
Nemanja Ivanovic	f6e4b9fc06	[RISCV] Fix shared libs build Commit `a2d19bad07` introduced a dependency in the RISCV disassembler on two additional libraries (MC, RISCVDesc) which wasn't added to the CMakeLists.txt. This causes shared library builds to break. This patch just adds them to fix failures seen on some bots, such as the PPC64LE Multistage.	2021-02-09 06:14:25 -06:00
Dylan McKay	2ccb941740	[AVR] Fix global references to function symbols References to functions are in program memory and need a `pm()` fixup. This should fix trait objects for Rust on AVR. Differential Revision: https://reviews.llvm.org/D87631 Patch by Alex Mikhalev.	2021-02-10 00:40:49 +13:00
Hsiangkai Wang	a2d19bad07	[RISCV] Use whole register load/store for generic load/store. In vector v0.10, there are whole vector register load/store instructions. I suggest to use the whole register load/store instructions for generic load/store for scalable vector types. It could save up vset{i}vl{i} for these load/store. For fractional LMUL, I keep to use vle{eew}.v/vse{eew}.v instructions to load/store partial vector registers. Differential Revision: https://reviews.llvm.org/D95853	2021-02-09 15:52:04 +08:00
Jinsong Ji	9202806241	Revert "[CostModel] Remove VF from IntrinsicCostAttributes" This reverts commit `502a67dd7f`. This expose a failure in test-suite build on PowerPC, revert to unblock buildbot first, Dave will re-commit in https://reviews.llvm.org/D96287. Thanks Dave.	2021-02-09 02:14:14 +00:00
LemonBoy	45e33e8ba9	[SPARC] Recognize and handle the %lm(sym) operator Reviewed By: joerg Differential Revision: https://reviews.llvm.org/D77737	2021-02-08 19:25:33 -05:00
Hsiangkai Wang	a5b07a221a	[RISCV] Initial support of LoopVectorizer for RISC-V Vector. Define an option -riscv-vector-bits-max to specify the maximum vector bits for vectorizer. Loop vectorizer will use the value to check if it is safe to use the whole vector registers to vectorize the loop. It is not the optimum solution for loop vectorizing for scalable vector. It assumed the whole vector registers will be used to vectorize the code. If it is possible, we should configure vl to do vectorize instead of using whole vector registers. We only consider LMUL = 1 in this patch. This patch just an initial work for loop vectorizer for RISC-V Vector. Differential Revision: https://reviews.llvm.org/D95659	2021-02-09 06:32:18 +08:00
Matt Arsenault	bcf723b2fd	AMDGPU: Stop adding stack passed wide arguments to call conv handler The generated calling convention code shouldn't see these types since we split large types into 32-bit chunks before the calling convention code is triggered. GlobalISel ends up directly calls the generated CC code before checking for the register count breakdown. Arguably this difference is a bug, but this was dead code for the DAG anyway.	2021-02-08 17:09:28 -05:00
Arthur Eubanks	e84a4650eb	[NVPTX][NewPM] Re-enable NVVMReflectPass Disabled alongside NVVMIntrRangePass in https://reviews.llvm.org/D96166, but turns out NVVMIntrRangePass was the issue. Reviewed By: tra Differential Revision: https://reviews.llvm.org/D96291	2021-02-08 13:58:17 -08:00
David Green	0c7e044a7f	[ARM] One-off identity shuffle A One-Off Identity mask is a shuffle that is mostly an identity mask from as single source but contains a single element out-of-place, either from a different vector or from another position in the same vector. As opposed to lowering this via a ARMISD::BUILD_VECTOR we can generate an extract/insert pair directly. Under ARM with individually accessible lane elements this often becomes a simple lane move. This also alters the LowerVECTOR_SHUFFLEUsingMovs code to use v4f32 (not v4i32), a more natural type for lane moves. Differential Revision: https://reviews.llvm.org/D95551	2021-02-08 21:24:32 +00:00
Amara Emerson	ec41ed5b1b	[AArch64][GlobalISel] Support the 'returned' parameter attribute. On AArch64 (which seems to be the only target that supports it), this attribute allows codegen to avoid saving/restoring the value in x0 across a call. Gives a 0.1% geomean -Os code size improvement on CTMark. Differential Revision: https://reviews.llvm.org/D96099	2021-02-08 12:47:39 -08:00
Martin Storsjö	71c29b4cf3	[AArch64] Use '//' as comment string for MSVC assembly As the actual MSVC toolset doesn't use the GAS-style assembly that Clang/LLVM produces and consumes, there's no reference for what string to use for e.g. comments when building with a MSVC triple. This frees up the use of semicolon as separator string, just like was done for GNU targets in `2341319564`. (Previously, both the separator and comment strings were set to the same, a semicolon.) Compiler-rt extensively uses separator chars in its assembly, and that assembly should be buildable with clang-cl for MSVC too. Differential Revision: https://reviews.llvm.org/D96259	2021-02-08 22:30:14 +02:00
Craig Topper	b49aaed8c7	[RISCV] Use _COMMUTABLE fma pseudos for fixed vectors. This matches what we do in the VLMAX SDNode patterns.	2021-02-08 11:27:23 -08:00
Craig Topper	8d8cafa32e	[RISCV] Add support for splat fixed length build_vectors using RVV. Building on the fixed vector support from D95705 I've added ISD nodes for vmv.v.x and vfmv.v.f and switched to lowering the intrinsics to it. This allows us to share the same isel patterns for both. This doesn't handle splats of i64 on RV32 yet. The build_vector gets converted to a vXi32 build_vector+bitcast during type legalization. Not sure the best way to handle this at the moment. Differential Revision: https://reviews.llvm.org/D96108	2021-02-08 11:12:56 -08:00
Craig Topper	b8d719fbe8	[RISCV] Add support for fixed vector FMA. Follow up to D95705. Does not include the commuting support from D95800. Differential Revision: https://reviews.llvm.org/D96103	2021-02-08 11:12:56 -08:00
Craig Topper	a719b667a9	[RISCV] Add initial support for converting fixed vectors to scalable vectors during lowering to use RVV instructions. This is an alternative to D95563. This is modeled after a similar feature for AArch64's SVE that uses predicated scalable vector instructions.a Rather than use predication, this patch uses an explicit VL operand. I've limited it to always use LMUL=1 for now, but we can improve this in the future. This requires a bunch of new ISD opcodes to carry the VL operand. I think we can probably lower intrinsics to these ISD opcodes to cut down on the size of the isel table. Which is why I've added patterns for all integer/float types and not just LMUL=1. I'm only testing one vector width right now, but the width is programmable via the command line. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D95705	2021-02-08 10:41:30 -08:00
Craig Topper	b7b4f4cbc3	[RISCV] Make scalable vector FMA commutable for register allocation. This adds support for commuting operands and converting between vfmadd and vfmacc to avoid register copies. To avoid messing up intrinsic behavior, I've added new pseudo instructions that have the isCommutable flag set. These pseudos also force a tail agnostic policy. The intrinsic version still use the tail undisturbed policy. For best results it looks like we need to start with fmadd and only pick fmacc if its beneficial. MachineCSE commutes without contraining the operands and then commutes back if it didn't help with CSE. So I've made sure that when the operand choice isn't constrained, we will keep fmadd for MachineCSE and when it does the second commute, we get back the original instruction. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D95800	2021-02-08 10:05:33 -08:00
Craig Topper	cc2c45dc54	[RISCV] Use SplatPat/SplatPat_simm5 to handle PseudoVMV_V_X_/PseudoVMV_V_I_ selection as well. This ensures that we'll match immediates consistently regardless of whether we match them as a standalone splat or as part of another operation. While I was there I added complexities to the simm5/uimm5 patterns so we didn't have to assume that the 1 on the non-immediate was lower than what tablegen inferred. I had to make a minor tweak to tablegen to fix one place that didn't expect to see a ComplexPattern that wasn't a "leaf". Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D96199	2021-02-08 09:48:27 -08:00
Jay Foad	a4b1df8af3	[AMDGPU] Use named unified buffer format constant. NFC.	2021-02-08 17:34:36 +00:00
Sander de Smalen	981a38baf4	[AArch64AsmParser] Fix type-limits warning for VectorIndex. Making VectorIndex an `int` instead of `unsigned`, silences the warning: comparison of unsigned expression in ‘>= 0’ is always true in: template <int Min, int Max> DiagnosticPredicate isVectorIndex() const { ... if (VectorIndex.Val >= Min && VectorIndex.Val <= Max) return DiagnosticPredicateTy::Match; ... } when Min is 0.	2021-02-08 15:35:30 +00:00
Tim Northover	c93d50dd71	AArch64: use a constpool for blockaddress(...) on MachO More MachO madness for everyone. MachO relocations are only 32-bits, which means the ARM64_RELOC_ADDEND one only actually has 24 (signed) bits for the actual addend. This is a problem when calculating the address of a basic block; because it has no symbol of its own, the sequence adrp x0, Ltmp0@PAGE add x0, x0, x0 Ltmp0@PAGEOFF is represented by relocation with an addend that contains the offset from the function start to Ltmp, and so the largest function where this is guaranteed to work is 8MB. That's not quite big enough that we can call it user error (IMO). So this patch puts the any blockaddress into a constant-pool, where the addend is instead stored in the (x)word being relocated, which is obviously big enough for any function.	2021-02-08 15:13:29 +00:00
Mikael Holmen	eb8c27c60c	[RISCV] Use std::make_tuple to make some toolchains happy again My toolchain (LLVM 8.0, libstdc++ 5.4.0) complained with: 12:38:19 ../lib/Target/RISCV/RISCVISelLowering.cpp:1717:12: error: chosen constructor is explicit in copy-initialization 12:38:19 return {RISCVISD::VECREDUCE_FADD, Op.getOperand(0), 12:38:19 ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 12:38:19 /proj/flexasic/app/llvm/8.0/bin/../lib/gcc/x86_64-unknown-linux-gnu/5.4.0/../../../../include/c++/5.4.0/tuple:479:19: note: explicit constructor declared here 12:38:19 constexpr tuple(_UElements&&... __elements) 12:38:19 ^ 12:38:19 ../lib/Target/RISCV/RISCVISelLowering.cpp:1720:12: error: chosen constructor is explicit in copy-initialization 12:38:19 return {RISCVISD::VECREDUCE_SEQ_FADD, Op.getOperand(1), Op.getOperand(0)}; 12:38:19 ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 12:38:19 /proj/flexasic/app/llvm/8.0/bin/../lib/gcc/x86_64-unknown-linux-gnu/5.4.0/../../../../include/c++/5.4.0/tuple:479:19: note: explicit constructor declared here 12:38:19 constexpr tuple(_UElements&&... __elements) 12:38:19 ^ 12:38:19 2 errors generated. This commit adds explicit calls to std::make_tuple to work around the problem.	2021-02-08 14:37:25 +01:00
Nicholas Guy	cd880442ae	[CodeGen][AArch64] Add TargetInstrInfo hook to modify the TailDuplicateSize default threshold Different targets might handle branch performance differently, so this patch allows for targets to specify the TailDuplicateSize threshold. Said threshold defines how small a branch can be and still be duplicated to generate straight-line code instead. This patch also specifies said override values for the AArch64 subtarget. Differential Revision: https://reviews.llvm.org/D95631	2021-02-08 13:28:00 +00:00
Thomas Symalla	f89f6d1e5d	[AMDGPU]: Fixes an invalid clamp selection pattern. When running the tests on PowerPC and x86, the lit test GlobalISel/trunc.ll fails at the memory sanitize step. This seems to be due to wrong invalid logic (which matches even if it shouldn't) and likely missing variable initialisation." Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D95878	2021-02-08 13:06:30 +01:00

1 2 3 4 5 ...

61313 Commits