Commit Graph

52 Commits

Author SHA1 Message Date
Philip Reames
531dd3634d [LV] Restructure isPredicatedInst and isScalarWithPredication (w/a fix for uniform mem ops)
This change reorganizes the code and comments to make the expected semantics of these routines more clear. However, this is *not* an NFC change. The functional change is having isScalarWithPredication return false if the instruction does not need predicated. Specifically, for the case of a uniform memory operation we were previously considering it *not* to be a predicated instruction, but *were* considering it to be scalable with predication.

As can be seen with the test changes, this causes uniform memory ops which should have been lowered as uniform-per-parts values to instead be lowering via naive scalarization or if scalarization is infeasible (i.e. scalable vectors) aborted entirely. I also don't trust the code to bail out correctly 100% of the time, so it's possible we had a crash or miscompile from trying to scalarize something which isn't scalaralizable. I haven't found a concrete example here, but I am suspicious.

Differential Revision: https://reviews.llvm.org/D131093
2022-08-18 07:14:04 -07:00
Philip Reames
33e7a0a33b [RISCV][LV] Add test coverage for upcoming dependence distance handling change 2022-08-15 15:20:36 -07:00
jacquesguan
45bae1be90 [RISCV][test] Add inloop reduction vectorize test. NFC 2022-08-04 15:06:44 +08:00
Philip Reames
0b47615fcf [LV] Recognize store of invariant value to invariant address as uniform
This extends the handling of uniform memory operations to handle the case where a store is storing a loop invariant value. Unlike the general case of a store to an invariant address where we must use the last active lane, in this case we can use any lane since all lanes must produce the same result.

For context, the basic structure of the existing code and how the change fits in:
* First, we select a widening strategy. (The result is irrelevant for this patch.)
* Then we determine if a computation is uniform within all lanes of VF. (Note this is the uniform-per-part definition, not LAI's uniform across all unrolled iterations definition.)
* If it is, we overrule the widening strategy, and unconditionally scalarize.
* VPReplicationRecipe - which is what actually does the scalarization - knows how to handle unform-per-part values including for scalable vectors. However, we do need to know that the expression is safe to execute without predication - e.g. the uniform mem op was unconditional in the original loop. (This part was split off and already landed.)

An obvious question is why not simply implement the generic case? The answer is that I'm going to, but doing so without a canonicalization towards uniform causes regressions due to bad interaction with scalarization/uniformity of values feeding the uniform mem-op. This patch is needed to avoid those regressions.

Differential Revision: https://reviews.llvm.org/D130364
2022-08-02 08:09:49 -07:00
Philip Reames
15c645f7ee [RISCV] Enable (scalable) vectorization by default
This change enables vectorization (using scalable vectorization only, fixed vectors are not yet enabled) for RISCV when vector instructions are available for the target configuration.

At this point, the resulting configuration should be both stable (e.g. no crashes), and profitable (i.e. few cases where scalar loops beat vector ones), but is not going to be particularly well tuned (i.e. we emit the best possible vector loop). The goal of this change is to align testing across organizations and ensure the default configuration matches what downstreams are using as closely as possible.

This exposes a large amount of code which hasn't otherwise been on by default, and thus may not have been fully exercised.  Given that, having issues fall out is not unexpected.  If you find issues, please make sure to include as much information as you can when reverting this change.

Differential Revision: https://reviews.llvm.org/D129013
2022-07-27 12:36:04 -07:00
Philip Reames
e8ceadd0ce [LV][RISCV] Add a test case for a quality problem mixing vector index and data types
The problem here is target independent, but particularly painful on RISCV.  If we chose to vectorize such that vscale x 2 x i32 is our widest type and fits in a register, a naive expansion of i64 comparisons results in comparisons and index types at <scalabe x 2 x i64>.  This requires both an LMUL of 2, and a VSETVLI toggle in the loop.  Note that we could have used <vscale x 2 x i32> for the compairons legally given the range of the trip count.
2022-07-27 11:42:28 -07:00
Philip Reames
ebee4fbb34 [RISCV][LV] Add basic tests for default configuration
All of our other tests are functionality tests constrained to some
specific configuration.  This one is intended to float with the
default configuration so that changes in that default are visible
in reviews.  Note that our current default does not enable
vectorization at all; thus the current output is unvectorized.
2022-07-27 09:16:44 -07:00
Philip Reames
27945f9282 [RISCV][LV] Split coverage of uniform load with outside use
Turns out this has a large effect of tail folding, so split out a single test to cover that case and remove it from the others.
2022-07-21 12:07:26 -07:00
Philip Reames
bb5dc2918f {RISCV][LV] Add tail folding coverage of uniform load store cases 2022-07-21 11:15:36 -07:00
Philip Reames
56a25ed208 {RISCV][LV] Add a test for uniform store of a loop varying value 2022-07-21 11:15:36 -07:00
Philip Reames
0ae46693f0 {RISCV][LV] Split out and expand tests for uniform loads and stores 2022-07-21 10:42:18 -07:00
Philip Reames
523a526a02 [LV] Fix miscompile due to srem/sdiv speculation safety condition
An srem or sdiv has two cases which can cause undefined behavior, not just one. The existing code did not account for this, and as a result, we miscompiled when we encountered e.g. a srem i64 %v, -1 in a conditional block.

Instead of hand rolling the logic, just use the utility function which exists exactly for this purpose.

Differential Revision: https://reviews.llvm.org/D130106
2022-07-20 05:35:23 -07:00
Philip Reames
8353403f08 [LV] Add test for generic predicated sdiv 2022-07-19 12:33:36 -07:00
Philip Reames
2247fe856a [LV] Add test coverage for a bug in srem handling 2022-07-19 11:29:17 -07:00
Philip Reames
b7d3ba4bdb [LV] Add test coverage for scalable div/rem patterns 2022-07-19 11:02:14 -07:00
Mel Chen
bd404fbcc8 [LV][NFC] Fix the condition for printing debug messages
Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D128523
2022-07-15 01:47:33 -07:00
Mel Chen
ae15e6a952 [LV] Pre-commit test case for D128523, NFC 2022-07-15 01:22:06 -07:00
Philip Reames
b12930e133 [RISCV] Switch to using get.active.lane.mask when tail folding
The motivation here is to a) bring us closer into alignment with AArch64 under the assumption that codepath is better tested, and b) simplify pattern matching in an upcoming change.

The immediate impact is a significant IR reduction but a fairly minimal change in the generated assembly. Due to a difference in expansion behavior we get a saturating add vs an unsaturating one for the old code, but that's about it. This difference comes down to different handling of overflow, which doesn't seem to be possible here anyways, so the assembly codegen is arguably a minor regression. I don't expect that to matter in practice.

Differential Revision: https://reviews.llvm.org/D129221
2022-07-08 10:24:59 -07:00
Florian Hahn
12f9c7b270 [LV] Update RISCV test missed by bc19b7c3cc. 2022-07-07 08:51:15 -07:00
Philip Reames
b9513a70e1 [RISCV] Autogen a vectorizer test for ease of update 2022-07-06 09:35:02 -07:00
Philip Reames
0f49d9e8d0 [RISCV] Add test coverage for vectorizer tailfolding
As can be seen in the check lines, we have a lot of work to do.
2022-07-06 09:06:20 -07:00
Philip Reames
f239cddbac [RISCV] Pin two tests to fixed length vectorization to preserve test intent 2022-06-28 13:53:31 -07:00
Philip Reames
20dd3297b1 [LV] Allow scalable vectorization with vscale = 1
This change is a bit subtle. If we have a type like <vscale x 1 x i64>, the vectorizer will currently reject vectorization. The reason is that a type like <1 x i64> is likely to get simply rescalarized, and the vectorizer doesn't want to be in the game of simple unrolling.

(I've given the example in terms of 1 x types which use a single register, but the same issue exists for any N x types which use N registers. e.g. RISCV LMULs.)

This change distinguishes scalable types from fixed types under the reasoning that converting to a scalable type isn't unrolling. Because the actual vscale isn't known until runtime, using a vscale type is potentially very profitable.

This makes an important, but unchecked, assumption. Specifically, the scalable type is assumed to only be legal per the cost model if there's actually a scalable register class which is distinct from the scalar domain. This is, to my knowledge, true for all targets which return non-invalid costs for scalable vector ops today, but in theory, we could have a target decide to lower scalable to fixed length vector or even scalar registers. If that ever happens, we'd need to revisit this code.

In practice, this patch unblocks scalable vectorization for ELEN types on RISCV.

Let me sketch one alternate implementation I considered. We could have restricted this to when we know a minimum value for vscale. Specifically, for the default +v extension for RISCV, we actually know that vscale >= 2 for ELEN types. However, doing it this way means we can't generate scalable vectors when using the various embedded vector extensions which have a minimum vscale of 1.

Differential Revision: https://reviews.llvm.org/D128542
2022-06-27 13:38:57 -07:00
Philip Reames
9803b0d1e7 [RISCV] Implement getVScaleForTuning and thus prefer scalable vectorization when enabled
LoopVectorizer uses getVScaleForTuning for deciding how to discount the cost of a potential vector factor by the amount of work performed. Without the callback implemented, the vectorizer was defaulting to an estimated vscale of 1. This results in fixed vectorization looking falsely profitable (since it used the command line VLEN).

The test change is pretty limited since a) we don't have much coverage of the vectorizer with scalable vectors at all, and b) what little coverage we have mostly uses i64 element types. There's a separate issue with <vscale x 1 x i64> which prevents us from getting to this stage of costing, and thus only the one test explicitly written to avoid that is visible in the diff. However, this is actually a very wide impact change as it changes the practical vectorization result when both fixed and scalable is enabled to scalable.

As an aside, I think the vectorizer is at little too strongly biased towards scalable when both are legal, but we can explore that separately. For now, let's just get the cost model working the way it was intended.

Differential Revision: https://reviews.llvm.org/D128547
2022-06-25 11:25:23 -07:00
Philip Reames
ae8fac6f98 [LV][RISCV] Add coverage showing scalable codegen when etype != ELEN
We currently have a costing bug around the etype == ELEN case, so add otherwise duplicate tests to show test diffs as I work on other parts of costing.
2022-06-24 11:38:54 -07:00
Philip Reames
056d63938a [RISCV] Split a vectorizer test runline so that upcoming changes in defaults are visible 2022-06-24 08:48:11 -07:00
Philip Reames
adbe718675 [RISCV] Modify a test line so it exercises the intended configuration once we turn on scalable vectorization 2022-06-24 08:48:11 -07:00
Philip Reames
46ea4b5ea1 [LV] Avoid a crash when costing a uniform store which doesn't correspond to a legal scatter
If we have an unaligned uniform store, then when costing a scalable VF we can't emit code to scalarize it.  (Well, we could, but we haven't implemented that case.)  This change replaces an assert with a cost-model bailout such that we reject vectorization with the scalable VF instead of crashing.
2022-06-23 12:41:09 -07:00
Philip Reames
8ae0664282 LoopVect, tests] Add some basic coverage for scalable costing of scatter/gather patterns on RISCV
This just adds some very basic vectorizer testing with both fixed and scalable vectorization enabled.
2022-06-21 13:54:53 -07:00
Philip Reames
2cf320d41e [LoopVect, tests] Add some basic coverage for scalable costing on RISCV
This just adds some very basic vectorizer testing with both fixed and scalable vectorization enabled.  For context, I just yesterday fixed a crash in costing of the splat_ptr example - see bbf3fd.
2022-06-21 13:35:38 -07:00
Philip Reames
f7bb691d61 [RISCV] Implement isElementTypeLegalForScalableVector TTI hook
This brings us into alignment with AArch64, and in the process fixes a compiler crash bug in uniform store handling in the vectorizer.

Before the recent invalid cost bailout work, this would have also avoided crashes on invalid costs in some cases. I honestly think the vectorizer should gracefully bailout on uniform stores it can't use a scatter for, but it doesn't, so lets take the path of least resistance here. It's also possible that there are other vectorizer bugs AArch64 isn't seeing because of this hook; we don't want to be finding them either.

Differential Revision: https://reviews.llvm.org/D127514
2022-06-10 13:20:58 -07:00
Philip Reames
0e29a80fdc [RISCV] Add cost model for reverse shuffle
The majority of the cost appears to be forming the indices vector.

Differential Revision: https://reviews.llvm.org/D127141
2022-06-09 07:21:40 -07:00
Philip Reames
6071de3db6 [RISCV] Autogen a test for ease of update 2022-06-06 12:44:34 -07:00
yanming
8d9d8f866a [RISCV] Define risc-v's own register class to model FP Register.
The default RegisterClass is not enough to model RISCV Register.
We define risc-v's own register class to model FP Register.
This helps to better estimate the register pressure in the loop-vectorize.

Reviewed By: kito-cheng

Differential Revision: https://reviews.llvm.org/D126854
2022-06-06 14:43:52 +08:00
Liqin.Weng
a84026821b [RISCV] Add test for experimental.vector.reverse
```
void vector_reverse_i64(int *A, int *B, int n) {
  #pragma clang loop vectorize_width(4, scalable)
  for (int i = n-1; i >= 0; i--)
    A[i] = B[i] + 1;
}
```
When option: scalable-vectorization is on (or set #pragma clang loop vectorize_width(elements, scalable)), Reverse Iterators can't loop vectorization as <vscale x elements x elementType>

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D125866
2022-05-27 06:30:07 +00:00
David Sherwood
befc952045 [LoopVectorize] Permit tail-folding for low trip counts using scalable vectors
When the loop vectoriser encounters a known low trip count it tries
to create a single predicated loop in order to get the benefit of
vectorisation and eliminate the scalar tail. However, until now the
vectoriser prevented the use of scalable vectors in this case due
to concerns in the past about stability. I believe that tail-folded
loops using scalable vectors are now sufficiently well tested that
we can enable this. For the same reason I've also enabled it when
optimising for code size too.

Tests added here:

  Transforms/LoopVectorize/AArch64/sve-low-trip-count.ll
  Transforms/LoopVectorize/AArch64/sve-tail-folding-optsize.ll
  Transforms/LoopVectorize/RISCV/low-trip-count.ll

Differential Revision: https://reviews.llvm.org/D121595
2022-05-16 09:14:24 +01:00
Dávid Bolvanský
872f7000fc Revert "[NFCI] Regenerate SROA/LoopVectorize test checks"
This reverts commit 14e3450fb5.
2022-04-04 01:15:30 +02:00
Dávid Bolvanský
a113a582b1 [NFCI] Regenerate LoopVectorize test checks 2022-04-03 21:56:24 +02:00
eopXD
3cf15af2da [RISCV] Remove experimental prefix from rvv-related extensions.
Extensions affected: +v, +zve*, +zvl*

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D117860
2022-01-22 20:18:40 -08:00
Kito Cheng
cc35161dc7 [RISCV] Add initial support for getRegUsageForType and getNumberOfRegisters
Those two TTI hooks are used during vectorization for calculating
register pressure, the default implementation isn't consider for LMUL,
and that's also definitly wrong value for register number (all register class
are 8 registers).

So in this patch we tried to:

1. Calculate right register usage for vector type and scalar type.
2. Return right number of register for general purpose register and
   vector register.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D116890
2022-01-17 15:27:54 +08:00
Craig Topper
042394b69e [RISCV] Add a command line option to control the LMUL used by TTI's getRegisterBitWidth.
By default we return the width of an LMUL=1 register. We can enable
testing with larger LMUL values by returning a larger bit width.

This patch adds a RISCV specific option to provide a LMUL which will be
multiplied by the LMUL=1 bit width.

Reviewed By: kito-cheng

Differential Revision: https://reviews.llvm.org/D116339
2022-01-07 20:02:10 -08:00
Craig Topper
a9486a40f7 [RISCV] Disable interleaving scalar loops in the loop vectorizer.
The loop vectorizer can interleave scalar loops even if it doesn't
vectorize them. I don't believe we intended to enable this when
we enabled interleaving for vector instructions.

Disable interleaving for VF=1 like X86 and AMDGPU already do. Test
lifted from AMDGPU.

Differential Revision: https://reviews.llvm.org/D115975
2021-12-23 08:37:24 -06:00
Florian Hahn
74c4d44d47 [LV] Update test that was missed in e844f05397. 2021-10-18 18:23:00 +01:00
Florian Hahn
8344e215ec [LV] Update more target-specific tests after 23c2f2e6b2. 2021-06-07 12:13:21 +01:00
Douglas Yung
18225d4576 Mark test as requiring asserts. 2021-06-01 02:01:01 -07:00
Luke
c4c3869554 [RISCV] Enable interleaved vectorization for RVV
Enable interleaved vectorization for RVV.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D101469
2021-05-29 11:03:27 +08:00
Luke
1595994b28 [RISCV] Add legality check for vectorizing reduction
Check if it is legal to vectorize reduction.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D99509
2021-05-20 17:45:46 +08:00
Sander de Smalen
4f86aa650c [LV] Add -scalable-vectorization=<option> flag.
This patch adds a new option to the LoopVectorizer to control how
scalable vectors can be used.

Initially, this suggests three levels to control scalable
vectorization, although other more aggressive options can be added in
the future.

The possible options are:
- Disabled:   Disables vectorization with scalable vectors.
- Enabled:    Vectorize loops using scalable vectors or fixed-width
              vectors, but favors fixed-width vectors when the cost
              is a tie.
- Preferred:  Like 'Enabled', but favoring scalable vectors when the
              cost-model is inconclusive.

Reviewed By: paulwalker-arm, vkmr

Differential Revision: https://reviews.llvm.org/D101945
2021-05-19 10:40:56 +01:00
Craig Topper
512bae81cc [RISCV] Add basic cost modelling for fixed vector gather/scatter.
Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D99142
2021-03-24 11:14:14 -07:00
Luke
d28297ff68 [RISCV] Enable fixed-length vectorization of LoopVectorizer for RISC-V Vector
By implementing the method "unsigned RISCVTTIImpl::getRegisterBitWidth(bool Vector)",
fixed-length vectorization is enabled when possible. Without this method, the
"#pragma clang loop" directive is needed to enable vectorization(or the cost model
may inform LLVM that "Vectorization is possible but not beneficial").

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D97549
2021-03-05 10:54:51 +08:00