Commit Graph

483 Commits

Author SHA1 Message Date
Karlo Basioli
f3a14217a9 Fix maybe unused errors caused by #131527 (#132944) 2025-03-25 15:26:21 +00:00
Kazu Hirata
fedac3bdb8 [mlir] Fix warnings
This patch fixes:

  mlir/lib/Dialect/Vector/Transforms/VectorEmulateNarrowType.cpp:331:8:
  error: unused variable 'srcVecTy' [-Werror,-Wunused-variable]

  mlir/lib/Dialect/Vector/Transforms/VectorEmulateNarrowType.cpp:332:8:
  error: unused variable 'destVecTy' [-Werror,-Wunused-variable]
2025-03-25 07:39:15 -07:00
Andrzej Warzyński
9768077de6 [mlir][vector] Update helpers in VectorEmulateNarrowType.cpp (nfc) (#131527)
Refactors the following pairs of helper hooks:
  * `dynamicallyInsertSubVector` + `staticallyInsertSubVector`
  * `dynamicallyExtractSubVector` + `staticallyExtractSubVector`

These hooks are very similar, so I have unified the variable names and
various conditions to make the actual differences clearer.
2025-03-25 13:32:32 +00:00
Ivan Butygin
9b022220b7 [mlir][vector] Propagate vector.extract through elementwise ops (#131462)
Propagate `Extract(Elementwise(...))` -> `Elemetwise(Extract...)`.

Currenly limited to the case when extract is the single use of
elementwise to avoid introducing additional elementwise ops.
2025-03-25 14:07:48 +03:00
Kunwar Grover
dc28e0d5d2 [mlir][Vector] Remove more special case uses for extractelement/insertelement (#130166)
A number of places in our codebase special case to use
extractelement/insertelement for 0D vectors, because extract/insert did
not support 0D vectors previously. Since insert/extract support 0D
vectors now, use them instead of special casing.
2025-03-24 13:04:16 +00:00
Kunwar Grover
cf0efb3188 [mlir][vector] Decouple unrolling gather and gather to llvm lowering (#132206)
This patch decouples unrolling vector.gather and lowering vector.gather
to llvm.masked.gather.

This is consistent with how vector.load, vector.store,
vector.maskedload, vector.maskedstore lower to LLVM.

Some interesting test changes from this patch:

- 2D vector.gather lowering to llvm tests are deleted. This is
consistent with other memory load/store ops.
- There are still tests for 2D vector.gather, but the constant mask for
these test is modified. This is because with the updated lowering, one
of the unrolled vector.gather disappears because it is masked off (also
demonstrating why this is a better lowering path)

Overall, this makes vector.gather take the same consistent path for
lowering to LLVM as other load/store ops.

Discourse Discussion:
https://discourse.llvm.org/t/rfc-improving-gather-codegen-for-vector-dialect/85011/13
2025-03-24 12:25:17 +00:00
Andrzej Warzyński
d928a671b8 [mlir][Vector] Refactor VectorEmulateNarrowType.cpp (#123529)
This is PR refactors `alignedConversionPrecondition` from
VectorEmulateNarrowType.cpp and adds new helper hooks.

**Update `alignedConversionPrecondition` (1)**

This method doesn't require the vector type for the "container" argument. The
underlying element type is sufficient. The corresponding argument has been
renamed as `containerTy` - this is meant as the multi-byte container element
type (`i8`, `i16`, `i32`, etc). With this change, the updated invocations of
`alignedConversionPrecondition` (in e.g. `RewriteAlignedSubByteIntExt`) make it
clear that the container element type is assumed to be `i8`.

**Update alignedConversionPrecondition (2):**

The final check in `alignedConversionPrecondition` has been replaced with a new
helper method, `isSubByteVecFittable`. This helper hook is now also re-used in
`ConvertVectorTransferRead` (to improve code re-use).

**Other updates**

Extended + unified comments.

**Implements**: https://github.com/llvm/llvm-project/issues/123630
2025-03-16 12:22:46 +00:00
Ivan Butygin
02fae68a45 [mlir][vector] VectorLinearize: ub.poison support (#128612)
Unify `arith.constant` and `up.poison` using
`OpTraitConversionPattern<OpTrait::ConstantLike>`.
2025-03-13 14:18:21 +03:00
Artemiy Bulavin
fc127ff53d [mlir] Extract RHS rows once when lowering vector.contract to dot (#130130)
The `vector.contract` op on two matrices A and B will be lowered to
individual dot products of each row and column of A and B respectively.
The existing lowering will extract each column of B for each row of A,
which leads to multiple values in the IR representing the same columns
of B.

This PR makes changes to the `ContractOpToDotLowering` to make sure that
the columns of B are only ever extracted once, so then the SSA values
representing the extracted columns are then re-used in the IR for later
dot products.

I have updated the existing vector-contract-to-dot-transforms test.
2025-03-12 17:16:49 +00:00
Artemiy Bulavin
f3dcc0fe22 [mlir] Refactor ConvertVectorToLLVMPass options (#128219)
The `VectorTransformsOptions` on the `ConvertVectorToLLVMPass` is
currently represented as a struct, which makes it not serialisable. This
means a pass pipeline that contains this pass cannot be represented as
textual form, which breaks reproducer generation and options such as
`--dump-pass-pipeline`.

This PR expands the `VectorTransformsOptions` struct into the two
options that are actually used by the Pass' patterns:
`vector-contract-lowering` and `vector-transpose-lowering` . The other
options present in VectorTransformOptions are not used by any patterns
in this pass.

Additionally, I have changed some interfaces to only take these specific
options over the full options struct as, again, the vector contract and
transpose lowering patterns only need one of their respective options.

Finally, I have added a simple lit test that just prints the pass
pipeline using `--dump-pass-pipeline` to ensure the options on this pass
remain serialisable.

Fixes #129046
2025-03-10 10:32:03 +00:00
Prakhar Dixit
037756242f [mlir]Add a check to ensure bailing out when reducing to a scalar (#129694)
Fixes issue #64075
Referencing this comment for more detailed view ->
https://github.com/llvm/llvm-project/issues/64075#issuecomment-2694112594

**Minimal example crashing :** 
```
func.func @multi_reduction(%0: vector<4x2xf32>, %acc1: f32) -> f32 {
  %2 = vector.multi_reduction <add>, %0, %acc1 [0, 1] : vector<4x2xf32> to f32
  return %2 : f32
}
```
2025-03-08 16:34:01 +00:00
Matthias Springer
a21cfca320 [mlir][IR] Deprecate match and rewrite functions (#130031)
Deprecate the `match` and `rewrite` functions. They mainly exist for
historic reasons. This PR also updates all remaining uses of in the MLIR
codebase.

This is addressing a
[comment](https://github.com/llvm/llvm-project/pull/129861#pullrequestreview-2662696084)
on an earlier PR.

Note for LLVM integration: `SplitMatchAndRewrite` will be deleted soon,
update your patterns to use `matchAndRewrite` instead of separate
`match` / `rewrite`.

---------

Co-authored-by: Jakub Kuderski <jakub@nod-labs.com>
2025-03-07 08:43:01 +01:00
Matthias Springer
a6151f4e23 [mlir][IR] Move match and rewrite functions into separate class (#129861)
The vast majority of rewrite / conversion patterns uses a combined
`matchAndRewrite` instead of separate `match` and `rewrite` functions.

This PR optimizes the code base for the most common case where users
implement a combined `matchAndRewrite`. There are no longer any `match`
and `rewrite` functions in `RewritePattern`, `ConversionPattern` and
their derived classes. Instead, there is a `SplitMatchAndRewriteImpl`
class that implements `matchAndRewrite` in terms of `match` and
`rewrite`.

Details:
* The `RewritePattern` and `ConversionPattern` classes are simpler
(fewer functions). Especially the `ConversionPattern` class, which now
has 5 fewer functions. (There were various `rewrite` overloads to
account for 1:1 / 1:N patterns.)
* There is a new class `SplitMatchAndRewriteImpl` that derives from
`RewritePattern` / `OpRewritePatern` / ..., along with a type alias
`RewritePattern::SplitMatchAndRewrite` for convenience.
* Fewer `llvm_unreachable` are needed throughout the code base. Instead,
we can use pure virtual functions. (In cases where users previously had
to implement `rewrite` or `matchAndRewrite`, etc.)
* This PR may also improve the number of [`-Woverload-virtual`
warnings](https://discourse.llvm.org/t/matchandrewrite-hiding-virtual-functions/84933)
that are produced by GCC. (To be confirmed...)

Note for LLVM integration: Patterns with separate `match` / `rewrite`
implementations, must derive from `X::SplitMatchAndRewrite` instead of
`X`.

---------

Co-authored-by: River Riddle <riddleriver@gmail.com>
2025-03-06 08:48:51 +01:00
Andrzej Warzyński
a522c227a1 [mlir][vector] Move tests for rewriteAlignedSubByteInt{Ext|Trunc} (nfc) (#126416)
Moves tests for `rewriteAlignedSubByteIntExt` and
`rewriteAlignedSubByteIntTrunc` into a dedicated files. Also adds +
fixes some comments.

This is merely for better organisation and so that it's easier to
identify the patterns and edge cases being tested.
2025-02-26 07:48:01 +00:00
Prakhar Dixit
da37c76ac6 [mlir][vector] Add a check to ensure input vector rank equals target shape rank (#127706)
Fixes issue #126197

The crash is caused because, during IR transformation, the
vector-unrolling pass (using ExtractStridedSliceOp) attempts to slice an
input vector of higher rank using a target vector of lower rank, which
is not supported.

Specific example :
```
module {
  func.func @func1() {
    %cst_25 = arith.constant dense<3.718400e+04> : vector<4x2x2xf16>
    %cst_26 = arith.constant dense<1.000000e+00> : vector<24x2x2xf32>
    %47 = vector.fma %cst_26, %cst_26, %cst_26 : vector<24x2x2xf32>
    %818 = scf.execute_region -> vector<24x2x2xf32> {
        scf.yield %47 : vector<24x2x2xf32>
      }
    %823 = vector.extract_strided_slice %cst_25 {offsets = [2], sizes = [1], strides = [1]} : vector<4x2x2xf16> to vector<1x2x2xf16>
    return
  }
}
```

---------

Co-authored-by: Kai Sasaki <lewuathe@gmail.com>
2025-02-26 10:39:24 +09:00
Andrzej Warzyński
ad948fa028 [mlir][vector] Document ConvertVectorStore + unify var names (nfc) (#126422)
1. Documents `ConvertVectorStore`. As the generated output is rather complex, I
  have refined the comments + variable names in:
    * "vector-emulate-narrow-type-unaligned-non-atomic.mlir",
  to serve as reference for this pattern.

2. As a follow-on for #123527, renames `isAlignedEmulation` to `isFullyAligned`
  and `numSrcElemsPerDest` to `emulatedPerContainerElem`.
2025-02-15 20:16:25 +00:00
Manupa Karunaratne
db1e15a1da [MLIR][Vector] Add support for inner-parallel masked multi-reductions (#126722)
This commit adds support to lower inner-parallel flavor of masked vector
multi-reductions.
2025-02-14 11:08:24 -08:00
Tomás Longeri
5767e4d4ca [MLIR][NFC] Return MemRefType in memref.subview return type inference functions (#120024)
Avoids the need for cast, and matches the extra build functions, which
take a `MemRefType`
2025-02-14 12:58:20 +00:00
Diego Caballero
2c4dd89902 [mlir][Vector] Introduce poison in LowerVectorBitCast/Broadcast/Transpose (#126180)
This PR continues with the introduction of poison as initialization
vector, in this particular case, in LowerVectorBitCast,
LowerVectorBroadcast and LowerVectorTranspose.
2025-02-07 10:51:24 -08:00
Diego Caballero
5a0075adbb [mlir][Vector] Generate poison vectors in vector.shape_cast lowering (#125613)
This is the first PR that introduces `ub.poison` vectors as part of a
rewrite/conversion pattern in the Vector dialect. It replaces the
`arith.constant dense<0>` vector initialization for
`vector.insert_slice` ops with a poison vector.

This PR depends on all the previous PRs that introduced support for
poison in Vector operations such as `vector.shuffle`, `vector.extract`,
`vector.insert`, including ODS, canonicalization and lowering support.

This PR may improve end-to-end compilation time through LLVM, depending
on the workloads.
2025-02-07 10:42:55 -08:00
Diego Caballero
68325148d3 [mlir][Vector] Fold vector.extract from poison vector (#126122)
This PR adds a folder for `vector.extract(ub.poison) -> ub.poison`. It
also replaces `create` with `createOrFold` insert/extract ops in vector
unroll and transpose lowering patterns to trigger the poison foldings
introduced recently.
2025-02-07 10:20:07 -08:00
Alan Li
f0e1857c84 [MLIR] Support non-atomic RMW option for emulated vector stores (#124887)
This patch is a followup of the previous one: #115922, It adds an option
to turn on emitting non-atomic rmw code sequence instead of atomic rmw.
2025-02-06 13:22:42 -08:00
Andrzej Warzyński
78f690bba7 [mlir][Vector] Update VectorEmulateNarrowType.cpp (2/N) (#123527)
This is PR 2 in a series of N patches aimed at improving
"VectorEmulateNarrowType.cpp". This is mainly minor refactoring, no
major functional changes are made/added.

**CHANGE 1** 

Renames the variable "scale". Note, "scale" could mean either:

  * "container-elements-per-emulated-type", or
  * "emulated-elements-per-container-type".

While from the context it is clear that it's always the former (original
type is always a sub-byte type and the emulated type is usually `i8`),
this PR reduces the cognitive load by making this clear.

**CHANGE 2** 

Replaces `isUnalignedEmulation` with `isFullyAligned`

Note, `isUnalignedEmulation` is always computed following a
"per-element-alignment" condition:
```cpp
// Check per-element alignment.
if (containerBits % emulatedBits != 0) {
  return rewriter.notifyMatchFailure(
    op, "impossible to pack emulated elements into container elements "
    "(bit-wise misalignment)");
}

// (...)

bool isUnalignedEmulation = origElements % emulatedPerContainerElem != 0;
```

Given that `isUnalignedEmulation` captures only one of two conditions
required for "full alignment", it should be re-named as
`isPartiallyUnalignedEmulation`. Instead, I've flipped the condition and
renamed it as `isFullyAligned`:

```cpp
bool isFullyAligned = origElements % emulatedPerContainerElem == 0;
```

**CHANGE 3**
  * Unifies various comments throughout the file (for consistency).
* Adds new comments throughout the file and adds TODOs where high-level
    comments are missing.
    
    
**GitHub issue to track this work**:
https://github.com/llvm/llvm-project/issues/123630
2025-02-06 09:19:18 +00:00
Andrzej Warzyński
978310f1de [mlir][vector][nfc] Fix typos in "VectorEmulateNarrowType.cpp" (#125415)
Updates `emulatedVectorLoad` that was introduced in #115922.
Specifically, ATM `emulatedVectorLoad` mixes "emulated type" and
"container type". This only became clear after #123526 in which the
concepts of "emulated" and "container" types were introduced.

This is an NFC change and simply updates the variable naming.
2025-02-03 10:10:37 +00:00
Andrzej Warzyński
3e5640b22d [mlir][Vector] Update VectorEmulateNarrowType.cpp (1/N) (#123526)
This is PR 1 in a series of N patches aimed at improving
"VectorEmulateNarrowType.cpp". This is mainly minor refactoring, no
major functional changes are made/added.

This PR renames:
* `srcBits`/`dstBits` + `oldElementType`/`newElementType`

to improve consistency in naming within the file. This is illustrated
below:

```cpp
  // Extracted from VectorEmulateNarrowType.cpp

  // BEFORE (mixing old/new and src/dst):
  // Type oldElementType = op.getType().getElementType();
  // Type newElementType = convertedType.getElementType();

  // int srcBits = oldElementType.getIntOrFloatBitWidth();
  // int dstBits = newElementType.getIntOrFloatBitWidth();

  // AFTER (consistently using emulated/container):
  Type emulatedElemType = op.getType().getElementType();
  Type containerElemType = convertedType.getElementType();

  int emulatedBits = emulatedElemTy.getIntOrFloatBitWidth();
  int containerBits = containerElemTy.getIntOrFloatBitWidth();
```

Also adds some comments and unifies related "rewriter notification"
messages.

**GitHub issue to track this work:**
* https://github.com/llvm/llvm-project/issues/123630
2025-02-02 14:58:38 +00:00
Diego Caballero
2b04291830 [mlir][Vector] Fix scalable InsertSlice/ExtractSlice lowering (#124861)
It looks like scalable `vector.insertslice/extractslice` ops made their way
through lowering patterns that generate `vector.shuffle` ops. I'm not
sure why this wasn't caught by the verifier, probably because the
shuffle op was folded into something else as part of the same rewrite
and the IR wasn't verified.

This PR fixes the issue by preventing scalable vector.insertslice/extractslice
ops to be lowered to vector shuffles. Instead, they are now lowered to a
sequence of insertslice/extractelement ops using an existing patter.
2025-01-31 14:21:35 -08:00
Jay Foad
aa2952165c Fix typo "tranpose" (#124929) 2025-01-29 17:49:54 +00:00
Alan Li
cdced8e5bc [MLIR] Implement emulation of static indexing subbyte type vector stores (#115922)
This patch enables unaligned, statically indexed storing of vectors with
sub emulation width element types.

To illustrate the mechanism, consider the example of storing
vector<7xi2> into memref<3x7xi2>[1, 0].
In this case the linearized indices of those bits being overwritten are
[14, 28), which are:

* the last 2 bits of byte no.2
* byte no.3
* first 4 bits of byte no.4

Because memory accesses are in bytes, byte no.2 and no.4 in the above
example are only being modified partially.
In the case of multi-threading scenario, in order to avoid data
contention, these two bytes must be handled atomically.
2025-01-29 12:28:28 +08:00
Diego Caballero
a7a4c16c67 [mlir][Vector] Support efficient shape cast lowering for n-D vectors (#123497)
This PR implements a generalization of the existing more efficient
lowering of shape casts from 2-D to 1D and 1-D to 2-D vectors. This
significantly reduces code size and generates more performant code for
n-D shape casts that make their way to LLVM/SPIR-V.
2025-01-27 14:36:19 -08:00
Chao Chen
bd5d361c05 [mlir][vector] add support for linearizing vector.bitcast in VectorLinearize (#123110)
This PR adds support for converting Vector::BitCastOp working on ND 
(N >1) vectors into the same op working on linearized (1D) vectors.
2025-01-27 14:41:33 -06:00
Andrzej Warzyński
d88293d8a2 [mlir][vector] Disable BreakDownVectorBitCast for scalable vectors (#122725)
`BreakDownVectorBitCast` leverages
  * `vector.extract_strided_slices` + `vector.insert_strided_slices`

As these Ops do not support extracting scalable sub-vectors (i.e.
extracting/inserting a fraction of a scalable dim), it's best to bail
out.
2025-01-24 17:15:06 +00:00
Tejas Vipin
099fd018d1 [mlir][Vector] Remove Vector{Load|Store}ToMemrefLoadLowering (#121454)
0-d vectors are supported now and so these patterns are no longer
required. This covers a part of this issue
https://github.com/llvm/llvm-project/issues/112913 . Additionally this
removes %arg2 in mlir/test/Conversion/GPUCommon/transfer_write.mlir and
renames %arg3 to %arg2 as %arg2 was originally not required.
2025-01-22 02:42:41 -08:00
Han-Chung Wang
9cbc1f29ca [mlir][NFC] Avoid using braced initializer lists to call a constructor. (#123714)
In the LLVM style guide, we prefer not using braced initializer lists to
call a constructor. Also, we prefer using an equal before the open curly
brace if we use a braced initializer list when initializing a variable.

See

https://llvm.org/docs/CodingStandards.html#do-not-use-braced-initializer-lists-to-call-a-constructor
for more details.

The style guide does not explain the reason well. There is an article
from abseil, which mentions few benefits. E.g., we can avoid the most
vexing parse, etc. See https://abseil.io/tips/88 for more details.

Signed-off-by: hanhanW <hanhan0912@gmail.com>
2025-01-21 21:23:32 -08:00
Matthias Springer
6aaa8f25b6 [mlir][IR][NFC] Move free-standing functions to MemRefType (#123465)
Turn free-standing `MemRefType`-related helper functions in
`BuiltinTypes.h` into member functions.
2025-01-21 08:48:09 +01:00
Will Froom
2a044f8a09 [MLIR] Add [[maybe_unused]] to variables on used in assert (#123037)
Add [[maybe_unused]] to suppresses warnings when  `-NDEBUG` is enabled
2025-01-15 10:22:02 +00:00
ziereis
929eb500d4 [mlir] Rewrites for I2 to I8 signed and unsigned extension (#121298)
Adds rewrites for i2 to i8 signed and unsigned extension, similar to the
ones that already exist for i4 to i8 conversion.

I use this for i6 quantized models, and this gives me roughly a 2x
speedup for an i6 4096x4096 dequantization-matmul on an AMD 5950x.

I didn't add the rewrite for i8 to i2 truncation because I currently
don't use it, but if this is needed, I can add it as well.

---------

Co-authored-by: Andrzej Warzyński <andrzej.warzynski@gmail.com>
2025-01-15 08:12:39 +00:00
Twice
b91d5af1ac [MLIR][Vector] Allow any strided memref for one-element vector.load in lowering vector.gather (#122437)
In `Gather1DToConditionalLoads`, currently we will check if the stride
of the most minor dim of the input memref is 1. And if not, the
rewriting pattern will not be applied. However, according to the
verification of `vector.load` here:

4e32271e8b/mlir/lib/Dialect/Vector/IR/VectorOps.cpp (L4971-L4975)

.. if the output vector type of `vector.load` contains only one element,
we can ignore the requirement of the stride of the input memref, i.e.
the input memref can be with any stride layout attribute in such case.

So here we can allow more cases in lowering `vector.gather` by relaxing
such check.

As shown in the test case attached in this patch
[here](1933fbad58/mlir/test/Dialect/Vector/vector-gather-lowering.mlir (L151)),
now `vector.gather` of memref with non-trivial stride can be lowered
successfully if the result vector contains only one element.

---------

Signed-off-by: PragmaTwice <twice@apache.org>
Co-authored-by: Andrzej Warzyński <andrzej.warzynski@gmail.com>
2025-01-12 16:02:41 +00:00
Andrzej Warzyński
21ba7aef3b [mlir][vector][nfc] Update alignedConversionPrecondition (#122136)
Adds some comments and re-name variables to clarify the usage.
2025-01-09 15:14:34 +00:00
Matthias Springer
3ace685105 [mlir][Transforms] Support 1:N mappings in ConversionValueMapping (#116524)
This commit updates the internal `ConversionValueMapping` data structure
in the dialect conversion driver to support 1:N replacements. This is
the last major commit for adding 1:N support to the dialect conversion
driver.

Since #116470, the infrastructure already supports 1:N replacements. But
the `ConversionValueMapping` still stored 1:1 value mappings. To that
end, the driver inserted temporary argument materializations (converting
N SSA values into 1 value). This is no longer the case. Argument
materializations are now entirely gone. (They will be deleted from the
type converter after some time, when we delete the old 1:N dialect
conversion driver.)

Note for LLVM integration: Replace all occurrences of
`addArgumentMaterialization` (except for 1:N dialect conversion passes)
with `addSourceMaterialization`.

---------

Co-authored-by: Markus Böck <markus.boeck02@gmail.com>
2025-01-03 16:11:56 +01:00
Jacques Pienaar
09dfc5713d [mlir] Enable decoupling two kinds of greedy behavior. (#104649)
The greedy rewriter is used in many different flows and it has a lot of
convenience (work list management, debugging actions, tracing, etc). But
it combines two kinds of greedy behavior 1) how ops are matched, 2)
folding wherever it can.

These are independent forms of greedy and leads to inefficiency. E.g.,
cases where one need to create different phases in lowering and is
required to applying patterns in specific order split across different
passes. Using the driver one ends up needlessly retrying folding/having
multiple rounds of folding attempts, where one final run would have
sufficed.

Of course folks can locally avoid this behavior by just building their
own, but this is also a common requested feature that folks keep on
working around locally in suboptimal ways.

For downstream users, there should be no behavioral change. Updating
from the deprecated should just be a find and replace (e.g., `find ./
-type f -exec sed -i
's|applyPatternsAndFoldGreedily|applyPatternsGreedily|g' {} \;` variety)
as the API arguments hasn't changed between the two.
2024-12-20 08:15:48 -08:00
Kazu Hirata
6e41483b84 [MemRef] Migrate away from PointerUnion::{is,get} (NFC) (#120382)
Note that PointerUnion::{is,get} have been soft deprecated in
PointerUnion.h:

  // FIXME: Replace the uses of is(), get() and dyn_cast() with
  //        isa<T>, cast<T> and the llvm::dyn_cast<T>

I'm not touching PointerUnion::dyn_cast for now because it's a bit
complicated; we could blindly migrate it to dyn_cast_if_present, but
we should probably use dyn_cast when the operand is known to be
non-null.
2024-12-18 10:56:27 -08:00
Petr Kurapov
bc29fc937c [MLIR] Create GPU utils library & move distribution utils (#119264)
Continue the move of `warp_execute_on_lane_0` op to the gpu dialect
(#116994). This patch creates a utils library in GPU and moves generic
helper functions there.
2024-12-13 10:26:57 +01:00
lialan
1669ac434c [MLIR] Refactor mask compression logic when emulating vector.maskedload ops (#116520)
This patch simplifies and extends the logic used when compressing masks
emitted by `vector.constant_mask` to support extracting 1-D vectors from
multi-dimensional vector loads. It streamlines mask computation, making
it applicable for multi-dimensional mask generation, improving the
overall handling of masked load operations.
2024-11-27 13:22:13 -08:00
Petr Kurapov
ecaf2c335c [MLIR] Move warp_execute_on_lane_0 from vector to gpu (#116994)
Please see the related RFC here:
https://discourse.llvm.org/t/rfc-move-execute-on-lane-0-from-vector-to-gpu-dialect/82989.

This patch does exactly one thing - moves the op to gpu.
2024-11-22 15:30:47 +01:00
lialan
f981ee7efc [MLIR] extend getCompressedMaskOp support in VectorEmulateNarrowType (#116122)
Previously when `numFrontPadElems` is not zero, `getCompressedMaskOp`
produces wrong result if the mask generator op is a
`vector.create_mask`.

This patch resolves the issue by including `numFrontPadElems` into the
mask generation.

Signed-off-by: Alan Li <me@alanli.org>
2024-11-19 16:49:05 -08:00
lialan
6626ed6f9f [MLIR] Fix BubbleDownVectorBitCastForExtract crash on non-static index (#116518)
Previously the patch was not expecting to handle non-static index, when
the index is a non constant value it will crash.
    
This patch is to make sure it return gracefully instead of crashing.
2024-11-18 17:25:12 -08:00
Kunwar Grover
2f925d75de [mlir][Vector] Move insert/extractelement distribution patterns to insert/extract (#116425)
This is a NFC-ish change that moves
vector.extractelement/vector.insertelement vector distribution patterns
to vector.insert/vector.extract.

Before:

0-d/1-d vector.extract -> vector.extractelement -> distributed
vector.extractelement
2-d+ vector.extract -> distributed vector.extract

After:

scalar input vector.extract -> distributed vector.extract
vector.extractelement -> distributed vector.extract
2d+ vector.extract -> distributed vector.extract

The same changes are done for insertelement/insert. The change allows us
to remove reliance on vector.extractelement/vector.insertelement, which
are soon to be depreciated:
https://discourse.llvm.org/t/rfc-psa-remove-vector-extractelement-and-vector-insertelement-ops-in-favor-of-vector-extract-and-vector-insert-ops/71116/8

No extra tests are included because this patch doesn't introduce /
remove any functionality. It only changes the chain of lowerings. This
change can be completly NFC if we make the distributed operation
vector.extractelement/vector.insertelement, but that is slightly weird,
because you are going from extractelement -> extract -> extractelement.
2024-11-18 10:59:49 +00:00
lialan
ef92aba52a [MLIR] Fix VectorEmulateNarrowType constant op mask bug (#116064)
This commit adds support for handling mask constants generated by the
`arith.constant` op in the `VectorEmulateNarrowType` pattern.
Previously, this pattern would not match due to the lack of mask
constant handling in `getCompressedMaskOp`.

The changes include:

1. Updating `getCompressedMaskOp` to recognize and handle
`arith.constant` ops as mask value sources.

2. Handling cases where the mask is not aligned with the emulated load
width. The compressed mask is adjusted to account for the offset.

Limitations:
- The arith.constant op can only have 1-dimensional constant values.

Resolves: #115742

Signed-off-by: Alan Li <me@alanli.org>
2024-11-15 10:06:40 -08:00
Andrzej Warzyński
7a31f3c761 [mlir][vector][nfc] Improve comments in getCompressedMaskOp (#115663) 2024-11-13 17:08:42 +00:00
Kunwar Grover
8e66303916 [mlir][Vector] Remove trivial uses of vector.extractelement/vector.insertelement (1/N) (#116053)
This patch removes trivial usages of
vector.extractelement/vector.insertelement. These operations can be
fully represented by vector.extract/vector.insert. See
https://discourse.llvm.org/t/rfc-psa-remove-vector-extractelement-and-vector-insertelement-ops-in-favor-of-vector-extract-and-vector-insert-ops/71116
for more information.

Further patches will remove more usages of these ops.
2024-11-13 15:45:59 +00:00