clang-p2996

Author	SHA1	Message	Date
Ivan Butygin	953b07febc	[mlir] AMDGPUToROCDL: RawBufferOpLowering fixes (#120642 ) 1. We can use `getNumElements()` only for memrefs with trivial layout. 2. Buffer ops expecting sizes in i32 but descriptor values can be either i32 or i64, add appropriate casts. This implementation is not ideal as it can overflow, but it's still better than generating broken IR.	2024-12-20 18:09:01 +03:00
Krzysztof Drewniak	3452149c05	[mlir][AMDGPU] Support vector<2xbf16> packed atomic fadd (#113929 ) Now that we use LLVM's native bfloat types in the AMDGPU lowering, enable vector<2xbf16> for AMDGPU.	2024-10-31 10:52:53 -05:00
Benoit Jacob	d8a656ffaf	[MLIR] AMDGPUToROCDL: Use a bitcast op to reintepret a vector of i8 as single integer. (#111400 ) Found by inspecting AMDGPU assembly - so the arithmetic ops created there were definitely making their way into the target ISA. A `LLVM::BitcastOp` seems equivalent, and evaporates as expected in the target asm. Along the way, I thought that this helper function `mfmaConcatIfNeeded` could be renamed to `convertMFMAVectorOperand` to better convey its contract; so I don't need to think about whether a bitcast is a legitimate "concat" :-) --------- Signed-off-by: Benoit Jacob <jacob.benoit.1@gmail.com>	2024-10-07 14:14:18 -04:00
Matthias Springer	206fad0e21	[mlir][NFC] Mark type converter in `populate...` functions as `const` (#111250 ) This commit marks the type converter in `populate...` functions as `const`. This is useful for debugging. Patterns already take a `const` type converter. However, some `populate...` functions do not only add new patterns, but also add additional type conversion rules. That makes it difficult to find the place where a type conversion was added in the code base. With this change, all `populate...` functions that only populate pattern now have a `const` type converter. Programmers can then conclude from the function signature that these functions do not register any new type conversion rules. Also some minor cleanups around the 1:N dialect conversion infrastructure, which did not always pass the type converter as a `const` object internally.	2024-10-05 21:32:40 +02:00
Daniel Hernandez-Juarez	b014265d99	[mlir][AMDGPU] New gfx12 barrier instructions and update lowering LDSBarrierOp (#109273 ) New gfx12 barrier instructions: s.barrier.signal, s.barrier.wait and s.wait.dscnt. And update lowering LDSBarrierOp accordingly. CC: @krzysz00 @manupak @giuseros	2024-09-20 17:41:36 -05:00
Krzysztof Drewniak	6292ea6879	[mlir][AMDGPU] Remove an old bf16 workaround (#108409 ) The AMDGPU backend now implements LLVM's `bfloat` type. Therefore, we no longer need to type convert MLIR's `bf16` to `i16` during lowerings to ROCDL. As a result of this change, we discovered that, whel the code for MFMA and WMMA intrinsics was mainly prepared for this change, we were failing to bitcast the bf16 results of WMMA operations out from the i16 they're natively represented as. This commit also fixes that issue. --------- Co-authored-by: Jakub Kuderski <kubakuderski@gmail.com>	2024-09-12 17:45:39 -05:00
Krzysztof Drewniak	9596e83b2a	[mlir][AMDGPU] Enable emulating vector buffer_atomic_fadd on gfx11 (#108312 ) * Fix a bug introduced by the Chipset refactoring in #107720 where atomics emulation for adds was mistakenly applied to gfx11+ * Add the case needed for gfx11+ atomic emulation, namely that gfx11 doesn't support atomically adding a v2f16 or v2bf16, thus requiring MLIR-level legalization for buffer intrinsics that attempt to do such an addition * Add tests, including tests for gfx11 atomic emulation Co-authored-by: Manupa Karunaratne <manupa.karunaratne@amd.com>	2024-09-12 09:47:52 -05:00
Krzysztof Drewniak	aa60a3e4d0	[mlir][AMDGPU] Support vector<2xf16> inputs to buffer atomic fadd (#108286 ) Extend the lowering of atomic.fadd to support the v2f16 variant avaliable on some AMDGPU chips. Re-lands #108238 (and addresses review comments from there) Co-authored-by: Giuseppe Rossini <giuseppe.rossini@amd.com>	2024-09-11 17:51:07 -05:00
Krzysztof Drewniak	cb031267bd	Revert "[mlir][AMDGPU] Support vector<2xf16> inputs to buffer atomic fadd (#108238 )" (#108256 ) This reverts commit `0d48d4d835`. Mistakenly landed without approval	2024-09-11 12:28:15 -05:00
Krzysztof Drewniak	0d48d4d835	[mlir][AMDGPU] Support vector<2xf16> inputs to buffer atomic fadd (#108238 ) Extend the lowering of atomic.fadd to support the v2f16 variant avaliable on some AMDGPU chips. Co-authored-by: Giuseppe Rossini <giuseppe.rossini@amd.com>	2024-09-11 12:12:17 -05:00
Jakub Kuderski	763bc9249c	[mlir][amdgpu] Align Chipset with TargetParser (#107720 ) Update the Chipset struct to follow the `IsaVersion` definition from llvm's `TargetParser`. This is a follow up to https://github.com/llvm/llvm-project/pull/106169#discussion_r1733955012. * Add the stepping version. Note: This may break downstream code that compares against the minor version directly. * Use comparisons with full Chipset version where possible. Note that we can't use the code in `TargetParser` directly because the chipset utility is outside of `mlir/Target` that re-exports llvm's target library.	2024-09-09 11:12:26 -04:00
Giuseppe Rossini	a8e1c6f99a	[MLIR][AMDGPU] Add support for fp8 ops on gfx12 (#106388 ) This PR is adding support for `fp8` and `bfp8` on gfx12	2024-09-03 17:47:08 +01:00
stefankoncarevic	1164e4aef2	[mlir][AMDGPU] Implement AMDGPU DPP operation in MLIR. (#89233 ) Defined AMDGPU DPP operation in mlir to represent semantics. Introduced a new enumeration attribute for different permutations and allowed for different types of arguments. Implemented constant attribute handling for ROCDL::DPPMovOp operation. The operation now correctly accepts constant attributes for dppCtrl, rowMask, bankMask, boundCtrl, and passes them to the corresponding LLVM intrinsic.	2024-08-16 11:19:39 -05:00
Manupa Karunaratne	1d0723d4fb	[MLIR][AMDGPU] Add amdgpu.sched_barrier (#98911 ) This commit adds sched_barrier operator to AMDGPU dialect that lowers to rocdl.sched.barrier.	2024-07-30 09:27:28 -05:00
Christian Sigg	a5757c5b65	Switch member calls to `isa/dyn_cast/cast/...` to free function calls. (#89356 ) This change cleans up call sites. Next step is to mark the member functions deprecated. See https://mlir.llvm.org/deprecation and https://discourse.llvm.org/t/preferred-casting-style-going-forward.	2024-04-19 15:58:27 +02:00
Krzysztof Drewniak	b05c15259b	[mlir][AMDGPU] Improve amdgpu.lds_barrier, add warnings (#77942 ) On some architectures (currently gfx90a, gfx94, and gfx10*), we can implement an LDS barrier using compiler intrinsics instead of inline assembly, improving optimization possibilities and decreasing the fragility of the underlying code. Other AMDGPU chipsets continue to require inline assembly to implement this barrier, as, by the default, the LLVM backend will insert waits on global memory (s_waintcnt vmcnt(0)) before barriers in order to ensure memory watchpoints set by debuggers work correctly. Use of amdgpu.lds_barrier, on these architectures, imposes a tradeoff between debugability and performance. The documentation, as well as the generated inline assembly, have been updated to explicitly call attention to this fact. For chipsets that did not require the inline assembly hack, we move to the s.waitcnt and s.barrier intrinsics, which have been added to the ROCDL dialect. The magic constants used as an argument to the waitcnt intrinsic can be derived from llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.cpp	2024-03-11 10:06:49 -05:00
Hugo Trachino	65066c0277	[mlir] Use `create` instead of `createOrFold` for ConstantOp as folding has no effect (NFC) (#80129 ) This aims to clean-up confusing uses of builder.createOrFold<ConstantOp> since folding of constants fails.	2024-01-31 23:40:37 -08:00
Mehdi Amini	5c1650e7f4	Apply clang-tidy fixes for llvm-else-after-return in AMDGPUToROCDL.cpp (NFC)	2023-11-20 01:40:31 -08:00
Krzysztof Drewniak	2ebd633f14	[mlir][AMDGPU] Add packed 8-bit float conversion ops and lowering Define operations that wrap the gfx940's new operations for converting between f32 and registers containing packed sets of four 8-bit floats. Define rocdl operations for the intrinsics and an AMDGPU dialect wrapper around them (to account for the fact that MLIR distinguishes the two float formats at the type level but that the LLVM IR does not). Define an ArithToAMDGPU pass, meant to run before conversion to LLVM, that replaces relevant calls to arith.extf and arith.truncf with the packed operations in the AMDGPU dialect. Note that the conversion currently only handles scalars and vectors of rank <= 1, as we do not have a usecase for multi-dimensional vector support right now. Reviewed By: jsjodin Differential Revision: https://reviews.llvm.org/D152457	2023-09-28 14:44:16 +00:00
Krzysztof Drewniak	bfa501b892	[mlir][AMDGPU] Move to new buffer resource intrinsics The AMDGPU backend now has buffer resource intrinsics that take a ptr addrspase (8) instead of a vector<4xi32>, improving LLVM's ability to reason about their memory behavior. This commit moves MLIR to these new functions. Reviewed By: jsjodin Differential Revision: https://reviews.llvm.org/D157053	2023-09-22 19:48:06 +00:00
Krzysztof Drewniak	51b65d0895	[mlir][AMDGPU] Improve BF16 handling through AMDGPU compilation Many previous sets of AMDGPU dialect code have been incorrect in the presence of the bf16 type (when lowered to LLVM's bfloat) as they were developed in a setting that run a custom bf16-to-i16 pass before LLVM lowering. An overall effect of this patch is that you should run --arith-emulate-unsupported-floats="source-types=bf16 target-type=f32" on your GPU module before calling --convert-gpu-to-rocdl if your code performs bf16 arithmetic. While LLVM now supports software bfloat, initial experiments showed that using this support on AMDGPU inserted a large number of conversions around loads and stores which had substantial performance imparts. Furthermore, all of the native AMDGPU operations on bf16 types (like the WMMA operations) operate on 16-bit integers instead of the bfloat type. First, we make the following changes to preserve compatibility once the LLVM bfloat type is reenabled. 1. The matrix multiplication operations (MFMA and WMMA) will bitcast bfloat vectors to i16 vectors. 2. Buffer loads and stores will operate on the relevant integer datatype and then cast to bfloat if needed. Second, we add type conversions to convert bf16 and vectors of it to equivalent i16 types. Third, we add the bfloat <-> f32 expansion patterns to the set of operations run before the main LLVM conversion so that MLIR's implementation of these conversion routines is used. Finally, we extend the "floats treated as integers" support in the LLVM exporter to handle types other than fp8. We also fix a bug in the unsupported floats emulation where it tried to operate on `arith.bitcast` due to an oversight. Reviewed By: rsuderman Differential Revision: https://reviews.llvm.org/D156361	2023-08-17 18:31:28 +00:00
Matthias Springer	ce254598b7	[mlir][Conversion] Store const type converter in ConversionPattern ConversionPatterns do not (and should not) modify the type converter that they are using. * Make `ConversionPattern::typeConverter` const. * Make member functions of the `LLVMTypeConverter` const. * Conversion patterns take a const type converter. * Various helper functions (that are called from patterns) now also take a const type converter. Differential Revision: https://reviews.llvm.org/D157601	2023-08-14 09:03:11 +02:00
Nicolas Vasilache	620e2bb20c	[mlir][LLVM] NFC - Remove createIndexConstant method This revision removes the createIndexConstant method, which implicitly creates constants of the getIndexType type and updates all uses to the more explicit createIndexAttrConstant which requires an explicit Type parameter. This is an NFC step towards entangling index type conversion in LLVM lowering. The selection of which index type to use requires finer granularity than the existing implementations which all rely on pass level flags and end up in mismatches, especially on GPUs with multiple address spaces of different capacities. This revision also includes an NFC fix to MemRefToLLVM.cpp that prevents a crash in cases where an integer memory space cannot be derived for a MemRef. Differential Revision: https://reviews.llvm.org/D156854	2023-08-02 07:24:29 +00:00
Giuseppe Rossini	4b3eaee270	[mlir][AMDGPU] Define wrappers for WMMA matrix ops Wave Matrix Multiply Accumulate (WMMA) is the instruction to accelerate matrix multiplication on RDNA3 architectures. LLVM already provides a set of intrinsics to generate wmma instructions. This change uses those intrinsics to enable the feature in MLIR. Reviewed By: krzysz00 Differential Revision: https://reviews.llvm.org/D152451	2023-07-20 18:38:35 +00:00
Matthias Springer	b23c8225e8	[mlir][NFC] Clean up builder usage around constants/non-foldable ops * Use `create` instead of `createOrFold` for constant ops. Constants cannot be folded any further. * Use `create` instead of `createOrFold` for ops that do not have a folder. * Use C++ op builders that take an `int` instead of creating a `ConstantIndexOp`. * Create `tensor::DimOp` instead of `linalg::createOrFoldDimOp` when it is certain that the operand is a tensor. Differential Revision: https://reviews.llvm.org/D154196	2023-06-30 13:56:42 +02:00
Krzysztof Drewniak	73eecc9ca4	[mlir] Convert 8-bit float types to i8 Whereas LLVM currently doesn't have any types for 8-bit floats, and whereas existing 8-bit float APIs (for instance, the AMDGCN intrinsics) take such floats as (packed) bytes, translate the MLIR 8-bit float types to i8 during LLVM lowering. In order to not special-case arith.constant for bitcasting constants to their integer form, amend the MLIR to LLVM translator to turn 8-bit float constants into i8 constants with the same value (by use of APFloat's bitcast method). This change can be reverted once LLVM has 8-bit float types. Reviewed By: gysit Differential Revision: https://reviews.llvm.org/D153160	2023-06-26 17:42:00 +00:00
Giuseppe Rossini	20c66a0c66	[AMDGPU] Add basic support for gfx11xx This patch fixes a minor issue in AMDGPUToROCDL to add gfx11 support in MLIR Reviewed By: krzysz00 Differential Revision: https://reviews.llvm.org/D152450	2023-06-12 17:06:36 +00:00
Tres Popp	5550c82189	[mlir] Move casting calls from methods to function calls The MLIR classes Type/Attribute/Operation/Op/Value support cast/dyn_cast/isa/dyn_cast_or_null functionality through llvm's doCast functionality in addition to defining methods with the same name. This change begins the migration of uses of the method to the corresponding function call as has been decided as more consistent. Note that there still exist classes that only define methods directly, such as AffineExpr, and this does not include work currently to support a functional cast/isa call. Caveats include: - This clang-tidy script probably has more problems. - This only touches C++ code, so nothing that is being generated. Context: - https://mlir.llvm.org/deprecation/ at "Use the free function variants for dyn_cast/cast/isa/…" - Original discussion at https://discourse.llvm.org/t/preferred-casting-style-going-forward/68443 Implementation: This first patch was created with the following steps. The intention is to only do automated changes at first, so I waste less time if it's reverted, and so the first mass change is more clear as an example to other teams that will need to follow similar steps. Steps are described per line, as comments are removed by git: 0. Retrieve the change from the following to build clang-tidy with an additional check: https://github.com/llvm/llvm-project/compare/main...tpopp:llvm-project:tidy-cast-check 1. Build clang-tidy 2. Run clang-tidy over your entire codebase while disabling all checks and enabling the one relevant one. Run on all header files also. 3. Delete .inc files that were also modified, so the next build rebuilds them to a pure state. 4. Some changes have been deleted for the following reasons: - Some files had a variable also named cast - Some files had not included a header file that defines the cast functions - Some files are definitions of the classes that have the casting methods, so the code still refers to the method instead of the function without adding a prefix or removing the method declaration at the same time. ``` ninja -C $BUILD_DIR clang-tidy run-clang-tidy -clang-tidy-binary=$BUILD_DIR/bin/clang-tidy -checks='-,misc-cast-functions'\ -header-filter=mlir/ mlir/ -fix rm -rf $BUILD_DIR/tools/mlir/*/.inc git restore mlir/lib/IR mlir/lib/Dialect/DLTI/DLTI.cpp\ mlir/lib/Dialect/Complex/IR/ComplexDialect.cpp\ mlir/lib/**/IR/\ mlir/lib/Dialect/SparseTensor/Transforms/SparseVectorization.cpp\ mlir/lib/Dialect/Vector/Transforms/LowerVectorMultiReduction.cpp\ mlir/test/lib/Dialect/Test/TestTypes.cpp\ mlir/test/lib/Dialect/Transform/TestTransformDialectExtension.cpp\ mlir/test/lib/Dialect/Test/TestAttributes.cpp\ mlir/unittests/TableGen/EnumsGenTest.cpp\ mlir/test/python/lib/PythonTestCAPI.cpp\ mlir/include/mlir/IR/ ``` Differential Revision: https://reviews.llvm.org/D150123	2023-05-12 11:21:25 +02:00
Krzysztof Drewniak	cc4703745f	[mlir][AMDGPU] Add emulation pass for atomics on AMDGPU targets Not all AMDGPU targets support all atomic operations. For example, there are not atomic floating-point adds on the gfx10 series. Add a pass to emulate these operations using a compare-and-swap loop, by analogy to the generic atomicrmw rewrite in MemrefToLLVM. This pass is named generally, as in the future we may have a memref-to-amdgpu that translates constructs like atomicrmw fmax (which doesn't generally exist in LLVM) to the relevant intrinsics, which may themselves require emulation. Since the AMDGPU dialect now has a pass that operates on it, the dialect's directory structure is reorganized to match other similarly complex dialects. The pass should be run before amdgpu-to-rocdl if desired. This commit also adds f64 support to atomic_fmax. Depends on D148722 Reviewed By: nirvedhmeshram Differential Revision: https://reviews.llvm.org/D148724	2023-05-03 21:18:48 +00:00
Krzysztof Drewniak	98c1104d41	[mlir][AMDGPU] Define atomic compare-and-swap for raw buffers This commit adds the buffer cmpswap intrinsic to the ROCDL dialect and its corresponding AMDGPU dialect wrappers. Reviewed By: nirvedhmeshram Differential Revision: https://reviews.llvm.org/D148722	2023-05-03 21:11:20 +00:00
giuseros	82ac02e4a8	Add scalar support for amdgpu.raw_buffer_{load,store} Introduce the possibility to load/store scalars via amdgpu.raw_buffer_{load,store} Reviewed By: krzysz00 Differential Revision: https://reviews.llvm.org/D146413	2023-03-20 20:19:20 +00:00
Jakub Kuderski	8c258fda1f	[ADT][mlir][NFCI] Do not use non-const lvalue-refs with enumerate Replace references to enumerate results with either result_pairs (reference wrapper type) or structured bindings. I did not use structured bindings everywhere as it wasn't clear to me it would improve readability. This is in preparation to the switch to zip semantics which won't support non-const lvalue reference to elements: https://reviews.llvm.org/D144503. I chose to use values instead of const lvalue-refs because MLIR is biased towards avoiding `const` local variables. This won't degrade performance because currently `result_pair` is cheap to copy (size_t + iterator), and in the future, the enumerator iterator dereference will return temporaries anyway. Reviewed By: dblaikie Differential Revision: https://reviews.llvm.org/D146006	2023-03-15 10:43:56 -04:00
Manupa Karunaratne	584f64365a	[MLIR][AMDGPU][ROCDL] Adding raw.buffer.atomic.fmax/smax/umin support This commit adds support for atomic fmax/smax/umin support for AMDGPU dialect and the dependent dialects to allow such a lowering. Reviewed By: krzysz00 Differential Revision: https://reviews.llvm.org/D144097	2023-02-28 16:58:35 +00:00
Krzysztof Drewniak	22f0c7a451	[mlir][AMDGPU] 8-bit float usage in the AMDGPU dialect Upcoming AMD hardware will include functions that accept 8-bit floats. Specifically, there are MFMA instructions that accept 8-bit floats, either using the same or mixed formats. This patch adds MLIR wrappers for these intrinsics and explicitly adds support for 8-bit floats in the gpu-to-rocdl conversion by way of amdgpu-to-rocdl. Since LLVM does not have f8 types, when targeting LLVM for compilation on an AMD GPU, both f8 types used on AMD hardware (f8E5M2FNUZ and f8E4M3FNUZ) are rewritten to i8. This patch also relaxes the restriction that the types of both source operands to a amdgpu.mfma instructions match exactly, as this is not necessarily required for the bf8 (f8E5M2FNUZ) and fp8 (f8E4M3FNUZ) instructions. In addition, since the buffer_{load,store} operations maintain a whitelist of permitted types, we add the relevant f8 types to that list. This patch does not add any implementations of arithmetic operations for f8 types. Reviewed By: jakeh-gc Differential Revision: https://reviews.llvm.org/D143956	2023-02-15 16:46:08 +00:00
Kazu Hirata	0a81ace004	[mlir] Use std::optional instead of llvm::Optional (NFC) This patch replaces (llvm::\|)Optional< with std::optional<. I'll post a separate patch to remove #include "llvm/ADT/Optional.h". This is part of an effort to migrate from llvm::Optional to std::optional: https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716	2023-01-14 01:25:58 -08:00
Kazu Hirata	a1fe1f5f77	[mlir] Add #include <optional> (NFC) This patch adds #include <optional> to those files containing llvm::Optional<...> or Optional<...>. I'll post a separate patch to actually replace llvm::Optional with std::optional. This is part of an effort to migrate from llvm::Optional to std::optional: https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716	2023-01-13 21:05:06 -08:00
Kazu Hirata	1a36588ec6	[mlir] Use std::nullopt instead of None (NFC) This patch mechanically replaces None with std::nullopt where the compiler would warn if None were deprecated. The intent is to reduce the amount of manual work required in migrating from Optional to std::optional. This is part of an effort to migrate from llvm::Optional to std::optional: https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716	2022-12-03 18:50:27 -08:00
Aliia Khasanova	399638f98c	Merge kDynamicSize and kDynamicSentinel into one constant. resolve conflicts Differential Revision: https://reviews.llvm.org/D138282	2022-11-21 13:01:26 +00:00
Kazu Hirata	430cbd5401	[mlir] Fix a warning This patch fixes: mlir/lib/Conversion/AMDGPUToROCDL/AMDGPUToROCDL.cpp:128:10: warning: variable ‘llvm2xI32’ set but not used [-Wunused-but-set-variable] The last use of llvm2xI32 was removed on July 6, 2022 in commit `6329562249`.	2022-10-23 10:11:20 -07:00
Krzysztof Drewniak	c55b41d519	[mlir][AMDGPU] Define amdgpu.mfma operator The amdgpu.mfma operator is a wrapper around the Matrix Fused Multiply Add (MFMA) instructions on some AMD GPUs (the CDNA-based MI-* cards). This interface allows for selecting the operation to be performed by specifying the dimensions of the multiplication to be performed and any additional attributes (such as whether to use reduced-precision floating-point math) that are needed to select the relevant mfma instruction and set its parameters. Reviewed By: ThomasRaoux, nirvedhmeshram Differential Revision: https://reviews.llvm.org/D132956	2022-08-31 21:06:12 +00:00
Michele Scuttari	67d0d7ac0a	[MLIR] Update pass declarations to new autogenerated files The patch introduces the required changes to update the pass declarations and definitions to use the new autogenerated files and allow dropping the old infrastructure. Reviewed By: mehdi_amini, rriddle Differential Review: https://reviews.llvm.org/D132838	2022-08-31 12:28:45 +02:00
Michele Scuttari	039b969b32	Revert "[MLIR] Update pass declarations to new autogenerated files" This reverts commit `2be8af8f0e`.	2022-08-30 22:21:55 +02:00
Michele Scuttari	2be8af8f0e	[MLIR] Update pass declarations to new autogenerated files The patch introduces the required changes to update the pass declarations and definitions to use the new autogenerated files and allow dropping the old infrastructure. Reviewed By: mehdi_amini, rriddle Differential Review: https://reviews.llvm.org/D132838	2022-08-30 21:56:31 +02:00
Jeff Niu	0af643f3ce	[mlir][LLVMIR] (NFC) Add convenience builders for ConstantOp And clean up some of the user code	2022-08-09 15:34:36 -04:00
Krzysztof Drewniak	6329562249	[mlir][AMDGPU] Explicitly truncate memory addresses in buffer ops As a percaution, truncate memory addresses passed to kernels to 48 bits, since bits 48-63 of the buffer descriptor are used for the stride field and, on gfx10, to control swizzling. Reviewed By: ThomasRaoux Differential Revision: https://reviews.llvm.org/D131016	2022-08-04 19:42:33 +00:00
Krzysztof Drewniak	bc61cc9a2d	[mlir][AMDGPU] Add lds_barrier op The lds_barrier op allows workgroups to wait at a barrier for operations to/from their local data store (LDS) to complete without incurring the performance penalties of a full memory fence. Reviewed By: nirvedhmeshram Differential Revision: https://reviews.llvm.org/D129522	2022-07-14 20:45:26 +00:00
Krzysztof Drewniak	db590549a9	[mlir][AMDGPU] Use the correct values for OOB_SELECT on gfx10 Differential Revision: https://reviews.llvm.org/D129320	2022-07-07 21:23:38 +00:00
Krzysztof Drewniak	cab44c515c	[mlir][AMDGPU] Add --chipset option to AMDGPUToROCDL Because the buffer descriptor structure (the V#) has no backwards-compatibility guarentees, and since said guarantees have been violated in practice (see https://github.com/llvm/llvm-project/issues/56323 ), and since the `targetIsRDNA` attribute isn't something that higher-level clients can set in general, make the lowering of the amdgpu dialect to rocdl take a --chipset option. Note that this option is a string because adding a parser for the Chipset struct to llvm::cl wasn't working out. Reviewed By: herhut Differential Revision: https://reviews.llvm.org/D129228	2022-07-07 14:58:13 +00:00
Kazu Hirata	037f09959a	[mlir] Don't use Optional::hasValue (NFC)	2022-06-20 11:22:37 -07:00
Jacques Pienaar	8df54a6a03	[mlir] Update accessors to prefixed form (NFC) Follow up from flipping dialects to both, flip accessor used to prefixed variant ahead to flipping from _Both to _Prefixed. This just flips to the accessors introduced in the preceding change which are just prefixed forms of the existing accessor changed from. Mechanical change using helper script https://github.com/jpienaar/llvm-project/blob/main/clang-tools-extra/clang-tidy/misc/AddGetterCheck.cpp and clang-format.	2022-06-18 17:53:22 -07:00

1 2

51 Commits