clang-p2996

Author	SHA1	Message	Date
Valentin Clement (バレンタインクレメン)	5be9082fed	[flang][cuda] Carry over the dynamic shared memory size to gpu.launch_func (#132837 )	2025-03-24 18:37:19 -07:00
Valentin Clement (バレンタインクレメン)	478e516140	[flang][cuda] Sync double descriptor after c_f_pointer call (#130194 ) After a global device pointer is set through `c_f_pointer`, we need to sync the double descriptor so the version on the device is also up to date.	2025-03-06 19:19:51 -08:00
Valentin Clement (バレンタインクレメン)	93b2e47f12	[flang][cuda] Avoid assign element mismatch when doing data transfer from a constant (#128252 ) Currently when we do a CUDA data transfer from a constant, we embox it and delegate the assignment to the runtime. When the type of the constant is not exactly the same as the destination descriptor, the runtime will emit an assignment mismatch error. Convert the constant when necessary so the assignment is fine.	2025-02-21 17:46:46 -08:00
Michael Kruse	b815a3942a	[Flang] Move non-common headers to FortranSupport (#124416 ) Move non-common files from FortranCommon to FortranSupport (analogous to LLVMSupport) such that * declarations and definitions that are only used by the Flang compiler, but not by the runtime, are moved to FortranSupport * declarations and definitions that are used by both ("common"), the compiler and the runtime, remain in FortranCommon * generic STL-like/ADT/utility classes and algorithms remain in FortranCommon This allows a for cleaner separation between compiler and runtime components, which are compiled differently. For instance, runtime sources must not use STL's `<optional>` which causes problems with CUDA support. Instead, the surrogate header `flang/Common/optional.h` must be used. This PR fixes this for `fast-int-sel.h`. Declarations in include/Runtime are also used by both, but are header-only. `ISO_Fortran_binding_wrapper.h`, a header used by compiler and runtime, is also moved into FortranCommon.	2025-02-06 15:29:10 +01:00
Valentin Clement (バレンタインクレメン)	f1b075df2e	[flang][cuda] Pass the pinned variable in allocate calls (#125310 )	2025-02-02 18:05:59 -08:00
agozillon	4186805060	[Flang][MLIR] Extend DataLayout utilities to have basic GPU Module support (#123149 ) As there is now certain areas where we now have the possibility of having either a ModuleOp or GPUModuleOp and both of these modules can have DataLayout's and we may require utilising the DataLayout utilities in these areas I've taken the liberty of trying to extend them for use with both. Those with more knowledge of how they wish the GPUModuleOp's to interact with their parent ModuleOp's DataLayout may have further alterations they wish to make in the future, but for the moment, it'll simply utilise the basic data layout construction which I believe combines parent and child datalayouts from the ModuleOp and GPUModuleOp. If there is no GPUModuleOp DataLayout it should default to the parent ModuleOp. It's worth noting there is some weirdness if you have two module operations defining builtin dialect DataLayout Entries, it appears the combinatorial functionality for DataLayouts doesn't support the merging of these. This behaviour is useful for areas like: https://github.com/llvm/llvm-project/pull/119585/files#diff-19fc4bcb38829d085e25d601d344bbd85bf7ef749ca359e348f4a7c750eae89dR1412 where we have a crossroads between the two different module operations.	2025-01-30 17:31:50 +01:00
Valentin Clement (バレンタインクレメン)	382d3599c2	[flang][cuda] Propagate the data attribute on the converted calls (#124877 )	2025-01-29 08:04:22 -08:00
Valentin Clement (バレンタインクレメン)	ee054404df	[flang][cuda] Carry over the cuf.proc_attr attribute to gpu.launch_func (#124325 )	2025-01-24 13:09:58 -08:00
Valentin Clement (バレンタインクレメン)	67a8857989	[flang][cuda] Handle pointer allocation with double descriptors (#124183 )	2025-01-23 15:54:22 -08:00
Valentin Clement (バレンタインクレメン)	8c138bee6e	[flang][cuda] Handle pointer allocation with source (#124070 )	2025-01-23 09:24:06 -08:00
Valentin Clement (バレンタインクレメン)	6e498bc2cd	[flang][cuda] Handle simple device pointer allocation (#123996 )	2025-01-22 15:59:32 -08:00
Valentin Clement (バレンタインクレメン)	ce30ee53a8	[flang][cuda] Add gpu.launch to device context (#123105 ) `gpu.launch` should also be considered device context.	2025-01-15 14:04:38 -08:00
Valentin Clement (バレンタインクレメン)	a19919f4cd	[flang][cuda] Add cuf.device_address operation (#122975 ) Introduce a new op to get the device address from a host symbol. This simplify the current conversion and this is also in preparation for some legalization work that need to be done in cuf kernel and cuf kernel launch similar to https://github.com/llvm/llvm-project/pull/122802	2025-01-14 17:38:04 -08:00
Valentin Clement (バレンタインクレメン)	ba4dc5a0d6	[flang][cuda] Pass the device address for global descriptor (#122802 )	2025-01-13 17:23:12 -08:00
Valentin Clement (バレンタインクレメン)	7531672712	[flang][cuda][NFC] Remove unused variable (#121533 ) Failed buildbot after https://github.com/llvm/llvm-project/pull/121524	2025-01-02 17:37:44 -08:00
Valentin Clement (バレンタインクレメン)	6dcd2b035d	[flang][cuda] Convert cuf.sync_descriptor to runtime call (#121524 ) Convert the op to a new entry point in the runtime `CUFSyncGlobalDescriptor`	2025-01-02 17:02:59 -08:00
Valentin Clement (バレンタインクレメン)	4b17a8b10e	[flang][cuda] Add operation to sync global descriptor (#121520 ) Introduce cuf.sync_descriptor to be used to sync device global descriptor after pointer association. Also move CUFCommon so it can be used in FIRBuilder lib as well.	2025-01-02 17:02:45 -08:00
Valentin Clement (バレンタインクレメン)	415cfaf339	[flang][cuda][NFC] Fix type in CUFFreeDescriptor (#120799 )	2024-12-20 14:43:12 -08:00
Valentin Clement (バレンタインクレメン)	e650ac1654	[flang][cuda][NFC] Fix typo in CUFAllocDescriptor (#120797 ) Missing `r` in the function name.	2024-12-20 13:57:47 -08:00
Renaud Kauffmann	27e458c8cb	[flang][cuda] Distinguish constant fir.global from globals with a #cuf.cuda<constant> attribute (#118912 ) 1. In `CufOpConversion` `isDeviceGlobal` was renamed `isRegisteredGlobal` and moved to the common file. `isRegisteredGlobal` excludes constant `fir.global` operation from registration. This is to avoid calls to `_FortranACUFGetDeviceAddress` on globals which do not have any symbols in the runtime. This was done for `_FortranACUFRegisterVariable` in #118582, but also needs to be done here after #118591 2. `CufDeviceGlobal` no longer adds the `#cuf.cuda<constant>` attribute to the constant global. As discussed in #118582 a module variable with the #cuf.cuda<constant> attribute is not a compile time constant. Yet, the compile time constant also needs to be copied into the GPU module. The candidates for copy to the GPU modules are - the globals needing regsitrations regardless of their uses in device code (they can be referred to in host code as well) - the compile time constant when used in device code 3. The registration of "constant" module device variables ( #cuf.cuda<constant>) can be restored in `CufAddConstructor`	2024-12-05 18:36:48 -08:00
Valentin Clement (バレンタインクレメン)	7efd6139f2	[flang][cuda] Get device address in fir.declare (#118591 ) Add pattern that update fir.declare memref when it comes from a device global and is not a descriptor. In that case, we recover the device address that needs to be used in ops like `fir.array_coor` and so on.	2024-12-04 13:36:58 -08:00
Valentin Clement (バレンタインクレメン)	b5825963f0	[flang][cuda] Materialize box when needed (#117810 ) Materialize the box when the src comes from a embox or rebox operation. This was done in the case of transfer to a descriptor but not when transferring from a descriptor.	2024-11-26 17:36:25 -08:00
Valentin Clement (バレンタインクレメン)	eb5cda480d	[flang][cuda] cuf.allocate: Carry over stream to the runtime call (#117631 ) - Update the runtime entry points to accept a stream information - Update the conversion of `cuf.allocate` to pass correctly the stream information when present. Note that the stream is not currently used in the runtime. This will be done in a separate patch as a design/solution needs to be down together with the allocators.	2024-11-25 20:46:24 -08:00
Valentin Clement (バレンタインクレメン)	5802367ddb	[flang][cuda] Add support for allocate with source (#117388 ) Add support for allocate statement with CUDA device variable and a source.	2024-11-22 16:55:26 -08:00
Valentin Clement (バレンタインクレメン)	4d7df40c08	[flang][cuda] Materialize constant src in memory (#116851 ) When the src of the data transfer is a constant, it needs to be materialized in memory to be able to perform a data transfer. ``` subroutine sub1() real, device :: a(10) integer :: I do i = 5, 10 a(i) = -4.0 end do end ```	2024-11-19 14:11:20 -08:00
Valentin Clement (バレンタインクレメン)	de2e270ee6	[flang][cuda] Materialize box when src or dst are rebox (#116494 )	2024-11-18 09:22:12 -08:00
Valentin Clement	42be165dde	Reland '[flang][cuda] Specialize entry point for scalar to desc data transfer'	2024-11-15 19:13:55 -08:00
Valentin Clement (バレンタインクレメン)	70b9440c88	Revert "[flang][cuda] Specialize entry point for scalar to desc data transfer" (#116458 ) Reverts llvm/llvm-project#116457	2024-11-15 17:44:48 -08:00
Valentin Clement (バレンタインクレメン)	43cb424a54	[flang][cuda] Specialize entry point for scalar to desc data transfer (#116457 ) The runtime Assign function is not meant to initialize an array from a scalar. For that we need to use DoAssignFromSource. Update the data transfer from scalar to descriptor to use a new entry point that use this function underneath.	2024-11-15 17:41:23 -08:00
Valentin Clement (バレンタインクレメン)	b1fa9d154b	[flang][cuda] Correctly embox logical constant (#116445 )	2024-11-15 15:29:41 -08:00
Valentin Clement (バレンタインクレメン)	012fad975e	[flang][cuda] Materialize the box in memory when dst is emboxed (#116320 ) Similar to #116289 but for the dst.	2024-11-15 14:31:36 -08:00
Valentin Clement (バレンタインクレメン)	e8469f1577	[flang][cuda] Add support for character type in cuf.alloc and cuf.data_transfer (#116277 ) Add support for character type in bytes computation	2024-11-15 14:31:21 -08:00
Valentin Clement (バレンタインクレメン)	98daf22638	[flang][cuda] Materialize the box in memory when src is emboxed (#116289 )	2024-11-14 18:33:14 -08:00
Valentin Clement (バレンタインクレメン)	02018cf793	[flang][cuda][NFC] Use mlir::emitError to get location (#116267 ) Use `mlir::emitError` so we can get location information on error.	2024-11-14 10:32:09 -08:00
Valentin Clement (バレンタインクレメン)	d133a3ee9d	[flang][cuda] Add conversion after CUFGetDeviceAddress to avoid issue when emboxing (#116145 )	2024-11-14 09:03:15 -08:00
Valentin Clement (バレンタインクレメン)	ec066d30e2	[flang][cuda] cuf.alloc in device context should be converted to fir.alloc (#116110 ) Update `inDeviceContext` to account for the gpu.func operation.	2024-11-13 14:57:42 -08:00
Valentin Clement (バレンタインクレメン)	e457861647	[flang][cuda] Support shape shift in data transfer op. (#115929 ) When an array is declared with a non default lower bound, the declare op `getShape` will return a `ShapeShiftOp`. This result is used in data transfer operation to compute the number of bytes to transfer. Update the op to support `ShapeShiftOp`.	2024-11-13 11:13:19 -08:00
Valentin Clement (バレンタインクレメン)	2583071fb4	[flang][cuda] Compute size of derived type arrays (#115914 )	2024-11-12 21:23:58 -08:00
Valentin Clement (バレンタインクレメン)	853d52b838	[flang][cuda] Support derived type in cuf.data_transfer conversion (#115557 ) Support derived type in `cuf.data_transfer` conversion by computing their size in bytes.	2024-11-12 10:05:53 -08:00
Valentin Clement (バレンタインクレメン)	d4eb430c9e	[flang][cuda] Support derived type in cuf.alloc (#115550 ) Number of bytes to allocate was not computed when using `cuf.alloc` with a derived type. Update the conversion to compute the number of bytes and emit an error when type is not supported.	2024-11-08 14:32:00 -08:00
Valentin Clement (バレンタインクレメン)	ef8d88ca1a	[flang][cuda] Support scalar to array data transfer (#115273 ) Do it via descriptor assignment until we have a more efficient way.	2024-11-07 09:27:10 -08:00
Valentin Clement (バレンタインクレメン)	db69d6939a	[flang][cuda] Support data transfer from descriptor to a pointer (#115023 ) Data transfer from a variable with a descriptor to a pointer. We create a descriptor for the pointer so we can use the flang runtime to perform the transfer. The Assign function handles all corner cases. We add a new entry points `CUFDataTransferDescDescNoRealloc` to avoid reallocation since the variable on the LHS is not an allocatable.	2024-11-05 11:59:08 -08:00
Valentin Clement (バレンタインクレメン)	652db7e4ff	[flang][cuda] Support data transfer from pointer to a descriptor (#114892 ) When source is a pointer to an array or a scalar, embox it and use the `CUFDataTransferDescDesc` or `CUFDataTransferGlobalDescDesc` entry points. The runtime is already able to deal with all the corner cases like non contiguous arrays and so on so we exploit this. Memset might still be used for simple case where we want to initialize to 0 for example. This will come in a follow up patch.	2024-11-05 08:56:19 -08:00
Valentin Clement (バレンタインクレメン)	9d09c6fd9c	[flang][cuda] Update device descriptor on data transfer (#114838 ) When the destination of the data transfer is a global we might need to sync the descriptor after the data transfer is done. This is the case when the data transfer is from host/device to device as reallocation might have happened and the descriptor on the device needs to take the new values written on the host. A new entry point is added `CUFDataTransferGlobalDescDesc` with the sync when needed.	2024-11-04 13:22:06 -08:00
Valentin Clement (バレンタインクレメン)	e4e9fea71e	[flang][cuda] Pass descriptor by reference for CUFMemsetDescriptor (#114338 )	2024-10-31 09:02:59 -07:00
Renaud Kauffmann	bfe486fe76	Passing descriptors by reference to CUDA runtime calls (#114288 ) Passing a descriptor as a `const Descriptor &` or a `const Descriptor ` generates a FIR signature where the box is passed by value. This is an issue, as it requires a load of the box to be passed. But since, ultimately, all boxes are passed by reference a temporary is generated in LLVM and the reference to the temporary is passed. The boxes addresses are registered with the CUDA runtime but the temporaries are not, thus preventing the runtime to properly map a host side address to its device side counterpart. To address this issue, this PR changes the signatures to the transfer functions to pass a descriptor as a `Descriptor `, which will in turn generate a FIR signature with that takes a box reference as an argument.	2024-10-30 13:24:47 -07:00
Valentin Clement (バレンタインクレメン)	0d94c7b5ce	[flang][cuda][NFC] Make pattern names homogenous (#114156 ) Dialect name is uppercase. Make all the patterns prefix homogenous.	2024-10-29 20:39:17 -07:00
Valentin Clement (バレンタインクレメン)	0fa2fb3ed0	[flang][cuda] Add conversion pattern for cuf.kernel_launch op (#114129 )	2024-10-29 17:00:41 -07:00
Renaud Kauffmann	b9978f8c77	[flang][cuda] Adding variable registration in constructor (#113976 ) 1) Adding variable registration in constructor 2) Applying feedback from PR https://github.com/llvm/llvm-project/pull/112989	2024-10-29 11:48:48 -07:00
Valentin Clement (バレンタインクレメン)	4e40b71c51	[flang][cuda] Add specialized gpu.launch_func conversion (#113493 )	2024-10-23 15:28:51 -07:00

1 2

51 Commits