clang-p2996

Author	SHA1	Message	Date
Valentin Clement (バレンタインクレメン)	7efd6139f2	[flang][cuda] Get device address in fir.declare (#118591 ) Add pattern that update fir.declare memref when it comes from a device global and is not a descriptor. In that case, we recover the device address that needs to be used in ops like `fir.array_coor` and so on.	2024-12-04 13:36:58 -08:00
Renaud Kauffmann	ed2db3be61	[flang][cuda] Do not register global constants (#118582 ) Global constants have no symbols in library files. They are replaced with literal constants during lowering before kernels are moved into a GPU module. Do not register them because they will result in unresolved symbols.	2024-12-04 09:37:08 -08:00
Valentin Clement (バレンタインクレメン)	5522d2462e	[flang][cuda] Allow AbstractResult to run in gpu.module (#118529 ) in CUDA Fortran, device function are converted to `gpu.func` inside the `gpu.module` operation. Update the AbstractResult pass to be able to run on `func.func` and `gpu.func` operations inside the `gpu.module`.	2024-12-03 14:04:49 -08:00
s-watanabe314	f3cf24fcc4	[flang] Apply nocapture attribute to dummy arguments (#116182 ) Apply llvm.nocapture attribute to dummy arguments that do not have the target, asynchronous, volatile, or pointer attributes in a procedure that is not a bind(c). This was discussed in https://discourse.llvm.org/t/applying-the-nocapture-attribute-to-reference-passed-arguments-in-fortran-subroutines/81401	2024-11-28 15:39:26 +09:00
Valentin Clement (バレンタインクレメン)	b5825963f0	[flang][cuda] Materialize box when needed (#117810 ) Materialize the box when the src comes from a embox or rebox operation. This was done in the case of transfer to a descriptor but not when transferring from a descriptor.	2024-11-26 17:36:25 -08:00
jeanPerier	cf602b95d1	[flang] handle fir.call in AliasAnalysis::getModRef (#117164 ) fir.call side effects are hard to describe in a useful way using `MemoryEffectOpInterface` because it is impossible to list which memory location a user procedure read/write without doing a data flow analysis of its body (even PURE procedures may read from any module variable, Fortran SIMPLE procedure from F2023 will allow that, but they are far from common at that point). Fortran language specifications allow the compiler to deduce that a procedure call cannot access a variable in many cases This patch leverages this to extend `fir::AliasAnalysis::getModRef` to deal with fir.call. This will allow implementing "array = array_function()" optimization in a future patch.	2024-11-26 11:17:33 +01:00
Valentin Clement (バレンタインクレメン)	eb5cda480d	[flang][cuda] cuf.allocate: Carry over stream to the runtime call (#117631 ) - Update the runtime entry points to accept a stream information - Update the conversion of `cuf.allocate` to pass correctly the stream information when present. Note that the stream is not currently used in the runtime. This will be done in a separate patch as a design/solution needs to be down together with the allocators.	2024-11-25 20:46:24 -08:00
Valentin Clement (バレンタインクレメン)	5802367ddb	[flang][cuda] Add support for allocate with source (#117388 ) Add support for allocate statement with CUDA device variable and a source.	2024-11-22 16:55:26 -08:00
Valentin Clement (バレンタインクレメン)	a76609dd72	[flang][cuda] Avoid intrinsics simplification in device context (#117026 )	2024-11-21 10:37:38 -08:00
Valentin Clement (バレンタインクレメン)	ecda14069f	[flang][cuda] Adapt ExternalNameConversion to work in gpu module (#117039 )	2024-11-20 15:30:05 -08:00
Valentin Clement (バレンタインクレメン)	01cd7ad2ba	[flang][cuda] Do not generate NVVM target attribute when creating the module (#116882 ) Leave it to the `NVVMAttachTargetPass` so we can set compute capability and features.	2024-11-19 16:55:34 -08:00
Valentin Clement (バレンタインクレメン)	4d7df40c08	[flang][cuda] Materialize constant src in memory (#116851 ) When the src of the data transfer is a constant, it needs to be materialized in memory to be able to perform a data transfer. ``` subroutine sub1() real, device :: a(10) integer :: I do i = 5, 10 a(i) = -4.0 end do end ```	2024-11-19 14:11:20 -08:00
Valentin Clement (バレンタインクレメン)	ca79e12648	[flang][cuda] Handle implicit global in cuf kernel and nested statement (#116846 ) Update the implicit global detection by looking for them in the CUF kernel and also update to a walk so nested `fir.address_of` in nested statement are also accounted for.	2024-11-19 12:38:18 -08:00
Valentin Clement (バレンタインクレメン)	de2e270ee6	[flang][cuda] Materialize box when src or dst are rebox (#116494 )	2024-11-18 09:22:12 -08:00
Abid Qadeer	030179c2cb	[flang][debug] Support ClassType. (#114809 ) This PR adds the handling of `ClassType`. It is treated as pointer to the underlying type. Note that `ClassType` when passed to the function have double indirection so it is represented as pointer to type (compared to other types which may have a single indirection). If `ClassType` wraps a pointer or allocatable then we take care to generate it as PTR -> type (and not PTR -> PTR -> type). This is how it looks like in the debugger. ``` subroutine test_proc (this) class(test_type), intent (inout) :: this allocate (this%b (3, 2)) call fill_array_2d (this%b) print *, this%a end ``` ``` (gdb) p this $6 = (PTR TO -> ( Type test_type )) 0x2052a0 (gdb) p this%a $7 = 0 (gdb) p this%b $8 = ((1, 2, 3) (4, 5, 6)) ```	2024-11-18 11:26:35 +00:00
Valentin Clement	42be165dde	Reland '[flang][cuda] Specialize entry point for scalar to desc data transfer'	2024-11-15 19:13:55 -08:00
Valentin Clement (バレンタインクレメン)	70b9440c88	Revert "[flang][cuda] Specialize entry point for scalar to desc data transfer" (#116458 ) Reverts llvm/llvm-project#116457	2024-11-15 17:44:48 -08:00
Valentin Clement (バレンタインクレメン)	43cb424a54	[flang][cuda] Specialize entry point for scalar to desc data transfer (#116457 ) The runtime Assign function is not meant to initialize an array from a scalar. For that we need to use DoAssignFromSource. Update the data transfer from scalar to descriptor to use a new entry point that use this function underneath.	2024-11-15 17:41:23 -08:00
Valentin Clement (バレンタインクレメン)	b1fa9d154b	[flang][cuda] Correctly embox logical constant (#116445 )	2024-11-15 15:29:41 -08:00
Valentin Clement (バレンタインクレメン)	012fad975e	[flang][cuda] Materialize the box in memory when dst is emboxed (#116320 ) Similar to #116289 but for the dst.	2024-11-15 14:31:36 -08:00
Valentin Clement (バレンタインクレメン)	e8469f1577	[flang][cuda] Add support for character type in cuf.alloc and cuf.data_transfer (#116277 ) Add support for character type in bytes computation	2024-11-15 14:31:21 -08:00
Valentin Clement (バレンタインクレメン)	98daf22638	[flang][cuda] Materialize the box in memory when src is emboxed (#116289 )	2024-11-14 18:33:14 -08:00
Valentin Clement (バレンタインクレメン)	02018cf793	[flang][cuda][NFC] Use mlir::emitError to get location (#116267 ) Use `mlir::emitError` so we can get location information on error.	2024-11-14 10:32:09 -08:00
Valentin Clement (バレンタインクレメン)	d133a3ee9d	[flang][cuda] Add conversion after CUFGetDeviceAddress to avoid issue when emboxing (#116145 )	2024-11-14 09:03:15 -08:00
Valentin Clement (バレンタインクレメン)	ec066d30e2	[flang][cuda] cuf.alloc in device context should be converted to fir.alloc (#116110 ) Update `inDeviceContext` to account for the gpu.func operation.	2024-11-13 14:57:42 -08:00
Valentin Clement (バレンタインクレメン)	e457861647	[flang][cuda] Support shape shift in data transfer op. (#115929 ) When an array is declared with a non default lower bound, the declare op `getShape` will return a `ShapeShiftOp`. This result is used in data transfer operation to compute the number of bytes to transfer. Update the op to support `ShapeShiftOp`.	2024-11-13 11:13:19 -08:00
Valentin Clement (バレンタインクレメン)	2583071fb4	[flang][cuda] Compute size of derived type arrays (#115914 )	2024-11-12 21:23:58 -08:00
Valentin Clement (バレンタインクレメン)	853d52b838	[flang][cuda] Support derived type in cuf.data_transfer conversion (#115557 ) Support derived type in `cuf.data_transfer` conversion by computing their size in bytes.	2024-11-12 10:05:53 -08:00
Valentin Clement (バレンタインクレメン)	d4eb430c9e	[flang][cuda] Support derived type in cuf.alloc (#115550 ) Number of bytes to allocate was not computed when using `cuf.alloc` with a derived type. Update the conversion to compute the number of bytes and emit an error when type is not supported.	2024-11-08 14:32:00 -08:00
Valentin Clement (バレンタインクレメン)	ef8d88ca1a	[flang][cuda] Support scalar to array data transfer (#115273 ) Do it via descriptor assignment until we have a more efficient way.	2024-11-07 09:27:10 -08:00
Valentin Clement (バレンタインクレメン)	db69d6939a	[flang][cuda] Support data transfer from descriptor to a pointer (#115023 ) Data transfer from a variable with a descriptor to a pointer. We create a descriptor for the pointer so we can use the flang runtime to perform the transfer. The Assign function handles all corner cases. We add a new entry points `CUFDataTransferDescDescNoRealloc` to avoid reallocation since the variable on the LHS is not an allocatable.	2024-11-05 11:59:08 -08:00
Abid Qadeer	a993dfcdbf	[flang][debug] Support assumed-rank arrays. (#114404 ) The assumed-rank array are represented by DIGenericSubrange in debug metadata. We have to provide 2 things. 1. Expression to get rank value at the runtime from descriptor. 2. Assuming the dimension number for which we want the array information has been put on the DWARF expression stack, expressions which will extract the lowerBound, count and stride information from the descriptor for the said dimension. With this patch in place, this is how I see an assumed_rank variable being evaluated by GDB. ``` function mean(x) result(y) integer, intent(in) :: x(..) ... end program main use mod implicit none integer :: x1,xvec(3),xmat(3,3),xtens(3,3,3) x1 = 5 xvec = 6 xmat = 7 xtens = 8 print *,mean(xvec), mean(xmat), mean(xtens), mean(x1) end program main (gdb) p x $1 = (6, 6, 6) (gdb) p x $2 = ((7, 7, 7) (7, 7, 7) (7, 7, 7)) (gdb) p x $3 = (((8, 8, 8) (8, 8, 8) (8, 8, 8)) ((8, 8, 8) (8, 8, 8) (8, 8, 8)) ((8, 8, 8) (8, 8, 8) (8, 8, 8))) (gdb) p x $4 = 5 ```	2024-11-05 18:49:29 +00:00
Valentin Clement (バレンタインクレメン)	652db7e4ff	[flang][cuda] Support data transfer from pointer to a descriptor (#114892 ) When source is a pointer to an array or a scalar, embox it and use the `CUFDataTransferDescDesc` or `CUFDataTransferGlobalDescDesc` entry points. The runtime is already able to deal with all the corner cases like non contiguous arrays and so on so we exploit this. Memset might still be used for simple case where we want to initialize to 0 for example. This will come in a follow up patch.	2024-11-05 08:56:19 -08:00
Valentin Clement (バレンタインクレメン)	9d09c6fd9c	[flang][cuda] Update device descriptor on data transfer (#114838 ) When the destination of the data transfer is a global we might need to sync the descriptor after the data transfer is done. This is the case when the data transfer is from host/device to device as reallocation might have happened and the descriptor on the device needs to take the new values written on the host. A new entry point is added `CUFDataTransferGlobalDescDesc` with the sync when needed.	2024-11-04 13:22:06 -08:00
Valentin Clement (バレンタインクレメン)	067ce5ca18	[flang][cuda] Use getOrCreateGPUModule in CUFDeviceGlobal pass (#114468 ) Make the pass functional if gpu module was not created yet.	2024-10-31 18:58:43 -07:00
Valentin Clement (バレンタインクレメン)	e4e9fea71e	[flang][cuda] Pass descriptor by reference for CUFMemsetDescriptor (#114338 )	2024-10-31 09:02:59 -07:00
Renaud Kauffmann	423f35410a	[flang][cuda] Adding support for registration of boxes (#114323 ) Needed to take into account that `fir::getTypeSizeAndAlignmentOrCrash` does not work with box types but requires the `fir::LLVMTypeConverter`	2024-10-31 08:39:08 -07:00
Renaud Kauffmann	bfe486fe76	Passing descriptors by reference to CUDA runtime calls (#114288 ) Passing a descriptor as a `const Descriptor &` or a `const Descriptor ` generates a FIR signature where the box is passed by value. This is an issue, as it requires a load of the box to be passed. But since, ultimately, all boxes are passed by reference a temporary is generated in LLVM and the reference to the temporary is passed. The boxes addresses are registered with the CUDA runtime but the temporaries are not, thus preventing the runtime to properly map a host side address to its device side counterpart. To address this issue, this PR changes the signatures to the transfer functions to pass a descriptor as a `Descriptor `, which will in turn generate a FIR signature with that takes a box reference as an argument.	2024-10-30 13:24:47 -07:00
Abid Qadeer	652988b658	[flang][debug] Support TupleType. (#113917 ) Handling is similar to RecordType with following differences: 1. No check for cyclic references 2. No extra processing for lower bounds of array members. 3. No line information as TupleType is a lowering artefact and does not really represent an entity in the code.	2024-10-30 09:52:56 +00:00
Valentin Clement (バレンタインクレメン)	0d94c7b5ce	[flang][cuda][NFC] Make pattern names homogenous (#114156 ) Dialect name is uppercase. Make all the patterns prefix homogenous.	2024-10-29 20:39:17 -07:00
Valentin Clement (バレンタインクレメン)	0fa2fb3ed0	[flang][cuda] Add conversion pattern for cuf.kernel_launch op (#114129 )	2024-10-29 17:00:41 -07:00
Renaud Kauffmann	b9978f8c77	[flang][cuda] Adding variable registration in constructor (#113976 ) 1) Adding variable registration in constructor 2) Applying feedback from PR https://github.com/llvm/llvm-project/pull/112989	2024-10-29 11:48:48 -07:00
Valentin Clement (バレンタインクレメン)	b05fec97d5	[flang][cuda] Convert gpu.launch_func to CUFLaunchClusterKernel when cluster dims are present (#113959 ) Kernel launch in CUF are converted to `gpu.launch_func`. When the kernel has `cluster_dims` specified these get carried over to the `gpu.launch_func` operation. This patch updates the special conversion of `gpu.launch_func` when cluster dims are present to the newly added entry point.	2024-10-29 10:02:08 -07:00
Abid Qadeer	8239ea3918	[flang][debug] Support IndexType. (#113921 )	2024-10-29 12:22:43 +00:00
Renaud Kauffmann	0eb5c9d2ef	[flang][cuda] Copying device globals in the gpu module (#113955 )	2024-10-28 15:34:27 -07:00
Yusuke MINATO	bd6ab32e6e	Revert "[flang] Integrate the option -flang-experimental-integer-overflow into -fno-wrapv" (#113901 ) Reverts llvm/llvm-project#110063 due to the performance regression on 503.bwaves_r in SPEC2017.	2024-10-28 14:19:20 +00:00
jeanPerier	64d7e45c40	Revert "[flang][debug] Support mlir::NoneType." (#113769 ) Reverts llvm/llvm-project#113550 It turns out this causes compiler crashes with assumed-type arrays and -g. See https://github.com/llvm/llvm-project/pull/113769 for a reproducer.	2024-10-26 21:38:54 +02:00
Renaud Kauffmann	3acf856b50	Adding CUFCommon.{h,cpp} for CUF utilities (#113740 )	2024-10-25 16:08:45 -07:00
Abid Qadeer	85af1926f7	[flang][debug] Support mlir::NoneType. (#113550 )	2024-10-25 11:43:25 +01:00
Yusuke MINATO	96bb375f5c	[flang] Integrate the option -flang-experimental-integer-overflow into -fno-wrapv (#110063 ) nsw is now added to do-variable increment when -fno-wrapv is enabled as GFortran seems to do. That means the option introduced by #91579 isn't necessary any more. Note that the feature of -flang-experimental-integer-overflow is enabled by default.	2024-10-25 15:20:23 +09:00

1 2 3 4 5 ...

414 Commits