clang-p2996

Author	SHA1	Message	Date
Tom Eccles	303249c449	[flang][StackArrays] track pointers through fir.convert (#121919 ) This does add a little computational complexity because now every freemem operation has to be tested for every allocation. This could be improved with some more memoisation but I think it is easier to read this way. Let me know if you would prefer me to change this to pre-compute the normalised addresses each freemem operation is using. Weirdly, this change resulted in a verifier failure for the fir.declare in the previous test case. Maybe it was previously removed as dead code and now it isn't. Anyway I fixed that too.	2025-01-08 10:05:21 +00:00
Abid Qadeer	f7420a9dff	[flang][debug] Fix issue with argument numbering. (#120726 ) Currently fir::isDummyArgument is being used to check if a DeclareOp represents a dummy argument. The argument passed to the function is declOp.getMemref(). This bypasses the code in isDummyArgument that checks for dummy_scope because the `Value` returned by the getMemref() may not have DeclareOp as its defining op. This bypassing mean that sometime a variable will be marked as argument when it should not. This happened in this case where same arg was being used for 2 different result variables with use of `entry` in the function. The solution is to check directly if the declOp has a dummy_scope. If yes, we know this is dummy argument. We can now check if the memref points to the BlockArgument and use its number. This will still miss arguments where memref does not directly point to a BlockArgument but that is missed currently too. Note that we can still evaluate those variable in debugger. It is just that they are not marked as arguments. Fixes #116525.	2025-01-03 19:41:48 +00:00
Valentin Clement (バレンタインクレメン)	7531672712	[flang][cuda][NFC] Remove unused variable (#121533 ) Failed buildbot after https://github.com/llvm/llvm-project/pull/121524	2025-01-02 17:37:44 -08:00
Valentin Clement (バレンタインクレメン)	6dcd2b035d	[flang][cuda] Convert cuf.sync_descriptor to runtime call (#121524 ) Convert the op to a new entry point in the runtime `CUFSyncGlobalDescriptor`	2025-01-02 17:02:59 -08:00
Valentin Clement (バレンタインクレメン)	4b17a8b10e	[flang][cuda] Add operation to sync global descriptor (#121520 ) Introduce cuf.sync_descriptor to be used to sync device global descriptor after pointer association. Also move CUFCommon so it can be used in FIRBuilder lib as well.	2025-01-02 17:02:45 -08:00
Abid Qadeer	328ff042e3	[flang][NFC] Replace dyn_cast_or_null with dyn_cast_if_present. (#120785 )	2025-01-02 10:30:08 +00:00
Abid Qadeer	460e7d5f30	[flang][debug] Correct pointer size. (#120781 ) We were passing size in bytes for the sizeInBits field in DIDerivedTypeAttr with DW_TAG_pointer_type. Although this field is un-used in this case but better to be accurate.	2025-01-02 10:26:27 +00:00
Slava Zakharin	711419e302	[flang] Enable loop-versioning for slices. (#120344 ) Loops resulting from array expressions like array(:,i) may be versioned for the unit stride of the innermost dimension, when the initial array is an assumed-shape array (which are contiguous in many Fortran programs). This speeds up facerec for about 12% due to further vectorization of the innermost loop produced for the total SUM reduction.	2024-12-23 07:53:10 -08:00
Kazu Hirata	392651a7ec	[flang] Migrate away from PointerUnion::{is,get} (NFC) (#120880 ) Note that PointerUnion::{is,get} have been soft deprecated in PointerUnion.h: // FIXME: Replace the uses of is(), get() and dyn_cast() with // isa<T>, cast<T> and the llvm::dyn_cast<T> I'm not touching PointerUnion::dyn_cast for now because it's a bit complicated; we could blindly migrate it to dyn_cast_if_present, but we should probably use dyn_cast when the operand is known to be non-null.	2024-12-22 13:30:16 -08:00
Valentin Clement (バレンタインクレメン)	415cfaf339	[flang][cuda][NFC] Fix type in CUFFreeDescriptor (#120799 )	2024-12-20 14:43:12 -08:00
Valentin Clement (バレンタインクレメン)	e650ac1654	[flang][cuda][NFC] Fix typo in CUFAllocDescriptor (#120797 ) Missing `r` in the function name.	2024-12-20 13:57:47 -08:00
Jacques Pienaar	09dfc5713d	[mlir] Enable decoupling two kinds of greedy behavior. (#104649 ) The greedy rewriter is used in many different flows and it has a lot of convenience (work list management, debugging actions, tracing, etc). But it combines two kinds of greedy behavior 1) how ops are matched, 2) folding wherever it can. These are independent forms of greedy and leads to inefficiency. E.g., cases where one need to create different phases in lowering and is required to applying patterns in specific order split across different passes. Using the driver one ends up needlessly retrying folding/having multiple rounds of folding attempts, where one final run would have sufficed. Of course folks can locally avoid this behavior by just building their own, but this is also a common requested feature that folks keep on working around locally in suboptimal ways. For downstream users, there should be no behavioral change. Updating from the deprecated should just be a find and replace (e.g., `find ./ -type f -exec sed -i 's\|applyPatternsAndFoldGreedily\|applyPatternsGreedily\|g' {} \;` variety) as the API arguments hasn't changed between the two.	2024-12-20 08:15:48 -08:00
Valentin Clement (バレンタインクレメン)	e93d226664	[flang][cuda] Update CompilerGeneratedNames pass to work on gpu module (#120660 ) - Update `CompilerGeneratedNames` so it can perform renaming in gpu.module - Update Codegen so it look in the correct module for the type descriptor.	2024-12-19 19:07:00 -08:00
Valentin Clement (バレンタインクレメン)	37978c466b	[flang][cuda] Remove unused variable	2024-12-12 15:04:16 -08:00
Valentin Clement (バレンタインクレメン)	ea04148c27	[flang][cuda] Extend implicit global handling to any type descriptor (#119769 ) Relax the check to also handle other type descriptor globals.	2024-12-12 14:52:49 -08:00
Valentin Clement (バレンタインクレメン)	956d0dd624	[flang][cuda] Support builtin global in device global pass (#119626 )	2024-12-11 17:09:56 -08:00
khaki3	609899f443	[flang][cuda] Avoid stack corruption when setting kernel launch parameters (#119469 ) In order to get the pointer to a structure member, `getelementptr` typically requires two indices: one to indicate the structure itself, and another to specify the member's position. We are missing the former in `GPULaunchKernelConversion`, so generated code may cause stack corruption. This PR corrects the indices of a structure used as a kernel launch temp.	2024-12-10 16:08:22 -08:00
Valentin Clement (バレンタインクレメン)	850c932f05	[flang][cuda] Walk through cuf kernel for implicit globals (#119455 ) Globals used in cuf kernel need to be flagged as well.	2024-12-10 14:01:53 -08:00
khaki3	e9866d5d14	[flang][cuda] Fix GPULaunchKernelConversion to generate correct kernel launch parameters (#119431 ) For the call to _FortranACUFLaunchKernel, we store the pointer to a member of a temporary structure in a parameter array. However, when we obtain an element pointer from the parameter array, its address is calculated based on the type of the structure. This PR properly treats the parameter array as an array of pointers. Example: ```mlir %30 = llvm.load %29 : !llvm.ptr -> i32 %31 = llvm.mlir.constant(1 : i32) : i32 %32 = llvm.alloca %31 x !llvm.struct<(i64, i64, i32, ptr)> : (i32) -> !llvm.ptr %33 = llvm.mlir.constant(4 : i32) : i32 %34 = llvm.alloca %33 x !llvm.ptr : (i32) -> !llvm.ptr %35 = llvm.mlir.constant(0 : i32) : i32 %36 = llvm.getelementptr %32[%35] : (!llvm.ptr, i32) -> !llvm.ptr, !llvm.struct<(i64, i64, i32, ptr)> llvm.store %8, %36 : i64, !llvm.ptr %37 = llvm.getelementptr %34[%35] : (!llvm.ptr, i32) -> !llvm.ptr, !llvm.struct<(i64, i64, i32, ptr)> llvm.store %36, %37 : !llvm.ptr, !llvm.ptr ... llvm.call @_FortranACUFLaunchKernel(%47, %8, %8, %8, %2, %8, %8, %7, %34, %48) : (!llvm.ptr, i64, i64, i64, i64, i64, i64, i32, !llvm.ptr, !llvm.ptr) -> () ``` In this example, `%37 = llvm.getelementptr %34[%35] : (!llvm.ptr, i32) -> !llvm.ptr, !llvm.struct<(i64, i64, i32, ptr)>` will be `%37 = llvm.getelementptr %34[%35] : (!llvm.ptr, i32) -> !llvm.ptr, !llvm.ptr`.	2024-12-10 11:32:32 -08:00
Yusuke MINATO	a88677edc0	Reland "[flang] Integrate the option -flang-experimental-integer-overflow into -fno-wrapv" (#118933 ) This relands #110063. The performance issue on 503.bwaves_r is found not to be related to the patch, and is resolved by `fbd89bcc` when LTO is enabled.	2024-12-10 16:26:53 +09:00
Valentin Clement (バレンタインクレメン)	a1d71c3693	[flang][cuda] Additional update to ExternalNameConversion (#119276 )	2024-12-09 17:39:51 -08:00
Valentin Clement (バレンタインクレメン)	75623bfe1b	[flang][cuda] Handle gpu.return in AbstractResult pass (#119035 )	2024-12-09 17:39:16 -08:00
Valentin Clement (バレンタインクレメン)	1d4b5c161f	[flang][cuda] Change how abstract result pass is scheduled on func.func and gpu.func (#119034 ) Use `pm.nest` to schedule the pass on nested `func.func` and `gpu.func` in the `gpu.module`. AbstractResult pass is not meant to run on the whole gpu.module at once.	2024-12-09 13:31:27 -08:00
Renaud Kauffmann	27e458c8cb	[flang][cuda] Distinguish constant fir.global from globals with a #cuf.cuda<constant> attribute (#118912 ) 1. In `CufOpConversion` `isDeviceGlobal` was renamed `isRegisteredGlobal` and moved to the common file. `isRegisteredGlobal` excludes constant `fir.global` operation from registration. This is to avoid calls to `_FortranACUFGetDeviceAddress` on globals which do not have any symbols in the runtime. This was done for `_FortranACUFRegisterVariable` in #118582, but also needs to be done here after #118591 2. `CufDeviceGlobal` no longer adds the `#cuf.cuda<constant>` attribute to the constant global. As discussed in #118582 a module variable with the #cuf.cuda<constant> attribute is not a compile time constant. Yet, the compile time constant also needs to be copied into the GPU module. The candidates for copy to the GPU modules are - the globals needing regsitrations regardless of their uses in device code (they can be referred to in host code as well) - the compile time constant when used in device code 3. The registration of "constant" module device variables ( #cuf.cuda<constant>) can be restored in `CufAddConstructor`	2024-12-05 18:36:48 -08:00
Valentin Clement (バレンタインクレメン)	7efd6139f2	[flang][cuda] Get device address in fir.declare (#118591 ) Add pattern that update fir.declare memref when it comes from a device global and is not a descriptor. In that case, we recover the device address that needs to be used in ops like `fir.array_coor` and so on.	2024-12-04 13:36:58 -08:00
Renaud Kauffmann	ed2db3be61	[flang][cuda] Do not register global constants (#118582 ) Global constants have no symbols in library files. They are replaced with literal constants during lowering before kernels are moved into a GPU module. Do not register them because they will result in unresolved symbols.	2024-12-04 09:37:08 -08:00
Valentin Clement (バレンタインクレメン)	5522d2462e	[flang][cuda] Allow AbstractResult to run in gpu.module (#118529 ) in CUDA Fortran, device function are converted to `gpu.func` inside the `gpu.module` operation. Update the AbstractResult pass to be able to run on `func.func` and `gpu.func` operations inside the `gpu.module`.	2024-12-03 14:04:49 -08:00
s-watanabe314	f3cf24fcc4	[flang] Apply nocapture attribute to dummy arguments (#116182 ) Apply llvm.nocapture attribute to dummy arguments that do not have the target, asynchronous, volatile, or pointer attributes in a procedure that is not a bind(c). This was discussed in https://discourse.llvm.org/t/applying-the-nocapture-attribute-to-reference-passed-arguments-in-fortran-subroutines/81401	2024-11-28 15:39:26 +09:00
Valentin Clement (バレンタインクレメン)	b5825963f0	[flang][cuda] Materialize box when needed (#117810 ) Materialize the box when the src comes from a embox or rebox operation. This was done in the case of transfer to a descriptor but not when transferring from a descriptor.	2024-11-26 17:36:25 -08:00
jeanPerier	cf602b95d1	[flang] handle fir.call in AliasAnalysis::getModRef (#117164 ) fir.call side effects are hard to describe in a useful way using `MemoryEffectOpInterface` because it is impossible to list which memory location a user procedure read/write without doing a data flow analysis of its body (even PURE procedures may read from any module variable, Fortran SIMPLE procedure from F2023 will allow that, but they are far from common at that point). Fortran language specifications allow the compiler to deduce that a procedure call cannot access a variable in many cases This patch leverages this to extend `fir::AliasAnalysis::getModRef` to deal with fir.call. This will allow implementing "array = array_function()" optimization in a future patch.	2024-11-26 11:17:33 +01:00
Valentin Clement (バレンタインクレメン)	eb5cda480d	[flang][cuda] cuf.allocate: Carry over stream to the runtime call (#117631 ) - Update the runtime entry points to accept a stream information - Update the conversion of `cuf.allocate` to pass correctly the stream information when present. Note that the stream is not currently used in the runtime. This will be done in a separate patch as a design/solution needs to be down together with the allocators.	2024-11-25 20:46:24 -08:00
Valentin Clement (バレンタインクレメン)	5802367ddb	[flang][cuda] Add support for allocate with source (#117388 ) Add support for allocate statement with CUDA device variable and a source.	2024-11-22 16:55:26 -08:00
Valentin Clement (バレンタインクレメン)	a76609dd72	[flang][cuda] Avoid intrinsics simplification in device context (#117026 )	2024-11-21 10:37:38 -08:00
Valentin Clement (バレンタインクレメン)	ecda14069f	[flang][cuda] Adapt ExternalNameConversion to work in gpu module (#117039 )	2024-11-20 15:30:05 -08:00
Valentin Clement (バレンタインクレメン)	01cd7ad2ba	[flang][cuda] Do not generate NVVM target attribute when creating the module (#116882 ) Leave it to the `NVVMAttachTargetPass` so we can set compute capability and features.	2024-11-19 16:55:34 -08:00
Valentin Clement (バレンタインクレメン)	4d7df40c08	[flang][cuda] Materialize constant src in memory (#116851 ) When the src of the data transfer is a constant, it needs to be materialized in memory to be able to perform a data transfer. ``` subroutine sub1() real, device :: a(10) integer :: I do i = 5, 10 a(i) = -4.0 end do end ```	2024-11-19 14:11:20 -08:00
Valentin Clement (バレンタインクレメン)	ca79e12648	[flang][cuda] Handle implicit global in cuf kernel and nested statement (#116846 ) Update the implicit global detection by looking for them in the CUF kernel and also update to a walk so nested `fir.address_of` in nested statement are also accounted for.	2024-11-19 12:38:18 -08:00
Valentin Clement (バレンタインクレメン)	de2e270ee6	[flang][cuda] Materialize box when src or dst are rebox (#116494 )	2024-11-18 09:22:12 -08:00
Abid Qadeer	030179c2cb	[flang][debug] Support ClassType. (#114809 ) This PR adds the handling of `ClassType`. It is treated as pointer to the underlying type. Note that `ClassType` when passed to the function have double indirection so it is represented as pointer to type (compared to other types which may have a single indirection). If `ClassType` wraps a pointer or allocatable then we take care to generate it as PTR -> type (and not PTR -> PTR -> type). This is how it looks like in the debugger. ``` subroutine test_proc (this) class(test_type), intent (inout) :: this allocate (this%b (3, 2)) call fill_array_2d (this%b) print *, this%a end ``` ``` (gdb) p this $6 = (PTR TO -> ( Type test_type )) 0x2052a0 (gdb) p this%a $7 = 0 (gdb) p this%b $8 = ((1, 2, 3) (4, 5, 6)) ```	2024-11-18 11:26:35 +00:00
Valentin Clement	42be165dde	Reland '[flang][cuda] Specialize entry point for scalar to desc data transfer'	2024-11-15 19:13:55 -08:00
Valentin Clement (バレンタインクレメン)	70b9440c88	Revert "[flang][cuda] Specialize entry point for scalar to desc data transfer" (#116458 ) Reverts llvm/llvm-project#116457	2024-11-15 17:44:48 -08:00
Valentin Clement (バレンタインクレメン)	43cb424a54	[flang][cuda] Specialize entry point for scalar to desc data transfer (#116457 ) The runtime Assign function is not meant to initialize an array from a scalar. For that we need to use DoAssignFromSource. Update the data transfer from scalar to descriptor to use a new entry point that use this function underneath.	2024-11-15 17:41:23 -08:00
Valentin Clement (バレンタインクレメン)	b1fa9d154b	[flang][cuda] Correctly embox logical constant (#116445 )	2024-11-15 15:29:41 -08:00
Valentin Clement (バレンタインクレメン)	012fad975e	[flang][cuda] Materialize the box in memory when dst is emboxed (#116320 ) Similar to #116289 but for the dst.	2024-11-15 14:31:36 -08:00
Valentin Clement (バレンタインクレメン)	e8469f1577	[flang][cuda] Add support for character type in cuf.alloc and cuf.data_transfer (#116277 ) Add support for character type in bytes computation	2024-11-15 14:31:21 -08:00
Valentin Clement (バレンタインクレメン)	98daf22638	[flang][cuda] Materialize the box in memory when src is emboxed (#116289 )	2024-11-14 18:33:14 -08:00
Valentin Clement (バレンタインクレメン)	02018cf793	[flang][cuda][NFC] Use mlir::emitError to get location (#116267 ) Use `mlir::emitError` so we can get location information on error.	2024-11-14 10:32:09 -08:00
Valentin Clement (バレンタインクレメン)	d133a3ee9d	[flang][cuda] Add conversion after CUFGetDeviceAddress to avoid issue when emboxing (#116145 )	2024-11-14 09:03:15 -08:00
Valentin Clement (バレンタインクレメン)	ec066d30e2	[flang][cuda] cuf.alloc in device context should be converted to fir.alloc (#116110 ) Update `inDeviceContext` to account for the gpu.func operation.	2024-11-13 14:57:42 -08:00
Valentin Clement (バレンタインクレメン)	e457861647	[flang][cuda] Support shape shift in data transfer op. (#115929 ) When an array is declared with a non default lower bound, the declare op `getShape` will return a `ShapeShiftOp`. This result is used in data transfer operation to compute the number of bytes to transfer. Update the op to support `ShapeShiftOp`.	2024-11-13 11:13:19 -08:00

1 2 3 4 5 ...

438 Commits