clang-p2996

Author	SHA1	Message	Date
Jean-Didier PAILLEUX	e811cb00e5	[flang] Implement !DIR$ UNROLL [N] (#123331 ) This patch implements support for the UNROLL directive to control how many times a loop should be unrolled. It must be placed immediately before a `DO LOOP` and applies only to the loop that follows. N is an integer that specifying the unrolling factor. This is done by adding an attribute to the branch into the loop in LLVM to indicate that the loop should unrolled. The code pushed to support the directive `VECTOR ALWAYS` has been modified to take account of the fact that several directives can be used before a `DO LOOP`.	2025-01-29 09:44:09 +01:00
Valentin Clement (バレンタインクレメン)	654b76321a	[flang][cuda] Allow to set the stack limit size (#124859 ) This patch adds a call to the CUFInit function just after `ProgramStart` when CUDA Fortran is enabled to initialize the CUDA context. This allows us to set up some context information like the stack limit that can be defined by an environment variable `ACC_OFFLOAD_STACKSIZE=<value>`.	2025-01-28 20:57:33 -08:00
Kaviya Rajendiran	daa18205c6	[Flang][OpenMP] Fix copyin allocatable lowering to MLIR (#122097 ) Fixes https://github.com/llvm/llvm-project/issues/113191 Issue: [flang][OpenMP] Runtime segfault when an allocatable variable is used with copyin Rootcause: The value of the threadprivate variable is not being copied from the primary thread to the other threads within a parallel region. As a result it tries to access a null pointer inside a parallel region which causes segfault. Fix: When allocatables used with copyin clause need to ensure that, on entry to any parallel region each thread’s copy of a variable will acquire the allocation status of the primary thread, before copying the value of a threadprivate variable of the primary thread to the threadprivate variable of each other member of the team.	2025-01-23 11:14:00 +05:30
Kareem Ergawy	a0406ce823	[flang][OpenMP] Add `hostIsSource` paramemter to `copyHostAssociateVar` (#123162 ) This fixes a bug when the same variable is used in `firstprivate` and `lastprivate` clauses on the same construct. The issue boils down to the fact that `copyHostAssociateVar` was deciding the direction of the copy assignment (i.e. the `lhs` and `rhs`) based on whether the `copyAssignIP` parameter is set. This is not the best way to do it since it is not related to whether we doing a copy from host to localized copy or the other way around. When we set the insertion for `firstprivate` in delayed privatization, this resulted in switching the direction of the copy assignment. Instead, this PR adds a new paramter to explicitely tell the function the direction of the assignment. This is a follow up PR for https://github.com/llvm/llvm-project/pull/122471, only the latest commit is relevant.	2025-01-16 19:10:12 +01:00
jeanPerier	d82d53b2e3	[flang][openmp] initialize allocatable components of firstprivate copies (#121808 ) Descriptors of allocatable components of firstprivate derived type copies need to be set-up. Otherwise the program later die when manipulating them inside OpenMP region.	2025-01-07 10:04:27 +01:00
Valentin Clement (バレンタインクレメン)	9165848c82	[flang][cuda] Sync global descriptor when nullifying pointer (#121595 )	2025-01-03 14:37:14 -08:00
Matthias Springer	c870632ef6	[flang] Fix some memory leaks (#121050 ) This commit fixes some but not all memory leaks in Flang. There are still 91 tests that fail with ASAN. - Use `mlir::OwningOpRef` instead of `std::unique_ptr`. The latter does not free allocations of nested blocks. - Pass `ModuleOp` as value instead of reference. - Add few missing deallocations in test cases and other places.	2024-12-25 09:42:03 +01:00
Leandro Lupori	1fcb6a9754	[flang][OpenMP] Initialize allocatable members of derived types (#120295 ) Allocatable members of privatized derived types must be allocated, with the same bounds as the original object, whenever that member is also allocated in it, but Flang was not performing such initialization. The `Initialize` runtime function can't perform this task unless its signature is changed to receive an additional parameter, the original object, that is needed to find out which allocatable members, with their bounds, must also be allocated in the clone. As `Initialize` is used not only for privatization, sometimes this other object won't even exist, so this new parameter would need to be optional. Because of this, it seemed better to add a new runtime function: `InitializeClone`. To avoid unnecessary calls, lowering inserts a call to it only for privatized items that are derived types with allocatable members. Fixes https://github.com/llvm/llvm-project/issues/114888 Fixes https://github.com/llvm/llvm-project/issues/114889	2024-12-19 17:26:50 -03:00
Peter Klausler	fc97d2e68b	[flang] Add UNSIGNED (#113504 ) Implement the UNSIGNED extension type and operations under control of a language feature flag (-funsigned). This is nearly identical to the UNSIGNED feature that has been available in Sun Fortran for years, and now implemented in GNU Fortran for gfortran 15, and proposed for ISO standardization in J3/24-116.txt. See the new documentation for details; but in short, this is C's unsigned type, with guaranteed modular arithmetic for +, -, and *, and the related transformational intrinsic functions SUM & al.	2024-12-18 07:02:37 -08:00
Kareem Ergawy	e532241b02	Re-apply (#117867 ): [flang][OpenMP] Implicitly map allocatable record fields (#120374 ) This re-applies #117867 with a small fix that hopefully prevents build bot failures. The fix is avoiding `dyn_cast` for the result of `getOperation()`. Instead we can assign the result to `mlir::ModuleOp` directly since the type of the operation is known statically (`OpT` in `OperationPass`).	2024-12-18 09:19:45 +01:00
Kareem Ergawy	dc936f3c19	Revert "[flang][OpenMP] Implicitly map allocatable record fields (#117867 )" (#120360 )	2024-12-18 06:52:24 +01:00
Kareem Ergawy	db09014a07	[flang][OpenMP] Implicitly map allocatable record fields (#117867 ) This is a starting PR to implicitly map allocatable record fields. This PR contains the following changes: 1. Re-purposes some of the utils used in `Lower/OpenMP.cpp` so that these utils work on the `mlir::Value` level rather than the `semantics::Symbol` level. This takes one step towards to enabling MLIR passes to more easily do some lowering themselves (e.g. creating `omp.map.bounds` ops for implicitely caputured data like this PR does). 2. Adds support for implicitely capturing and mapping allocatable fields in record types. There is quite some distant to still cover to have full support for this. I added a number of todos to guide further development. Co-authored-by: Andrew Gozillon <andrew.gozillon@amd.com> Co-authored-by: Andrew Gozillon <andrew.gozillon@amd.com>	2024-12-18 05:37:58 +01:00
Slava Zakharin	9d33874936	[flang] Support -f[no-]realloc-lhs. (#120165 ) -frealloc-lhs is the default. If -fno-realloc-lhs is specified, then an allocatable on the left side of an intrinsic assignment is not implicitly (re)allocated to conform with the right hand side. Fortran runtime will issue an error if there is a mismatch in shape/type/allocation-status.	2024-12-17 09:06:05 -08:00
Valentin Clement (バレンタインクレメン)	0469bb91aa	[flang][cuda] Fix lowering when step is a variable (#119421 ) Add missing conversion.	2024-12-10 09:48:15 -08:00
Yusuke MINATO	a88677edc0	Reland "[flang] Integrate the option -flang-experimental-integer-overflow into -fno-wrapv" (#118933 ) This relands #110063. The performance issue on 503.bwaves_r is found not to be related to the patch, and is resolved by `fbd89bcc` when LTO is enabled.	2024-12-10 16:26:53 +09:00
Michael Kruse	c91ba04328	[Flang][NFC] Split runtime headers in preparation for cross-compilation. (#112188 ) Split some headers into headers for public and private declarations in preparation for #110217. Moving the runtime-private headers in runtime-private include directory will occur in #110298. * Do not use `sizeof(Descriptor)` in the compiler. The size of the descriptor is target-dependent while `sizeof(Descriptor)` is the size of the Descriptor for the host platform which might be too small when cross-compiling to a different platform. Another problem is that the emitted assembly ((cross-)compiling to the same target) is not identical between Flang's running on different systems. Moving the declaration of `class Descriptor` out of the included header will also reduce the amount of #included sources. * Do not use `sizeof(ArrayConstructorVector)` and `alignof(ArrayConstructorVector)` in the compiler. Same reason as with `Descriptor`. * Compute the descriptor's extra flags without instantiating a Descriptor. `Fortran::runtime::Descriptor` is defined in the runtime source, but not the compiler source. * Move `InquiryKeywordHashDecode` into runtime-private header. The function is defined in the runtime sources and trying to call it in the compiler would lead to a link-error. * Move allocator-kind magic numbers into common header. They are the only declarations out of `allocator-registry.h` in the compiler as well. This does not make Flang cross-compile ready yet, the main goal is to avoid transitive header dependencies from Flang to clang-rt. There are more assumptions that host platform is the same as the target platform.	2024-12-06 15:29:00 +01:00
jeanPerier	ff78cd5f3d	[flang] fix private pointers and default initialized variables (#118494 ) Both OpenMP privatization and DO CONCURRENT LOCAL lowering was incorrect for pointers and derived type with default initialization. For pointers, the descriptor was not established with the rank/type code/element size, leading to undefined behavior if any inquiry was made to it prior to a pointer assignment (and if/when using the runtime for pointer assignments, the descriptor must have been established). For derived type with default initialization, the copies were not default initialized.	2024-12-05 14:09:48 +01:00
vdonaldson	6003be7ef1	[flang] IEEE_GET_UNDERFLOW_MODE, IEEE_SET_UNDERFLOW_MODE (#118551 ) Implement IEEE_GET_UNDERFLOW_MODE and IEEE_SET_UNDERFLOW_MODE. Update IEEE_SUPPORT_UNDERFLOW_CONTROL to enable support for indvidual REAL kinds.	2024-12-04 16:21:11 -05:00
Yusuke MINATO	e573c6b67e	[flang] Add nsw to DO loop parameters (#113854 ) nsw is added to DO loop parameters (initial parameters, terminal parameters, and incrementation parameters). This can help vectorization in some cases like #110609. See also the discussion in https://discourse.llvm.org/t/rfc-add-nsw-flags-to-arithmetic-integer-operations-using-the-option-fno-wrapv/77584/20.	2024-11-28 08:58:09 +09:00
Valentin Clement (バレンタインクレメン)	3433e4140d	[flang][cuda] Detect constant on the rhs of data transfer (#117806 ) When the rhs expression has some constants and a device symbol, an implicit data transfer needs to be generated for the device symbol and the computation with the constant is done on the host.	2024-11-26 17:04:00 -08:00
jeanPerier	bb8bf858e8	[flang] add internal_assoc flag to mark variable captured in internal procedure (#117161 ) This patch adds a flag to mark hlfir.declare of host variables that are captured in some internal procedure. It enables implementing a simple fir.call handling in fir::AliasAnalysis::getModRef leveraging Fortran language specifications and without a data flow analysis. This will allow implementing an optimization for "array = array_function()" where array storage is passed directly into the hidden result argument to "array_function" when it can be proven that arraY_function does not reference "array". Captured host variables are very tricky because they may be accessed indirectly in any calls if the internal procedure address was captured via some global procedure pointer. Without flagging them, there is no way around doing a complex inter procedural data flow analysis: - checking that the call is not made to an internal procedure is not enough because of the possibility of indirect calls made to internal procedures inside the callee. - checking that the current func.func has no internal procedure is not enough because this would be invalid with inlining when an procedure with internal procedures is inlined inside a procedure without internal procedure.	2024-11-26 09:21:13 +01:00
khaki3	ff7fca7fa8	[flang][cuda] Support memory cleanup at a return statement (#116304 ) We generate `cuf.free` and `func.return` twice if a return statement exists at the end of program. ```f90 program test integer, device :: a(10) return end ``` ``` % flang -x cuda test.cuf -mmlir --mlir-print-ir-after-all error: loc("/path/to/test.cuf":3:3): 'func.return' op must be the last operation in the parent block // -----// IR Dump After Fortran::lower::VerifierPass Failed () //----- // ``` Dumped IR: ```mlir "func.func"() <{function_type = () -> (), sym_name = "_QQmain"}> ({ ... "cuf.free"(%5#1) <{data_attr = #cuf.cuda<device>}> : (!fir.ref<!fir.array<10xi32>>) -> () "func.return"() : () -> () "cuf.free"(%5#1) <{data_attr = #cuf.cuda<device>}> : (!fir.ref<!fir.array<10xi32>>) -> () "func.return"() : () -> () } ... ``` The routine `genExitRoutine` in `Bridge.cpp` is guarded by `blockIsUnterminated()` to make sure that `func.return` is generated only at the end of a block. However, we redundantly run `bridge.fctCtx().finalizeAndKeep()` before `genExitRoutine` in this case, resulting in two pairs of `cuf.free` and `func.return`. This PR fixes `Bridge.cpp` by using `blockIsUnterminated()` to guard `finalizeAndKeep` as well.	2024-11-15 08:44:42 -08:00
Valentin Clement (バレンタインクレメン)	37143fe27e	[flang][cuda] Make launch configuration optional for cuf kernel (#115947 )	2024-11-12 16:49:44 -08:00
Kareem Ergawy	0698482506	[flang][MLIR] Hoist `do concurrent` nest bounds/steps outside the nest (#114020 ) If you have the following multi-range `do concurrent` loop: ```fortran do concurrent(i=1:n, j=1:bar(n*m, n/m)) a(i) = n end do ``` Currently, flang generates the following IR: ```mlir fir.do_loop %arg1 = %42 to %44 step %c1 unordered { ... %53:3 = hlfir.associate %49 {adapt.valuebyref} : (i32) -> (!fir.ref<i32>, !fir.ref<i32>, i1) %54:3 = hlfir.associate %52 {adapt.valuebyref} : (i32) -> (!fir.ref<i32>, !fir.ref<i32>, i1) %55 = fir.call @_QFPbar(%53#1, %54#1) fastmath<contract> : (!fir.ref<i32>, !fir.ref<i32>) -> i32 hlfir.end_associate %53#1, %53#2 : !fir.ref<i32>, i1 hlfir.end_associate %54#1, %54#2 : !fir.ref<i32>, i1 %56 = fir.convert %55 : (i32) -> index ... fir.do_loop %arg2 = %46 to %56 step %c1_4 unordered { ... } } ``` However, if `bar` is impure, then we have a direct violation of the standard: ``` C1143 A reference to an impure procedure shall not appear within a DO CONCURRENT construct. ``` Moreover, the standard describes the execution of `do concurrent` construct in multiple stages: ``` 11.1.7.4 Execution of a DO construct ... 11.1.7.4.2 DO CONCURRENT loop control The concurrent-limit and concurrent-step expressions in the concurrent-control-list are evaluated. ... 11.1.7.4.3 The execution cycle ... The block of a DO CONCURRENT construct is executed for every active combination of the index-name values. Each execution of the block is an iteration. The executions may occur in any order. ``` From the above 2 points, it seems to me that execution is divided in multiple consecutive stages: 11.1.7.4.2 is the stage where we evaluate all control expressions including the step and then 11.1.7.4.3 is the stage to execute the block of the concurrent loop itself using the combination of possible iteration values.	2024-10-31 09:19:18 +01:00
Yusuke MINATO	bd6ab32e6e	Revert "[flang] Integrate the option -flang-experimental-integer-overflow into -fno-wrapv" (#113901 ) Reverts llvm/llvm-project#110063 due to the performance regression on 503.bwaves_r in SPEC2017.	2024-10-28 14:19:20 +00:00
Yusuke MINATO	96bb375f5c	[flang] Integrate the option -flang-experimental-integer-overflow into -fno-wrapv (#110063 ) nsw is now added to do-variable increment when -fno-wrapv is enabled as GFortran seems to do. That means the option introduced by #91579 isn't necessary any more. Note that the feature of -flang-experimental-integer-overflow is enabled by default.	2024-10-25 15:20:23 +09:00
Tarun Prabhu	839344f025	[clang][flang][mlir] Reapply "Support -frecord-command-line option (#102975 )" The underlying issue was caused by a file included in two different places which resulted in duplicate definition errors when linking individual shared libraries. This was fixed in `c3201ddaea` [#109874].	2024-10-14 08:44:24 -06:00
jeanPerier	c4204c0b29	[flang] replace fir.complex usages with mlir complex (#110850 ) Core patch of https://discourse.llvm.org/t/rfc-flang-replace-usages-of-fir-complex-by-mlir-complex-type/82292. After that, the last step is to remove fir.complex from FIR types.	2024-10-03 17:10:57 +02:00
David Spickett	737c414e1d	Revert "[clang][flang][mlir] Support -frecord-command-line option (#102975 )" This reverts commit `b3533a156d`. It caused test failures in shared library builds: https://lab.llvm.org/buildbot/#/builders/80/builds/3854	2024-09-20 11:30:50 +00:00
Tarun Prabhu	b3533a156d	[clang][flang][mlir] Support -frecord-command-line option (#102975 ) Add support for the -frecord-command-line option that will produce the llvm.commandline metadata which will eventually be saved in the object file. This behavior is also supported in clang. Some refactoring of the code in flang to handle these command line options was carried out. The corresponding -grecord-command-line option which saves the command line in the debug information has not yet been enabled for flang.	2024-09-19 18:28:50 -06:00
Tom Eccles	5aaf384b16	[flang][NFC] use llvm.intr.stacksave/restore instead of opaque calls (#108562 ) The new LLVM stack save/restore intrinsic operations are more convenient than function calls because they do not add function declarations to the module and therefore do not block the parallelisation of passes. Furthermore they could be much more easily marked with memory effects than function calls if that ever proved useful. This builds on top of #107879. Resolves #108016	2024-09-16 12:33:37 +01:00
Mats Petersson	8e10a3f80e	[flang][OpenMP] don't privatise loop index marked shared (#108176 ) Mark the symbol with OmpShared, and then check that later in lowering to avoid making a local loop index. OpenMP 5.2 says: "Loop iteration variables of loops that are not associated with any OpenMP directive maybe listed in data-sharing attribute clauses on the surrounding teams, parallel or taskgenerating construct, and on enclosed constructs, subject to other restrictions." Tests updated to match the extra OmpShared attribute. Add regression test for lowering to hlfir. Closes #102961 --------- Co-authored-by: Tom Eccles <tom.eccles@arm.com>	2024-09-13 12:57:11 +01:00
David Truby	53b59022b0	[flang][OpenMP] Implement copyin for pointers and allocatables. (#107425 ) The copyin clause currently forbids pointer and allocatable variables, which are allowed by the OpenMP 1.1 and 3.0 specifications respectively.	2024-09-10 14:59:21 +01:00
Sergio Afonso	433ca3ebbe	[Flang][Lower] Introduce SymMapScope helper class (NFC) (#107866 ) This patch creates a simple RAII wrapper class for `SymMap` to make it easier to use and prevent a missing matching `popScope()` for a `pushScope()` call on simple use cases. Some push-pop pairs are replaced with instances of the new class by this patch.	2024-09-10 11:09:25 +01:00
Leandro Lupori	797f01198e	[flang][OpenMP] Make lastprivate work with reallocated variables (#106559 ) Fixes https://github.com/llvm/llvm-project/issues/100951	2024-09-05 14:55:01 -03:00
Valentin Clement (バレンタインクレメン)	c81b43074a	[flang][cuda] Fix lowering of cuf kernel with unstructured nested construct (#107149 ) Lowering was crashing when cuf kernels has an unstructured construct. Blocks created by PFT need to be re-created inside of the operation like it is done for OpenACC construct.	2024-09-04 08:43:13 -07:00
vdonaldson	8586d0330e	[flang] Don't generate empty else blocks (#106618 ) Code lowering always generates fir.if else blocks for source level if statements, whether needed or not. Change this to only generate else blocks that are needed.	2024-08-30 09:07:30 -04:00
Valentin Clement (バレンタインクレメン)	d4c519e7b2	[flang][cuda] Do inline allocation/deallocation in device code (#106628 ) ALLOCATE and DEALLOCATE statements can be inlined in device function. This patch updates the condition that determined to inline these actions in lowering. This avoid runtime calls in device function code and can speed up the execution. Also move `isCudaDeviceContext` from `Bridge.cpp` so it can be used elsewhere.	2024-08-29 22:37:20 -07:00
Valentin Clement (バレンタインクレメン)	0a41c8e7a0	[flang][cuda] Avoid generating cuf.data_transfer in OpenACC region (#106435 ) `cuf.data_transfer` will be converted to runtime calls to cuda runtime api and these are not supported in device code. assignment in OpenACC region will be handled by the OpenACC code gen so we avoid to generate data transfer on them.	2024-08-29 11:27:42 -07:00
Valentin Clement (バレンタインクレメン)	ccbee7116b	[flang][cuda] Use declare op results instead of memref (#106287 ) #106120 Simplify the data transfer when possible by using the reference and a shape. This bypass the declare op. In order to keep the declare op around, use the second results of the declare op which achieve the same.	2024-08-27 17:36:31 -07:00
Valentin Clement (バレンタインクレメン)	900cd62758	[flang][cuda] Simplify data transfer when possible (#106120 ) When possible, avoid using descriptors and use the reference and the shape for data_transfer.	2024-08-27 10:03:15 -07:00
Valentin Clement (バレンタインクレメン)	7af61d5cf4	[flang][cuda] Add shape to cuf.data_transfer operation (#104631 ) When doing data transfer with dynamic sized array, we are currently generating a data transfer between two descriptors. If the shape values can be provided, we can keep the data transfer between two references. This patch adds the shape operands to the operation. This will be exploited in lowering in a follow up patch.	2024-08-26 09:50:17 -07:00
Tarun Prabhu	90aac06c7f	[flang][mlir] Add llvm.ident metadata when compiling with flang This brings the behavior of flang in line with clang which also adds this metadata unconditionally. Co-authored-by: Tarun Prabhu <tarun.prabhu@gmail.com>	2024-08-12 11:56:19 -06:00
Valentin Clement (バレンタインクレメン)	0ee0eeb4bb	[flang] Enhance location information (#95862 ) Add inclusion location information by using FusedLocation with attribute. More context here: https://discourse.llvm.org/t/rfc-enhancing-location-information/79650	2024-07-23 09:49:17 -07:00
Valentin Clement (バレンタインクレメン)	3ad7108c3c	[flang][cuda] Avoid temporary when RHS is a logical constant (#99078 ) Enhance the detection of constant on the RHS for logical cases so we don't create a temporary.	2024-07-17 08:39:18 -07:00
Alexis Perry-Holby	f1d3fe7aae	Add basic -mtune support (#98517 ) Initial implementation for the -mtune flag in Flang. This PR is a clean version of PR #96688, which is a re-land of PR #95043	2024-07-16 16:48:24 +01:00
Valentin Clement (バレンタインクレメン)	9b6504e983	[flang][cuda] Make sure to issue freemem for the allocated temp (#98078 ) When implicit data transfer is created, make sure we generate the `freemem` op on the `allocmem` result value and not the declare op value.	2024-07-11 17:15:54 -07:00
Valentin Clement (バレンタインクレメン)	bd7b16217b	[flang][cuda] Add conversion for stream value in cuf kernel directive (#98082 ) The stream value is defined as an i32 value in the operation. Add a conversion so the declared integer can be different and an i32 value.	2024-07-09 10:13:00 -07:00
jeanPerier	66d5ca2a3d	Reland "[flang] add extra component information in fir.type_info" (#97404 ) Reland #96746 with the proper Support/CMakelist.txt change. fir.type does not contain all Fortran level information about components. For instance, component lower bounds and default initial value are lost. For correctness purpose, this does not matter because this information is "applied" in lowering (e.g., when addressing the components, the lower bounds are reflected in the hlfir.designate). However, this "loss" of information will prevent the generation of correct debug info for the type (needs to know about lower bounds). The initial value could help building some optimization pass to get rid of initialization runtime calls. This patch adds lower bound and initial value information into fir.type_info via a new fir.dt_component operation. This operation is generated only for component that needs it, which helps keeping the IR small for "boring" types. In general, adding Fortran level info in fir.type_info will allow delaying the generation of "type descriptors" gobals that are very verbose in FIR and make it hard to work with FIR dumps from applications with many derived types.	2024-07-02 15:19:49 +02:00
Leandro Lupori	29cdc8f9ca	[flang][OpenMP] Fix nested privatization of allocatable (#96968 ) In nested constructs where a given variable is privatized more than once, using the default clause, the innermost host association symbol will point to the previous host association symbol. Such symbol lacks the allocatable attribute and can't be used to generate the type of the symbol to be cloned. Use the ultimate symbol instead. Fixes #85594, #80398	2024-07-01 14:10:35 -03:00

1 2 3 4 5 ...

364 Commits