clang-p2996

Author	SHA1	Message	Date
vdonaldson	6003be7ef1	[flang] IEEE_GET_UNDERFLOW_MODE, IEEE_SET_UNDERFLOW_MODE (#118551 ) Implement IEEE_GET_UNDERFLOW_MODE and IEEE_SET_UNDERFLOW_MODE. Update IEEE_SUPPORT_UNDERFLOW_CONTROL to enable support for indvidual REAL kinds.	2024-12-04 16:21:11 -05:00
Yusuke MINATO	e573c6b67e	[flang] Add nsw to DO loop parameters (#113854 ) nsw is added to DO loop parameters (initial parameters, terminal parameters, and incrementation parameters). This can help vectorization in some cases like #110609. See also the discussion in https://discourse.llvm.org/t/rfc-add-nsw-flags-to-arithmetic-integer-operations-using-the-option-fno-wrapv/77584/20.	2024-11-28 08:58:09 +09:00
Valentin Clement (バレンタインクレメン)	3433e4140d	[flang][cuda] Detect constant on the rhs of data transfer (#117806 ) When the rhs expression has some constants and a device symbol, an implicit data transfer needs to be generated for the device symbol and the computation with the constant is done on the host.	2024-11-26 17:04:00 -08:00
jeanPerier	bb8bf858e8	[flang] add internal_assoc flag to mark variable captured in internal procedure (#117161 ) This patch adds a flag to mark hlfir.declare of host variables that are captured in some internal procedure. It enables implementing a simple fir.call handling in fir::AliasAnalysis::getModRef leveraging Fortran language specifications and without a data flow analysis. This will allow implementing an optimization for "array = array_function()" where array storage is passed directly into the hidden result argument to "array_function" when it can be proven that arraY_function does not reference "array". Captured host variables are very tricky because they may be accessed indirectly in any calls if the internal procedure address was captured via some global procedure pointer. Without flagging them, there is no way around doing a complex inter procedural data flow analysis: - checking that the call is not made to an internal procedure is not enough because of the possibility of indirect calls made to internal procedures inside the callee. - checking that the current func.func has no internal procedure is not enough because this would be invalid with inlining when an procedure with internal procedures is inlined inside a procedure without internal procedure.	2024-11-26 09:21:13 +01:00
khaki3	ff7fca7fa8	[flang][cuda] Support memory cleanup at a return statement (#116304 ) We generate `cuf.free` and `func.return` twice if a return statement exists at the end of program. ```f90 program test integer, device :: a(10) return end ``` ``` % flang -x cuda test.cuf -mmlir --mlir-print-ir-after-all error: loc("/path/to/test.cuf":3:3): 'func.return' op must be the last operation in the parent block // -----// IR Dump After Fortran::lower::VerifierPass Failed () //----- // ``` Dumped IR: ```mlir "func.func"() <{function_type = () -> (), sym_name = "_QQmain"}> ({ ... "cuf.free"(%5#1) <{data_attr = #cuf.cuda<device>}> : (!fir.ref<!fir.array<10xi32>>) -> () "func.return"() : () -> () "cuf.free"(%5#1) <{data_attr = #cuf.cuda<device>}> : (!fir.ref<!fir.array<10xi32>>) -> () "func.return"() : () -> () } ... ``` The routine `genExitRoutine` in `Bridge.cpp` is guarded by `blockIsUnterminated()` to make sure that `func.return` is generated only at the end of a block. However, we redundantly run `bridge.fctCtx().finalizeAndKeep()` before `genExitRoutine` in this case, resulting in two pairs of `cuf.free` and `func.return`. This PR fixes `Bridge.cpp` by using `blockIsUnterminated()` to guard `finalizeAndKeep` as well.	2024-11-15 08:44:42 -08:00
Valentin Clement (バレンタインクレメン)	37143fe27e	[flang][cuda] Make launch configuration optional for cuf kernel (#115947 )	2024-11-12 16:49:44 -08:00
Kareem Ergawy	0698482506	[flang][MLIR] Hoist `do concurrent` nest bounds/steps outside the nest (#114020 ) If you have the following multi-range `do concurrent` loop: ```fortran do concurrent(i=1:n, j=1:bar(n*m, n/m)) a(i) = n end do ``` Currently, flang generates the following IR: ```mlir fir.do_loop %arg1 = %42 to %44 step %c1 unordered { ... %53:3 = hlfir.associate %49 {adapt.valuebyref} : (i32) -> (!fir.ref<i32>, !fir.ref<i32>, i1) %54:3 = hlfir.associate %52 {adapt.valuebyref} : (i32) -> (!fir.ref<i32>, !fir.ref<i32>, i1) %55 = fir.call @_QFPbar(%53#1, %54#1) fastmath<contract> : (!fir.ref<i32>, !fir.ref<i32>) -> i32 hlfir.end_associate %53#1, %53#2 : !fir.ref<i32>, i1 hlfir.end_associate %54#1, %54#2 : !fir.ref<i32>, i1 %56 = fir.convert %55 : (i32) -> index ... fir.do_loop %arg2 = %46 to %56 step %c1_4 unordered { ... } } ``` However, if `bar` is impure, then we have a direct violation of the standard: ``` C1143 A reference to an impure procedure shall not appear within a DO CONCURRENT construct. ``` Moreover, the standard describes the execution of `do concurrent` construct in multiple stages: ``` 11.1.7.4 Execution of a DO construct ... 11.1.7.4.2 DO CONCURRENT loop control The concurrent-limit and concurrent-step expressions in the concurrent-control-list are evaluated. ... 11.1.7.4.3 The execution cycle ... The block of a DO CONCURRENT construct is executed for every active combination of the index-name values. Each execution of the block is an iteration. The executions may occur in any order. ``` From the above 2 points, it seems to me that execution is divided in multiple consecutive stages: 11.1.7.4.2 is the stage where we evaluate all control expressions including the step and then 11.1.7.4.3 is the stage to execute the block of the concurrent loop itself using the combination of possible iteration values.	2024-10-31 09:19:18 +01:00
Yusuke MINATO	bd6ab32e6e	Revert "[flang] Integrate the option -flang-experimental-integer-overflow into -fno-wrapv" (#113901 ) Reverts llvm/llvm-project#110063 due to the performance regression on 503.bwaves_r in SPEC2017.	2024-10-28 14:19:20 +00:00
Yusuke MINATO	96bb375f5c	[flang] Integrate the option -flang-experimental-integer-overflow into -fno-wrapv (#110063 ) nsw is now added to do-variable increment when -fno-wrapv is enabled as GFortran seems to do. That means the option introduced by #91579 isn't necessary any more. Note that the feature of -flang-experimental-integer-overflow is enabled by default.	2024-10-25 15:20:23 +09:00
Tarun Prabhu	839344f025	[clang][flang][mlir] Reapply "Support -frecord-command-line option (#102975 )" The underlying issue was caused by a file included in two different places which resulted in duplicate definition errors when linking individual shared libraries. This was fixed in `c3201ddaea` [#109874].	2024-10-14 08:44:24 -06:00
jeanPerier	c4204c0b29	[flang] replace fir.complex usages with mlir complex (#110850 ) Core patch of https://discourse.llvm.org/t/rfc-flang-replace-usages-of-fir-complex-by-mlir-complex-type/82292. After that, the last step is to remove fir.complex from FIR types.	2024-10-03 17:10:57 +02:00
David Spickett	737c414e1d	Revert "[clang][flang][mlir] Support -frecord-command-line option (#102975 )" This reverts commit `b3533a156d`. It caused test failures in shared library builds: https://lab.llvm.org/buildbot/#/builders/80/builds/3854	2024-09-20 11:30:50 +00:00
Tarun Prabhu	b3533a156d	[clang][flang][mlir] Support -frecord-command-line option (#102975 ) Add support for the -frecord-command-line option that will produce the llvm.commandline metadata which will eventually be saved in the object file. This behavior is also supported in clang. Some refactoring of the code in flang to handle these command line options was carried out. The corresponding -grecord-command-line option which saves the command line in the debug information has not yet been enabled for flang.	2024-09-19 18:28:50 -06:00
Tom Eccles	5aaf384b16	[flang][NFC] use llvm.intr.stacksave/restore instead of opaque calls (#108562 ) The new LLVM stack save/restore intrinsic operations are more convenient than function calls because they do not add function declarations to the module and therefore do not block the parallelisation of passes. Furthermore they could be much more easily marked with memory effects than function calls if that ever proved useful. This builds on top of #107879. Resolves #108016	2024-09-16 12:33:37 +01:00
Mats Petersson	8e10a3f80e	[flang][OpenMP] don't privatise loop index marked shared (#108176 ) Mark the symbol with OmpShared, and then check that later in lowering to avoid making a local loop index. OpenMP 5.2 says: "Loop iteration variables of loops that are not associated with any OpenMP directive maybe listed in data-sharing attribute clauses on the surrounding teams, parallel or taskgenerating construct, and on enclosed constructs, subject to other restrictions." Tests updated to match the extra OmpShared attribute. Add regression test for lowering to hlfir. Closes #102961 --------- Co-authored-by: Tom Eccles <tom.eccles@arm.com>	2024-09-13 12:57:11 +01:00
David Truby	53b59022b0	[flang][OpenMP] Implement copyin for pointers and allocatables. (#107425 ) The copyin clause currently forbids pointer and allocatable variables, which are allowed by the OpenMP 1.1 and 3.0 specifications respectively.	2024-09-10 14:59:21 +01:00
Sergio Afonso	433ca3ebbe	[Flang][Lower] Introduce SymMapScope helper class (NFC) (#107866 ) This patch creates a simple RAII wrapper class for `SymMap` to make it easier to use and prevent a missing matching `popScope()` for a `pushScope()` call on simple use cases. Some push-pop pairs are replaced with instances of the new class by this patch.	2024-09-10 11:09:25 +01:00
Leandro Lupori	797f01198e	[flang][OpenMP] Make lastprivate work with reallocated variables (#106559 ) Fixes https://github.com/llvm/llvm-project/issues/100951	2024-09-05 14:55:01 -03:00
Valentin Clement (バレンタインクレメン)	c81b43074a	[flang][cuda] Fix lowering of cuf kernel with unstructured nested construct (#107149 ) Lowering was crashing when cuf kernels has an unstructured construct. Blocks created by PFT need to be re-created inside of the operation like it is done for OpenACC construct.	2024-09-04 08:43:13 -07:00
vdonaldson	8586d0330e	[flang] Don't generate empty else blocks (#106618 ) Code lowering always generates fir.if else blocks for source level if statements, whether needed or not. Change this to only generate else blocks that are needed.	2024-08-30 09:07:30 -04:00
Valentin Clement (バレンタインクレメン)	d4c519e7b2	[flang][cuda] Do inline allocation/deallocation in device code (#106628 ) ALLOCATE and DEALLOCATE statements can be inlined in device function. This patch updates the condition that determined to inline these actions in lowering. This avoid runtime calls in device function code and can speed up the execution. Also move `isCudaDeviceContext` from `Bridge.cpp` so it can be used elsewhere.	2024-08-29 22:37:20 -07:00
Valentin Clement (バレンタインクレメン)	0a41c8e7a0	[flang][cuda] Avoid generating cuf.data_transfer in OpenACC region (#106435 ) `cuf.data_transfer` will be converted to runtime calls to cuda runtime api and these are not supported in device code. assignment in OpenACC region will be handled by the OpenACC code gen so we avoid to generate data transfer on them.	2024-08-29 11:27:42 -07:00
Valentin Clement (バレンタインクレメン)	ccbee7116b	[flang][cuda] Use declare op results instead of memref (#106287 ) #106120 Simplify the data transfer when possible by using the reference and a shape. This bypass the declare op. In order to keep the declare op around, use the second results of the declare op which achieve the same.	2024-08-27 17:36:31 -07:00
Valentin Clement (バレンタインクレメン)	900cd62758	[flang][cuda] Simplify data transfer when possible (#106120 ) When possible, avoid using descriptors and use the reference and the shape for data_transfer.	2024-08-27 10:03:15 -07:00
Valentin Clement (バレンタインクレメン)	7af61d5cf4	[flang][cuda] Add shape to cuf.data_transfer operation (#104631 ) When doing data transfer with dynamic sized array, we are currently generating a data transfer between two descriptors. If the shape values can be provided, we can keep the data transfer between two references. This patch adds the shape operands to the operation. This will be exploited in lowering in a follow up patch.	2024-08-26 09:50:17 -07:00
Tarun Prabhu	90aac06c7f	[flang][mlir] Add llvm.ident metadata when compiling with flang This brings the behavior of flang in line with clang which also adds this metadata unconditionally. Co-authored-by: Tarun Prabhu <tarun.prabhu@gmail.com>	2024-08-12 11:56:19 -06:00
Valentin Clement (バレンタインクレメン)	0ee0eeb4bb	[flang] Enhance location information (#95862 ) Add inclusion location information by using FusedLocation with attribute. More context here: https://discourse.llvm.org/t/rfc-enhancing-location-information/79650	2024-07-23 09:49:17 -07:00
Valentin Clement (バレンタインクレメン)	3ad7108c3c	[flang][cuda] Avoid temporary when RHS is a logical constant (#99078 ) Enhance the detection of constant on the RHS for logical cases so we don't create a temporary.	2024-07-17 08:39:18 -07:00
Alexis Perry-Holby	f1d3fe7aae	Add basic -mtune support (#98517 ) Initial implementation for the -mtune flag in Flang. This PR is a clean version of PR #96688, which is a re-land of PR #95043	2024-07-16 16:48:24 +01:00
Valentin Clement (バレンタインクレメン)	9b6504e983	[flang][cuda] Make sure to issue freemem for the allocated temp (#98078 ) When implicit data transfer is created, make sure we generate the `freemem` op on the `allocmem` result value and not the declare op value.	2024-07-11 17:15:54 -07:00
Valentin Clement (バレンタインクレメン)	bd7b16217b	[flang][cuda] Add conversion for stream value in cuf kernel directive (#98082 ) The stream value is defined as an i32 value in the operation. Add a conversion so the declared integer can be different and an i32 value.	2024-07-09 10:13:00 -07:00
jeanPerier	66d5ca2a3d	Reland "[flang] add extra component information in fir.type_info" (#97404 ) Reland #96746 with the proper Support/CMakelist.txt change. fir.type does not contain all Fortran level information about components. For instance, component lower bounds and default initial value are lost. For correctness purpose, this does not matter because this information is "applied" in lowering (e.g., when addressing the components, the lower bounds are reflected in the hlfir.designate). However, this "loss" of information will prevent the generation of correct debug info for the type (needs to know about lower bounds). The initial value could help building some optimization pass to get rid of initialization runtime calls. This patch adds lower bound and initial value information into fir.type_info via a new fir.dt_component operation. This operation is generated only for component that needs it, which helps keeping the IR small for "boring" types. In general, adding Fortran level info in fir.type_info will allow delaying the generation of "type descriptors" gobals that are very verbose in FIR and make it hard to work with FIR dumps from applications with many derived types.	2024-07-02 15:19:49 +02:00
Leandro Lupori	29cdc8f9ca	[flang][OpenMP] Fix nested privatization of allocatable (#96968 ) In nested constructs where a given variable is privatized more than once, using the default clause, the innermost host association symbol will point to the previous host association symbol. Such symbol lacks the allocatable attribute and can't be used to generate the type of the symbol to be cloned. Use the ultimate symbol instead. Fixes #85594, #80398	2024-07-01 14:10:35 -03:00
jeanPerier	6a66b8224d	Revert "[flang] add extra component information in fir.type_info" (#96937 ) Reverts llvm/llvm-project#96746 Breaking shared library buillds: https://lab.llvm.org/buildbot/#/builders/89/builds/931	2024-06-27 19:22:48 +02:00
jeanPerier	1448ed2000	[flang] add extra component information in fir.type_info (#96746 ) fir.type does not contain all Fortran level information about components. For instance, component lower bounds and default initial value are lost. For correctness purpose, this does not matter because this information is "applied" in lowering (e.g., when addressing the components, the lower bounds are reflected in the hlfir.designate). However, this "loss" of information will prevent the generation of correct debug info for the type (needs to know about lower bounds). The initial value could help building some optimization pass to get rid of initialization runtime calls. This patch adds lower bound and initial value information into fir.type_info via a new fir.dt_component operation. This operation is generated only for component that needs it, which helps keeping the IR small for "boring" types. In general, adding Fortran level info in fir.type_info will allow delaying the generation of "type descriptors" gobals that are very verbose in FIR and make it hard to work with FIR dumps from applications with many derived types.	2024-06-27 18:59:03 +02:00
Tarun Prabhu	8dd9494056	Revert "[flang] Add basic -mtune support" (#96678 ) Reverts llvm/llvm-project#95043	2024-06-25 13:25:39 -06:00
Alexis Perry-Holby	a790279bf2	[flang] Add basic -mtune support (#95043 ) This PR adds -mtune as a valid flang flag and passes the information through to LLVM IR as an attribute on all functions. No specific architecture optimizations are added at this time.	2024-06-25 18:39:35 +01:00
Leandro Lupori	952bdaaf79	[flang][OpenMP] Fix copyprivate allocatable/pointer lowering (#95975 ) The lowering of copyprivate clauses with allocatable or pointer variables was incorrect. This happened because the values passed to copyVar() are always wrapped in SymbolBox::Intrinsic, which resulted in allocatable/pointer variables being handled as regular ones. This is fixed by providing to copyVar() the attributes of the variables being copied, to make it possible to detect and handle allocatable/pointer variables correctly. Fixes #95801	2024-06-25 09:25:41 -03:00
Valentin Clement (バレンタインクレメン)	8e8dccdecd	[flang][cuda] Do not consider PINNED as device attribute (#95988 ) PINNED is a CUDA data attribute meant for the host variables. Do not consider it when computing the number of device variables in assignment for the cuda data transfer.	2024-06-19 13:35:02 -07:00
David Truby	506b4cdae0	[flang] Change vector always errors to warnings (#95908 )	2024-06-18 14:25:56 +01:00
Alexander Shaposhnikov	77d8cfb3c5	[Flang] Switch to common::visit more call sites (#90018 ) Switch to common::visit more call sites. Test plan: ninja check-all	2024-06-17 12:59:04 -07:00
khaki3	85f4593e85	[flang] Add a REDUCE clause to each nested loop (#95555 ) For DO CONCURRENT REDUCE, every nested loop should have a REDUCE clause so that we can lower reduction without analysis.	2024-06-17 09:21:30 -07:00
David Truby	c6b6e18c4d	[flang] Implement !DIR$ VECTOR ALWAYS (#93830 ) This patch implements support for the VECTOR ALWAYS directive, which forces vectorization to occurr when possible regardless of a decision by the cost model. This is done by adding an attribute to the branch into the loop in LLVM to indicate that the loop should always be vectorized. This patch only implements this directive on plan structured do loops without labels. Support for unstructured loops and array expressions is planned for future patches.	2024-06-14 14:10:41 +01:00
Iman Hosseini	7665d3d90d	[flang] Add reductions for CUF Kernels: Lowering (#95184 ) * Add reductionOperands and reductionAttrs to cuf's KernelOp. * Parsing is already working and the tree has the info: here I make the Bridge emit the updated KernelOp with reduction information added. * Check \|reductionAttrs\| = \|reductionOperands\| in verifier * Add a test @clementval @vzakhari --------- Co-authored-by: Iman Hosseini <imanh@nvidia.com> Co-authored-by: Valentin Clement (バレンタインクレメン) <clementval@gmail.com>	2024-06-12 19:18:41 +01:00
vdonaldson	87374a8cff	[flang] Add support for lowering directives at the CONTAINS level (#95123 ) There is currently support for lowering directives that appear outside of a module or procedure, or inside the body of a module or procedure. Extend this to support directives at the CONTAINS level of a module or procedure, such as directives 3, 5, 7 9, and 10 in: !dir$ some directive 1 module m !dir$ some directive 2 contains !dir$ some directive 3 subroutine p !dir$ some directive 4 contains !dir$ some directive 5 subroutine s1 !dir$ some directive 6 end subroutine s1 !dir$ some directive 7 subroutine s2 !dir$ some directive 8 end subroutine s2 !dir$ some directive 9 end subroutine p !dir$ some directive 10 end module m !dir$ some directive 11 This is done by looking for CONTAINS statements at the module or procedure level, while ignoring CONTAINS statements at the derived type level.	2024-06-12 09:35:14 -04:00
Valentin Clement	e7d569a0fa	[flang] Fix copy creation in #94718	2024-06-10 08:50:08 -07:00
khaki3	f11e08fb26	[flang] Generate fir.do_loop reduce from DO CONCURRENT REDUCE clause (#94718 ) Derived from #92480. This PR updates the lowering process of DO CONCURRENT to support F'2023 REDUCE clause. The structure `IncrementLoopInfo` is extended to have both reduction operations and symbols in `reduceSymList`. The function `getConcurrentControl` constructs `reduceSymList` for the innermost loop. Finally, `genFIRIncrementLoopBegin` builds `fir.do_loop` with reduction operands.	2024-06-10 08:41:05 -07:00
Peter Klausler	c7593344f4	[flang] Better error message for RANK(NULL()) (#93577 ) We currently complain that the argument may not be a procedure, which is confusing. Distinguish the NULL() case from other error cases (which are indeed procedures). And clean up the utility predicates used for these tests -- the current IsProcedure() is really just a test for a procedure designator.	2024-06-03 12:58:39 -07:00
jeanPerier	d1aa9bac3c	[flang] lower select rank (#93967 ) Lower select rank according to [assumed-rank lowering design doc](https://github.com/llvm/llvm-project/blob/main/flang/docs/AssumedRank.md). The construct is lowered using fir.box_rank and fir.select_case operation and, for the non pointer/allocatable case, a fir.is_assumed_size + conditional branch before the select_case to deal with the assumed-size case. The way the CFG logic is generated, apart from the extra conditional branch for assumed-size, is similar to what is done for SELECT CASE lowering, hence the sharing of the construct level visitor. For the CFG parts. The main difference is that we need to keep track of the selector to cook it and map it inside the cases (hence the new members of the ConstructContext). The only TODOs left are to deal with the RANK() case for polymorphic entities and PDTs. I will do the polymorphic case in a distinct patch, this patch has enough content. Fortran::evaluate::IsSimplyContiguous change is needed to avoid generating copy-in/copy-out runtime calls when passing the RANK() associating entity to some implicit interface.	2024-06-03 17:20:07 +02:00
Kareem Ergawy	6af4118f15	Reapply #91116 with fix (#93160 ) This PR contains 2 commits: 1. A commit to reapply changes introduced #91116 (was reverted earlier due to test suite failures) 2. A commit containing a possible solution for the issue causing the test suite failures. In particular, it introduces a simple symbol visitor class to keep track of the current active OMP construct and marking this active construct as the scope defining the symbol being visisted.	2024-05-27 14:26:52 +02:00

1 2 3 4 5 ...

347 Commits