clang-p2996

Author	SHA1	Message	Date
Valentin Clement (バレンタインクレメン)	ab1ee912be	[flang][cuda] Remove the need of special compile definition for CUFInit (#124965 ) This patch addresses post commit review comments from #124859. The extra compile definition is not necessary and goes against the effort to separate the runtimes from the flang compiler itself. The function declaration for `CUFInit` can be accessed anyway since the header are always present. The insertion of the call is only based on the language feature options from the folding context. A program compiled with cuda enabled but no cufruntime would just fail at link time as expected.	2025-01-29 13:12:04 -08:00
Krzysztof Parzyszek	15ab7be2e0	[flang][OpenMP] Parse WHEN, OTHERWISE, MATCH clauses plus METADIRECTIVE (#121817 ) Parse METADIRECTIVE as a standalone executable directive at the moment. This will allow testing the parser code. There is no lowering, not even clause conversion yet. There is also no verification of the allowed values for trait sets, trait properties.	2025-01-29 15:07:20 -06:00
Slava Zakharin	bac9575274	[flang] Reset all extents to zero for empty hlfir.elemental loops. (#124867 ) An hlfir.elemental with a shape `(0, HUGE)` still runs `HUGE` number of iterations when expanded into a loop nest. HLFIR transformational operations inlined as hlfir.elemental may execute slower comparing to Fortran runtime implementation. This patch adds an option for BufferizeHLFIR pass to reset all upper bounds in the elemental loop nests to zero, if the result is an empty array. A separate patch will enable this option in the driver after I do more performance testing. The option is off by default now.	2025-01-29 12:03:05 -08:00
Krzysztof Parzyszek	c8593239a3	[flang][OpenMP] Make parsing of trait properties more context-sensitive (#122900 ) A trait poperty can be one of serveral alternatives (name, expression, etc.), and each property in a list was parsed as if it could be any of these alternatives independently from other properties. This made the parsing vulnerable to certain ambiguities in the trait grammar (provided in the OpenMP spec). At the same time the OpenMP spec gives the expected types of properties for almost every trait: all properties listed for a given trait are usually of the same type, e.g. names, clauses, etc. Incorporate these restrictions into the parser, and additionally use property extensions as the fallback if the parsing of the expected property type failed. This is intended to allow the parser to succeed, and instead let the semantic-checking code emit a more user-friendly message.	2025-01-29 11:54:52 -06:00
Jean-Didier PAILLEUX	ecc71de53f	[flang] Implement IERRNO intrinsic (#124281 ) Add the implementation of the IERRNO intrinsic to get the last system error number, as given by the C errno variable. This intrinsic is also used in RAMSES (https://github.com/ramses-organisation/ramses/).	2025-01-29 12:01:07 +01:00
Jean-Didier PAILLEUX	5a34e6fdce	[flang] Implement CHDIR intrinsic (#124280 ) This intrinsic is a gnu extension (https://gcc.gnu.org/onlinedocs/gfortran/CHDIR.html) and is used in FLEUR (https://github.com/JuDFTteam/FLEUR).	2025-01-29 09:44:58 +01:00
Jean-Didier PAILLEUX	e811cb00e5	[flang] Implement !DIR$ UNROLL [N] (#123331 ) This patch implements support for the UNROLL directive to control how many times a loop should be unrolled. It must be placed immediately before a `DO LOOP` and applies only to the loop that follows. N is an integer that specifying the unrolling factor. This is done by adding an attribute to the branch into the loop in LLVM to indicate that the loop should unrolled. The code pushed to support the directive `VECTOR ALWAYS` has been modified to take account of the fact that several directives can be used before a `DO LOOP`.	2025-01-29 09:44:09 +01:00
Valentin Clement (バレンタインクレメン)	654b76321a	[flang][cuda] Allow to set the stack limit size (#124859 ) This patch adds a call to the CUFInit function just after `ProgramStart` when CUDA Fortran is enabled to initialize the CUDA context. This allows us to set up some context information like the stack limit that can be defined by an environment variable `ACC_OFFLOAD_STACKSIZE=<value>`.	2025-01-28 20:57:33 -08:00
vdonaldson	9d8dc45d17	[flang] IEEE underflow control for Arm (#124807 ) Update IEEE_SUPPORT_UNDERFLOW_CONTROL, IEEE_GET_UNDERFLOW_MODE, and IEEE_SET_UNDERFLOW_MODE code for Arm.	2025-01-28 15:41:04 -05:00
Renaud Kauffmann	56a0a7f6d1	[flang][cuda] Adding support for more atomic calls (#124671 ) The PR follows the earlier https://github.com/llvm/llvm-project/pull/123840 PR for atomic operation support in CUF	2025-01-28 08:36:43 -08:00
Muhammad Omair Javaid	c0861e9cbb	Revert "[flang] IEEE underflow control for Arm (#124617 )" This reverts commit `c4c76eabb8`. This breaks LLVM build on Windows: https://lab.llvm.org/buildbot/#/builders/161/builds/4322	2025-01-28 17:32:45 +05:00
Slava Zakharin	c489108912	[flang] Added hlfir.reshape definition/lowering/codegen. (#124226 ) Lower Fortran RESHAPE intrinsic into hlfir.reshape, and then lower hlfir.reshape into a runtime call. A later patch will add hlfir.reshape inlining as hlfir.elemental.	2025-01-27 18:14:02 -08:00
vdonaldson	c4c76eabb8	[flang] IEEE underflow control for Arm (#124617 ) Update IEEE_SUPPORT_UNDERFLOW_CONTROL, IEEE_GET_UNDERFLOW_MODE, and IEEE_SET_UNDERFLOW_MODE code for Arm.	2025-01-27 15:11:24 -05:00
Peter Klausler	d732c86c92	[flang] Don't take corank from actual intrinsic argument (#124029 ) When constructing the characteristics of a particular reference to an intrinsic procedure that was passed a non-coindexed reference to local coarray data as an actual argument, don't add the corank of the actual argument to those characteristics. Also clean up the TypeAndShape characteristics class a little; the Attr::Coarray is redundant since the corank() accessor can be used to the same effect.	2025-01-27 11:57:01 -08:00
Peter Klausler	e252c40210	[flang] Fix spurious error due to bad expression shape calculation (#124323 ) GetShape() needed to be called with a FoldingContext in order to properly construct an extent expression for the shape of an array constructor whose elements (nested in an implied DO loop) were not scalars. Fixes https://github.com/llvm/llvm-project/issues/124191.	2025-01-27 08:59:43 -08:00
Peter Klausler	2625510ef8	[flang] Refine EVENT_TYPE/LOCK_TYPE usage checks (#123244 ) The event variable in an EVENT POST/WAIT statement can be a coarray reference, and need not be an entire coarray. Variables and potential subobject components with EVENT_TYPE/LOCK_TYPE must be coarrays, unless they are potential subobjects nested within coarrays or pointers.	2025-01-27 08:45:11 -08:00
Peter Klausler	038b42ba5b	[flang] Safer hermetic module file reading (#121002 ) When a hermetic module file is read, use a new scope to hold its dependent modules so that they don't conflict with any modules in the global scope.	2025-01-27 08:43:41 -08:00
vdonaldson	3322ba493a	Revert "[flang] IEEE underflow control for Arm" (#124570 ) Reverts llvm/llvm-project#124170	2025-01-27 10:40:23 -05:00
vdonaldson	3684ec4259	[flang] IEEE underflow control for Arm (#124170 ) Update IEEE_SUPPORT_UNDERFLOW_CONTROL, IEEE_GET_UNDERFLOW_MODE, and IEEE_SET_UNDERFLOW_MODE code for Arm.	2025-01-27 09:18:47 -05:00
Mats Petersson	8035d38daa	[Flang][OpenMP]Add parsing support for DISPATCH construct (#121982 ) This allows the Flang parser to accept the !$OMP DISPATCH and related clauses. Lowering is currently not implemented. Tests for unparse and parse-tree dump is provided, and one for checking that the lowering ends in a "not yet implemented" --------- Co-authored-by: Kiran Chandramohan <kiran.chandramohan@arm.com>	2025-01-26 09:44:04 +00:00
Valentin Clement (バレンタインクレメン)	48657bf29b	[flang][cuda] Handle launch of cooperative kernel (#124362 ) Add `CUFLaunchCooperativeKernel` entry points and lower gpu.launch_func with grid_global attribute to this entry point.	2025-01-24 15:52:05 -08:00
Slava Zakharin	3da7de34a2	[flang][runtime] Disable optimization for traceback related functions. (#124172 ) The backtrace may at least print the backtrace name in the call stack, but this does not happen with the release builds of the runtime. Surprisingly, specifying "no-omit-frame-pointer" did not work with GCC, so I decided to fall back to -O0 for these functions.	2025-01-24 08:49:35 -08:00
Jianjian Guan	990837f91d	[mlir][arith][tensor] Disable index type for bitcast (#121455 ) Fixes #121397.	2025-01-24 16:53:04 +08:00
Valentin Clement (バレンタインクレメン)	67a8857989	[flang][cuda] Handle pointer allocation with double descriptors (#124183 )	2025-01-23 15:54:22 -08:00
Valentin Clement (バレンタインクレメン)	8c138bee6e	[flang][cuda] Handle pointer allocation with source (#124070 )	2025-01-23 09:24:06 -08:00
Renaud Kauffmann	652ff20140	[flang][cuda] Adding atomicadd as a cudadevice intrinsic and converting it LLVM dialect (#123840 ) With these changes, CUF atomic operations are handled as cudadevice intrinsics and are converted straight to the LLVM dialect with the `llvm.atomicrw` operation. I am only submitting changes for `atomicadd` to gather feedback. If we are to proceed with these changes I will add support for all other applicable atomic operations following this pattern.	2025-01-22 19:06:28 -08:00
Valentin Clement (バレンタインクレメン)	6e498bc2cd	[flang][cuda] Handle simple device pointer allocation (#123996 )	2025-01-22 15:59:32 -08:00
jeanPerier	662133a278	[flang][OpenMP][OpenACC] remove libEvaluate dependency in passes (#123784 ) Move OpenACC/OpenMP helpers from Lower/DirectivesCommon.h that are also used in OpenACC and OpenMP mlir passes into a new Optimizer/Builder/DirectivesCommon.h so that parser and evaluate headers are not included in Optimizer libraries (this both introduce compile-time and link-time pointless overheads). This should fix https://github.com/llvm/llvm-project/issues/123377	2025-01-21 20:32:42 +01:00
Kiran Chandramohan	ce32625966	Reland "[Flang][Driver] Add a flag to control zero initialization" (#123606 ) Reverts llvm/llvm-project#123330	2025-01-21 07:57:44 +00:00
Valentin Clement (バレンタインクレメン)	2523d3b102	[flang][cuda] Perform scalar assignment of c_devptr inlined (#123407 ) Because `c_devptr` has a `c_ptr` field, any assignment were done via the Assign runtime function. This leads to stack overflow on the device and taking too much memory. As we know the c_devptr can be directly copied on assignment, make it a special case.	2025-01-17 14:34:47 -08:00
Slava Zakharin	71ff486bee	Reland "[flang] Inline hlfir.dot_product. (#123143 )" (#123385 ) This reverts commit `afc43a7b62`. +Fixed declaration of hlfir::genExtentsVector(). Some good results for induct2, where dot_product is applied to a vector of unknow size and a known 3-element vector: the inlining ends up generating a 3-iteration loop, which is then fully unrolled. With late FIR simplification it is not happening even when the simplified intrinsics implementation is inlined by LLVM (because the loop bounds are not known). This change just follows the current approach to expose the loops for later worksharing application.	2025-01-17 12:09:44 -08:00
Kiran Chandramohan	8a229f595a	Revert "Revert "Revert "[Flang][Driver] Add a flag to control zero initializa…" (#123330 ) Reverts llvm/llvm-project#123097 Reverting due to buildbot failure https://lab.llvm.org/buildbot/#/builders/89/builds/14577.	2025-01-17 12:27:58 +00:00
Kiran Chandramohan	8c63648117	Revert "Revert "[Flang][Driver] Add a flag to control zero initializa… (#123097 ) …tion of global v…" (#123067)" This reverts commit `44ba43aa2b`. Adds the flag to bbc as well.	2025-01-17 12:14:20 +00:00
Philip Reames	afc43a7b62	Revert "[flang] Inline hlfir.dot_product. (#123143 )" This reverts commit `9a6433f0ff`. ninja check-flang on x86 host fails to compile.	2025-01-16 17:38:40 -08:00
Slava Zakharin	9a6433f0ff	[flang] Inline hlfir.dot_product. (#123143 ) Some good results for induct2, where dot_product is applied to a vector of unknow size and a known 3-element vector: the inlining ends up generating a 3-iteration loop, which is then fully unrolled. With late FIR simplification it is not happening even when the simplified intrinsics implementation is inlined by LLVM (because the loop bounds are not known). This change just follows the current approach to expose the loops for later worksharing application.	2025-01-16 12:52:59 -08:00
Valentin Clement (バレンタインクレメン)	12ba74e181	[flang] Do not produce result for void runtime call (#123155 ) Runtime function call to a void function are producing a ssa value because the FunctionType result is set to NoneType with is later translated to a empty struct. This is not an issue when going to LLVM IR but it breaks when lowering a gpu module to PTX. This patch update the RTModel to correctly set the FunctionType result type to nothing. This is one runtime call before this patch at the LLVM IR dialect step. ``` %45 = llvm.call @_FortranAAssign(%arg0, %1, %44, %4) : (!llvm.ptr, !llvm.ptr, !llvm.ptr, i32) -> !llvm.struct<()> ``` After the patch the call would be correctly formed ``` llvm.call @_FortranAAssign(%arg0, %1, %44, %4) : (!llvm.ptr, !llvm.ptr, !llvm.ptr, i32) -> () ``` Without the patch it would lead to error like: ``` ptxas /tmp/mlir-cuda_device_mod-nvptx64-nvidia-cuda-sm_60-e804b6.ptx, line 10; error : Output parameter cannot be an incomplete array. ptxas /tmp/mlir-cuda_device_mod-nvptx64-nvidia-cuda-sm_60-e804b6.ptx, line 125; error : Call has wrong number of parameters ``` The change is pretty much mechanical.	2025-01-16 12:34:38 -08:00
Kareem Ergawy	a0406ce823	[flang][OpenMP] Add `hostIsSource` paramemter to `copyHostAssociateVar` (#123162 ) This fixes a bug when the same variable is used in `firstprivate` and `lastprivate` clauses on the same construct. The issue boils down to the fact that `copyHostAssociateVar` was deciding the direction of the copy assignment (i.e. the `lhs` and `rhs`) based on whether the `copyAssignIP` parameter is set. This is not the best way to do it since it is not related to whether we doing a copy from host to localized copy or the other way around. When we set the insertion for `firstprivate` in delayed privatization, this resulted in switching the direction of the copy assignment. Instead, this PR adds a new paramter to explicitely tell the function the direction of the assignment. This is a follow up PR for https://github.com/llvm/llvm-project/pull/122471, only the latest commit is relevant.	2025-01-16 19:10:12 +01:00
Matthias Springer	f023da12d1	[mlir][IR] Remove factory methods from `FloatType` (#123026 ) This commit removes convenience methods from `FloatType` to make it independent of concrete interface implementations. See discussion here: https://discourse.llvm.org/t/rethink-on-approach-to-low-precision-fp-types/82361 Note for LLVM integration: Replace `FloatType::getF32(` with `Float32Type::get(` etc.	2025-01-16 08:56:09 +01:00
David Truby	0195ec452e	[flang] Add -f[no-]unroll-loops flag (#122906 )	2025-01-16 06:43:32 +00:00
Slava Zakharin	3bb969f3eb	[flang] Inline hlfir.matmul[_transpose]. (#122821 ) Inlining `hlfir.matmul` as `hlfir.eval_in_mem` does not allow to get rid of a temporary array in many cases, but it may still be much better allowing to: * Get rid of any overhead related to calling runtime MATMUL (such as descriptors creation). * Use CPU-specific vectorization cost model for matmul loops, which Fortran runtime cannot currently do. * Optimize matmul of known-size arrays by complete unrolling. One of the drawbacks of `hlfir.eval_in_mem` inlining is that the ops inside it with store memory effects block the current MLIR CSE, so I decided to run this inlining late in the pipeline. There is a source commen explaining the CSE issue in more detail. Straightforward inlining of `hlfir.matmul` as an `hlfir.elemental` is not good for performance, and I got performance regressions with it comparing to Fortran runtime implementation. I put it under an enigneering option for experiments. At the same time, inlining `hlfir.matmul_transpose` as `hlfir.elemental` seems to be a good approach, e.g. it allows getting rid of a temporay array in cases like: `A(:)=B(:)+MATMUL(TRANSPOSE(C(:,:)),D(:))`. This patch improves performance of galgel and tonto a little bit.	2025-01-15 08:42:57 -08:00
vdonaldson	ff862d6de9	[flang] Modifications to ieee floating point environment procedures (#121949 ) Intrinsic module procedures ieee_get_modes, ieee_set_modes, ieee_get_status, and ieee_set_status store and retrieve opaque data values whose size varies by machine and OS environment. These data values are usually, but not always small. Their sizes are not directly known in a cross compilation environment. Address this issue by implementing two mechanisms for processing these data values. Environments that use typical small data sizes can access storage defined at compile time. When this is not valid, data storage of any size can be allocated at runtime.	2025-01-15 10:55:09 -05:00
Kiran Chandramohan	44ba43aa2b	Revert "[Flang][Driver] Add a flag to control zero initialization of global v…" (#123067 ) Reverts llvm/llvm-project#122144 Reverting due to CI failure https://lab.llvm.org/buildbot/#/builders/89/builds/14422	2025-01-15 15:23:34 +00:00
Kiran Chandramohan	c593e3d0f7	[Flang][Driver] Add a flag to control zero initialization of global v… (#122144 ) …ariables Patch adds a flag to control zero initialization of global variables without default initialization. The default is to zero initialize.	2025-01-15 15:06:57 +00:00
Valentin Clement (バレンタインクレメン)	a19919f4cd	[flang][cuda] Add cuf.device_address operation (#122975 ) Introduce a new op to get the device address from a host symbol. This simplify the current conversion and this is also in preparation for some legalization work that need to be done in cuf kernel and cuf kernel launch similar to https://github.com/llvm/llvm-project/pull/122802	2025-01-14 17:38:04 -08:00
Peter Klausler	ecf264d3b4	[flang] Fix spurious error message due to inaccessible generic binding (#122810 ) Generic operator/assignment checks for distinguishable specific procedures must ignore inaccessible generic bindings. Fixes https://github.com/llvm/llvm-project/issues/122764.	2025-01-14 13:02:21 -08:00
Peter Klausler	9f0f54a629	[flang] Improve error messages for module self-USE (#122747 ) A module can't USE itself, either directly within the top-level module or from one of its submodules. Add a test for this case (which we already caught), and improve the diagnostic for the more confusing case involving a submodule.	2025-01-14 13:00:03 -08:00
Peter Klausler	9696355484	[flang] Better messages and error recovery for a bad RESHAPE (#122604 ) Add tests for negative array extents where necessary, motivated by a compiler crash exposed by yet another fuzzer test, and improve overall error message quality for RESHAPE(). Fixes https://github.com/llvm/llvm-project/issues/122060.	2025-01-14 12:57:49 -08:00
Peter Klausler	ebec4d6369	[flang] Fix use-after-free cases found by valgrind (#122394 ) The expression traversal library needs to use interfaces into triplets (and substrings) that return pointers to nested expressions, rather than optional copies of them, since at least one semantic analysis collects a set of references to some subexpression representation class instances, and those references obviously can't point to local copies of objects. Fixes https://github.com/llvm/llvm-project/issues/121999.	2025-01-14 12:56:33 -08:00
Peter Klausler	b720b6cbe9	[flang] Fix crash from fuzzy test. (#122364 ) Fixes https://github.com/llvm/llvm-project/issues/122002.	2025-01-14 12:56:03 -08:00
Razvan Lupusoru	01a0d212a6	[flang][acc] Implement MappableType interfaces for fir.box and fir.array (#122495 ) The newly introduced MappableType interface in `acc` dialect was primarily intended to allow variables with non-materialized storage to be used in acc data clauses (previously everything was required to be `pointer-like`). One motivator for this was `fir.box` since it is possible to be passed to functions without a wrapping `fir.ref` and also it can be generated directly via operations like `fir.embox` - and unlike other variable representations in FIR, the underlying storage for it does not get materialized until LLVM codegen. The new interface is being attached to both `fir.box` and `fir.array`. Strictly speaking, attaching to the latter is primarily for consistency since the MappableType interface requires implementation of utilities to compute byte size - and it made sense that a `fir.box<fir.array<10xi32>>` and `fir.array<10xi32>` would have a consistently computable size. This decision may be revisited as MappableType interface evolves. The new interface attachments are made in a new library named `FIROpenACCSupport`. The reason for this is to avoid circular dependencies since the implementation of this library is reusing code from lowering of OpenACC. More specifically, the types are defined in `FIRDialect` and `FortranLower` depends on it. Thus we cannot attach these interfaces in `FIRDialect`.	2025-01-14 10:42:57 -08:00

1 2 3 4 5 ...

2412 Commits