clang-p2996

Author	SHA1	Message	Date
Kazu Hirata	4435b7d8d3	[flang] Migrate away from PointerUnion::{is,get} (NFC) (#122585 ) Note that PointerUnion::{is,get} have been soft deprecated in PointerUnion.h: // FIXME: Replace the uses of is(), get() and dyn_cast() with // isa<T>, cast<T> and the llvm::dyn_cast<T>	2025-01-11 02:06:47 -08:00
Krzysztof Parzyszek	70e96dc3fb	[flang][OpenMP] Parsing context selectors for METADIRECTIVE (#121815 ) This is just adding parsers for context selectors. There are no tests because there is no way to execute these parsers yet.	2025-01-10 11:05:23 -06:00
Peter Klausler	3a8a52f4a5	[flang] Make IsCoarray() more accurate (#121415 ) A designator without cosubscripts can have subscripts, component references, substrings, &c. and still have corank. The current IsCoarray() predicate only seems to work for whole variable/component references. This was breaking some cases of THIS_IMAGE().	2025-01-08 13:16:56 -08:00
Peter Klausler	eb77f442b3	[flang] Accept L0 (#121998 ) Accept a zero field width for formatted logical output (L0), interpreting it as if it had been L1.	2025-01-08 13:15:51 -08:00
Peter Klausler	9496391901	[flang] Fold LCOBOUND & UCOBOUND (#121411 ) Implement constant folding for LCOBOUND and UCOBOUND intrinsic functions. Moves some error detection code from intrinsics.cpp to fold-integer.cpp so that erroneous calls get properly flagged and converted into known errors.	2025-01-08 13:13:30 -08:00
Valentin Clement (バレンタインクレメン)	878a57468b	[flang][cuda] Add c_devloc as intrinsic and inline it during lowering (#120648 ) Add `c_devloc` as intrinsic and inline it during lowering. `c_devloc` is used in CUDA Fortran to get the address of device variables. For the moment, we borrow almost all semantic checks from `c_loc` except for the pointer or target restriction. The specifications of `c_devloc` are are pretty vague and we will relax/enforce the restrictions based on library and apps usage comparing them to the reference compiler.	2025-01-08 11:23:05 -08:00
Slava Zakharin	2e637dbbb8	[flang] Canonicalize redundant pointer converts. (#121864 ) This patch adds a canonicalization pattern for optimizing redundant "pointer" fir.converts. Such converts prevent the StackArrays pass to recognize fir.freemem for the corresponding fir.allocmem, e.g.: ``` %69 = fir.allocmem !fir.array<2xi32> %71:2 = hlfir.declare %69(%70) {uniq_name = ".tmp.arrayctor"} : (!fir.heap<!fir.array<2xi32>>, !fir.shape<1>) -> (!fir.heap<!fir.array<2xi32>>, !fir.heap<!fir.array<2xi32>>) %95 = fir.convert %71#1 : (!fir.heap<!fir.array<2xi32>>) -> !fir.ref<!fir.array<2xi32>> %100 = fir.convert %95 : (!fir.ref<!fir.array<2xi32>>) -> !fir.heap<!fir.array<2xi32>> fir.freemem %100 : !fir.heap<!fir.array<2xi32>> ``` I found this in `tonto`, but the change does not affect performance at all. Anyway, it looks like a reasonable thing to do, and it makes easier to compare the performance profiles with other compilers'.	2025-01-07 08:35:43 -08:00
Mats Petersson	4df366cd80	[FLANG][OpenMP]Add support for ALIGN clause on OMP ALLOCATE (#120791 ) This is trivially additional support for the existing ALLOCATE directive, which allows an ALIGN clause. The ALLOCATE directive is currently not implemented, so this is just addding the necessary parser parts to allow the compiler to not say "Huh? I don't get this" [or "Expected OpenMP construct"] when it encounters the ALIGN clause. Some parser testing is updated and a new todo test, just in case the feature of align clause is not supported by the initial support for ALLOCATE.	2025-01-06 11:02:31 +00:00
Valentin Clement (バレンタインクレメン)	9165848c82	[flang][cuda] Sync global descriptor when nullifying pointer (#121595 )	2025-01-03 14:37:14 -08:00
Slava Zakharin	3c700d131a	[flang] Extract hlfir.assign inlining from opt-bufferization. (#121544 ) Optimized bufferization can transform hlfir.assign into a loop nest doing element per element assignment, but it avoids doing so for RHS that is hlfir.expr. This is done to let ElementalAssignBufferization pattern to try to do a better job. This patch moves the hlfir.assign inlining after opt-bufferization, and enables it for hlfir.expr RHS. The hlfir.expr RHS cases are present in tonto, and this patch results in some nice improvements. Note that those cases are handled by other compilers also using array temporaries, so this patch seems to just get rid of the Assign runtime overhead/inefficiency.	2025-01-03 08:33:14 -08:00
Krzysztof Parzyszek	adeff9f63a	[flang][OpenMP] Allow utility constructs in specification part (#121509 ) Allow utility constructs (error and nothing) to appear in the specification part as well as the execution part. The exception is "ERROR AT(EXECUTION)" which should only be in the execution part. In case of ambiguity (the boundary between the specification and the execution part), utility constructs will be parsed as belonging to the specification part. In such cases move them to the execution part in the OpenMP canonicalization code.	2025-01-03 09:21:36 -06:00
Krzysztof Parzyszek	df859f90aa	[flang][OpenMP] Frontend support for NOTHING directive (#120606 ) Create OpenMPUtilityConstruct and put the two utility directives in it (error and nothing). Rename OpenMPErrorConstruct to OmpErrorDirective.	2025-01-03 08:36:34 -06:00
Valentin Clement (バレンタインクレメン)	6dcd2b035d	[flang][cuda] Convert cuf.sync_descriptor to runtime call (#121524 ) Convert the op to a new entry point in the runtime `CUFSyncGlobalDescriptor`	2025-01-02 17:02:59 -08:00
Valentin Clement (バレンタインクレメン)	4b17a8b10e	[flang][cuda] Add operation to sync global descriptor (#121520 ) Introduce cuf.sync_descriptor to be used to sync device global descriptor after pointer association. Also move CUFCommon so it can be used in FIRBuilder lib as well.	2025-01-02 17:02:45 -08:00
Matthias Springer	c870632ef6	[flang] Fix some memory leaks (#121050 ) This commit fixes some but not all memory leaks in Flang. There are still 91 tests that fail with ASAN. - Use `mlir::OwningOpRef` instead of `std::unique_ptr`. The latter does not free allocations of nested blocks. - Pass `ModuleOp` as value instead of reference. - Add few missing deallocations in test cases and other places.	2024-12-25 09:42:03 +01:00
Ivan Aksamentov	2d3d62d77e	[flang] fix: split ifndef for CHECK and CHECK_MSG (#114707 ) Resolves https://github.com/llvm/llvm-project/issues/114703 I think it's the best practice that each macro has it's own `ifndef` check and this way the build issue is resolved for me. I also find the names of these macro a bit too generic - an easy recipe for conflicts. In my case, the error was likely caused by something else defining `CHECK` but not `CHECK_MSG`, so likely these `CHECK` and `CHECK_MSG` weren't actually working at all because the result of `ifndef` is always false. As a definitive fix, perhaps it makes sense to rename them to something more specific, e.g. `FLANG_CHECK` and `FLANG_CHECK_MSG`.	2024-12-25 07:47:30 +00:00
Valentin Clement (バレンタインクレメン)	4cb2a519db	Revert "Reland '[flang] Allow to pass an async id to allocate the descriptor (#118713 )' and #118733 " (#121029 ) This still cause issue for device runtime build.	2024-12-23 21:27:34 -08:00
Valentin Clement (バレンタインクレメン)	5b74fb75d9	Reland '[flang] Allow to pass an async id to allocate the descriptor (#118713 )' and #118733 (#120997 ) Device runtime build have been fixed. Attempt to re-land these patches that have been approved before. https://github.com/llvm/llvm-project/pull/118713 https://github.com/llvm/llvm-project/pull/118733	2024-12-23 12:13:56 -08:00
vdonaldson	c28a7c1efd	[flang] Modifications to ieee_support_halting (#120747 ) The F23 standard requires that a call to intrinsic module procedure ieee_support_halting be foldable to a constant at compile time in some contexts. See for example F23 Clause 10.1.11 [Specification expression] list item (13), Clause 1.1.12 [Constant expression] list item (11), and references to specification and constant expressions elsewhere, such as constraints C1012, C853, and C704. Some Arm processors allow a user to control processor behavior when an arithmetic exception is signaled, and some Arm processors do not have this capability. An Arm executable will run on either type of processor, so it is effectively unknown at compile time whether or not this support will be available at runtime. This in conflict with the standard requirement. This patch addresses this conflict by implementing ieee_support_halting calls on Arm processors to check if this capability is present at runtime. A call to ieee_support_halting in a constant context, such as in the specification part of a program unit, will generate a compile time "cannot be computed as a constant value" error. The expectation is that such calls are unlikely to appear in production code. Code generation for other processors will continue to generate a compile time constant result for ieee_support_halting calls.	2024-12-23 09:30:45 -05:00
Valentin Clement (バレンタインクレメン)	415cfaf339	[flang][cuda][NFC] Fix type in CUFFreeDescriptor (#120799 )	2024-12-20 14:43:12 -08:00
Valentin Clement (バレンタインクレメン)	e650ac1654	[flang][cuda][NFC] Fix typo in CUFAllocDescriptor (#120797 ) Missing `r` in the function name.	2024-12-20 13:57:47 -08:00
Leandro Lupori	1fcb6a9754	[flang][OpenMP] Initialize allocatable members of derived types (#120295 ) Allocatable members of privatized derived types must be allocated, with the same bounds as the original object, whenever that member is also allocated in it, but Flang was not performing such initialization. The `Initialize` runtime function can't perform this task unless its signature is changed to receive an additional parameter, the original object, that is needed to find out which allocatable members, with their bounds, must also be allocated in the clone. As `Initialize` is used not only for privatization, sometimes this other object won't even exist, so this new parameter would need to be optional. Because of this, it seemed better to add a new runtime function: `InitializeClone`. To avoid unnecessary calls, lowering inserts a call to it only for privatized items that are derived types with allocatable members. Fixes https://github.com/llvm/llvm-project/issues/114888 Fixes https://github.com/llvm/llvm-project/issues/114889	2024-12-19 17:26:50 -03:00
Renaud Kauffmann	cb0effc0e6	[flang][cuda] Using nvvm intrinsics for the syncthread and threadfence families of calls (#120020 )	2024-12-18 11:44:30 -08:00
Peter Klausler	fc97d2e68b	[flang] Add UNSIGNED (#113504 ) Implement the UNSIGNED extension type and operations under control of a language feature flag (-funsigned). This is nearly identical to the UNSIGNED feature that has been available in Sun Fortran for years, and now implemented in GNU Fortran for gfortran 15, and proposed for ISO standardization in J3/24-116.txt. See the new documentation for details; but in short, this is C's unsigned type, with guaranteed modular arithmetic for +, -, and *, and the related transformational intrinsic functions SUM & al.	2024-12-18 07:02:37 -08:00
Kareem Ergawy	e532241b02	Re-apply (#117867 ): [flang][OpenMP] Implicitly map allocatable record fields (#120374 ) This re-applies #117867 with a small fix that hopefully prevents build bot failures. The fix is avoiding `dyn_cast` for the result of `getOperation()`. Instead we can assign the result to `mlir::ModuleOp` directly since the type of the operation is known statically (`OpT` in `OperationPass`).	2024-12-18 09:19:45 +01:00
Kareem Ergawy	dc936f3c19	Revert "[flang][OpenMP] Implicitly map allocatable record fields (#117867 )" (#120360 )	2024-12-18 06:52:24 +01:00
Kareem Ergawy	db09014a07	[flang][OpenMP] Implicitly map allocatable record fields (#117867 ) This is a starting PR to implicitly map allocatable record fields. This PR contains the following changes: 1. Re-purposes some of the utils used in `Lower/OpenMP.cpp` so that these utils work on the `mlir::Value` level rather than the `semantics::Symbol` level. This takes one step towards to enabling MLIR passes to more easily do some lowering themselves (e.g. creating `omp.map.bounds` ops for implicitely caputured data like this PR does). 2. Adds support for implicitely capturing and mapping allocatable fields in record types. There is quite some distant to still cover to have full support for this. I added a number of todos to guide further development. Co-authored-by: Andrew Gozillon <andrew.gozillon@amd.com> Co-authored-by: Andrew Gozillon <andrew.gozillon@amd.com>	2024-12-18 05:37:58 +01:00
Peter Klausler	a957cedea9	[flang] Handle substring in data statement constant (#120130 ) The case of a constant substring wasn't handled in the parser for data statement constants. Fixes https://github.com/llvm/llvm-project/issues/119005.	2024-12-17 12:10:50 -08:00
Slava Zakharin	9d33874936	[flang] Support -f[no-]realloc-lhs. (#120165 ) -frealloc-lhs is the default. If -fno-realloc-lhs is specified, then an allocatable on the left side of an intrinsic assignment is not implicitly (re)allocated to conform with the right hand side. Fortran runtime will issue an error if there is a mismatch in shape/type/allocation-status.	2024-12-17 09:06:05 -08:00
Slava Zakharin	a00946fc94	[flang] Simplify hlfir.sum total reductions. (#119482 ) I am trying to switch to keeping the reduction value in a temporary scalar location so that I can use hlfir::genLoopNest easily. This also allows using omp.loop_nest with worksharing for OpenMP.	2024-12-13 13:08:28 -08:00
Mats Petersson	75e6d0eb4d	[flang][OpenMP]Add support for OpenMP ERROR directive (#119582 ) Lowering leads to a TODO, with a test to confirm. Also testing unparse. --------- Co-authored-by: Krzysztof Parzyszek <Krzysztof.Parzyszek@amd.com>	2024-12-13 14:05:48 +00:00
Slava Zakharin	139e69b7bc	[flang] Simple folding for hlfir.shape_of. (#119649 ) This folding makes sure there are no hlfir.shape_of users of hlfir.elemental - this may enable more InlineElementals matches, because it is looking for exactly two uses of an hlfir.elemental.	2024-12-12 10:38:34 -08:00
Krzysztof Parzyszek	03cbe42627	[flang][OpenMP] Rework LINEAR clause (#119278 ) The OmpLinearClause class was a variant of two classes, one for when the linear modifier was present, and one for when it was absent. These two classes did not follow the conventions for parse tree nodes, (i.e. tuple/wrapper/union formats), which necessitated specialization of the parse tree visitor. The new form of OmpLinearClause is the standard tuple with a list of modifiers and an object list. The specialization of parse tree visitor for it has been removed. Parsing and unparsing of the new form bears additional complexity due to syntactical differences between OpenMP 5.2 and prior versions: in OpenMP 5.2 the argument list is post-modified, while in the prior versions, the step modifier was a post-modifier while the linear modifier had an unusual syntax of `modifier(list)`. With this change the LINEAR clause is no different from any other clauses in terms of its structure and use of modifiers. Modifier validation and all other checks work the same as with other clauses.	2024-12-12 12:19:35 -06:00
Krzysztof Parzyszek	58f9c4fc00	[flang][OpenMP] Semantic checks for IN_REDUCTION and TASK_REDUCTION (#118841 ) Update parsing of these two clauses and add semantic checks for them. Simplify some code in IsReductionAllowedForType and CheckReductionOperator.	2024-12-12 12:19:12 -06:00
Valentin Clement (バレンタインクレメン)	151901c762	[flang][rt][device] Use enum-set.h as Fortran.h (#119611 )	2024-12-11 15:38:38 -08:00
Mats Petersson	00e1cc4c9d	[flang][OpenMP]Add support for fail clause (#118683 ) Support the atomic compare option of a fail(memory-order) clauses. Additional tests introduced to check that parsing and semantics checks for the new clause is handled. Lowering for atomic compare is still unsupported and wil end in a TOOD (aka "Not yet implemented"). A test for this case with the fail clause is also present.	2024-12-11 16:29:02 +00:00
执着	e8baa792e7	Backtrace support for flang (#118179 ) Fixed build failures in old PRs due to missing files	2024-12-10 10:31:48 +00:00
Yusuke MINATO	a88677edc0	Reland "[flang] Integrate the option -flang-experimental-integer-overflow into -fno-wrapv" (#118933 ) This relands #110063. The performance issue on 503.bwaves_r is found not to be related to the patch, and is resolved by `fbd89bcc` when LTO is enabled.	2024-12-10 16:26:53 +09:00
Slava Zakharin	1ca392764a	[flang] Added definition of hlfir.cshift operation. (#118732 ) CSHIFT intrinsic will be lowered to this operation, which then can be optimized as inline sequence or lowered into a runtime call.	2024-12-09 07:55:22 -08:00
Valentin Clement (バレンタインクレメン)	16c2a1016e	Revert "[flang] Allow to pass an async id to allocate the descriptor (#118713 )" (#119109 ) This reverts commit `7d1c661381`. This commit breaks some device runtime builds. Need time to investigate.	2024-12-07 19:55:12 -08:00
Michael Kruse	c91ba04328	[Flang][NFC] Split runtime headers in preparation for cross-compilation. (#112188 ) Split some headers into headers for public and private declarations in preparation for #110217. Moving the runtime-private headers in runtime-private include directory will occur in #110298. * Do not use `sizeof(Descriptor)` in the compiler. The size of the descriptor is target-dependent while `sizeof(Descriptor)` is the size of the Descriptor for the host platform which might be too small when cross-compiling to a different platform. Another problem is that the emitted assembly ((cross-)compiling to the same target) is not identical between Flang's running on different systems. Moving the declaration of `class Descriptor` out of the included header will also reduce the amount of #included sources. * Do not use `sizeof(ArrayConstructorVector)` and `alignof(ArrayConstructorVector)` in the compiler. Same reason as with `Descriptor`. * Compute the descriptor's extra flags without instantiating a Descriptor. `Fortran::runtime::Descriptor` is defined in the runtime source, but not the compiler source. * Move `InquiryKeywordHashDecode` into runtime-private header. The function is defined in the runtime sources and trying to call it in the compiler would lead to a link-error. * Move allocator-kind magic numbers into common header. They are the only declarations out of `allocator-registry.h` in the compiler as well. This does not make Flang cross-compile ready yet, the main goal is to avoid transitive header dependencies from Flang to clang-rt. There are more assumptions that host platform is the same as the target platform.	2024-12-06 15:29:00 +01:00
Renaud Kauffmann	27e458c8cb	[flang][cuda] Distinguish constant fir.global from globals with a #cuf.cuda<constant> attribute (#118912 ) 1. In `CufOpConversion` `isDeviceGlobal` was renamed `isRegisteredGlobal` and moved to the common file. `isRegisteredGlobal` excludes constant `fir.global` operation from registration. This is to avoid calls to `_FortranACUFGetDeviceAddress` on globals which do not have any symbols in the runtime. This was done for `_FortranACUFRegisterVariable` in #118582, but also needs to be done here after #118591 2. `CufDeviceGlobal` no longer adds the `#cuf.cuda<constant>` attribute to the constant global. As discussed in #118582 a module variable with the #cuf.cuda<constant> attribute is not a compile time constant. Yet, the compile time constant also needs to be copied into the GPU module. The candidates for copy to the GPU modules are - the globals needing regsitrations regardless of their uses in device code (they can be referred to in host code as well) - the compile time constant when used in device code 3. The registration of "constant" module device variables ( #cuf.cuda<constant>) can be restored in `CufAddConstructor`	2024-12-05 18:36:48 -08:00
Valentin Clement (バレンタインクレメン)	83ccaad473	[flang][cuda] Use async id for device stream allocation (#118733 ) When stream is specified use cudaMallocAsync with the specified stream	2024-12-05 08:57:10 -08:00
jeanPerier	ff78cd5f3d	[flang] fix private pointers and default initialized variables (#118494 ) Both OpenMP privatization and DO CONCURRENT LOCAL lowering was incorrect for pointers and derived type with default initialization. For pointers, the descriptor was not established with the rank/type code/element size, leading to undefined behavior if any inquiry was made to it prior to a pointer assignment (and if/when using the runtime for pointer assignments, the descriptor must have been established). For derived type with default initialization, the copies were not default initialized.	2024-12-05 14:09:48 +01:00
Michael Kruse	0cda970ecc	[Flang][NFC] Split common headers to reduce dependencies. (#110244 ) Fortran.h and target.h are defining symbols where some are used by both, the Fortran runtime (Flang-RT) and Fortran compiler (Flang), and others are used by Flang only. With the upcoming refactoring of the Fortran runtime into its own subproject (#110217), move the declarations that are used by both into new headers to minimize the amount of code that will need to be shared by Flang-RT and Flang. Details: * `Fortran.h`: Flang-RT only uses some enum definitions out of this file, but not `AsFortran` which is defined in `Fortran.cpp`. Moving the enums into `Fortran-consts.h` allows keeping `Fortran.cpp` within Flang. * `target.h`: Contains some floating-point definitions that is used by the non-GTest unittests in `fp-testing.h`. Flang-RT also uses some non-GTest as well. Moving those definitions avoids the dependence on the entire FortranEvaluate library.	2024-12-05 11:29:32 +01:00
Valentin Clement (バレンタインクレメン)	7d1c661381	[flang] Allow to pass an async id to allocate the descriptor (#118713 ) This is a patch in preparation for the support stream ordered memory allocator in CUDA Fortran. This patch adds an asynchronous id to the AllocatableAllocate runtime function and to Descriptor::Allocate so it can be passed down to the registered allocator. It is up to the allocator to use this value or not. A follow up patch will implement that asynchronous allocator for CUDA Fortran.	2024-12-04 18:24:40 -08:00
Valentin Clement (バレンタインクレメン)	7efd6139f2	[flang][cuda] Get device address in fir.declare (#118591 ) Add pattern that update fir.declare memref when it comes from a device global and is not a descriptor. In that case, we recover the device address that needs to be used in ops like `fir.array_coor` and so on.	2024-12-04 13:36:58 -08:00
vdonaldson	6003be7ef1	[flang] IEEE_GET_UNDERFLOW_MODE, IEEE_SET_UNDERFLOW_MODE (#118551 ) Implement IEEE_GET_UNDERFLOW_MODE and IEEE_SET_UNDERFLOW_MODE. Update IEEE_SUPPORT_UNDERFLOW_CONTROL to enable support for indvidual REAL kinds.	2024-12-04 16:21:11 -05:00
Valentin Clement (バレンタインクレメン)	5522d2462e	[flang][cuda] Allow AbstractResult to run in gpu.module (#118529 ) in CUDA Fortran, device function are converted to `gpu.func` inside the `gpu.module` operation. Update the AbstractResult pass to be able to run on `func.func` and `gpu.func` operations inside the `gpu.module`.	2024-12-03 14:04:49 -08:00
jeanPerier	cd7e65398f	[flang] optimize array function calls using hlfir.eval_in_mem (#118070 ) This patch encapsulate array function call lowering into hlfir.eval_in_mem and allows directly evaluating the call into the LHS when possible. The conditions are: LHS is contiguous, not accessed inside the function, it is not a whole allocatable, and the function results needs not to be finalized. All these conditions are tested in the previous hlfir.eval_in_mem optimization (#118069) that is leveraging the extension of getModRef to handle function calls(#117164). This yields a 25% speed-up on polyhedron channel2 benchmark (from 1min to 45s measured on an X86-64 Zen 2).	2024-12-03 10:04:52 +01:00

1 2 3 4 5 ...

2359 Commits