clang-p2996

Author	SHA1	Message	Date
Valentin Clement (バレンタインクレメン)	60105ac6ba	[flang][cuda] Fix kernel registration (#113372 ) The registration needs the fct pointer and the name. This patch updates the entry point with an extra arg and the translation as well.	2024-10-23 11:25:58 -07:00
Valentin Clement (バレンタインクレメン)	d37bc32a65	[flang][cuda] Translate cuf.register_kernel and cuf.register_module (#112972 ) Add LLVM IR Translation for `cuf.register_module` and `cuf.register_kernel`. These are lowered to function call to the CUF runtime entries.	2024-10-18 21:31:47 -07:00
Scott Manley	e6a4346b5a	[flang] add getElementType() to fir::SquenceType and fir::VectorType (#112770 ) getElementType() was missing from Sequence and Vector types. Did a replace of the obvious places getEleTy() was used for these two types and updated to use this name instead. Co-authored-by: Scott Manley <scmanley@nvidia.com>	2024-10-18 09:29:25 +02:00
Valentin Clement (バレンタインクレメン)	834d001e10	[flang][cuda] Relax the verifier for cuf.register_kernel op (#112585 ) Relax the verifier since the `gpu.func` might be converted to `llvm.func` before `cuf.register_kernel` is converted.	2024-10-17 08:30:13 -07:00
Valentin Clement (バレンタインクレメン)	7e72e5ba86	Reland '[flang][cuda] Add cuf.register_kernel operation' (#112389 ) The operation will be used in the CUF constructor to register the kernel functions. This allow to delay this until codegen when the gpu.binary will be available. Reland of #112268 with correct shared library build support.	2024-10-15 11:12:03 -07:00
Valentin Clement (バレンタインクレメン)	2a68f82989	Revert "[flang][cuda] Add cuf.register_kernel operation" (#112306 ) Reverts llvm/llvm-project#112268	2024-10-14 21:06:59 -07:00
Valentin Clement (バレンタインクレメン)	cbe76a2ac3	[flang][cuda] Add cuf.register_kernel operation (#112268 ) The operation will be used in the CUF constructor to register the kernel functions. This allow to delay this until codegen when the gpu.binary will be available.	2024-10-14 20:57:21 -07:00
Tarun Prabhu	839344f025	[clang][flang][mlir] Reapply "Support -frecord-command-line option (#102975 )" The underlying issue was caused by a file included in two different places which resulted in duplicate definition errors when linking individual shared libraries. This was fixed in `c3201ddaea` [#109874].	2024-10-14 08:44:24 -06:00
Leandro Lupori	390943f25b	[flang] Implement conversion of compatible derived types (#111165 ) With some restrictions, BIND(C) derived types can be converted to compatible BIND(C) derived types. Semantics already support this, but ConvertOp was missing the conversion of such types. Fixes https://github.com/llvm/llvm-project/issues/107783	2024-10-09 10:37:46 -03:00
jeanPerier	1753de2d95	[flang][FIR] remove fir.complex type and its fir.real element type (#111025 ) Final patch of https://discourse.llvm.org/t/rfc-flang-replace-usages-of-fir-complex-by-mlir-complex-type/82292 Since fir.real was only still used as fir.complex element type, this patch removes it at the same time.	2024-10-04 09:57:03 +02:00
jeanPerier	c4204c0b29	[flang] replace fir.complex usages with mlir complex (#110850 ) Core patch of https://discourse.llvm.org/t/rfc-flang-replace-usages-of-fir-complex-by-mlir-complex-type/82292. After that, the last step is to remove fir.complex from FIR types.	2024-10-03 17:10:57 +02:00
jeanPerier	c2601f1769	[flang][NFC] remove unused fir.constc operation (#110821 ) As part of [RFC to replace fir.complex usages by mlir.complex type](https://discourse.llvm.org/t/rfc-flang-replace-usages-of-fir-complex-by-mlir-complex-type/82292). fir.constc is unused so instead of porting it, just remove it. Complex constants are currently created with inserts in lowering already. When using mlir complex, we may just want to start using [complex.constant](`4f6ad17adc/mlir/include/mlir/Dialect/Complex/IR/ComplexOps.td (L131C5-L131C16)`).	2024-10-02 16:16:57 +02:00
David Spickett	737c414e1d	Revert "[clang][flang][mlir] Support -frecord-command-line option (#102975 )" This reverts commit `b3533a156d`. It caused test failures in shared library builds: https://lab.llvm.org/buildbot/#/builders/80/builds/3854	2024-09-20 11:30:50 +00:00
Tarun Prabhu	b3533a156d	[clang][flang][mlir] Support -frecord-command-line option (#102975 ) Add support for the -frecord-command-line option that will produce the llvm.commandline metadata which will eventually be saved in the object file. This behavior is also supported in clang. Some refactoring of the code in flang to handle these command line options was carried out. The corresponding -grecord-command-line option which saves the command line in the debug information has not yet been enabled for flang.	2024-09-19 18:28:50 -06:00
Youngsuk Kim	84d7f294c4	[flang] Tidy uses of raw_string_ostream (NFC) As specified in the docs, 1) raw_string_ostream is always unbuffered and 2) the underlying buffer may be used directly ( `65b13610a5` for further reference ) Avoid unneeded calls to raw_string_ostream::str(), to avoid excess indirection.	2024-09-18 13:26:29 -05:00
Valentin Clement (バレンタインクレメン)	bc54e5636f	[flang][cuda] Add new entry points function for data transfer (#108244 ) Add new entry points for more complex data transfer involving descriptors. These functions will be called when converting `cuf.data_transfer` operations.	2024-09-16 09:45:44 -07:00
jeanPerier	b65fc7e91a	[flang][fir] allow fir.convert from and to !llvm.ptr type (#106590 ) Allow some interaction between LLVM and FIR dialect by allowing conversion between FIR memory types and llvm.ptr type. This is meant to help experimentation where FIR and LLVM dialect coexists, and is useful to deal with cases where LLVM type makes it early into the MLIR produced by flang, like when inserting LLVM stack intrinsic here: `0a00d32c5f/flang/lib/Optimizer/Transforms/StackReclaim.cpp (L57)`	2024-08-30 08:20:17 +02:00
Valentin Clement (バレンタインクレメン)	900cd62758	[flang][cuda] Simplify data transfer when possible (#106120 ) When possible, avoid using descriptors and use the reference and the shape for data_transfer.	2024-08-27 10:03:15 -07:00
Abid Qadeer	d07dc73bcf	[flang][debug] Support derived types. (#99476 ) This PR adds initial debug support for derived type. It handles `RecordType` and generates appropriate `DICompositeTypeAttr`. The `TypeInfoOp` is used to get information about the parent and location of the derived type. We use `getTypeSizeAndAlignment` to get the size and alignment of the components of the derived types. This function needed a few changes to be suitable to be used here: 1. The `getTypeSizeAndAlignment` errored out on unsupported type which would not work with incremental way we are building debug support. A new variant of this function has been that returns an std::optional. The original function has been renamed to `getTypeSizeAndAlignmentOrCrash` as it will call `TODO()` for unsupported types. 2. The Character type was returning size of just element and not the whole string which has been fixed. The testcase checks for offsets of the components which had to be hardcoded in the test. So the testcase is currently enabled on x86_64. With this PR in place, this is how the debugging of derived types look like: ``` type :: t_date integer :: year, month, day end type type :: t_address integer :: house_number end type type, extends(t_address) :: t_person character(len=20) name end type type, extends(t_person) :: t_employee type(t_date) :: hired_date real :: monthly_salary end type type(t_employee) :: employee (gdb) p employee $1 = ( t_person = ( t_address = ( house_number = 1 ), name = 'John', ' ' <repeats 16 times> ), hired_date = ( year = 2020, month = 1, day = 20 ), monthly_salary = 3.1400001 ) ```	2024-08-27 10:30:49 +01:00
Valentin Clement (バレンタインクレメン)	7af61d5cf4	[flang][cuda] Add shape to cuf.data_transfer operation (#104631 ) When doing data transfer with dynamic sized array, we are currently generating a data transfer between two descriptors. If the shape values can be provided, we can keep the data transfer between two references. This patch adds the shape operands to the operation. This will be exploited in lowering in a follow up patch.	2024-08-26 09:50:17 -07:00
jeanPerier	2051a7bcd3	[flang][NFC] turn fir.call is_bind_c into enum for procedure flags (#105691 ) First patch to fix a BIND(C) ABI issue (https://github.com/llvm/llvm-project/issues/102113). I need to keep track of BIND(C) in more locations (fir.dispatch and func.func operations), and I need to fix a few passes that are dropping the attribute on the floor. Since I expect more procedure attributes that cannot be reflected in mlir::FunctionType will be needed for ABI, optimizations, or debug info, this NFC patch adds a new enum attribute to keep track of procedure attributes in the IR. This patch is not updating lowering to lower more attributes, this will be done in a separate patch to keep the test changes low here. Adding the attribute on fir.dispatch and func.func will also be done in separate patches.	2024-08-23 14:32:43 +02:00
Tarun Prabhu	90aac06c7f	[flang][mlir] Add llvm.ident metadata when compiling with flang This brings the behavior of flang in line with clang which also adds this metadata unconditionally. Co-authored-by: Tarun Prabhu <tarun.prabhu@gmail.com>	2024-08-12 11:56:19 -06:00
Valentin Clement (バレンタインクレメン)	0ee0eeb4bb	[flang] Enhance location information (#95862 ) Add inclusion location information by using FusedLocation with attribute. More context here: https://discourse.llvm.org/t/rfc-enhancing-location-information/79650	2024-07-23 09:49:17 -07:00
Valentin Clement (バレンタインクレメン)	33cb29cc3e	[flang][cuda] Use cuf.alloc/cuf.free for local descriptor (#98518 ) Local descriptor for cuda allocatable need to be handled on host and device. One solution is to duplicate the descriptor (one on the host and one on the device) and keep them in sync or have the descriptor in managed/unified memory so we don't to take care of any sync. The second solution is probably the one we will implement. In order to have more flexibility on how descriptor representing cuda allocatable are allocated, this patch updates the lowering to use the cuf operations alloc and free to managed them.	2024-07-17 13:52:36 -07:00
jeanPerier	31087c5e4c	[flang] handle alloca outside of entry blocks in MemoryAllocation (#98457 ) This patch generalizes the MemoryAllocation pass (alloca -> heap) to handle fir.alloca regardless of their postion in the IR. Currently, it only dealt with fir.alloca in function entry blocks. The logic is placed in a utility that can be used to replace alloca in an operation on demand to whatever kind of allocation the utility user wants via callbacks (allocmem, or custom runtime calls to instrument the code...). To do so, a concept of ownership, that was already implied a bit and used in passes like stack-reclaim, is formalized. Any operation with the LoopLikeInterface, AutomaticAllocationScope, or IsolatedFromAbove owns the alloca directly nested inside its regions, and they must not be used after the operation. The pass then looks for the exit points of region with such interface, and use that to insert deallocation. If dominance is not proved, the pass fallbacks to storing the new address into a C pointer variable created in the entry of the owning region which allows inserting deallocation as needed, included near the alloca itself to avoid leaks when the alloca is executed multiple times due to block CFGs loops. This should fix https://github.com/llvm/llvm-project/issues/88344. In a next step, I will try to refactor lowering a bit to introduce lifetime operation for alloca so that the deallocation points can be inserted as soon as possible.	2024-07-17 09:15:47 +02:00
Alexis Perry-Holby	f1d3fe7aae	Add basic -mtune support (#98517 ) Initial implementation for the -mtune flag in Flang. This PR is a clean version of PR #96688, which is a re-land of PR #95043	2024-07-16 16:48:24 +01:00
jeanPerier	66d5ca2a3d	Reland "[flang] add extra component information in fir.type_info" (#97404 ) Reland #96746 with the proper Support/CMakelist.txt change. fir.type does not contain all Fortran level information about components. For instance, component lower bounds and default initial value are lost. For correctness purpose, this does not matter because this information is "applied" in lowering (e.g., when addressing the components, the lower bounds are reflected in the hlfir.designate). However, this "loss" of information will prevent the generation of correct debug info for the type (needs to know about lower bounds). The initial value could help building some optimization pass to get rid of initialization runtime calls. This patch adds lower bound and initial value information into fir.type_info via a new fir.dt_component operation. This operation is generated only for component that needs it, which helps keeping the IR small for "boring" types. In general, adding Fortran level info in fir.type_info will allow delaying the generation of "type descriptors" gobals that are very verbose in FIR and make it hard to work with FIR dumps from applications with many derived types.	2024-07-02 15:19:49 +02:00
Ramkumar Ramachandra	db791b278a	mlir/LogicalResult: move into llvm (#97309 ) This patch is part of a project to move the Presburger library into LLVM.	2024-07-02 10:42:33 +01:00
jeanPerier	6a66b8224d	Revert "[flang] add extra component information in fir.type_info" (#96937 ) Reverts llvm/llvm-project#96746 Breaking shared library buillds: https://lab.llvm.org/buildbot/#/builders/89/builds/931	2024-06-27 19:22:48 +02:00
jeanPerier	1448ed2000	[flang] add extra component information in fir.type_info (#96746 ) fir.type does not contain all Fortran level information about components. For instance, component lower bounds and default initial value are lost. For correctness purpose, this does not matter because this information is "applied" in lowering (e.g., when addressing the components, the lower bounds are reflected in the hlfir.designate). However, this "loss" of information will prevent the generation of correct debug info for the type (needs to know about lower bounds). The initial value could help building some optimization pass to get rid of initialization runtime calls. This patch adds lower bound and initial value information into fir.type_info via a new fir.dt_component operation. This operation is generated only for component that needs it, which helps keeping the IR small for "boring" types. In general, adding Fortran level info in fir.type_info will allow delaying the generation of "type descriptors" gobals that are very verbose in FIR and make it hard to work with FIR dumps from applications with many derived types.	2024-06-27 18:59:03 +02:00
Tarun Prabhu	8dd9494056	Revert "[flang] Add basic -mtune support" (#96678 ) Reverts llvm/llvm-project#95043	2024-06-25 13:25:39 -06:00
Alexis Perry-Holby	a790279bf2	[flang] Add basic -mtune support (#95043 ) This PR adds -mtune as a valid flang flag and passes the information through to LLVM IR as an attribute on all functions. No specific architecture optimizations are added at this time.	2024-06-25 18:39:35 +01:00
donald chen	2c1ae801e1	[mlir][side effect] refactor(*): Include more precise side effects (#94213 ) This patch adds more precise side effects to the current ops with memory effects, allowing us to determine which OpOperand/OpResult/BlockArgument the operation reads or writes, rather than just recording the reading and writing of values. This allows for convenient use of precise side effects to achieve analysis and optimization. Related discussions: https://discourse.llvm.org/t/rfc-add-operandindex-to-sideeffect-instance/79243	2024-06-19 22:10:34 +08:00
jeanPerier	a786919256	[flang] allow assumed-rank box in fir.store (#95980 ) Codegen is done with a memcpy using the rank from the "value" descriptor like for the fir.load case. Rational described in https://github.com/llvm/llvm-project/blob/main/flang/docs/AssumedRank.md.	2024-06-19 10:12:19 +02:00
jeanPerier	bacbf26b4c	[flang] allow assumed-rank box in fir.alloca (#95947 ) The alloca can be maximized with the maximum number or ranks, which is reasonable (15 currently as per the standard). Introducing a rank based dynamic allocation would complexify alloca hoisting and stack size analysis (this can be revisited if the standard changes to allow more ranks). No change is needed since this is already reflected in how the fir.box type is translated to LLVM.	2024-06-19 09:56:36 +02:00
Valentin Clement (バレンタインクレメン)	5e20785edc	[flang][cuda] Relax cuf.data_transfer verifier (#95974 ) Allow data transfer between array reference and array described by a descriptor.	2024-06-18 13:09:37 -07:00
jeanPerier	9f44d5d9d0	[flang] Simplify copy-in copy-out runtime API (#95822 ) The runtime API for copy-in copy-out currently only has an entry only for the copy-out. This entry has a "skipInit" boolean that is never set to false by lowering and it does not deal with the deallocation of the temporary. The generated code was a mix of inline code and runtime calls This is not a big deal, but this is unneeded compiler and generated code complexity. With assumed-rank, it is also more cumbersome to establish a temporary descriptor. Instead, this patch: - Adds a CopyInAssignment API that deals with establishing the temporary descriptor and does the copy. - Removes unused arg to CopyOutAssign, and pushes destruction/deallocation responsibility inside it. Note that this runtime API are still not responsible for deciding the need of copying-in and out. This is kept as a separate runtime call to IsContiguous, which is easier to inline/replace by inline code with the hope of removing the copy-in/out calls after user function inlining. @vzakhari has already shown that always inlining all the copy part increase Fortran compilation time due to loop optimization attempts for loops that are known to have little optimization profitability (the variable being copied from and to is not contiguous).	2024-06-18 12:04:04 +02:00
Iman Hosseini	7665d3d90d	[flang] Add reductions for CUF Kernels: Lowering (#95184 ) * Add reductionOperands and reductionAttrs to cuf's KernelOp. * Parsing is already working and the tree has the info: here I make the Bridge emit the updated KernelOp with reduction information added. * Check \|reductionAttrs\| = \|reductionOperands\| in verifier * Add a test @clementval @vzakhari --------- Co-authored-by: Iman Hosseini <imanh@nvidia.com> Co-authored-by: Valentin Clement (バレンタインクレメン) <clementval@gmail.com>	2024-06-12 19:18:41 +01:00
Valentin Clement (バレンタインクレメン)	0babff9675	[flang] Lower REDUCE intrinsic with no DIM argument and rank 1 (#94652 ) This patch lowers the `REDUCE` intrinsic call to the runtime equivalent for scalar results. Call with array result will follow.	2024-06-10 14:12:57 -07:00
khaki3	88cdd99055	[flang] Add reduction semantics to fir.do_loop (#93934 ) Derived from #92480. This PR introduces reduction semantics into loops for DO CONCURRENT REDUCE. The `fir.do_loop` operation now invisibly has the `operandSegmentsizes` attribute and takes variable-length reduction operands with their operations given as `fir.reduce_attr`. For the sake of compatibility, `fir.do_loop`'s builder has additional arguments at the end. The `iter_args` operand should be placed in front of the declaration of result types, so the new operand for reduction variables (`reduce`) is put in the middle of arguments.	2024-06-06 11:16:40 -07:00
Slava Zakharin	e42864ecfb	[flang] Fixed buildbots: removed std::move preventing copy elision.	2024-06-04 15:31:39 -07:00
Slava Zakharin	ae4f300133	[flang] Canonicalize fir.array_coor by pulling in embox/rebox. (#92858 ) In a simple case like this: ``` program test integer :: u(120, 2) u(1:120,1:2) = u(1:120,1:2) + 2 end program ``` Flang is creating a copy loop with fir.array_coor using a result of fir.embox inserted before the loop. This results in split address computations before and inside the loop, which can be seen as many more arithmetic operations than required after converting FIR to LLVM dialect. Even though LLVM SROA/mem2reg are able to optimize the temporary descriptor, and then LICM is able to hoist the invariant computations, we seem to get better mix of LLVM dialect operations after FIR-to-LLVM codegen. This may also slightly reduce the compilation time taken by LLVM to optimize the generate LLVM IR. This may also slightly reduce the time spent by FIR AliasAnalysis to reach the memory reference source.	2024-06-04 15:21:19 -07:00
Slava Zakharin	6cd86d0fae	[flang] Use fir.declare/fir.dummy_scope for TBAA tags attachments. (#92472 ) With MLIR inlining (e.g. `flang-new -mmlir -inline-all=true`) the current TBAA tags attachment is suboptimal, because we may lose information about the callee's dummy arguments (by bypassing fir.declare in AliasAnalysis::getSource). This is a conservative first step to improve the situation. This patch makes AddAliasTagsPass to account for fir.dummy_scope hierarchy after MLIR inlining and use it to place the TBAA tags into TBAA trees corresponding to different function scopes. The pass uses special mode of AliasAnalysis to find the instantiation point of a Fortran variable (a [hl]fir.decalre) when searching for the source of a memory reference. In this mode, AliasAnalysis will always stop at fir.declare operations that have dummy_scope operands - there should not be a reason to past throught it for the purpose of TBAA tags attachment.	2024-06-04 08:33:40 -07:00
Kareem Ergawy	5bfc444524	[flang] Emit `argNo` debug info only for `func` block args (#93921 ) Fixes a bug uncovered by [pr43337.f90](https://github.com/llvm/llvm-test-suite/blob/main/Fortran/gfortran/regression/gomp/pr43337.f90) in the test suite. In particular, this emits `argNo` debug info only if the parent op of a block is a `func.func` op. This avoids DI conflicts when a function contains a nested OpenMP region that itself has block arguments with DI attached to them; for example, `omp.parallel` with delayed privatization enabled.	2024-06-03 11:33:00 +02:00
jeanPerier	fd8b2d2046	[flang] lower RANK intrinsic (#93694 ) First commit is reviewed in https://github.com/llvm/llvm-project/pull/93682. Lower RANK using fir.box_rank. This patches updates fir.box_rank to accept box reference, this avoids the need of generating an assumed-rank fir.load just for the sake of reading ALLOCATABLE/POINTER rank. The fir.load would generate a "dynamic" memcpy that is hard to optimize without further knowledge. A read effect is conditionally given to the operation.	2024-05-30 11:02:09 +02:00
jeanPerier	f1d13bbd66	[flang] add FIR to FIR pass to lower assumed-rank operations (#93344 ) Add pass to lower assumed-rank operations. The current patch adds codegen for fir.rebox_assumed_rank. It will be the pass lowering fir.select_rank. fir.rebox_assumed_rank is lowered to a call to CopyAndUpdateDescriptor runtime API. Note that the lowering ends-up allocating two new descriptors at the LLVM level (one alloca created by the pass for the CopyAndUpdateDescriptor result descriptor argument, the second one is created by the fir.load of the result descriptor in codegen). LLVM is currently unable to properly optimize and merge those allocas. The "nocapture" attribute added to CopyAndUpdateDescriptor arguments gives part of the information to LLVM, but the fir.load codegen of descriptors must be updated to use llvm.memcpy instead of llvm.load+store to allow LLVM to optimize it. This will be done in later patch.	2024-05-27 11:45:39 +02:00
jeanPerier	b0b3596404	[flang] add fir.rebox_assumed_rank operation (#93334 ) As described in https://github.com/llvm/llvm-project/blob/main/flang/docs/AssumedRank.md, add an operation to make copies of assumed-rank descriptors where lower bounds, attributes, or dynamic type may have been changed.	2024-05-27 10:53:31 +02:00
Valentin Clement (バレンタインクレメン)	0bc710f7c1	[flang][cuda] Accept constant as src for cuf.data_tranfer (#92951 ) Assignment of a constant (host) to a device variable is a special case that can be further lowered to `cudaMemset` or similar functions. This patch update the lowering to avoid the creation of a temporary when we assign a constant to a device variable.	2024-05-21 12:42:30 -07:00
Valentin Clement	7847b1ca00	[flang][cuda][NFC] Silence warning triggered in buildbot	2024-05-21 11:35:51 -07:00
Valentin Clement (バレンタインクレメン)	1fc3ce1cdb	[flang][cuda] Enable data transfer for descriptors (#92804 ) Remove the TODO when data transfer is done with descriptor variables.	2024-05-21 11:23:55 -07:00

1 2 3 4 5 ...

368 Commits