clang-p2996

Author	SHA1	Message	Date
Billy Zhu	6f6336858e	[MLIR][LLVM] Add DebugNameTableKind to DICompileUnit (#87974 ) Add the DebugNameTableKind field to DICompileUnit, along with its importer & exporter.	2024-04-09 06:18:07 -07:00
jeanPerier	a4798bb0b6	[flang][NFC] use mlir::SymbolTable in lowering (#86673 ) Whenever lowering is checking if a function or global already exists in the mlir::Module, it was doing module->lookup. On big programs (~5000 globals and functions), this causes important slowdowns because these lookups are linear. Use mlir::SymbolTable to speed-up these lookups. The SymbolTable has to be created from the ModuleOp and maintained in sync. It is therefore placed in the converter, and FirOPBuilders can take a pointer to it to speed-up the lookups. This patch does not bring mlir::SymbolTable to FIR/HLFIR passes, but some passes creating a lot of runtime calls could benefit from it too. More analysis will be needed. As an example of the speed-ups, this patch speeds-up compilation of Whizard compare_amplitude_UFO.F90 from 5 mins to 2 mins on my machine (there is still room for speed-ups).	2024-04-02 14:29:29 +02:00
jeanPerier	2d14ea68b8	[flang][NFC] speed-up external name conversion pass (#86814 ) The ExternalNameConversion pass can be surprisingly slow on big programs. On an example with a 50kloc Fortran file with about 10000 calls to external procedures, the pass alone took 25s on my machine. This patch reduces this to 0.16s. The root cause is that using `replaceAllSymbolUses` on each modified FuncOp is very expensive: it is walking all operations and attribute every time. An alternative would be to use mlir::SymbolUserMap to avoid walking the module again and again, but this is still much more expensive than what is needed because it is essentially caching all symbol uses of the module, and there is no need to such caching here. Instead: - Do a shallow walk of the module (only top level operation) to detect FuncOp/GlobalOp that needs to be updated. Update them and place the name remapping in a DenseMap. - If any remapping were done, do a single deep walk of the module operation, and update any SymbolRefAttr that matches a name that was remapped.	2024-04-02 10:22:03 +02:00
Tom Eccles	1f1e0948f2	[flang] run CFG conversion on omp reduction declare ops (#84953 ) Most FIR passes only look for FIR operations inside of functions (either because they run only on func.func or they run on the module but iterate over functions internally). But there can also be FIR operations inside of fir.global, some OpenMP and OpenACC container operations. This has worked so far for fir.global and OpenMP reductions because they only contained very simple FIR code which doesn't need most passes to be lowered into LLVM IR. I am not sure how OpenACC works. In the long run, I hope to see a more systematic approach to making sure that every pass runs on all of these container operations. I will write an RFC for this soon. In the meantime, this pass duplicates the CFG conversion pass to also run on omp reduction operations. This is similar to how the AbstractResult pass is already duplicated for fir.global operations. OpenMP array reductions 2/6 Previous PR: https://github.com/llvm/llvm-project/pull/84952 Next PR: https://github.com/llvm/llvm-project/pull/84954 --------- Co-authored-by: Mats Petersson <mats.petersson@arm.com>	2024-03-20 09:47:49 +00:00
Krzysztof Parzyszek	c4a89f1538	[flang][OpenMP] Fix use-after-free in OMPFunctionFiltering (#84373 ) When walking over functions (in pre-order), if the function being visited needs to be erased, skip visiting its regions. This was detected by address sanitizer.	2024-03-08 09:15:59 -06:00
Matthias Springer	354deba10a	[flang] Fix use-after-free in `MemoryAllocation.cpp` (#83768 ) `AllocaOpConversion` takes an `ArrayRef<Operation >`, but the underlying `SmallVector<Operation >` was dead by the time the pattern ran.	2024-03-04 15:54:22 +09:00
David Green	2a95fe481d	[Flang] Allow Intrinsic simpification with min/maxloc dim and scalar result (#81619 ) This makes an adjustment to the existing fir minloc/maxloc generation code to handle functions with a dim=1 that produce a scalar result. This should allow us to get the same benefits as the existing generated minmax reductions. This is a recommit of #76194 with an extra alteration to the end of genRuntimeMinMaxlocBody to make sure we convert the output array to the correct type (a `box<heap<i32>>`, not `box<heap<array<1xi32>>>`) to prevent writing the wrong type of box into it. This still allocates the data as a `array<1xi32>`, converting it into a i32 assuming that is safe. An alternative would be to allocate the data as a i32 and change more of the accesses to it throughout genRuntimeMinMaxlocBody.	2024-03-02 14:39:59 +00:00
David Green	7242896233	[Flang] Attempt to fix Nan handling in Minloc/Maxloc intrinsic simplification (#82313 ) In certain case "extreme" values like Nan, Inf and 0xffffffff could lead to generating different code via the inline-generated intrinsics vs the versions in the runtimes (and other compilers like gfortran). There are some examples I was using for testing in https://godbolt.org/z/x4EfqEss5. This changes the generation for the intrinsics to be more like the runtimes, using a condition that is similar to: isFirst \|\| (prev != prev && elem == elem) \|\| elem < prev The middle part is only used for floating point operations, and checks if the values are Nan. This should then hopefully make the logic closer to - return the first element with the lowest value, with Nans ignored unless there are only Nans. The initial limit value for floats are also changed from the largest float to Inf, to make sure it is handled correctly. The integer reductions are also changed to use a similar scheme to make sure they work with masked values. This means that the preamble after the loop can be removed.	2024-02-21 09:31:29 +00:00
David Green	815a846552	[Flang] Move genMinMaxlocReductionLoop to Transforms/Utils.cpp (#81380 ) This is one option for attempting to move genMinMaxlocReductionLoop to a better location. It moves it into Transforms and makes HLFIRTranforms depend upon FIRTransforms. It passes a build locally, both with and without -DBUILD_SHARED_LIBS, and does OK on the windows CI.	2024-02-13 08:31:07 +00:00
Valentin Clement (バレンタインクレメン)	7d9c38a040	[flang][NFC] Remove hardcoded attr name for fir.global op (#81347 ) These hardcoded attribute name are a leftover from the upstreaming period when there was no way to get the attribute name without an instance of the operation. It is since possible to do without them and they should be removed to avoid duplication. This PR cleanup the fir.global op of these hardcoded attribute name and use their generated getters. Some other PRs will follow to cleanup other operations.	2024-02-12 08:56:30 -08:00
Alex Bradbury	22544e2a54	[flang] Set fast math related function attributes for -Ofast/-ffast-math (#79301 ) The implemented logic matches the logic used for Clang in emitting these attributes. Although it's hoped that function attributes won't be needed in the future (vs using fast math flags in individual IR instructions), there are codegen differences currently with/without these attributes, as can be seen in issues like #79257 or by hacking Clang to avoid producing these attributes and observing codegen changes.	2024-02-05 19:39:12 +00:00
agozillon	95fe47ca7e	[Flang][OpenMP] Initial mapping of Fortran pointers and allocatables for target devices (#71766 ) This patch seeks to add an initial lowering for pointers and allocatable variables captured by implicit and explicit map in Flang OpenMP for Target operations that take map clauses e.g. Target, Target Update. Target Exit/Enter etc. Currently this is done by treating the type that lowers to a descriptor (allocatable/pointer/assumed shape) as a map of a record type (e.g. a structure) as that's effectively what descriptor types lower to in LLVM-IR and what they're represented as in the Fortran runtime (written in C/C++). The descriptor effectively lowers to a structure containing scalar and array elements that represent various aspects of the underlying data being mapped (lower bound, upper bound, extent being the main ones of interest in most cases) and a pointer to the allocated data. In this current iteration of the mapping we map the structure in it's entirety and then attach the underlying data pointer and map the data to the device, this allows most of the required data to be resident on the device for use. Currently we do not support the addendum (another block of pointer data), but it shouldn't be too difficult to extend this to support it. The MapInfoOp generation for descriptor types is primarily handled in an optimization pass, where it expands BoxType (descriptor types) map captures into two maps, one for the structure (scalar elements) and the other for the pointer data (base address) and links them in a Parent <-> Child relationship. The later lowering processes will then treat them as a conjoined structure with a pointer member map.	2024-02-05 18:45:07 +01:00
Slava Zakharin	3e47e75feb	[flang] Use DataLayout for computing type size in LoopVersioning. (#79778 ) The existing type size computation in LoopVersioning does not work for REAL*10, because the compute element size is 10 bytes, which violates the power-of-two assertion. We'd better use the DataLayout for computing the storage size of each element of an array of the given type.	2024-01-29 09:14:47 -08:00
David Green	202917f86e	[Flang] Move genMinMaxlocReductionLoop to a common location. The shared library build doesn't like references of genMinMaxlocReductionLoop, in Optimizer/Transforms, from HLFIR/Optimizer/Transforms. For the moment I've moved the code to the header file where it can be shared, like other methods in Utils.h	2024-01-25 13:31:18 +00:00
David Green	223d3dabc8	[Flang] Minloc elemental intrinsic lowering (#74828 ) Currently the lowering of a minloc intrinsic with a mask will look something like: %e = hlfir.elemental %shape ({ ... }) %m = hlfir.minloc %array mask %e hlfir.assign %m to %result hlfir.destroy %m The elemental will be expanded into a temporary+loop, the minloc into a FortranAMinloc call (which hopefully gets simplified to a specialized call that can be inlined at the call site), and the assign might get expanded to a FortranAAssign. It would be better to generate the entire construct as single loop if we can - one that performs the minloc calculation with the mask elemental computed inline. This patch attempt to do that, adding a hlfir version of the expansion code from SimplifyIntrinsics that turns an minloc+elemental into a single combined loop nest. It attempts to reuse the methods in genMinlocReductionLoop for constructing the loop with a modified loop body. The declaration for the function is currently in Optimizer/Support/Utils.h, but there might be a better place for it. It is added as part of the OptimizedBufferizationPass, like the similar count/any/all that have been added recently.	2024-01-25 12:17:12 +00:00
jeanPerier	27cfe7a07f	[flang] Set assumed-size last extent to -1 (#79156 ) Currently lowering sets the extents of assumed-size array to "undef" which was OK as long as the value was not expected to be read. But when interfacing with the runtime and when passing assumed-size to assumed-rank, this last extent may be read and must be -1 as specified in the BIND(C) case in 18.5.3 point 5. Set this value to -1, and update all the lowering code that was looking for an undef defining op to identify assumed-size: much safer to propagate and use semantic info here, the previous check actually did not work if the array was used in an internal procedure (defining op not visible anymore). @clementval and @agozillon, I left assumed-size extent to zero in the acc/omp bounds op as it was, please double check that is what you want (I can imagine -1 may create troubles here, and 0 makes some sense as it would lead to no data transfer). This also allows removing special cases in UBOUND/LBOUND lowering. Also disable allocation of cray pointee. This was never intended and would now lead to crashes with the -1 value for assumed-size cray pointee.	2024-01-24 13:23:55 +01:00
David Green	49212d1601	[Flang] Fix for replacing loop uses in LoopVersioning pass (#77899 ) The added test case has a loop that is versioned, which has a use of the loop in an if block after the loop. The current code replaces all uses of the loop with the new version If, but only if the parent blocks match. As far as I can see it should be safe to replace all the uses, then construct the result for the If with op.op.	2024-01-20 22:16:05 +00:00
Sergio Afonso	2747193058	[Flang][MLIR][OpenMP] Remove the early outlining interface (#78450 ) After the removal of the OpenMP early outlining MLIR pass in #67319, the `EarlyOutliningInterface` stopped doing any useful work. It used to be necessary to tie the name of the function from which a target region was outlined to that new function, so it would be used when translating to LLVM IR in place of the outlined function's name. This is not necessary anymore, so this patch removes all references to this interface and uses of the `omp.outline_parent_name` discardable attribute in tests.	2024-01-18 15:33:43 +00:00
Matthias Springer	5fcf907b34	[mlir][IR] Rename "update root" to "modify op" in rewriter API (#78260 ) This commit renames 4 pattern rewriter API functions: * `updateRootInPlace` -> `modifyOpInPlace` * `startRootUpdate` -> `startOpModification` * `finalizeRootUpdate` -> `finalizeOpModification` * `cancelRootUpdate` -> `cancelOpModification` The term "root" is a misnomer. The root is the op that a rewrite pattern matches against (https://mlir.llvm.org/docs/PatternRewriter/#root-operation-name-optional). A rewriter must be notified of all in-place op modifications, not just in-place modifications of the root (https://mlir.llvm.org/docs/PatternRewriter/#pattern-rewriter). The old function names were confusing and have contributed to various broken rewrite patterns. Note: The new function names use the term "modify" instead of "update" for consistency with the `RewriterBase::Listener` terminology (`notifyOperationModified`).	2024-01-17 11:08:59 +01:00
David Green	4056287d3a	[Flang] Clean up LoopVersioning LLVM_DEBUG blocks. NFC (#77818 ) Just a little trick to put LLVM_DEBUG blocks into separate { } scopes, so they clang-format better.	2024-01-15 11:23:50 +00:00
Christian Ulmann	fa5255eee2	[MLIR][LLVM] Enable export of DISubprograms on function declarations (#78026 ) This commit changes the MLIR to LLVMIR export to also attach subprogram debug attachements to function declarations. This commit additonally fixes the two passes that produce subprograms to not attach the "Definition" flag to function declarations. This otherwise results in invalid LLVM IR.	2024-01-15 07:34:13 +01:00
Christian Ulmann	bae1fdea71	[MLIR][LLVM] Add distinct identifier to the DISubprogram attribute (#77093 ) This commit adds an optional distinct attribute parameter to the DISubprogramAttr. This enables modeling of distinct subprograms, as required for LLVM IR. This change is required to avoid accidential uniquing of subprograms on functions that would lead to invalid LLVM IR post export.	2024-01-08 08:25:30 +01:00
Christian Ulmann	b3037ae1fc	[MLIR][LLVM] Add distinct identifier to DICompileUnit attribute (#77070 ) This commit adds a distinct attribute parameter to the DICompileUnit to enable the modeling of distinctness. LLVM requires DICompileUnits to be distinct and there are cases where one gets two equivalent compilation units but LLVM still requires differentiates them. We observed such cases for combinations of LTO and inline functions. This patch also changes the DIScopeForLLVMFuncOp pass to a module pass, to ensure that only one distinct DICompileUnit is created, instead of one for each function.	2024-01-08 07:42:33 +01:00
Pete Steinfeld	4f59a38821	Revert #76194 (#76987 ) [Flang] Revert "Allow Intrinsic simpification with min/maxloc dim and…scalar result (#76194)" This reverts commit `9b7cf5bfb0`. See merge request #76194. This change was causing several failures in our internal tests. I'm reverting now and will work on creating a test that David Green can use to reproduce the problem.	2024-01-04 10:19:50 -08:00
David Green	9b7cf5bfb0	[Flang] Allow Intrinsic simpification with min/maxloc dim and scalar result (#76194 ) This makes an adjustment to the existing fir minloc/maxloc generation code to handle functions with a dim=1 that produce a scalar result. This should allow us to get the same benefits as the existing generated minmax reductions. This is a recommit of #75820 with the typename added to the generated function.	2024-01-02 11:09:18 +00:00
Radu Salavat	0487377382	[flang] Pass to add frame pointer attribute (#74598 ) Pass to add frame pointer attribute in Flang	2023-12-28 15:41:27 +00:00
Pete Steinfeld	0cf3af0c51	Revert "[Flang] Allow Intrinsic simpification with min/maxloc dim and… (#76184 ) … scalar result. (#75820)" This reverts commit `701f647905`. The commit breaks some uses of the 'maxloc' intrinsic. See PR #75820	2023-12-21 13:14:05 -08:00
Kazu Hirata	c50de57feb	[flang] Fix a warning This patch fixes: flang/lib/Optimizer/Transforms/StackArrays.cpp:452:7: error: ignoring return value of function declared with 'nodiscard' attribute [-Werror,-Wunused-result]	2023-12-21 10:30:36 -08:00
David Green	701f647905	[Flang] Allow Intrinsic simpification with min/maxloc dim and scalar result. (#75820 ) This makes an adjustment to the existing fir minloc/maxloc generation code to handle functions with a dim=1 that produce a scalar result. This should allow us to get the same benefits as the existing generated minmax reductions.	2023-12-20 12:12:12 +00:00
David Green	9bb47f7f8b	[Flang] Add Maxloc to fir simplify intrinsics pass (#75463 ) This takes the code from D144103 and extends it to maxloc, to allow the simplifyMinMaxlocReduction method to work with both min and max intrinsics by switching condition and limit/initial value.	2023-12-18 07:59:51 +00:00
Kazu Hirata	11efccea8f	[flang] Use StringRef::{starts,ends}_with (NFC) This patch replaces uses of StringRef::{starts,ends}with with StringRef::{starts,ends}_with for consistency with std::{string,string_view}::{starts,ends}_with in C++20. I'm planning to deprecate and eventually remove StringRef::{starts,ends}with.	2023-12-13 23:48:53 -08:00
Tom Eccles	ba3d0241e2	[flang] Record the original name of a function during ExternalNameCoversion (#74065 ) We pass TBAA alias information with separate TBAA trees per function (to prevent incorrect alias information after inlining). These TBAA trees are identified by a unique string per function. Naturally, we use the mangled name of the function. TBAA tags are added in two places: during a dedicated pass relatively early (structured control flow makes fir::AliasAnalysis more accurate), then again during CodeGen (when implied box loads and stores become visible). In between these two passes, the ExternalNameConversion pass changes the name of some functions. These functions with changed names previously ended up with separate TBAA trees from the TBAA tags pass and from CodeGen - leading LLVM to think that all data accesses alias with all descriptor accesses. This patch solves this by storing the original name of a function in an attribute during the ExternalNameConversion pass, and using the name from that attribute when creating TBAA trees during CodeGen.	2023-12-03 20:37:10 +00:00
Valentin Clement	208a4510d4	[flang][NFC] Fix typo	2023-11-17 10:54:45 -08:00
Akash Banerjee	8701b178e0	[MLIR][OpenMP] Changes to function-filtering pass (#71850 ) Currently, when deleting the device functions in the second stage of filtering during MLIR to LLVM translation we can end up with invalid calls to these functions. This is because of the removal of the EarlyOutliningPass which would have otherwise gotten rid of any such calls. This patch aims to alter the function filtering pass in the following way: - Any host function is completely removed. - Call to the host function are also removed and their uses replaced with Undef values. - Any host function with target region code is marked to be removed during the the second stage. - Calls to such functions are still removed and their uses replaced with Undef values. Co-authored-by: Sergio Afonso <sergio.afonsofumero@amd.com>	2023-11-14 12:43:31 +00:00
Akash Banerjee	63752399f8	[OpenMP][MLIR]OMPEarlyOutliningPass removal This patch removes the OMPEarlyOutliningPass as it is no longer required. The implicit map operand capture has now been moved to the PFT lowering stage. Depends on #67318.	2023-11-06 13:24:02 +00:00
Tom Eccles	e215324185	[flang][StackArrays] skip analysis of very large functions (#71047 ) The stack arrays pass uses data flow analysis to determine whether heap allocations are freed on all paths out of the function. `interp_domain_em_part2` in spec2017 wrf generates over 120k operations, including almost 5k fir.if operations and over 200 fir.do_loop operations, all in the same function. The MLIR data flow analysis framework cannot provide reasonable performance for such cases because there is a combinatorial explosion in the number of control flow paths through the function, all of which must be checked to determine if the heap allocations will be freed. This patch skips the stack arrays pass for ridiculously large functions (defined as having more than 1000 fir.allocmem operations). This threshold is configurable at runtime with a command line argument. With this patch, compiling this file is more than 80% faster.	2023-11-03 10:29:33 +00:00
Tom Eccles	6242c8ca18	[flang] add TBAA tags to global and direct variables These turn out to be useful for spec2017/fotonik3d and safe so long as they are not used along side TBAA tags for local allocations. LLVM may be able to figure out local allocations by itself anyway. PR #68727	2023-10-25 10:47:51 +00:00
Sergio Afonso	4b15c0ed0a	[Flang][HLFIR][OpenMP] Fix offloading tests broken by HLFIR (#69457 ) This patch makes changes to the early outlining pass to avoid compiler crashes due to not handling `hlfir.declare` operations correctly. That pass is intended to eventually be removed (#67319), but in the meantime this fixes some issues arising in different parts of the OpenMP offloading compilation process. The main changes included in this patch are the following: - Added support for mapped values defined by an `hlfir.declare` operation. These operations are now kept in outlined target functions, so that both of their outputs (base and original base) are available to the corresponding `omp.target`'s map arguments and region. - Added a fix by @agozillon to prevent unused map clauses from producing a compiler crash. All these unused mapped variables are added to the outlined function's inputs. - Added a fix to the OpenMP translation to MLIR to support integer arguments to these outlined functions. This enables successfully compiling and running the tests in opemp/libomptarget/test/offloading/fortran using HLFIR. Co-authored-by: agozillon <Andrew.Gozillon@amd.com>	2023-10-23 17:40:55 +02:00
Mats Petersson	8dcee5800c	[flang]Check for dominance in loop versioning (#68797 ) This avoids trying to version loops that can't be versioned, and thus avoids hitting an assert. Co-authored with Slava Zakharin (who provided the test-code).	2023-10-12 13:07:16 +01:00
Tom Eccles	c0f453c023	[flang] add missing dependency FIRTransforms -> FIRAnalysis	2023-10-11 15:36:47 +00:00
Tom Eccles	df5c27869c	[flang][FIR] add FIR TBAA pass See RFC at https://discourse.llvm.org/t/rfc-propagate-fir-alias-analysis-information-using-tbaa/73755 This pass adds TBAA tags to all accesses to non-pointer/target dummy arguments. These TBAA tags tell LLVM that these accesses cannot alias: allowing better dead code elimination, hoisting out of loops, and vectorization. Each function has its own TBAA tree so that accesses between funtions MayAlias after inlining. I also included code for adding tags for local allocations and for global variables. Enabling all three kinds of tag is known to produce a miscompile and so these are disabled by default. But it isn't much code and I thought it could be interesting to play with these later if one is looking at a benchmark which looks like it would benefit from more alias information. I'm open to removing this code too. TBAA tags are also added separately by TBAABuilder during CodeGen. TBAABuilder has to run during CodeGen because it adds tags to box accesses, many of which are implicit in FIR. This pass cannot (easily) run in CodeGen because fir::AliasAnalysis has difficulty tracing values between blocks, and by the time CodeGen runs, structured control flow has already been lowered. Coming in follow up patches - Change CodeGen/TBAABuilder to use TBAAForest to add tags within the same per-function trees as are used here (delayed to a later patch to make it easier to revert) - Command line argument processing to actually enable the pass	2023-10-11 14:29:47 +00:00
jeanPerier	4ccd57ddb1	[flang][nfc] replace fir.dispatch_table with more generic fir.type_info (#68309 ) The goal is to progressively propagate all the derived type info that is currently in the runtime type info globals into a FIR operation that can be easily queried and used by FIR/HLFIR passes. When this will be complete, the last step will be to stop generating the runtime info global in lowering, but to do that later in or just before codegen to keep the FIR files readable (on the added type-info.f90 tests, the lowered runtime info globals takes a whooping 2.6 millions characters on 1600 lines of the FIR textual output. The fir.type_info that contains all the info required to generate those globals for such "trivial" types takes 1721 characters on 9 lines). So far this patch simply starts by replacing the fir.dispatch_table operation by the fir.type_info operation and to add the noinit/ nofinal/nodestroy flags to it. These flags will soon be used in HLFIR to better rewrite hlfir.assign with derived types.	2023-10-06 09:29:57 +02:00
Mats Petersson	6180964a01	[flang]Pass to add vscale range attribute (#68103 ) Add vscale range attirbute for the Scalable Vector Extension (SVE) if provided on the command-line (options in a previous commit) If no command-line option is provided, if the target-feature of SVE is specified and the architecture is AArch64, it defualts to 128-2048. in other words a vscale-min of 1, vscale-max of 16. A pass is used to add the atribute to all functions. The vectorizer will use this attribute to generate the SVE instruction to match the range specified. The attribute is harmless if there is no vectorizable operations in the function.	2023-10-05 11:06:00 +01:00
Andrew Gozillon	171d8c4028	[Flang][OpenMP][MLIR] Fix memory leak caused by D149368 causing sanitizer error and fix iterator invalidation error This patch fixes two issues introduced by the D149368 patch, one is a memory leak from using the removeFromParent rather than eraseFromParent (the erase also had to be moved to not create use after deletes). And the other is a possible iterator invalidation bug, better to be safe than sorry.	2023-09-20 22:28:11 -05:00
Andrew Gozillon	76916669b9	[MLIR][OpenMP] Initial Lowering of Declare Target for Data This patch adds initial lowering for DeclareTargetAttr on GlobalOp's utilising registerTargetGlobalVariable and getAddrOfDeclareTargetVar from the OMPIRBuilder. It also adds initial processing of declare target map operands, populating the combinedInfo that the OMPIRBuilder requires to generate kernels and it's kernel argument structure. The combination of these additions allows simple mapping of declare target globals to Target regions, as such a simple runtime test showcasing this and testing it has been added. The patch currently does not factor in filtering based on device_type clauses (e.g. no emission of globals for device if host specified), this will come in a future iteration. And for the moment it's only been tested with 1-D arrays and basic fortran data types, more complex types (such as user defined derived types from Fortran, allocatables or Fortran pointers) may need further work. reviewers: kiranchandramohan, skatrak Differential Revision: https://reviews.llvm.org/D149368	2023-09-20 13:31:15 -05:00
jeanPerier	1062c140f8	[flang] Prevent IR name clashes between BIND(C) and external procedures (#66777 ) Defining a procedure with a BIND(C, NAME="...") where the binding label matches the assembly name of a non BIND(C) external procedure in the same file causes a failure when generating the LLVM IR because of the assembly symbol name clash. Prevent this crash with a clearer semantic error.	2023-09-20 10:00:28 +02:00
Andrew Gozillon	eaa0d281b6	[Flang][MLIR][OpenMP] Update OMPEarlyOutlining to support Bounds, MapEntry and declare target globals This patch is a required change for the device side IR to maintain apporpiate links for declare target variables to their global variables for later lowering. It is also a requirement to clone over map bounds and entry operations to maintain the correct information for later lowering of the IR. It simply tries to clone over the relevant information maintaining the appropriate links they would have maintained prior to the pass, rather than redirecting them to new function arguments which causes a loss of information in the case of Declare Target and map information. Depends on D158734 reviewers: TIFitis, razvanlupusoru Differential Revision: https://reviews.llvm.org/D158735	2023-09-19 08:26:46 -05:00
Slava Zakharin	7beb65ae2d	[flang] Fixed LoopVersioning for array slices. (#65703 ) The first test case added in the LIT test demonstrates the problem. Even though we did not consider the inner loop as a candidate for the transformation due to the array_coor with a slice, we decided to version the outer loop for the same function argument. During the cloning of the outer loop we dropped the slicing completely producing invalid code. I restructured the code so that we record all arg uses that cannot be transformed (regardless of the reason), and then fixup the usage information across the loop nests. I also noticed that we may generate redundant contiguity checks for the inner loops, so I fixed it since it was easy with the new way of keeping the usage data.	2023-09-08 09:01:10 -07:00
jeanPerier	6ffea74f7c	[flang] Use BIND name, if any, when consolidating common blocks (#65613 ) This patch changes how common blocks are aggregated and named in lowering in order to: * fix one obvious issue where BIND(C) and non BIND(C) with the same Fortran name were "merged" * go further and deal with a derivative where the BIND(C) C name matches the assembly name of a Fortran common block. This is a bit unspecified IMHO, but gfortran, ifort, and nvfortran "merge" the common block without complaints as a linker would have done. This required getting rid of all the common block mangling early in FIR (\_QC) instead of leaving that to the phase that emits LLVM from FIR because BIND(C) common blocks did not have mangled names. Care has to be taken to deal with the underscoring option of flang-new. See added flang/test/Lower/HLFIR/common-block-bindc-conflicts.f90 for an illustration.	2023-09-08 10:43:55 +02:00
Tom Eccles	ad9af7de90	[flang][LoopVersioning] support fir.array_coor This is the last piece required for the loop versioning patch to work on code lowered via HLFIR. With this patch, HLFIR performance on spec2017 roms is now similar to the FIR lowering. Adding support for fir.array_coor means that many more loops will be versioned, even in the FIR lowering. So far as I have seen, these do not seem to have an impact on performance for the benchmarks I tried, but I expect it would speed up some programs, if the loop being versioned happened to be the hot code. The main difference between fir.array_coor and fir.coordinate_of is that fir.coordinate_of uses zero-based indices, whereas fir.array_coor uses the indices as specified in the Fortran program (starting from 1 by default, but also supporting non default lower bounds). I opted to transform fir.array_coor operations into fir.coordinate_of operations because this allows both to share the same offset calculation logic. The tricky bit of this patch is getting the correct lower bounds for the array operand to subtract from the fir.array_coor indices to get a zero-based indices. So far as I can tell, the FIR lowering will always provide lower bounds (shift) information in the shape operand to the fir.array_coor when non-default lower bounds are used. If none is given, I originally tried falling back to reading lower bounds from the box, but this led to misscompilation in SPEC2017 cam4. Therefore the pass instead assumes that if it can't already find an SSA value for the shift information, the default lower bound (1) should be used. A suspect the incorrect lower bounds in the box for the FIR lowering was already a known issue (see https://reviews.llvm.org/D158119). Differential Revision: https://reviews.llvm.org/D158597	2023-09-04 10:40:40 +00:00

1 2 3 4 5

226 Commits