clang-p2996

Author	SHA1	Message	Date
Mehdi Amini	369ce54bb3	Revert "[MLIR][GPU] Replace fdiv on fp16 with promoted (fp32) multiplication with reciprocal plus one (conditional) Newton iteration." This reverts commit `bcfc0a9051`. The build is broken with shared library enabled.	2022-06-04 08:35:45 +00:00
Christian Sigg	bcfc0a9051	[MLIR][GPU] Replace fdiv on fp16 with promoted (fp32) multiplication with reciprocal plus one (conditional) Newton iteration. This is correct for all values, i.e. the same as promoting the division to fp32 in the NVPTX backend. But it is faster (~10% in average, sometimes more) because: - it performs less Newton iterations - it avoids the slow path for e.g. denormals - it allows reuse of the reciprocal for multiple divisions by the same divisor Test program: ``` #include <stdio.h> #include "cuda_fp16.h" // This is a variant of CUDA's own __hdiv which is fast than hdiv_promote below // and doesn't suffer from the perf cliff of div.rn.fp32 with 'special' values. __device__ half hdiv_newton(half a, half b) { float fa = __half2float(a); float fb = __half2float(b); float rcp; asm("{rcp.approx.ftz.f32 %0, %1;\n}" : "=f"(rcp) : "f"(fb)); float result = fa * rcp; auto exponent = reinterpret_cast<const unsigned&>(result) & 0x7f800000; if (exponent != 0 && exponent != 0x7f800000) { float err = __fmaf_rn(-fb, result, fa); result = __fmaf_rn(rcp, err, result); } return __float2half(result); } // Surprisingly, this is faster than CUDA's own __hdiv. __device__ half hdiv_promote(half a, half b) { return __float2half(__half2float(a) / __half2float(b)); } // This is an approximation that is accurate up to 1 ulp. __device__ half hdiv_approx(half a, half b) { float fa = __half2float(a); float fb = __half2float(b); float result; asm("{div.approx.ftz.f32 %0, %1, %2;\n}" : "=f"(result) : "f"(fa), "f"(fb)); return __float2half(result); } __global__ void CheckCorrectness() { int i = threadIdx.x + blockIdx.x * blockDim.x; half x = reinterpret_cast<const half&>(i); for (int j = 0; j < 65536; ++j) { half y = reinterpret_cast<const half&>(j); half d1 = hdiv_newton(x, y); half d2 = hdiv_promote(x, y); auto s1 = reinterpret_cast<const short&>(d1); auto s2 = reinterpret_cast<const short&>(d2); if (s1 != s2) { printf("%f (%u) / %f (%u), got %f (%hu), expected: %f (%hu)\n", __half2float(x), i, __half2float(y), j, __half2float(d1), s1, __half2float(d2), s2); //__trap(); } } } __device__ half dst; __global__ void ProfileBuiltin(half x) { #pragma unroll 1 for (int i = 0; i < 10000000; ++i) { x = x / x; } dst = x; } __global__ void ProfilePromote(half x) { #pragma unroll 1 for (int i = 0; i < 10000000; ++i) { x = hdiv_promote(x, x); } dst = x; } __global__ void ProfileNewton(half x) { #pragma unroll 1 for (int i = 0; i < 10000000; ++i) { x = hdiv_newton(x, x); } dst = x; } __global__ void ProfileApprox(half x) { #pragma unroll 1 for (int i = 0; i < 10000000; ++i) { x = hdiv_approx(x, x); } dst = x; } int main() { CheckCorrectness<<<256, 256>>>(); half one = __float2half(1.0f); ProfileBuiltin<<<1, 1>>>(one); // 1.001s ProfilePromote<<<1, 1>>>(one); // 0.560s ProfileNewton<<<1, 1>>>(one); // 0.508s ProfileApprox<<<1, 1>>>(one); // 0.304s auto status = cudaDeviceSynchronize(); printf("%s\n", cudaGetErrorString(status)); } ``` Reviewed By: herhut Differential Revision: https://reviews.llvm.org/D126158	2022-06-04 08:03:29 +02:00
wren romano	3cf03f1c56	[mlir][sparse] Adding IsSparseTensorPred and updating ops to use it Reviewed By: aartbik Differential Revision: https://reviews.llvm.org/D126994	2022-06-03 17:15:31 -07:00
Diego Caballero	9a79b1b04c	[mlir] Add peeling xform to Codegen Strategy This patch adds the knobs to use peeling in the codegen strategy infrastructure. Reviewed By: springerm Differential Revision: https://reviews.llvm.org/D126842	2022-06-03 21:31:43 +00:00
Krzysztof Drewniak	95aff23e29	Re-land "[mlir] Add integer range inference analysis"" This reverts commit `4e5ce2056e`. This relands commit `1350c9887d`. Reinstates the range analysis with the build issue fixed. Differential Revision: https://reviews.llvm.org/D126926	2022-06-03 17:13:48 +00:00
Nicolas Vasilache	72de7588cc	[mlir][SCF] Add bufferization hook for scf.foreach_thread and terminator. `scf.foreach_thread` results alias with the underlying `scf.foreach_thread.parallel_insert_slice` destination operands and they bufferize to equivalent buffers in the absence of other conflicts. `scf.foreach_thread.parallel_insert_slice` conflict detection is similar to `tensor.insert_slice` conflict detection. Reviewed By: springerm Differential Revision: https://reviews.llvm.org/D126769	2022-06-03 07:14:05 +00:00
Thomas Raoux	205c08b54d	[mlir][scf] Add option to loop pipelining to not peel the epilogue Add an option to predicate the epilogue within the kernel instead of peeling the epilogue. This is a useful option to prevent generating large amount of code for deep pipeline. This currently require a user lamdba to implement operation predication. Differential Revision: https://reviews.llvm.org/D126753	2022-06-03 04:20:20 +00:00
River Riddle	ee1cf1f645	[mlir][NFC] Simplify the various `parseSourceFile<T>` overloads These effectively all share the same implementation, i.e. forward to the non-templated overload and then construct the container op.	2022-06-02 19:18:55 -07:00
River Riddle	bf352e0b2e	[mlir:PDLL] Add better support for providing Constraint/Pattern/Rewrite documentation This commit enables providing long-form documentation more seamlessly to the LSP by revamping decl documentation. For ODS imported constructs, we now also import descriptions and attach them to decls when possible. For PDLL constructs, the LSP will now try to provide documentation by parsing the comments directly above the decls location within the source file. This commit also adds a new parser flag `enableDocumentation` that gates the import and attachment of ODS documentation, which is unnecessary in the normal build process (i.e. it should only be used/consumed by tools). Differential Revision: https://reviews.llvm.org/D124881	2022-06-02 16:31:07 -07:00
Arjun P	8bc2cff95a	[MLIR][Presburger] Simplex: remove redundant member vars nRow, nCol Reviewed By: ftynse Differential Revision: https://reviews.llvm.org/D126790	2022-06-03 00:30:48 +01:00
Mehdi Amini	4e5ce2056e	Revert "[mlir] Add integer range inference analysis" This reverts commit `1350c9887d`. Shared library build is broken with undefined references.	2022-06-02 21:24:06 +00:00
Krzysztof Drewniak	1350c9887d	[mlir] Add integer range inference analysis This commit defines a dataflow analysis for integer ranges, which uses a newly-added InferIntRangeInterface to compute the lower and upper bounds on the results of an operation from the bounds on the arguments. The range inference is a flow-insensitive dataflow analysis that can be used to simplify code, such as by statically identifying bounds checks that cannot fail in order to eliminate them. The InferIntRangeInterface has one method, inferResultRanges(), which takes a vector of inferred ranges for each argument to an op implementing the interface and a callback allowing the implementation to define the ranges for each result. These ranges are stored as ConstantIntRanges, which hold the lower and upper bounds for a value. Bounds are tracked separately for the signed and unsigned interpretations of a value, which ensures that the impact of arithmetic overflows is correctly tracked during the analysis. The commit also adds a -test-int-range-inference pass to test the analysis until it is integrated into SCCP or otherwise exposed. Finally, this commit fixes some bugs relating to the handling of region iteration arguments and terminators in the data flow analysis framework. Depends on D124020 Depends on D124021 Reviewed By: rriddle, Mogball Differential Revision: https://reviews.llvm.org/D124023	2022-06-02 20:24:11 +00:00
Aart Bik	bf7dbc2a30	[mlir][sparse][bufferization] fix doc on new init operation The example was still using the -now- removed sparse_tensor.init_tensor. Also, I made the input operands of the matrix multiplication sparse too (since it looks a bit strange to multiply two dense matrices into a sparse). Reviewed By: bixia Differential Revision: https://reviews.llvm.org/D126897	2022-06-02 12:04:36 -07:00
Alex Zinenko	ce2e198bc2	[mlir] add decompose and generalize to structured transform ops These ops complement the tiling/padding transformations by transforming higher-level named structured operations such as depthwise convolutions into lower-level and/or generic equivalents that are better handled by some downstream transformations. Differential Revision: https://reviews.llvm.org/D126698	2022-06-02 15:25:18 +02:00
Nicolas Vasilache	311967701a	[mlir][SCF] Add scf.foreach_thread.parallel_insert_slice canonicalization. Reviewed By: ftynse Differential Revision: https://reviews.llvm.org/D126761	2022-06-02 11:53:25 +00:00
jacquesguan	19e285477e	[mlir][Arithmetic] Add constant folder for RemF. This patch adds the constant folder for RemF. Differential Revision: https://reviews.llvm.org/D126045	2022-06-02 06:24:37 +00:00
Matthias Springer	6232a8f3d6	[mlir][sparse][NFC] Switch InitOp to bufferization::AllocTensorOp Now that we have an AllocTensorOp (previously InitTensorOp) in the bufferization dialect, the InitOp in the sparse dialect is no longer needed. Differential Revision: https://reviews.llvm.org/D126180	2022-06-02 00:03:52 +02:00
wren romano	b364c76683	[mlir][sparse] Using non-empty function name suffix for OverheadType::kIndex The trick of using an empty token in the `FOREVERY_O` x-macro relies on preprocessor behavior which is only standard since C99 6.10.3/4 and C++11 N3290 16.3/4 (whereas it was undefined behavior up through C++03 16.3/10). Since the `ExecutionEngine/SparseTensorUtils.cpp` file is required to be compile-able under C++98 compatibility mode (unlike the C++11 used elsewhere in MLIR), we shouldn't rely on that behavior. Also, using a non-empty suffix helps improve uniformity of the API, since all other primary/overhead suffixes are also non-empty. I'm using the suffix `0` since that's the value used by the `SparseTensorEncoding` attribute for indicating the index overhead-type. Depends On D126720 Reviewed By: aartbik Differential Revision: https://reviews.llvm.org/D126724	2022-06-01 14:18:42 -07:00
Rob Suderman	f3bdb56d61	[mlir][math] Add math.ctlz expansion to control flow + arith operations Ctlz is an intrinsic in LLVM but does not have equivalent operations in SPIR-V. Including a decomposition gives an alternative path for these platforms. Reviewed By: NatashaKnk Differential Revision: https://reviews.llvm.org/D126261	2022-06-01 11:45:04 -07:00
Stella Laurenzo	3bb7999339	[mlir] Add global_load and global_store ops to ml_program. * Adds simple, non-atomic, non-volatile, non-synchronized direct load/store ops. Differential Revision: https://reviews.llvm.org/D126230	2022-06-01 11:32:15 -07:00
Arjun P	ec145ba2a3	[MLIR][Presburger] Matrix: inline trivial accessors This resolves a comment from https://reviews.llvm.org/D126708 that was previously missed.	2022-06-01 16:56:46 +01:00
Arjun P	d5e31cf38a	[MLIR][Presburger] Move Matrix accessors inline This gives a 1.5x speedup on the Presburger unittests. Reviewed By: Groverkss Differential Revision: https://reviews.llvm.org/D126708	2022-06-01 16:51:42 +01:00
PeixinQiao	fe2cc16035	[NFC][MLIR] Fix -Wtype-limits warning Fix the warning: comparison of unsigned expression in ‘>= 0’ is always true. Reviewed By: kiranchandramohan, shraiysh Differential Revision: https://reviews.llvm.org/D126784	2022-06-01 23:42:07 +08:00
Nicolas Vasilache	59b273a166	[mlir][SCF] Add parallel abstraction on tensors. This revision adds `scf.foreach_thread` and other supporting abstractions that allow connecting parallel abstractions and tensors. Discussion is available [here](https://discourse.llvm.org/t/rfc-parallel-abstraction-for-tensors-and-buffers/62607). Reviewed By: ftynse Differential Revision: https://reviews.llvm.org/D126555	2022-06-01 09:16:01 +00:00
Nicolas Vasilache	beab8e871e	Revert "[mlir][SCF] Add parallel abstraction on tensors." This reverts commit `9b7193f852`. This is an older branch that was committed by mistake and does not include addressed review comments, an updated version will come next.	2022-06-01 09:04:20 +00:00
Nicolas Vasilache	9b7193f852	[mlir][SCF] Add parallel abstraction on tensors. This revision adds `scf.foreach_thread` and other supporting abstractions that allow connecting parallel abstractions and tensors. Discussion is available [here](https://discourse.llvm.org/t/rfc-parallel-abstraction-for-tensors-and-buffers/62607).	2022-06-01 09:02:16 +00:00
lewuathe	6d75c89783	[mlir][complex] Add tan op for complex dialect Add tangent operation for complex dialect. This is the follow-up change of https://reviews.llvm.org/D126521 Differential Revision: https://reviews.llvm.org/D126685	2022-06-01 09:20:42 +02:00
wren romano	a4c53f8cd6	[mlir][sparse] Factoring out SparseTensorFile class for readSparseTensorShape The primary goal of this change is to define readSparseTensorShape. Whereas the SparseTensorFile class is merely introduced as a way to reduce code duplication along the way. Depends On D126106 Reviewed By: aartbik Differential Revision: https://reviews.llvm.org/D126233	2022-05-31 13:24:28 -07:00
lorenzo chelini	850dbff708	[MLIR][Math] Improve docs (NFC) Remove boilerplate examples and add a text at the dialect level to describe what kind of operands the operations accept (i.e., scalar, tensor or vector). Left a shorter sentence describing the input operands for each operation as this redundancy is convenient when browsing the documentation using the website. Reviewed By: ftynse Differential Revision: https://reviews.llvm.org/D126648	2022-05-31 18:16:59 +02:00
jacquesguan	42c17073fc	[mlir] Support import llvm intrinsics. This patch supports to convert the llvm intrinsic to the corresponding op. It still leaves some intrinsics to be handled specially. Reviewed By: ftynse Differential Revision: https://reviews.llvm.org/D126639	2022-05-31 11:08:23 +00:00
River Riddle	1c2edb026e	[mlir:PDLL] Rework the C++ generation of native Constraint/Rewrite arguments and results The current translation uses the old "ugly"/"raw" form which used PDLValue for the arguments and results. This commit updates the C++ generation to use the recently added sugar that allows for directly using the desired types for the arguments and result of PDL functions. In addition, this commit also properly imports the C++ class for ODS operations, constraints, and interfaces. This allows for a much more convienent C++ API than previously granted with the raw/low-level types. Differential Revision: https://reviews.llvm.org/D124817	2022-05-30 17:35:34 -07:00
River Riddle	91b8d96fd1	[mlir:PDLL] Add proper support for operation result type inference This allows for the results of operations to be inferred in certain contexts, and matches the support in PDL for result type inference. The main two initial circumstances are when used as a replacement of another operation, or when the operation being created implements InferTypeOpInterface. Differential Revision: https://reviews.llvm.org/D124782	2022-05-30 17:35:33 -07:00
Mehdi Amini	940e290860	Apply clang-tidy fixes for performance-unnecessary-value-param in OneShotModuleBufferize.cpp (NFC)	2022-05-30 18:44:28 +00:00
Alex Zinenko	cc6c159203	[mlir] add VectorizeOp to structured transform ops Vectorization is a key transformation to achieve high performance on most architectures. In the transform dialect, vectorization is implemented as a parameterizable transform op. It currently applies to a scope of payload IR delimited by some isolated-from-above op, mainly because several enabling transformations (such as affine simplification) are needed to perform vectorization and these transformation would apply to ops other than the "main" computational payload op. A separate "navigation" transform op that obtains the isolated-from-above ancestor of an op is introduced in the core transform dialect. Even though it is currently only useful for vectorization, isolated-from-above ops are a common anchor for transformations (usually implemented as passes) that is likely to be reused in the future. Depends On D126374 Reviewed By: nicolasvasilache Differential Revision: https://reviews.llvm.org/D126542	2022-05-30 17:37:50 +02:00
Alex Zinenko	5cde5a5739	[mlir] add interchange, pad and scalarize to structured transform dialect Add ops to the structured transform extension of the transform dialect that perform interchange, padding and scalarization on structured ops. Along with tiling that is already defined, this provides a minimal set of transformations necessary to build vectorizable code for a single structured op. Define two helper traits: one that implements TransformOpInterface by applying a function to each payload op independently and another that provides a simple "functional-style" producer/consumer list of memory effects for the transform ops. Reviewed By: nicolasvasilache Differential Revision: https://reviews.llvm.org/D126374	2022-05-30 11:42:40 +02:00
Christian Sigg	bcf3d52486	[MLIR][GPU] Expose GpuParallelLoopMapping as non-test pass. Reviewed By: bondhugula, herhut Differential Revision: https://reviews.llvm.org/D126199	2022-05-30 09:20:48 +02:00
Groverkss	dac27da7b9	[MLIR][Presburger] Add applyDomain/Range to IntegerRelation This patch adds support for applying a relation on domain/range of a relation. Reviewed By: arjunp, ftynse Differential Revision: https://reviews.llvm.org/D126339	2022-05-29 02:06:11 +05:30
Matthias Springer	2f0a634c5e	[mlir][bufferization] Add extra filter mechanism to bufferizeOp Differential Revision: https://reviews.llvm.org/D126569	2022-05-28 04:49:23 +02:00
Matthias Springer	f470f8cbce	[mlir][bufferize][NFC] Split analysis+bufferization of ModuleBufferization Analysis and bufferization can now be run separately. Differential Revision: https://reviews.llvm.org/D126572	2022-05-28 04:43:50 +02:00
Matthias Springer	3490aadf56	[mlir][bufferization][NFC] Remove post-analysis step infrastructure Now that analysis and bufferization are better separated, post-analysis steps are no longer needed. Users can directly interleave analysis and bufferization as needed. Differential Revision: https://reviews.llvm.org/D126571	2022-05-28 04:37:13 +02:00
Matthias Springer	1534177f8f	[mlir][bufferization][NFC] Move OpFilter out of BufferizationOptions Differential Revision: https://reviews.llvm.org/D126568	2022-05-28 01:47:39 +02:00
Aart Bik	a5d7e2a8ac	[OpenMP][mlir] fix broken build Reviewed By: Mogball Differential Revision: https://reviews.llvm.org/D126556	2022-05-27 10:06:01 -07:00
PeixinQiao	042ae89556	[OpenMP] Support operation conversion to LLVM for threadprivate directive This supports the operation conversion for threadprivate directive. The support for memref type conversion is not implemented. Reviewed By: kiranchandramohan, shraiysh Differential Revision: https://reviews.llvm.org/D124610	2022-05-28 00:06:57 +08:00
Daniil Dudkin	52d79b04b2	[mlir][llvm] Fix compiler error on GCC 9 This patch fixes the following compiler error: error: declaration of ‘mlir::LLVM::cconv::CConv mlir::LLVM::detail::CConvAttrStorage::CConv’ changes meaning of ‘CConv’ [-fpermissive] CConv as a member variable name was shadowing CConv as an enumeration, hence the compiler error. Reviewed By: ftynse, alexbatashev Differential Revision: https://reviews.llvm.org/D126530	2022-05-27 15:33:43 +03:00
Groverkss	f168a65943	[MLIR][Presburger] Add intersectDomain/Range to IntegerRelation This patch adds support for intersection a set with a relation. Reviewed By: arjunp Differential Revision: https://reviews.llvm.org/D126328	2022-05-27 15:51:54 +05:30
Alexander Batashev	0252357b3e	[mlir][LLVM] Add support for Calling Convention in LLVMFuncOp This patch adds support for Calling Convention attribute in LLVM dialect, including enums, custom syntax and import from LLVM IR. Additionally fix import of dso_local attribute. Reviewed By: ftynse Differential Revision: https://reviews.llvm.org/D126161	2022-05-27 09:43:31 +03:00
wren romano	2046e11ac4	[mlir][sparse] Improving ExecutionEngine/SparseTensorUtils.h This change makes the public API of SparseTensorUtils.cpp explicit, whereas before the publicity of these functions was only implicit. Implicit publicity is sufficient for mlir-opt to generate calls to these functions, but it's not enough to enable C/C++ code to call them directly in the usual way (i.e., without going through codegen). Thus, leaving the publicity implicit prevents development of other tools (e.g., microbenchmarks). In addition this change also marks the functions MLIR_CRUNNERUTILS_EXPORT, which is required by the JIT under certain configurations (albeit not for anything in our test suite). Reviewed By: aartbik Differential Revision: https://reviews.llvm.org/D126105	2022-05-26 17:22:08 -07:00
Alex Zinenko	73c3dff1b3	[mlir] Use-after-free checker for the Transform dialect The Transform dialect uses the side effect modeling mechanism to record the effects of the transform ops on the mapping between Transform IR values and Payload IR ops. Introduce a checker pass that warns if a Transform IR value is used after it has been freed (consumed). This pass is mostly intended as a debugging aid in addition to the verification/assertion mechanisms in the transform interpreter. It reports all potential use-after-free situations. The implementation makes a series of simplifying assumptions to be simple and conservative. A more advanced implementation would rely on the data flow-like analysis associated with a side-effect resource rather than a value, which is currently not supported by the analysis infrastructure. Reviewed By: springerm Differential Revision: https://reviews.llvm.org/D126381	2022-05-26 12:28:41 +02:00
Matthias Springer	52698a33d0	[mlir][bufferization] Clean up imports and code comments Differential Revision: https://reviews.llvm.org/D126427	2022-05-26 05:48:52 +02:00
bixia1	a14057d4bd	[mlir][sparse] Add more complex operations. Support complex operations sqrt, expm1, and tanh. Add tests. Reviewed By: aartbik Differential Revision: https://reviews.llvm.org/D126393	2022-05-25 16:38:09 -07:00

1 2 3 4 5 ...

6171 Commits