clang-p2996

Author	SHA1	Message	Date
Adam Paszke	fbfff1caff	[MLIR][CAPI] Add C API dialect registration methods for Arith, Math, MemRef and Vector dialects Reviewed By: ftynse Differential Revision: https://reviews.llvm.org/D155450	2023-07-17 14:45:49 +00:00
Alex Zinenko	371366ce27	[mlir][nvgpu] add simple pipelining for shared memory copies Add a simple transform operation to the NVGPU extension that performs software pipelining of copies to shared memory. The functionality is extremely minimalistic in this version and only supports copies from global to shared memory inside an `scf.for` loop with either `vector.transfer` or `nvgpu.device_async_copy` operations when pipelining preconditions are already satisfied in the IR. This is the minimally useful version that uses the more general loop pipeliner in an NVGPU-specific way. Further extensions and orthogonalizations will be necessary. This required a change to the loop pipeliner itself to properly propagate errors should the predicate generator fail. This is loosely inspired from the vesion in IREE, but has less unsafe assumptions and more principled way of communicating decisions. Reviewed By: nicolasvasilache Differential Revision: https://reviews.llvm.org/D155223	2023-07-17 14:29:12 +00:00
Matthias Springer	a4f4d82c35	[mlir][NVGPU][NFC] Clean up code structure * Move passes to `Transforms` directory. * Add `Utils.h` (will be utilized in a subsequent change). Differential Revision: https://reviews.llvm.org/D155427	2023-07-17 14:15:42 +02:00
Guillaume Chatelet	b38dda74fa	[libc][NFC] Split memcmp implementations per platform This is a follow up on D154800 and D154770 to make the code structure more principled and avoid too many nested #ifdef/#endif. Reviewed By: courbet Differential Revision: https://reviews.llvm.org/D155181	2023-07-17 11:35:31 +00:00
Guillaume Chatelet	83f3920854	[libc][NFC] Split memset implementations per platform This is a follow up on D154800 and D154770 to make the code structure more principled and avoid too many nested #ifdef/#endif. Reviewed By: courbet Differential Revision: https://reviews.llvm.org/D155174	2023-07-17 11:12:19 +00:00
Oleg Shyshkov	4592543a01	[mlir][bazel] Fix build.	2023-07-17 11:00:20 +02:00
Matthias Springer	88f4292a16	[mlir][bufferization] OneShotBufferizeOp: Add options to use linalg.copy This new option allows users to specify a custom memcpy op. Differential Revision: https://reviews.llvm.org/D155280	2023-07-14 13:34:22 +02:00
Hanhan Wang	8fc433f055	[mlir][MemRef] Move narrow type emulation common methods to MemRefUtils. It also unifies the computation of StridedLayoutAttr. If the stride is static known value, we can just use it. Differential Revision: https://reviews.llvm.org/D155017	2023-07-13 14:43:21 -07:00
Guillaume Chatelet	8cc440b3e7	[libc][NFC] Split memcpy implementations per platform This is a follow up on D154800 and D154770 to make the code structure more principled and avoid too many nested #ifdef/#endif. Reviewed By: courbet Differential Revision: https://reviews.llvm.org/D155099	2023-07-13 10:30:38 +00:00
Guillaume Chatelet	1c4e4e03bd	[libc][NFC] Split bcmp implementations per platform This is a follow up on D154800 and D154770 to make the code structure more principled and avoid too many nested #ifdef/#endif. Reviewed By: courbet Differential Revision: https://reviews.llvm.org/D155076	2023-07-13 10:19:00 +00:00
Sterling Augustine	39d6fe790c	Add bazel support for new DebugBTF component.	2023-07-12 14:57:15 -07:00
Andrés Villegas	4f92557bfc	[NFC][llvm-dwp] Switch from llvm::cl to OptTable Switch the parse of command line options from llvm::cl to OptTable. The motivation for this change is to continue adding llvm based tools to the llvm driver multicall. For more information about the proposal and motivation, please see https://discourse.llvm.org/t/rfc-llvm-busybox-proposal/58494 Reviewed By: abrachet Differential Revision: https://reviews.llvm.org/D154642	2023-07-12 19:12:48 +00:00
Adrian Kuegel	a69b2e3d1c	[clang][Bazel] Add dependency to the right target.	2023-07-12 10:19:06 +02:00
Adrian Kuegel	93e7ef5907	[clang][Bazel] Add missing dependency.	2023-07-12 10:14:14 +02:00
Sterling Augustine	8df8f01065	Fix bazel build for `5a1cdcbd86`	2023-07-11 14:27:40 -07:00
Arthur Eubanks	4cca3de87e	[bazel][docs] Update build documentation Reviewed By: rnk Differential Revision: https://reviews.llvm.org/D155004	2023-07-11 13:36:27 -07:00
Alex Zinenko	8a918c54bb	[mlir] add backward dense dataflow analysis This is the counterpart to the forward dense dataflow analysis and integrates into the dataflow framework. The implementation follows the structure of existing dataflow analyses. Reviewed By: Mogball, phisiart Differential Revision: https://reviews.llvm.org/D154713	2023-07-11 16:47:53 +00:00
Aliia Khasanova	be29fe2f98	Fix bazel build file for D154060. Differential Revision: https://reviews.llvm.org/D154976	2023-07-11 17:33:58 +02:00
NAKAMURA Takumi	82371e68e4	[Bazel] Fixup for D153758, D153850, and D153861 (global-isel-combiner-matchtable)	2023-07-11 22:53:38 +09:00
Fangrui Song	7f7f4a6b17	[bazel] Adjust llvm:DebugInfo after D149501 (BTF.h)	2023-07-10 15:05:51 -07:00
Guillaume Chatelet	bfd94882f2	[libc][NFC] Move aligned access implementations to separate header Follow up on https://reviews.llvm.org/D154770 Differential Revision: https://reviews.llvm.org/D154800	2023-07-09 22:17:05 +00:00
Guillaume Chatelet	dbaa5838c1	[libc][NFC] Move memfunction's byte per byte implementations to a separate header There will be subsequent patches to move things around and make the file layout more principled. Differential Revision: https://reviews.llvm.org/D154770	2023-07-09 07:21:58 +00:00
Alex Zinenko	9ab34689b0	[mlir] add a simple gpu barrier elimination mechanism GPU code generation, and specifically the shared memory copy insertion may introduce spurious barriers guarding read-after-read dependencies or read-after-write on non-aliasing data, which degrades performance due to unnecessary synchronization. Add a pattern and transform op that removes such barriers by analyzing memory effects that the barrier actually guards that are not also guarded by other barriers. The code is adapted from the Polygeist incubator project. Co-authored-by: William Moses <gh@wsmoses.com> Co-authored-by: Ivan Radanov Ivanov <ivanov.i.aa@m.titech.ac.jp> Reviewed By: nicolasvasilache, wsmoses Differential Revision: https://reviews.llvm.org/D154720	2023-07-07 18:51:49 +00:00
Guillaume Chatelet	cb1468d3cb	[libc] Adding a version of memcpy w/ software prefetching For machines with a lot of cores, hardware prefetchers can saturate the memory bus when utilization is high. In this case it is desirable to turn off the hardware prefetcher completely. This has a big impact on the performance of memory functions such as `memcpy` that rely on the fact that the next cache line will be readily available. This patch adds the 'LIBC_COPT_MEMCPY_X86_USE_SOFTWARE_PREFETCHING' compile time option that generates a version of memcpy with software prefetching. While not fully restoring the original performances it mitigates the impact to an acceptable level. Reviewed By: rtenneti Differential Revision: https://reviews.llvm.org/D154494	2023-07-07 10:37:32 +00:00
Haojian Wu	99074aafc3	[bazel] Port for `88e95c1e4b`	2023-07-07 09:02:05 +02:00
Michael Jones	cfbcbc8f88	[libc] fix MPFR rounding problems in fuzz test The accuracy for the MPFR numbers in the strtofloat fuzz test was set too high, causing rounding issues when rounding to a smaller final result. Reviewed By: lntue Differential Revision: https://reviews.llvm.org/D154150	2023-07-05 10:53:40 -07:00
Alexander Belyaev	594643177f	Fix bazel build after https://reviews.llvm.org/D150578 .	2023-07-05 14:33:49 +02:00
Matthias Springer	cb7bda2ace	[mlir][NFC] Use `getConstantIntValue` instead of casting to `ConstantIndexOp` `getConstantIntValue` extracts constant values from all constant-like ops, not just `arith::ConstantIndexOp`. Differential Revision: https://reviews.llvm.org/D154356	2023-07-04 14:08:37 +02:00
Benjamin Kramer	9846b9e2ca	[bazel] Add missing dependency for `d9d9be63a5`	2023-07-04 13:34:03 +02:00
Matthias Springer	8b8e62d3f6	[mlir][SCF] Add `loop.promote_if_one_iteration` transform op This transform op promotes loops with one iteration. I.e., the loop op is replaced by just the loop body. Differential Revision: https://reviews.llvm.org/D154361	2023-07-04 08:58:49 +02:00
Matthias Springer	fa1a23a720	[mlir][transform] Add `transform.apply_licm` op This op applies loop-invariant code motion to the targeted loop-like op. Differential Revision: https://reviews.llvm.org/D154327	2023-07-03 15:28:53 +02:00
Adrian Kuegel	630b8d36c0	[mlir][Bazel] Add missing dependencies after `564713c471`	2023-07-03 13:16:28 +02:00
Matthias Springer	180f9ef8b7	[mlir][linalg] LinalgOp-anchored empty tensor elimination This revision adds a pre-bufferization transform that can reduce the number of allocation. It is similar to `bufferization.eliminate_empty_tensors`, but specific to LinalgOp. The transform looks for `tensor.empty` ops where the SSA use-def chain ends in an "ins" operand of a `LinalgOp`. If the same `LinalgOp` has an unused "outs" operand (and some other conditions are met), this "outs" operand can be used instead of the `tensor.empty` and the "ins" operand can be turned into an "outs" operand. Differential Revision: https://reviews.llvm.org/D153952	2023-07-03 09:17:48 +02:00
Haojian Wu	b28296c500	[bazel] Port bazel support for `5bf8efd269`	2023-07-01 08:27:26 +02:00
Guillaume Chatelet	1c814c99aa	[libc] Improve memcmp latency and codegen This is based on ideas from @nafi to: - use a branchless version of 'cmp' for 'uint32_t', - completely resolve the lexicographic comparison through vector operations when wide types are available. We also get rid of byte reloads and serializing '__builtin_ctzll'. I did not include the suggestion to replace comparisons of 'uint16_t' with two 'uint8_t' as it did not seem to help the codegen. This can be revisited in sub-sequent patches. The code been rewritten to reduce nested function calls, making the job of the inliner easier and preventing harmful code duplication. Reviewed By: nafi3000 Differential Revision: https://reviews.llvm.org/D148717	2023-06-30 13:00:58 +00:00
Aart Bik	6b88c852b6	[mlir][sparse] Start migration to new surface syntax for STEA We are in the progress of migrating to a much improved surface syntax for the Sparse Tensor Encoding Attribute (STEA). You can see a preview of this in the StableHLO RFC at https://github.com/openxla/stablehlo/blob/main/rfcs/20230210-sparsity.md //This design is courtesy Wren Romano.// This initial revision (1) Introduces the first version of a new parser written by Wren Romano (2) Introduces a simple "migration plan" using NEW_SYNTAX on the STEA, which will allow us to test the new parser with new examples, as well as migrate existing examples over without the need to rewrite them all This first "drop" merely provides the entry points to parse the new syntax. The parser is still under active development. For example, we need to address the "lookahead" issue when parsing the lvl spec (viz. do we see l0 = d0 or a direct d0). Another larger task is to actually implement "affine" parsing (since the MLIR affine parser is not accessible in other parts of the tree). EXAMPLE: Currently, CSR looks like #CSR = #sparse_tensor.encoding<{ lvlTypes = ["dense","compressed"], dimToLvl = affine_map<(i,j) -> (i,j)> }> but you can "force" the new parser with #CSR = #sparse_tensor.encoding<{ NEW_SYNTAX = (d0, d1) -> (l0 = d0 : dense, l1 = d1 : compressed) }> Reviewed By: Peiming Differential Revision: https://reviews.llvm.org/D153997	2023-06-29 11:32:07 -07:00
Tue Ly	f320fefc4a	[libc][math] Implement erff function correctly rounded to all rounding modes. Implement correctly rounded `erff` functions. For `x >= 4`, `erff(x) = 1` for `FE_TONEAREST` or `FE_UPWARD`, `0x1.ffffep-1` for `FE_DOWNWARD` or `FE_TOWARDZERO`. For `0 <= x < 4`, we divide into 32 sub-intervals of length `1/8`, and use a degree-15 odd polynomial to approximate `erff(x)` in each sub-interval: ``` erff(x) ~ x * (c0 + c1 * x^2 + c2 * x^4 + ... + c7 * x^14). ``` For `x < 0`, we can use the same formula as above, since the odd part is factored out. Performance tested with `perf.sh` tool from the CORE-MATH project on AMD Ryzen 9 5900X: Reciprocal throughput (clock cycles / op) ``` $ ./perf.sh erff --path2 GNU libc version: 2.35 GNU libc release: stable -- CORE-MATH reciprocal throughput -- with -march=native (with FMA instructions) [####################] 100 % Ntrial = 20 ; Min = 11.790 + 0.182 clc/call; Median-Min = 0.154 clc/call; Max = 12.255 clc/call; -- CORE-MATH reciprocal throughput -- with -march=x86-64-v2 (without FMA instructions) [####################] 100 % Ntrial = 20 ; Min = 14.205 + 0.151 clc/call; Median-Min = 0.159 clc/call; Max = 15.893 clc/call; -- System LIBC reciprocal throughput -- [####################] 100 % Ntrial = 20 ; Min = 45.519 + 0.445 clc/call; Median-Min = 0.552 clc/call; Max = 46.345 clc/call; -- LIBC reciprocal throughput -- with -mavx2 -mfma (with FMA instructions) [####################] 100 % Ntrial = 20 ; Min = 9.595 + 0.214 clc/call; Median-Min = 0.220 clc/call; Max = 9.887 clc/call; -- LIBC reciprocal throughput -- with -msse4.2 (without FMA instructions) [####################] 100 % Ntrial = 20 ; Min = 10.223 + 0.190 clc/call; Median-Min = 0.222 clc/call; Max = 10.474 clc/call; ``` and latency (clock cycles / op): ``` $ ./perf.sh erff --path2 GNU libc version: 2.35 GNU libc release: stable -- CORE-MATH latency -- with -march=native (with FMA instructions) [####################] 100 % Ntrial = 20 ; Min = 38.566 + 0.391 clc/call; Median-Min = 0.503 clc/call; Max = 39.170 clc/call; -- CORE-MATH latency -- with -march=x86-64-v2 (without FMA instructions) [####################] 100 % Ntrial = 20 ; Min = 43.223 + 0.667 clc/call; Median-Min = 0.680 clc/call; Max = 43.913 clc/call; -- System LIBC latency -- [####################] 100 % Ntrial = 20 ; Min = 111.613 + 1.267 clc/call; Median-Min = 1.696 clc/call; Max = 113.444 clc/call; -- LIBC latency -- with -mavx2 -mfma (with FMA instructions) [####################] 100 % Ntrial = 20 ; Min = 40.138 + 0.410 clc/call; Median-Min = 0.536 clc/call; Max = 40.729 clc/call; -- LIBC latency -- with -msse4.2 (without FMA instructions) [####################] 100 % Ntrial = 20 ; Min = 44.858 + 0.872 clc/call; Median-Min = 0.814 clc/call; Max = 46.019 clc/call; ``` Reviewed By: michaelrj Differential Revision: https://reviews.llvm.org/D153683	2023-06-28 13:58:37 -04:00
Nicolas Vasilache	13f4e889c5	Revert "Revert "[mlir][Transform] Add support for mma.sync m16n8k16 f16 rewrite." and "[mlir][Transform] Introduce nvgpu transform extensions"" This reverts commit `6506692fe6`. Differential Revision: https://reviews.llvm.org/D153845	2023-06-28 06:50:05 +00:00
Mehdi Amini	6506692fe6	Revert "[mlir][Transform] Add support for mma.sync m16n8k16 f16 rewrite." and "[mlir][Transform] Introduce nvgpu transform extensions" This reverts commit `40deed40ae`. and commit `1660f2174d`. The buildbot is broken, the two tests aren't passing.	2023-06-27 08:46:18 +02:00
Benjamin Kramer	a18266473b	[bazel][mlir] Add missing dependencies for `5a1cdcbd86`	2023-06-27 01:24:15 +02:00
Andres Villegas	939c03512d	[llvm-libtool-darwin] Switch to OptTableSummary Switch the parse of command line options fromllvm::cl to OptTable. The motivation for this change is to continue adding llvm based tools to the llvm driver multicall. Differential Revision: https://reviews.llvm.org/D153665	2023-06-26 14:37:51 -07:00
Fangrui Song	19e9b9b589	[bazel] Add includes after `5a63b2b304`	2023-06-26 12:55:48 -07:00
Nicolas Vasilache	40deed40ae	[mlir][Transform] Introduce nvgpu transform extensions Mapping to NVGPU operations such as mma.sync with mixed precision and ldmatrix with transposes and various data types involves complex matchings from low-level IR. This is akin to raising complex patterns after unnecessarily having lost structural information. To avoid such unnecessary complexity, introduce a direct mapping step from a matmul on memrefs to distributed NVGPU vector abstractions. In this context, mapping to specific mma.sync operations is trivial and consists in simply translating the documentation into indexing expressions. Correctness is demonstrated with an end-to-end integration test. Differential Revision: https://reviews.llvm.org/D153420	2023-06-26 16:21:28 +00:00
Christian Sigg	9feed59a91	[Bazel][llvm] Fix after `8de9f2b`	2023-06-26 14:55:03 +02:00
Benjamin Kramer	4340ef141c	[bazel] Add TargetParser dep to tblgen after `8de9f2b558`	2023-06-26 12:04:54 +02:00
Christian Sigg	cd482968dc	[Bazel][mlir] Avoid ODR violation introduced in `7ab749c`. This change also prepares for `9119325` to land again. Adds `mlir_c_runner_utils_hdrs` and `mlir_runner_utils_hdrs` targets which do not depend on `//llvm::Support`. These can be used by other 'runner.so' targets if they are loaded along with the 'runner_utils.so' without calling `__mlir_execution_engine_init()` twice.	2023-06-22 08:00:50 +02:00
Guillaume Chatelet	bd1cba9f4f	Revert D148717 "[libc] Improve memcmp latency and codegen" Once integrated in our codebase the patch triggered a bunch of failing tests. We do not yet understand where the bug is but we revert it to move forward with integration. This reverts commit `5e32765c15`.	2023-06-21 12:37:14 +00:00
Christian Sigg	699e64c0d9	Revert "[Bazel][mlir] Fix ODR violation introduced in 7ab749c." This reverts commit `e83c8c3600`. Depending only on the support header files is not sufficient.	2023-06-21 14:29:44 +02:00
Christian Sigg	e83c8c3600	[Bazel][mlir] Fix ODR violation introduced in `7ab749c`.	2023-06-21 11:15:09 +02:00
Christian Sigg	7ab749c3a8	[Bazel][mlir] Fix after `bba2b65611`	2023-06-20 23:00:38 +02:00

1 2 3 4 5 ...

1342 Commits