Danila Kutenin
68c011aa08
[builtins] Optimize udivmodti4 for many platforms.
Summary:
While benchmarking uint128 division we found out that it has huge latency for small divisors
https://reviews.llvm.org/D83027
```
Benchmark Time(ns) CPU(ns) Iterations
--------------------------------------------------------------------------------------------------
BM_DivideIntrinsic128UniformDivisor<unsigned __int128> 13.0 13.0 55000000
BM_DivideIntrinsic128UniformDivisor<__int128> 14.3 14.3 50000000
BM_RemainderIntrinsic128UniformDivisor<unsigned __int128> 13.5 13.5 52000000
BM_RemainderIntrinsic128UniformDivisor<__int128> 14.1 14.1 50000000
BM_DivideIntrinsic128SmallDivisor<unsigned __int128> 153 153 5000000
BM_DivideIntrinsic128SmallDivisor<__int128> 170 170 3000000
BM_RemainderIntrinsic128SmallDivisor<unsigned __int128> 153 153 5000000
BM_RemainderIntrinsic128SmallDivisor<__int128> 155 155 5000000
```
This patch suggests a more optimized version of the division:
If the divisor is 64 bit, we can proceed with the divq instruction on x86 or constant multiplication mechanisms for other platforms. Once both divisor and dividend are not less than 2**64, we use branch free subtract algorithm, it has at most 64 cycles. After that our benchmarks improved significantly
```
Benchmark Time(ns) CPU(ns) Iterations
--------------------------------------------------------------------------------------------------
BM_DivideIntrinsic128UniformDivisor<unsigned __int128> 11.0 11.0 64000000
BM_DivideIntrinsic128UniformDivisor<__int128> 13.8 13.8 51000000
BM_RemainderIntrinsic128UniformDivisor<unsigned __int128> 11.6 11.6 61000000
BM_RemainderIntrinsic128UniformDivisor<__int128> 13.7 13.7 52000000
BM_DivideIntrinsic128SmallDivisor<unsigned __int128> 27.1 27.1 26000000
BM_DivideIntrinsic128SmallDivisor<__int128> 29.4 29.4 24000000
BM_RemainderIntrinsic128SmallDivisor<unsigned __int128> 27.9 27.8 26000000
BM_RemainderIntrinsic128SmallDivisor<__int128> 29.1 29.1 25000000
```
If not using divq instrinsics, it is still much better
```
Benchmark Time(ns) CPU(ns) Iterations
--------------------------------------------------------------------------------------------------
BM_DivideIntrinsic128UniformDivisor<unsigned __int128> 12.2 12.2 58000000
BM_DivideIntrinsic128UniformDivisor<__int128> 13.5 13.5 52000000
BM_RemainderIntrinsic128UniformDivisor<unsigned __int128> 12.7 12.7 56000000
BM_RemainderIntrinsic128UniformDivisor<__int128> 13.7 13.7 51000000
BM_DivideIntrinsic128SmallDivisor<unsigned __int128> 30.2 30.2 24000000
BM_DivideIntrinsic128SmallDivisor<__int128> 33.2 33.2 22000000
BM_RemainderIntrinsic128SmallDivisor<unsigned __int128> 31.4 31.4 23000000
BM_RemainderIntrinsic128SmallDivisor<__int128> 33.8 33.8 21000000
```
PowerPC benchmarks:
Was
```
BM_DivideIntrinsic128UniformDivisor<unsigned __int128> 22.3 22.3 32000000
BM_DivideIntrinsic128UniformDivisor<__int128> 23.8 23.8 30000000
BM_RemainderIntrinsic128UniformDivisor<unsigned __int128> 22.5 22.5 32000000
BM_RemainderIntrinsic128UniformDivisor<__int128> 24.9 24.9 29000000
BM_DivideIntrinsic128SmallDivisor<unsigned __int128> 394 394 2000000
BM_DivideIntrinsic128SmallDivisor<__int128> 397 397 2000000
BM_RemainderIntrinsic128SmallDivisor<unsigned __int128> 399 399 2000000
BM_RemainderIntrinsic128SmallDivisor<__int128> 397 397 2000000
```
With this patch
```
BM_DivideIntrinsic128UniformDivisor<unsigned __int128> 21.7 21.7 33000000
BM_DivideIntrinsic128UniformDivisor<__int128> 23.0 23.0 31000000
BM_RemainderIntrinsic128UniformDivisor<unsigned __int128> 21.9 21.9 33000000
BM_RemainderIntrinsic128UniformDivisor<__int128> 23.9 23.9 30000000
BM_DivideIntrinsic128SmallDivisor<unsigned __int128> 32.7 32.6 23000000
BM_DivideIntrinsic128SmallDivisor<__int128> 33.4 33.4 21000000
BM_RemainderIntrinsic128SmallDivisor<unsigned __int128> 31.1 31.1 22000000
BM_RemainderIntrinsic128SmallDivisor<__int128> 33.2 33.2 22000000
```
My email: danilak@google.com, I don't have commit rights
Reviewers: howard.hinnant, courbet, MaskRay
Reviewed By: courbet
Subscribers: steven.zhang, #sanitizers
Tags: #sanitizers
Differential Revision: https://reviews.llvm.org/D81809
2020-07-10 09:59:16 +02:00
..
2020-07-07 10:15:46 -07:00
2020-07-10 09:59:16 +02:00
2019-06-15 01:37:14 +00:00
2020-03-09 15:52:39 -07:00
2020-05-05 22:32:45 -07:00
2020-05-29 16:00:26 -07:00
2020-03-26 10:10:24 -07:00
2020-06-10 12:14:53 -07:00
2019-10-10 23:30:54 +00:00
2020-06-16 13:45:29 +02:00
2020-06-15 17:49:18 -07:00
2020-07-01 16:41:55 -07:00
2019-08-01 13:56:52 +00:00
2020-07-09 17:28:01 -07:00
2020-06-18 11:46:39 -07:00
2019-08-01 12:41:23 +00:00
2020-06-08 12:35:43 -07:00
2020-05-11 09:47:21 -07:00
2019-08-01 12:41:23 +00:00
2020-06-17 13:49:01 -04:00
2020-04-17 17:26:30 -07:00