Commit Graph

603 Commits

Author SHA1 Message Date
Simon Pilgrim
0776924a17 [CostModel][X86] getCmpSelInstrCost - treat BAD_PREDICATEs the same as the worst case cost predicates for ICMP/FCMP instructions
As suggested on D111024, we should treat getCmpSelInstrCost calls without a specific predicate as matching the worst case predicate cost.

These regressions will be addressed with a mixture of D111024 and fixing other specific getCmpSelInstrCost calls to have realistic predicates.
2021-10-06 10:14:56 +01:00
Roman Lebedev
3f9b235482 [X86][Costmodel] Load/store i64/f64 Stride=6 VF=8 interleaving costs
The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

For load we have:
https://godbolt.org/z/1jfGddcre - for intels `Block RThroughput: =36.0`; for ryzens, `Block RThroughput: =12.0`
So could pick cost of `36`

For store we have:
https://godbolt.org/z/ao9srMT8r - for intels `Block RThroughput: =30.0`; for ryzens, `Block RThroughput: =12.0`
So we could pick cost of `30`.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D111094
2021-10-05 16:58:58 +03:00
Roman Lebedev
e2784c5d8c [X86][Costmodel] Load/store i64/f64 Stride=6 VF=4 interleaving costs
The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

For load we have:
https://godbolt.org/z/rc8jYxW6M - for intels `Block RThroughput: =18.0`; for ryzens, `Block RThroughput: =6.0`
So could pick cost of `18`.

For store we have:
https://godbolt.org/z/9PhPEr65G - for intels `Block RThroughput: =15.0`; for ryzens, `Block RThroughput: =6.0`
So we could pick cost of `15`.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D111093
2021-10-05 16:58:58 +03:00
Roman Lebedev
3960693048 [X86][Costmodel] Load/store i64/f64 Stride=6 VF=2 interleaving costs
The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

For load we have:
https://godbolt.org/z/onese7rec - for intels `Block RThroughput: =6.0`; for ryzens, `Block RThroughput: =3.0`
So could pick cost of `6`.

For store we have:
https://godbolt.org/z/bMd7dddnT - for intels `Block RThroughput: =8.0`; for ryzens, `Block RThroughput: <=6.0`
So we could pick cost of `8`.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D111092
2021-10-05 16:58:58 +03:00
Roman Lebedev
79d6d12d95 [X86][Costmodel] Load/store i32/f32 Stride=6 VF=16 interleaving costs
This one required quite a bit of an assembly surgery, but i think it's in the right ballpark..

The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

For load we have:
https://godbolt.org/z/na97Kb96o - for intels `Block RThroughput: <=64.0`; for ryzens, `Block RThroughput: <=32.0`
So could pick cost of `64`.

For store we have:
https://godbolt.org/z/GG1WeoKar - for intels `Block RThroughput: =66.0`; for ryzens, `Block RThroughput: <=27.5`
So we could pick cost of `66`.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D111091
2021-10-05 16:58:58 +03:00
Roman Lebedev
2996a2b50f [X86][Costmodel] Load/store i32/f32 Stride=6 VF=8 interleaving costs
The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

For load we have:
https://godbolt.org/z/jK85GWKaK - for intels `Block RThroughput: =31.0`; for ryzens, `Block RThroughput: <=17.0`
So could pick cost of `31`.

For store we have:
https://godbolt.org/z/hPWWhEEf9 - for intels `Block RThroughput: =33.0`; for ryzens, `Block RThroughput: <=13.8`
So we could pick cost of `33`.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D111089
2021-10-05 16:58:57 +03:00
Roman Lebedev
d51532d8aa [X86][Costmodel] Load/store i32/f32 Stride=6 VF=4 interleaving costs
The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

For load we have:
https://godbolt.org/z/szEj1ceee - for intels `Block RThroughput: =15.0`; for ryzens, `Block RThroughput: <=8.8`
So could pick cost of `15`.

For store we have:
https://godbolt.org/z/81bq4fTo1 - for intels `Block RThroughput: =12.0`; for ryzens, `Block RThroughput: <=10.0`
So we could pick cost of `12`.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D111087
2021-10-05 16:58:57 +03:00
Roman Lebedev
764fd5f463 [X86][Costmodel] Load/store i32/f32 Stride=6 VF=2 interleaving costs
The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

For load we have:
https://godbolt.org/z/aec96Thee - for intels `Block RThroughput: =6.0`; for ryzens, `Block RThroughput: <=3.3`
So could pick cost of `6`.

For store we have:
https://godbolt.org/z/aec96Thee - for intels `Block RThroughput: =9.0`; for ryzens, `Block RThroughput: <=3.0`
So we could pick cost of `9`.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D111083
2021-10-05 16:58:57 +03:00
Roman Lebedev
c800119c46 [X86][Costmodel] Load/store i64/f64 Stride=4 VF=8 interleaving costs
The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

For load we have:
https://godbolt.org/z/3M3hbq7n8 - for intels `Block RThroughput: =20.0`; for ryzens, `Block RThroughput: =8.0`
So could pick cost of `20`.

For store we have:
https://godbolt.org/z/zvnPYWTx7 - for intels `Block RThroughput: =20.0`; for ryzens, `Block RThroughput: =8.0`
So we could pick cost of `20`.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D111076
2021-10-05 16:58:57 +03:00
Roman Lebedev
000ce0bfd5 [X86][Costmodel] Load/store i64/f64 Stride=4 VF=4 interleaving costs
The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

For load we have:
https://godbolt.org/z/MTKdzjvnr - for intels `Block RThroughput: =8.0`; for ryzens, `Block RThroughput: <=4.0`
So could pick cost of `8`.

For store we have:
https://godbolt.org/z/cMYEvqoah - for intels `Block RThroughput: =8.0`; for ryzens, `Block RThroughput: <=4.0`
So we could pick cost of `8`.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D111075
2021-10-05 16:58:57 +03:00
Roman Lebedev
dcc2b0d933 [X86][Costmodel] Load/store i64/f64 Stride=4 VF=2 interleaving costs
The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

For load we have:
https://godbolt.org/z/z197317d1 - for intels `Block RThroughput: =6.0`; for ryzens, `Block RThroughput: =2.0`
So could pick cost of `6`.

For store we have:
https://godbolt.org/z/8dzszjf9q - for intels `Block RThroughput: =6.0`; for ryzens, `Block RThroughput: <=4.0`
So we could pick cost of `6`.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D111073
2021-10-05 16:58:57 +03:00
Roman Lebedev
7d91037fd2 [X86][Costmodel] Load/store i32/f32 Stride=4 VF=16 interleaving costs
This one required quite a bit of assembly surgery, but the trend continues, so i think this is right.

The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

For load we have:
https://godbolt.org/z/EKWdj8cKT - for intels `Block RThroughput: <=32.0`; for ryzens, `Block RThroughput: <=24.0`
So could pick cost of `32`.

For store we have:
https://godbolt.org/z/zj4bb9P75 - for intels `Block RThroughput: =32.0`; for ryzens, `Block RThroughput: <=16.0`
So we could pick cost of `32`.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D111064
2021-10-05 16:58:57 +03:00
Roman Lebedev
4aee1e5b93 [X86][Costmodel] Load/store i32/f32 Stride=4 VF=8 interleaving costs
The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

For load we have:
https://godbolt.org/z/a6rxMG6ec - for intels `Block RThroughput: =16.0`; for ryzens, `Block RThroughput: <=12.0`
So could pick cost of `16`.

For store we have:
https://godbolt.org/z/ced1bdqc9 - for intels `Block RThroughput: =16.0`; for ryzens, `Block RThroughput: <=8.0`
So we could pick cost of `16`.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D111063
2021-10-05 16:58:57 +03:00
Roman Lebedev
3c2e22b795 [X86][Costmodel] Load/store i32/f32 Stride=4 VF=4 interleaving costs
The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

For load we have:
https://godbolt.org/z/avq1oz98W - for intels `Block RThroughput: =8.0`; for ryzens, `Block RThroughput: =4.0`
So could pick cost of `8`.

For store we have:
https://godbolt.org/z/89PGMc1qs - for intels `Block RThroughput: =6.0`; for ryzens, `Block RThroughput: <=6.0`
So we could pick cost of `6`.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D111061
2021-10-05 16:58:57 +03:00
Roman Lebedev
b6234c1edf [X86][Costmodel] Load/store i32/f32 Stride=4 VF=2 interleaving costs
Finally, we are getting to the heavy-hitter stuff!

The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

For load we have:
https://godbolt.org/z/7crGWoar6 - for intels `Block RThroughput: =4.0`; for ryzens, `Block RThroughput: <=2.0`
So could pick cost of `4`.

For store we have:
https://godbolt.org/z/T8aq3MszM - for intels `Block RThroughput: =5.0`; for ryzens, `Block RThroughput: <=2.0`
So we could pick cost of `5`.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D111060
2021-10-05 16:58:56 +03:00
Roman Lebedev
cef0a693b6 [X86][Costmodel] Load/store i64/f64 Stride=3 VF=16 interleaving costs
This required huge amount of assembly surgery, but i think this is about right.

The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

For load we have:
https://godbolt.org/z/z11crMEcj - for intels `Block RThroughput: =20.0`; for ryzens, `Block RThroughput: <=18.0`
So could pick cost of `25`.

For store we have:
https://godbolt.org/z/eqT4ze3j4 - for intels `Block RThroughput: =24.0`; for ryzens, `Block RThroughput: <=16.0`
So we could pick cost of `24`.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D111031
2021-10-04 14:35:17 +03:00
Roman Lebedev
ede0611e79 [X86][Costmodel] Load/store i64/f64 Stride=3 VF=8 interleaving costs
This one required quite a bit of assembly surgery.

The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

For load we have:
https://godbolt.org/z/oYWv4cTnK - for intels `Block RThroughput: =10.0`; for ryzens, `Block RThroughput: <=8.0`
So pick cost of `10`.

For store we have:
https://godbolt.org/z/33GMhrsG9 - for intels `Block RThroughput: =12.0`; for ryzens, `Block RThroughput: <=8.0`
So pick cost of `12`.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D111027
2021-10-04 14:35:01 +03:00
Roman Lebedev
eb9a694c17 [X86][Costmodel] Load/store i64/f64 Stride=3 VF=4 interleaving costs
This one required quite a bit of assembly surgery.

The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

For load we have:
https://godbolt.org/z/Tce3osvcz - for intels `Block RThroughput: =5.0`; for ryzens, `Block RThroughput: <=4.0`
So pick cost of `5`.

For store we have:
https://godbolt.org/z/oc3arEcnE - for intels `Block RThroughput: =6.0`; for ryzens, `Block RThroughput: <=4.0`
So pick cost of `6`.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D111026
2021-10-04 14:34:47 +03:00
Roman Lebedev
d3bbe781ea [X86][Costmodel] Load/store i64/f64 Stride=3 VF=2 interleaving costs
The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

For load we have:
https://godbolt.org/z/sz5qdKnr4 - for intels `Block RThroughput: =1.0`; for ryzens, `Block RThroughput: <=1.0`
So pick cost of `1`.

For store we have:
https://godbolt.org/z/Kzdjff63v - for intels `Block RThroughput: =4.0`; for ryzens, `Block RThroughput: <=3.0`
So pick cost of `4`.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D111025
2021-10-04 14:34:33 +03:00
Roman Lebedev
4ca5bc07af [X86][Costmodel] Load/store i32/f32 Stride=3 VF=16 interleaving costs
The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

For load we have:
https://godbolt.org/z/5fqrh4qqo - for intels `Block RThroughput: =14.0`; for ryzens, `Block RThroughput: <=12.0`
So pick cost of `14`.

For store we have:
https://godbolt.org/z/5fqrh4qqo - for intels `Block RThroughput: =22.0`; for ryzens, `Block RThroughput: <=16.0`
So pick cost of `22`.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D111022
2021-10-04 14:34:19 +03:00
Roman Lebedev
198aa84973 [X86][Costmodel] Load/store i32/f32 Stride=3 VF=8 interleaving costs
The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

For load we have:
https://godbolt.org/z/zdz5Ga6fs - for intels `Block RThroughput: =7.0`; for ryzens, `Block RThroughput: <=6.0`
So pick cost of `7`.

For store we have:
https://godbolt.org/z/qn71513ac - for intels `Block RThroughput: =11.0`; for ryzens, `Block RThroughput: <=8.0`
So pick cost of `11`.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D111021
2021-10-04 14:34:05 +03:00
Roman Lebedev
a93411c3af [X86][Costmodel] Load/store i32/f32 Stride=3 VF=4 interleaving costs
The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

For load we have:
https://godbolt.org/z/d8PdhEszo - for intels `Block RThroughput: =3.0`; for ryzens, `Block RThroughput: <=3.0`
So pick cost of `3`.

For store we have:
https://godbolt.org/z/WojonfG5n - for intels `Block RThroughput: =5.0`; for ryzens, `Block RThroughput: <=3.0`
So pick cost of `5`.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D111020
2021-10-04 14:34:03 +03:00
Roman Lebedev
3e93fcdfc8 [X86][Costmodel] Load/store i32/f32 Stride=3 VF=2 interleaving costs
The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

For load we have:
https://godbolt.org/z/z8qa14bs3 - for intels `Block RThroughput: =3.0`; for ryzens, `Block RThroughput: =1.5`
So pick cost of `3`.

For store we have:
https://godbolt.org/z/GYGajoc4K - for intels `Block RThroughput: <=4.0`; for ryzens, `Block RThroughput: <=2.0`
So pick cost of `4`.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D111019
2021-10-04 14:31:50 +03:00
Roman Lebedev
67f1ee2e38 [X86][Costmodel] Load/store i16 Stride=3 VF=32 interleaving costs
The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

For load we have:
https://godbolt.org/z/rMaYr67hz - for intels `Block RThroughput: =56.0`; for ryzens, `Block RThroughput: <=17.8`
So pick cost of `56`.

For store we have:
https://godbolt.org/z/eMsbKqnvv - for intels `Block RThroughput: <=54.0`; for ryzens, `Block RThroughput: <=15.0`
So pick cost of `54`.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D111018
2021-10-03 23:40:35 +03:00
Roman Lebedev
3cbc0a07f9 [X86][Costmodel] Load/store i16 Stride=3 VF=16 interleaving costs
The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

For load we have:
https://godbolt.org/z/1T6MMzeh3 - for intels `Block RThroughput: =28.0`; for ryzens, `Block RThroughput: <=8.5`
So pick cost of `28`.

For store we have:
https://godbolt.org/z/1T6MMzeh3 - for intels `Block RThroughput: <=27.0`; for ryzens, `Block RThroughput: <=7.0`
So pick cost of `27`.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D111017
2021-10-03 23:40:21 +03:00
Roman Lebedev
72f8a9244a [X86][Costmodel] Load/store i16 Stride=3 VF=8 interleaving costs
The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

For load we have:
https://godbolt.org/z/Mh9MnnT8W - for intels `Block RThroughput: =9.0`; for ryzens, `Block RThroughput: <=2.3`
So pick cost of `9`.

For store we have:
https://godbolt.org/z/Mh9MnnT8W - for intels `Block RThroughput: <=12.0`; for ryzens, `Block RThroughput: <=3.3`
So pick cost of `12`.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D111016
2021-10-03 23:40:05 +03:00
Roman Lebedev
04f1469cb4 [X86][Costmodel] Load/store i16 Stride=3 VF=4 interleaving costs
The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

For load we have:
https://godbolt.org/z/sP4j1173f - for intels `Block RThroughput: =7.0`; for ryzens, `Block RThroughput: <=3.0`
So pick cost of `7`.

For store we have:
https://godbolt.org/z/sP4j1173f - for intels `Block RThroughput: =6.0`; for ryzens, `Block RThroughput: <=2.0`
So pick cost of `6`.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D111015
2021-10-03 23:39:51 +03:00
Roman Lebedev
8e8fb77aa4 [X86][Costmodel] Load/store i16 Stride=3 VF=2 interleaving costs
The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

For load we have:
https://godbolt.org/z/xnE988aej - for intels `Block RThroughput: =5.0`; for ryzens, `Block RThroughput: <=2.5`
So pick cost of `5`.

For store we have:
https://godbolt.org/z/rMGT31Tnh - for intels `Block RThroughput: =4.0`; for ryzens, `Block RThroughput: <=2.0`
So pick cost of `4`.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D111014
2021-10-03 23:39:36 +03:00
Roman Lebedev
a5e5883ef5 [X86][Costmodel] Load/store i8 Stride=6 VF=32 interleaving costs
The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

For load we have:
https://godbolt.org/z/c1jjKqP7b - for intels `Block RThroughput: <=82.0`; for ryzens, `Block RThroughput: <=26.0`
So pick cost of `82`.

For store we have:
https://godbolt.org/z/YM4ErY8x7 - for intels `Block RThroughput: <=90.0`; for ryzens, `Block RThroughput: <=25.5`
So pick cost of `90`.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D111013
2021-10-03 23:39:22 +03:00
Roman Lebedev
bd5ba437fd [X86][Costmodel] Load/store i8 Stride=6 VF=16 interleaving costs
The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

For load we have:
https://godbolt.org/z/Gz8hhqfTM - for intels `Block RThroughput: <=43.0`; for ryzens, `Block RThroughput: <=14.0`
So pick cost of `43`.

For store we have:
https://godbolt.org/z/9vrdssYa8 - for intels `Block RThroughput: <=27.0`; for ryzens, `Block RThroughput: <=12.0`
So pick cost of `27`.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D111012
2021-10-03 23:39:08 +03:00
Roman Lebedev
0b27f9c088 [X86][Costmodel] Load/store i8 Stride=6 VF=8 interleaving costs
The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

For load we have:
https://godbolt.org/z/v98qPTTf6 - for intels `Block RThroughput: =18.0`; for ryzens, `Block RThroughput: =6.0`
So pick cost of `18`.

For store we have:
https://godbolt.org/z/rn5T9E8q6 - for intels `Block RThroughput: <=16.0`; for ryzens, `Block RThroughput: <=4.5`
So pick cost of `16`.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D111011
2021-10-03 23:38:54 +03:00
Roman Lebedev
6fe4cce558 [X86][Costmodel] Load/store i8 Stride=6 VF=4 interleaving costs
The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

For load we have:
https://godbolt.org/z/4sWhs396o - for intels `Block RThroughput: =14.0`; for ryzens, `Block RThroughput: <=7.0`
So pick cost of `14`.

For store we have:
https://godbolt.org/z/4sWhs396o - for intels `Block RThroughput: =9.0`; for ryzens, `Block RThroughput: <=3.0`
So pick cost of `9`.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D111010
2021-10-03 23:38:40 +03:00
Roman Lebedev
396b95e5c9 [X86][Costmodel] Load/store i8 Stride=6 VF=2 interleaving costs
The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

For load we have:
https://godbolt.org/z/jvj6jzns5 - for intels `Block RThroughput: =6.0`; for ryzens, `Block RThroughput: <=3.0`
So pick cost of `6`.

For store we have:
https://godbolt.org/z/ros7eebMP - for intels `Block RThroughput: =7.0`; for ryzens, `Block RThroughput: <=3.0`
So pick cost of `7`.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D111008
2021-10-03 23:38:10 +03:00
Simon Pilgrim
b85bf520dc [CostModel][X86] X86TTIImpl::getCmpSelInstrCost - try to use Predicate argument directly first (PR48337)
There's still a lot of cases where getCmpSelInstrCost fails to specify a predicate, once those are in place we should be able to remove the fallback to the Instruction argument entirely.
2021-10-03 17:16:51 +01:00
Roman Lebedev
acb459574a [X86][Costmodel] Load/store i8 Stride=4 VF=32 interleaving costs
While we already model this tuple, the load cost is divergent from reality, so fix it.

The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

For load we have:
https://godbolt.org/z/zWMhhnPYa - for intels `Block RThroughput: =56.0`; for ryzens, `Block RThroughput: <=24.0`
So pick cost of `56`.

For store we have:
https://godbolt.org/z/vnqqjWx51 - for intels `Block RThroughput: =12.0`; for ryzens, `Block RThroughput: <=4.0`
So pick cost of `12`.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D110971
2021-10-02 13:40:21 +03:00
Roman Lebedev
0e71ae6da8 [X86][Costmodel] Load/store i8 Stride=4 VF=16 interleaving costs
While we already model this tuple, the values are divergent from reality, so fix them.

The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

For load we have:
https://godbolt.org/z/TrGW7cKsE - for intels `Block RThroughput: =24.0`; for ryzens, `Block RThroughput: <=12.0`
So pick cost of `24`.

For store we have:
https://godbolt.org/z/Mh7qaqEfe - for intels `Block RThroughput: =8.0`; for ryzens, `Block RThroughput: <=4.0`
So pick cost of `8`.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D110970
2021-10-02 13:40:21 +03:00
Roman Lebedev
74e4a0e327 [X86][Costmodel] Load/store i8 Stride=4 VF=8 interleaving costs
While we already model this tuple, the values are divergent from reality, so fix them.

The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

For load we have:
https://godbolt.org/z/v7746Wcf7 - for intels `Block RThroughput: =12.0`; for ryzens, `Block RThroughput: <=6.0`
So pick cost of `12`.

For store we have:
https://godbolt.org/z/aEeEohEbP - for intels `Block RThroughput: =4.0`; for ryzens, `Block RThroughput: <=2.0`
So pick cost of `4`.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D110969
2021-10-02 13:40:20 +03:00
Roman Lebedev
ae08362cb8 [X86][Costmodel] Load/store i8 Stride=4 VF=4 interleaving costs
While we already model this tuple, the store cost is divergent from reality, so fix it.

The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

For load we have:
https://godbolt.org/z/1n4bPh7Tn - for intels `Block RThroughput: =4.0`; for ryzens, `Block RThroughput: <=2.0`
So pick cost of `4`.

For store we have:
https://godbolt.org/z/r8K9sveqo - for intels `Block RThroughput: =4.0`; for ryzens, `Block RThroughput: <=2.0`
So pick cost of `4`.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D110968
2021-10-02 13:40:20 +03:00
Roman Lebedev
935b9693ae [X86][Costmodel] Load/store i8 Stride=4 VF=2 interleaving costs
While we already model this tuple, the values are divergent from reality, so fix them.

The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

For load we have:
https://godbolt.org/z/KP6nn36zs - for intels `Block RThroughput: =4.0`; for ryzens, `Block RThroughput: <=2.0`
So pick cost of `4`.

For store we have:
https://godbolt.org/z/ov95zhrq6 - for intels `Block RThroughput: =4.0`; for ryzens, `Block RThroughput: <=2.0`
So pick cost of `4`.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D110966
2021-10-02 13:40:20 +03:00
Roman Lebedev
448c939839 [X86][Costmodel] Load/store i8 Stride=3 VF=32 interleaving costs
For VF=16, costs are correct.
For VF=32, load cost is divergent.

The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

For load we have:
https://godbolt.org/z/qKjevqf4W - for intels `Block RThroughput: <=14.0`; for ryzens, `Block RThroughput: <=4.5`
So pick cost of `14`.

For store we have:
https://godbolt.org/z/xTssTq319 - for intels `Block RThroughput: =13.0`; for ryzens, `Block RThroughput: <=5.5`
So pick cost of `13`.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D110961
2021-10-02 13:39:15 +03:00
Roman Lebedev
d1460c88a6 [X86][Costmodel] Load/store i8 Stride=3 VF=8 interleaving costs
While we already model this tuple, the values are divergent from reality, so fix them.

The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

For load we have:
https://godbolt.org/z/1jeocxj55 - for intels `Block RThroughput: =6.0`; for ryzens, `Block RThroughput: <=3.0`
So pick cost of `6`.

For store we have:
https://godbolt.org/z/fr7xfa3K5 - for intels `Block RThroughput: =6.0`; for ryzens, `Block RThroughput: <=2.0`
So pick cost of `6`.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D110960
2021-10-02 13:39:15 +03:00
Roman Lebedev
f1df2d8eaf [X86][Costmodel] Load/store i8 Stride=3 VF=4 interleaving costs
While we already model this tuple, the values are divergent from reality, so fix them.

The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

For load we have:
https://godbolt.org/z/obWz3PrfK - for intels `Block RThroughput: =3.0`; for ryzens, `Block RThroughput: <=1.5`
So pick cost of `3`.

For store we have:
https://godbolt.org/z/orjPshn3h - for intels `Block RThroughput: =4.0`; for ryzens, `Block RThroughput: <=2.0`
So pick cost of `4`.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D110958
2021-10-02 13:39:10 +03:00
Roman Lebedev
8a3c64c3a2 [X86][Costmodel] Load/store i8 Stride=3 VF=2 interleaving costs
While we already model this tuple, the values are divergent from reality, so fix them.

The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

For load we have:
https://godbolt.org/z/WYscYMcW4 - for intels `Block RThroughput: =3.0`; for ryzens, `Block RThroughput: <=1.5`
So pick cost of `3`.

For store we have:
https://godbolt.org/z/e9qvYdbbs - for intels `Block RThroughput: =4.0`; for ryzens, `Block RThroughput: <=2.0`
So pick cost of `4`.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D110956
2021-10-02 13:39:05 +03:00
Roman Lebedev
3e260efdfc [X86][Costmodel] Load/store i64/f64 Stride=2 VF=16 interleaving costs
The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

For load we have:
https://godbolt.org/z/1WMTojvfW - for intels `Block RThroughput: =16.0`; for ryzens, `Block RThroughput: <=8.0`
So pick cost of `16`.

For store we have:
https://godbolt.org/z/1WMTojvfW - for intels `Block RThroughput: =16.0`; for ryzens, `Block RThroughput: <=16.0`
So pick cost of `16`.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D110840
2021-10-01 17:48:14 +03:00
Roman Lebedev
abd37de63e [X86][Costmodel] Load/store i64/f64 Stride=2 VF=8 interleaving costs
The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

For load we have:
https://godbolt.org/z/PGYbYKPq8 - for intels `Block RThroughput: =8.0`; for ryzens, `Block RThroughput: <=4.0`
So pick cost of `8`.

For store we have:
https://godbolt.org/z/PGYbYKPq8 - for intels `Block RThroughput: =8.0`; for ryzens, `Block RThroughput: <=8.0`
So pick cost of `8`.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D110838
2021-10-01 17:48:14 +03:00
Roman Lebedev
71bc31b907 [X86][Costmodel] Load/store i64/f64 Stride=2 VF=4 interleaving costs
The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

For load we have:
https://godbolt.org/z/j5co1qWEW - for intels `Block RThroughput: =4.0`; for ryzens, `Block RThroughput: <=2.0`
So pick cost of `4`.

For store we have:
https://godbolt.org/z/j5co1qWEW - for intels `Block RThroughput: =4.0`; for ryzens, `Block RThroughput: <=4.0`
So pick cost of `4`.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D110837
2021-10-01 17:48:14 +03:00
Roman Lebedev
612e5b05a2 [X86][Costmodel] Load/store i64/f64 Stride=2 VF=2 interleaving costs
The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

For load we have:
https://godbolt.org/z/8a1cfGeMn - for intels `Block RThroughput: =2.0`; for ryzens, `Block RThroughput: =1.0`
So pick cost of `2`.

For store we have:
https://godbolt.org/z/jMdcM47bx - for intels `Block RThroughput: =2.0`; for ryzens, `Block RThroughput: <=2.0`
So pick cost of `2`.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D110835
2021-10-01 17:48:14 +03:00
Roman Lebedev
ea76cb87ee [X86][Costmodel] Load/store i32/f32 Stride=2 VF=32 interleaving costs
The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

Here for `store` pattern we are starting to have spilling,
so accurate modelling may be problematic,
although if i drop the spilling, the measurements don't change.

For load we have:
https://godbolt.org/z/1oTTnncbx - for intels `Block RThroughput: =16.0`; for ryzens, `Block RThroughput: <=8.0`
So pick cost of `16`.

For store we have:
https://godbolt.org/z/1oTTnncbx - for intels `Block RThroughput: =16.0`; for ryzens, `Block RThroughput: =8.0`
So pick cost of `16`.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D110761
2021-10-01 17:48:14 +03:00
Roman Lebedev
80cd8da78d [X86][Costmodel] Load/store i32/f32 Stride=2 VF=16 interleaving costs
The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

For load we have:
https://godbolt.org/z/M9eev3xe8 - for intels `Block RThroughput: =8.0`; for ryzens, `Block RThroughput: <=4.0`
So pick cost of `8`.

For store we have:
https://godbolt.org/z/M9eev3xe8 - for intels `Block RThroughput: =8.0`; for ryzens, `Block RThroughput: =4.0`
So pick cost of `8`.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D110756
2021-10-01 17:48:14 +03:00
Roman Lebedev
3a0643e9c2 [X86][Costmodel] Load/store i32/f32 Stride=2 VF=8 interleaving costs
The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

For load we have:
https://godbolt.org/z/n8aMKeo4E - for intels `Block RThroughput: =4.0`; for ryzens, `Block RThroughput: <=2.0`
So pick cost of `4`.

For store we have:
https://godbolt.org/z/n8aMKeo4E - for intels `Block RThroughput: =4.0`; for ryzens, `Block RThroughput: =2.0`
So pick cost of `4`.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D110755
2021-10-01 17:48:13 +03:00