Commit Graph

1593 Commits

Author SHA1 Message Date
Baptiste Saleil
c2892978e9 [PowerPC] Rename the vector pair intrinsics and builtins to replace the _mma_ prefix by _vsx_
On PPC, the vector pair instructions are independent from MMA.
This patch renames the vector pair LLVM intrinsics and Clang builtins to replace the _mma_ prefix by _vsx_ in their names.
We also move the vector pair type/intrinsic/builtin tests to their own files.

Differential Revision: https://reviews.llvm.org/D91974
2020-12-17 13:19:27 -05:00
QingShan Zhang
ebdd20f430 Expand the fp_to_int/int_to_fp/fp_round/fp_extend as libcall for fp128
X86 and AArch64 expand it as libcall inside the target. And PowerPC also
want to expand them as libcall for P8. So, propose an implement in the
legalizer to common the logic and remove the code for X86/AArch64 to
avoid the duplicate code.

Reviewed By: Craig Topper

Differential Revision: https://reviews.llvm.org/D91331
2020-12-17 07:59:30 +00:00
QingShan Zhang
08e287aaf3 [PowerPC][FP128] Fix the incorrect signature for math library call
The runtime library has two family library implementation for ppc_fp128 and fp128.
For IBM Long double(ppc_fp128), it is suffixed with 'l', i.e(sqrtl). For
IEEE Long double(fp128), it is suffixed with "ieee128" or "f128".
We miss to map several libcall for IEEE Long double.

Reviewed By: qiucf

Differential Revision: https://reviews.llvm.org/D91675
2020-12-14 07:52:56 +00:00
Zarko Todorovski
ce4040a43d [PPC] Check for PPC64 when emitting 64bit specific VSX nodes when pattern matching built vectors
Some of the pattern matching in PPCInstrVSX.td and node lowering involving vectors assumes 64bit mode.  This patch disables some of the unsafe pattern matching and lowering of BUILD_VECTOR in 32bit mode.

Reviewed By: Xiangling_L

Differential Revision: https://reviews.llvm.org/D92789
2020-12-12 15:28:28 -05:00
diggerlin
997d286f2d [AIX][XCOFF] emit traceback table for function in aix
SUMMARY:
 1. added a new option -xcoff-traceback-table to control whether generate traceback table for function.
 2. implement the functionality of emit traceback table of a function.

Reviewers: hubert.reinterpretcast, Jason Liu
Differential Revision: https://reviews.llvm.org/D92398
2020-12-11 17:50:25 -05:00
Stefan Pintilie
2812c15156 [PowerPC] Fix missing nop after call to weak callee.
Weak functions can be replaced by other functions at link time. Previously it
was assumed that no matter what the weak callee function was replaced with it
would still share the same TOC as the caller. This is no longer true as a weak
callee with a TOC setup can be replaced by another function that was compiled
with PC Relative and does not have a TOC at all.

This patch makes sure that all calls to functions defined as weak from a caller
that has a valid TOC have a nop after the call to allow a place for the linker
to restore the TOC.

Reviewed By: NeHuang

Differential Revision: https://reviews.llvm.org/D91983
2020-12-08 09:38:44 -06:00
Qiu Chaofan
efdd463050 [PowerPC] Fix chain for i1-to-fp operation
A simple SELECT is used for converting i1 to floating types on ppc32,
but in constrained cases, the chain is not handled properly. This patch
will fix that.

Reviewed By: steven.zhang

Differential Revision: https://reviews.llvm.org/D92365
2020-12-07 10:38:56 +08:00
QingShan Zhang
c25b039e21 [PowerPC] Fix the regression caused by commit 9c588f53fc
Add a TypeLegal check for MVT::i1 and add the test.
2020-12-04 10:22:13 +00:00
QingShan Zhang
9bf0fea372 [PowerPC] Add the hw sqrt test for vector type v4f32/v2f64
PowerPC ISA support the input test for vector type v4f32 and v2f64.
Replace the software compare with hw test will improve the perf.

Reviewed By: ChenZheng

Differential Revision: https://reviews.llvm.org/D90914
2020-12-03 03:19:18 +00:00
Qiu Chaofan
ffa2dce590 [PowerPC] Fix FLT_ROUNDS_ on little endian
In lowering of FLT_ROUNDS_, FPSCR content will be moved into FP register
and then GPR, and then truncated into word.

For subtargets without direct move support, it will store and then load.
The load address needs adjustment (+4) only on big-endian targets. This
patch fixes it on using generic opcodes on little-endian and subtargets
with direct-move.

Reviewed By: steven.zhang

Differential Revision: https://reviews.llvm.org/D91845
2020-12-02 17:16:32 +08:00
QingShan Zhang
47f784ace6 [PowerPC] Promote the i1 to i64 for SINT_TO_FP/FP_TO_SINT
i1 is the native type for PowerPC if crbits is enabled. However, we need
to promote the i1 to i64 as we didn't have the pattern for i1.

Reviewed By: Qiu Chao Fang

Differential Revision: https://reviews.llvm.org/D92067
2020-12-02 05:37:45 +00:00
QingShan Zhang
4d83aba422 [DAGCombine] Adding a hook to improve the precision of fsqrt if the input is denormal
For now, we will hardcode the result as 0.0 if the input is denormal or 0. That will
have the impact the precision. As the fsqrt added belong to the cold path of the
cmp+branch, it won't impact the performance for normal inputs for PowerPC, but improve
the precision if the input is denormal.

Reviewed By: Spatel

Differential Revision: https://reviews.llvm.org/D80974
2020-11-27 02:10:55 +00:00
Zarko Todorovski
6d648e69c0 [AIX] Add support for non var_arg extended vector ABI calling convention on AIX
This patch enables passing non variadic vector type parameters on the caller and callee side and vector return on AIX that are passed in vector registers only.

So far, support is enabled for only the AIX extended Altivec ABI Calling convention.

Reviewed By: sfertile, DiggerLin

Differential Revision: https://reviews.llvm.org/D86476
2020-11-26 12:03:51 -05:00
Simon Pilgrim
0637dfe88b [DAG] Legalize abs(x) -> smax(x,sub(0,x)) iff smax/sub are legal
If smax() is legal, this is likely to result in smaller codegen expansion for abs(x) than the xor(add,ashr) method.

This is also what PowerPC has been doing for its abs implementation, so it lets us get rid of a load of custom lowering code there (and which was never updated when they added smax lowering).

Alive2: https://alive2.llvm.org/ce/z/xRk3cD

Differential Revision: https://reviews.llvm.org/D92095
2020-11-25 15:03:03 +00:00
QingShan Zhang
9c588f53fc [DAGCombine] Add hook to allow target specific test for sqrt input
PowerPC has instruction ftsqrt/xstsqrtdp etc to do the input test for software square root.
LLVM now tests it with smallest normalized value using abs + setcc. We should add hook to
target that has test instructions.

Reviewed By: Spatel, Chen Zheng, Qiu Chao Fang

Differential Revision: https://reviews.llvm.org/D80706
2020-11-25 05:37:15 +00:00
QingShan Zhang
fa42f08b26 [PowerPC][FP128] Fix the incorrect calling convention for IEEE long double on Power8
For now, we are using the GPR to pass the arguments/return value for fp128 on Power8,
which is incorrect. It should be VSR. The reason why we do it this way is that,
we are setting the fp128 as illegal which make LLVM try to emulate it with i128 on
Power8. So, we need to correct it as legal.

Reviewed By: Nemanjai

Differential Revision: https://reviews.llvm.org/D91527
2020-11-25 01:43:48 +00:00
Zarko Todorovski
c92f29b05e [AIX] Add mabi=vec-extabi options to enable the AIX extended and default vector ABIs.
Added support for the options mabi=vec-extabi and mabi=vec-default which are analogous to qvecnvol and qnovecnvol when using XL on AIX.
The extended Altivec ABI on AIX is enabled using mabi=vec-extabi in clang and vec-extabi in llc.

Reviewed By: Xiangling_L, DiggerLin

Differential Revision: https://reviews.llvm.org/D89684
2020-11-24 18:17:53 -05:00
Sean Fertile
4f5355ee73 [PowerPC] Don't reuse an illegal typed load for int_to_fp conversion.
When the operand to an (s/u)int_to_fp node is an illegally typed load we
cannot reuse the load address since we can not build a proper dependancy
chain. The legalized loads will use a different chain output then the
illegal load. If we reuse the load address then we will build a
conversion node that uses the chain of the illegal load and operations
which modify the memory address in the other dependancy chain can be
scheduled before the floating point load which feeds the conversion.

Differential Revision: https://reviews.llvm.org/D91265
2020-11-24 15:45:33 -05:00
Craig Topper
a7eae62a42 [SelectionDAG][X86][PowerPC][Mips] Replace the default implementation of LowerOperationWrapper with the X86 and PowerPC version.
The default version only works if the returned node has a single
result. The X86 and PowerPC versions support multiple results
and allow a single result to be returned from a node with
multiple outputs. And allow a single result that is not result 0
of the node.

Also replace the Mips version since the new version should work
for it. The original version handled multiple results, but only
if the new node and original node had the same number of results.

Differential Revision: https://reviews.llvm.org/D91846
2020-11-20 10:06:53 -08:00
Simon Pilgrim
5f3a8074a4 [PPC] Fix dead store value clang static analyzer warning. NFCI.
Simplify the SplatBits 2-byte -> 4-byte 'splat'.
2020-11-17 16:27:45 +00:00
Baptiste Saleil
3f78605a8c [PowerPC] Add paired vector load and store builtins and intrinsics
This patch adds the Clang builtins and LLVM intrinsics to load and store vector pairs.

Differential Revision: https://reviews.llvm.org/D90799
2020-11-13 12:35:10 -06:00
Qiu Chaofan
3204ffeade [PowerPC] [NFC] Rename VCMPo to VCMP_rec
Reviewed By: jsji

Differential Revision: https://reviews.llvm.org/D90581
2020-11-03 11:10:59 +08:00
Nemanja Ivanovic
5459d08795 [PowerPC] Fix single-use check and update chain users for ld-splat
When converting a BUILD_VECTOR or VECTOR_SHUFFLE to a splatting load
as of 1461fb6e78, we inaccurately check
for a single user of the load and neglect to update the users of the
output chain of the original load. As a result, we can emit a new
load when the original load is kept and the new load can be reordered
after a dependent store. This patch fixes those two issues.

Fixes https://bugs.llvm.org/show_bug.cgi?id=47891
2020-10-27 16:49:38 -05:00
Victor Huang
2e1a737f46 [PowerPC][PCRelative] Turn on TLS support for PCRel by default
Turn on TLS support for PCRel by default and update the test cases.

Differential Revision: https://reviews.llvm.org/D88738
Reviewed by: stefanp, kamaub
2020-10-27 13:58:44 -05:00
Amy Kwan
6a946fd06f [DAGCombiner][PowerPC] Remove isMulhCheaperThanMulShift TLI hook, Use isOperationLegalOrCustom directly instead.
MULH is often expanded on targets.
This patch removes the isMulhCheaperThanMulShift hook and uses
isOperationLegalOrCustom instead.

Differential Revision: https://reviews.llvm.org/D80485
2020-10-19 12:23:04 -05:00
Kai Luo
354d3106c6 [PowerPC] Skip combining (uint_to_fp x) if x is not simple type
Current powerpc64le backend hits
```
Combining: t7: f64 = uint_to_fp t6
llc: llvm-project/llvm/include/llvm/CodeGen/ValueTypes.h:291: llvm::MVT llvm::EVT::getSimpleVT() const: Assertion `isSimple() && "Expected a SimpleValueType!"' failed.
```
This patch fixes it by skipping combination if `t6` is not simple type.
Fixed https://bugs.llvm.org/show_bug.cgi?id=47660.

Reviewed By: #powerpc, steven.zhang

Differential Revision: https://reviews.llvm.org/D88388
2020-10-19 05:23:46 +00:00
Albion Fung
d30155feaa [PowerPC] Implementation of 128-bit Binary Vector Rotate builtins
This patch implements 128-bit Binary Vector Rotate builtins for PowerPC10.

Differential Revision: https://reviews.llvm.org/D86819
2020-10-16 18:03:22 -04:00
David Sherwood
47f2dc7e5f [SVE][NFC] Replace some TypeSize comparisons in non-AArch64 Targets
In most of lib/Target we know that we are not dealing with scalable
types so it's perfectly fine to replace TypeSize comparison operators
with their fixed width equivalents, making use of getFixedSize()
and so on.

Differential Revision: https://reviews.llvm.org/D89101
2020-10-15 09:01:21 +01:00
Ahsan Saghir
f3202b30b8 [PowerPC] Add assemble disassemble intrinsics for MMA
This patch adds support for assemble disassemble intrinsics
for MMA.

Reviewed By: bsaleil, #powerpc

Differential Revision: https://reviews.llvm.org/D88739
2020-10-13 13:21:58 -05:00
Simon Pilgrim
2c3e4a21f9 [PowerPC] ReplaceNodeResults - bail on funnel shifts and let generic legalizers deal with it
Fixes regression raised on D88834 for 32-bit triple + 64-bit cpu cases (which apparently is a thing).
2020-10-10 19:13:16 +01:00
Fangrui Song
2bd4730850 [PowerPC] Fix signed overflow in decomposeMulByConstant after D88201
Caught by multipliers LONG_MAX (after +1) and LONG_MIN (after -1) in CodeGen/PowerPC/mul-const-i64.ll
2020-10-09 18:29:12 -07:00
Esme-Yi
e9fd8823ba [DAGCombiner] Add decomposition patterns for Mul-by-Imm.
Summary: This patch is derived from D87384.
In this patch we expand the existing decomposition of mul-by-constant to be more general by implementing 2 patterns:
```
  mul x, (2^N + 2^M) --> (add (shl x, N), (shl x, M))
  mul x, (2^N - 2^M) --> (sub (shl x, N), (shl x, M))
```
The conversion will be trigged if the multiplier is a big constant that the target can't use a single multiplication instruction to handle. This is controlled by the hook `decomposeMulByConstant`.
More over, the conversion benefits from an ILP improvement since the instructions are independent. A case with the sequence like following also gets benefit since a shift instruction is saved.

```
*res1 = a * 0x8800;
*res2 = a * 0x8080;
```

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D88201
2020-10-09 08:51:40 +00:00
Chen Zheng
0492dd91c4 [PowerPC] add more builtins for PPCTargetLowering::getTgtMemIntrinsic
Reviewed By: steven.zhang

Differential Revision: https://reviews.llvm.org/D88374
2020-10-06 23:48:33 -04:00
Craig Topper
1127662c6d [SelectionDAG] Make sure FMF are propagated when getSetcc canonicalizes FP constants to RHS.
getNode handling for ISD:SETCC calls FoldSETCC which can canonicalize
FP constants to the RHS. When this happens we should create the node
with the FMF that was requested. By using FlagInserter when can ensure
any calls to getNode/getSetcc during canonicalization will also get the flags.

Differential Revision: https://reviews.llvm.org/D88063
2020-10-05 14:55:23 -07:00
Zarko Todorovski
052c5bf40a [PPC] Do not emit extswsli in 32BIT mode when using -mcpu=pwr9
It looks like in some circumstances when compiling with `-mcpu=pwr9` we create an EXTSWSLI node when which causes llc to fail. No such error occurs in pwr8 or lower.

This occurs in 32BIT AIX and BE Linux. the cause seems to be that the default return in combineSHL is to create an EXTSWSLI node.  Adding a check for whether we are in PPC64 before that fixes the issue.

Reviewed By: #powerpc, nemanjai

Differential Revision: https://reviews.llvm.org/D87046
2020-09-30 11:06:20 -04:00
Sean Fertile
dfb717da1f [PowerPC] Remove support for VRSAVE save/restore/update.
After removal of Darwin as a PowerPC subtarget, the VRSAVE
save/restore/spill/update code is no longer needed by any supported
subtarget, so remove it while keeping support for vrsave and related instruction
aliases for inline asm. I've pre-commited tests to document the existing vrsave
handling in relation to @llvm.eh.unwind.init and inline asm usage, as
well as a test which shows a beahviour change on AIX related to
returning vector type as we were wrongly emiting VRSAVE_UPDATE on AIX.
2020-09-30 10:05:53 -04:00
Baptiste Saleil
0156914275 [PowerPC] Legalize v256i1 and v512i1 and implement load and store of these types
This patch legalizes the v256i1 and v512i1 types that will be used for MMA.

It implements loads and stores of these types.
v256i1 is a pair of VSX registers, so for this type, we load/store the two
underlying registers. v512i1 is used for MMA accumulators. So in addition to
loading and storing the 4 associated VSX registers, we generate instructions to
prime (copy the VSX registers to the accumulator) after loading and unprime
(copy the accumulator back to the VSX registers) before storing.

This patch also adds the UACC register class that is necessary to implement the
loads and stores. This class represents accumulator in their unprimed form and
allow the distinction between primed and unprimed accumulators to avoid invalid
copies of the VSX registers associated with primed accumulators.

Differential Revision: https://reviews.llvm.org/D84968
2020-09-28 14:39:37 -05:00
Amy Kwan
2e7117f847 [PowerPC] Implement the 128-bit vec_[all|any]_[eq | ne | lt | gt | le | ge] builtins in Clang/LLVM
This patch implements the vec_[all|any]_[eq | ne | lt | gt | le | ge] builtins for vector signed/unsigned __int128.

Differential Revision: https://reviews.llvm.org/D87910
2020-09-23 16:49:40 -04:00
Albion Fung
88cdbeab41 [PowerPC] Implement Vector signed/unsigned __int128 overloads for the comparison builtins
This patch implements Vector signed/unsigned __int128 overloads for the comparison builtins.

Differential Revision: https://reviews.llvm.org/D87804
2020-09-23 16:49:40 -04:00
Victor Huang
652a8f150d [PowerPC][PCRelative] Thread Local Storage Support for Local Dynamic
This patch is the initial support for the Local Dynamic Thread Local Storage
model to produce code sequence and relocation correct to the ABI for the model
when using PC relative memory operations.

Differential Revision: https://reviews.llvm.org/D87721
2020-09-23 13:48:06 -05:00
Albion Fung
d7eb917a7c [PowerPC] Implementation of 128-bit Binary Vector Mod and Sign Extend builtins
This patch implements 128-bit Binary Vector Mod and Sign Extend builtins for PowerPC10.

Differential: https://reviews.llvm.org/D87394#inline-815858
2020-09-23 01:18:14 -05:00
Qiu Chaofan
1d782c2987 [PowerPC] Pass nofpexcept flag to custom lowered constrained ops
This is a follow-up of D86605. For strict DAG FP node, if its FP
exception behavior metadata is ignore, it should have nofpexcept flag.
But during custom lowering, this flag isn't passed down.

This is also seen on X86 target.

Reviewed By: uweigand

Differential Revision: https://reviews.llvm.org/D87390
2020-09-21 10:44:25 +08:00
Qiu Chaofan
ebfbdebe96 [PowerPC] Fix store-fptoi combine of f128 on Power8
llc would crash for (store (fptosi-f128-i32)) when -mcpu=pwr8, we should
not generate FP_TO_(S|U)INT_IN_VSR for f128 types at this time. This
patch fixes it.

Reviewed By: steven.zhang

Differential Revision: https://reviews.llvm.org/D86686
2020-09-17 10:21:35 +08:00
Albion Fung
05aa997d51 [PowerPC] Implement __int128 vector divide operations
This patch implements __int128 vector divide operations for ISA3.1.

Differential Revision: https://reviews.llvm.org/D85453
2020-09-15 15:19:35 -04:00
Kamau Bridgeman
c0f199e566 [PowerPC] Implement Thread Local Storage Support for Local Exec
This patch is the initial support for the Local Exec Thread Local
Storage model to produce code sequence and relocations correct
to the ABI for the model when using PC relative memory operations.

Patch by: Kamau Bridgeman

Differential Revision: https://reviews.llvm.org/D83404
2020-09-14 14:16:28 -05:00
Qiu Chaofan
6afb279100 [PowerPC] [FPEnv] Disable strict FP mutation by default
22a0edd0 introduced a config IsStrictFPEnabled, which controls the
strict floating point mutation (transforming some strict-fp operations
into non-strict in ISel). This patch disables the mutation by default
since we've finished PowerPC strict-fp enablement in backend.

Reviewed By: uweigand

Differential Revision: https://reviews.llvm.org/D87222
2020-09-10 13:28:09 +08:00
Qiu Chaofan
88ff4d2ca1 [PowerPC] Fix STRICT_FRINT/STRICT_FNEARBYINT lowering
In standard C library, both rint and nearbyint returns rounding result
in current rounding mode. But nearbyint never raises inexact exception.
On PowerPC, x(v|s)r(d|s)pic may modify FPSCR XX, raising inexact
exception. So we can't select constrained fnearbyint into xvrdpic.

One exception here is xsrqpi, which will not raise inexact exception, so
fnearbyint f128 is okay here.

Reviewed By: uweigand

Differential Revision: https://reviews.llvm.org/D87220
2020-09-09 22:40:58 +08:00
Brad Smith
88b368a1c4 [PowerPC] Set setMaxAtomicSizeInBitsSupported appropriately for 32-bit PowerPC in PPCTargetLowering
Reviewed By: nemanjai

Differential Revision: https://reviews.llvm.org/D86165
2020-09-08 21:21:14 -04:00
Craig Topper
b1e68f885b [SelectionDAGBuilder] Pass fast math flags to getNode calls rather than trying to set them after the fact.:
This removes the after the fact FMF handling from D46854 in favor of passing fast math flags to getNode. This should be a superset of D87130.

This required adding a SDNodeFlags to SelectionDAG::getSetCC.

Now we manage to contant fold some stuff undefs during the
initial getNode that we don't do in later DAG combines.

Differential Revision: https://reviews.llvm.org/D87200
2020-09-08 15:27:21 -07:00
Qiu Chaofan
705271d9cd [PowerPC] Expand constrained ppc_fp128 to i32 conversion
Libcall __gcc_qtou is not available, which breaks some tests needing
it. On PowerPC, we have code to manually expand the operation, this
patch applies it to constrained conversion. To keep it strict-safe,
it's using the algorithm similar to expandFP_TO_UINT.

For constrained operations marking FP exception behavior as 'ignore',
we should set the NoFPExcept flag. However, in some custom lowering
the flag is missed. This should be fixed by future patches.

Reviewed By: uweigand

Differential Revision: https://reviews.llvm.org/D86605
2020-09-05 13:16:20 +08:00