Commit Graph

3662 Commits

Author SHA1 Message Date
Wael Yehia
9d4e8c09f4 [XCOFF] Do not put MergeableCStrings in their own section
The current implementation generates a csect with a
".rodata.str.x.y" prefix for a MergeableCString variable definition.
However, a reference to such variable does not get the prefix in its
name because there's not enough information in the containing IR.
In particular, without seeing the initializer and absent of some other
indicators, we cannot tell that the referenced variable is a null-
terminated string.

When the AIX codegen in llvm was being developed, the prefixing was copied
from ELF without having the linker take advantage of the info.
Currently, the AIX linker does not have the capability to merge
MergeableCString variables. If such feature would ever get implemented,
the contract between the linker and compiler would have to be reconsidered.

Here's the before and after of this change:
```
@a = global i64 320255973571806, align 8
@strA = unnamed_addr constant [7 x i8] c"hello\0A\00", align 1  ;; Mergeable1ByteCString
@strB = unnamed_addr constant [8 x i8] c"Blahah\0A\00", align 1 ;; Mergeable1ByteCString
@strC = unnamed_addr constant [2 x i16] [i16 1, i16 0], align 2 ;; Mergeable2ByteCString
@strD = unnamed_addr constant [2 x i16] [i16 1, i16 1], align 2 ;; !isMergeableCString
@strE = external unnamed_addr constant [2 x i16], align 2

-fdata-sections:
  .text  extern        .rodata.str1.1strA        .text  extern        strA
    0    SD       RO                               0    SD       RO
  .text  extern        .rodata.str1.1strB        .text  extern        strB
    0    SD       RO                               0    SD       RO
  .text  extern        .rodata.str2.2strC  ===>  .text  extern        strC
    0    SD       RO                               0    SD       RO
  .text  extern        strD                      .text  extern        strD
    0    SD       RO                               0    SD       RO
  .data  extern        a                         .data  extern        a
    0    SD       RW                               0    SD       RW
  undef  extern        strE                      undef  extern        strE
    0    ER       UA                               0    ER       UA

-fno-data-sections:
  .text  unamex        .rodata.str1.1            .text  unamex        .rodata
    0    SD       RO                               0    SD       RO
  .text  extern        strA                      .text  extern        strA
    0    LD       RO                               0    LD       RO
  .text  extern        strB                      .text  extern        strB
    0    LD       RO                               0    LD       RO
  .text  unamex        .rodata.str2.2      ===>  .text  extern        strC
    0    SD       RO                               0    LD       RO
  .text  extern        strC                      .text  extern        strD
    0    LD       RO                               0    LD       RO
  .text  unamex        .rodata                   .data  unamex        .data
    0    SD       RO                               0    SD       RW
  .text  extern        strD                      .data  extern        a
    0    LD       RO                               0    LD       RW
  .data  unamex        .data                     undef  extern        strE
    0    SD       RW                               0    ER       UA
  .data  extern        a
    0    LD       RW
  undef  extern        strE
    0    ER       UA
```

Reviewed by: David Tenty, Fangrui Song

Differential Revision: https://reviews.llvm.org/D156202
2023-07-29 03:24:21 +00:00
Kevin P. Neal
7e0e8b7ace [FPEnv][PowerPC] Correct strictfp tests.
Correct PowerPC strictfp tests to follow the rules documented in the LangRef:
https://llvm.org/docs/LangRef.html#constrained-floating-point-intrinsics

Mostly these tests just needed the strictfp attribute on function
definitions.  I've also removed the strictfp attribute from uses
of the constrained intrinsics because it comes by default since
D154991, but I only did this in tests I was changing anyway.

I have removed attributes added to declare lines of intrinsics. The
attributes of intrinsics cannot be changed in a test so I eliminated
attempts to do so.

Test changes verified with D146845.
2023-07-26 09:12:29 -04:00
esmeyi
e83b8a5e71 [XCOFF] Enable available_externally linkage for functions.
Summary: D80642 added support for emitting AvailableExternally Linkage on AIX. However, an assertion of "Trying to get csect representation of this symbol but none was set." occurred when a function is declared as available_externally. This is due to we missing to generate a csect for the function. This patch fixes it.

Reviewed By: hubert.reinterpretcast, shchenz

Differential Revision: https://reviews.llvm.org/D156213

Signed-off-by: Esme Yi <esme.yi@ibm.com>
2023-07-25 22:47:11 -04:00
Kai Luo
f26af16e2c [PowerPC][AIX] Enable quadword atomics by default for AIX
On AIX, a libatomic supporting inline quadword atomic operations has been released, so that compatibility is not an issue now, we can enable quadword atomics by default.

Reviewed By: #powerpc, nemanjai

Differential Revision: https://reviews.llvm.org/D151312
2023-07-25 08:21:07 +08:00
esmeyi
776195865d [XCOFF] Write source language ID and CPU version ID into C_FILE symbol.
Summary: The source language ID and CPU version ID are required by debuggers on AIX. AIX's system assembler determines the source language ID based on the source file's name suffix, and the behavior in this patch is consistent with it.

Reviewed By: shchenz

Differential Revision: https://reviews.llvm.org/D155684
2023-07-24 00:35:24 -04:00
Kishan Parmar
41af6ece6c [PowerPC/SPE] powerpcspe load and store instruction has
8-bit offset instead of 16-bit unlike other load/store instructions.
so if stack grows any further than 8-bit, create one emergency slot
for spilling.
2023-07-23 13:24:35 +05:30
Jake Egan
311abf5fc0 Implement -frecord-command-line for XCOFF integrated assembler path
The patch D153600 implemented `-frecord-command-line` for the XCOFF direct assembly path. This patch adds support for the XCOFF integrated assembly path.

Reviewed By: scott.linder

Differential Revision: https://reviews.llvm.org/D154921
2023-07-20 09:45:37 -04:00
Konstantina Mitropoulou
4c42ab1199 [DAGCombiner] Change foldAndOrOfSETCC() to optimize and/or patterns
CMP(A,C)||CMP(B,C) => CMP(MIN/MAX(A,B), C)
CMP(A,C)&&CMP(B,C) => CMP(MIN/MAX(A,B), C)

This first patch handles integer types.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D153502
2023-07-17 17:13:47 -07:00
Amy Kwan
8e0e442c1d [AIX][TLS] Account for local-exec accesses in XCOFFObjectWriter
This is a follow up to D149722 and aims to address https://github.com/llvm/llvm-project/issues/63885.
Local-exec accesses were not previously accounted for in XCOFFObjectWriter.
Specifically, the R_TLS_LE relocation was not previously handled, which lead to
the incorrect value being written for the relocation target.

Within this patch, the value being written is set to the symbol's virtual
address and extra relocation tests are added.

Differential Revision: https://reviews.llvm.org/D155415
2023-07-17 12:15:44 -05:00
Stephen Peckham
ac5d5351d4 Use empty symbol name for XCOFF text csect
When generating XCOFF, the compiler generates a csect with an internal
name.  Each function results in a label within the csect.  This patch
replaces the internal name ".text" with an empty string "".  This avoids
adding special code to handle a function text() in the source file, and
works better with some XCOFF tools that are confused when the csect and
the first function have the same address.

Reviewed By: hubert.reinterpretcast

Differential Revision: https://reviews.llvm.org/D154854
2023-07-15 16:13:48 -04:00
Kamau Bridgeman
62c1cf7c63 [PowerPC][Future] Enable __builtin_mma_xxm[t|f]acc
Future cpu instructions dmxxinstdmr512 and dmxxextfdmr512 insert and extract
quad vectors from the new wide accumulator(wacc) register class.
The introduction of these new instructions renders the p10 instructions
xxmtacc and xxmfacc obsolete since the new wacc register class is a better
choice for handing quad vector operations. This patch ensures that, for
future cpu, instructions dmxxinstdmr512 and dmxxextfdmr512 are generated
by custom lowering the intrinsics for xxm[t|f]acc to produce no instructions.

Reviewed By: amyk, lei

Differential Revision: https://reviews.llvm.org/D153034
2023-07-14 13:38:40 -05:00
Sean Fertile
5e28d30f1f [XCOFF][AIX] Peephole optimization for toc-data.
Followup to D101178 - peephole optimization that converts a
load address instruction and a consuming load/store into just the
load/store when its safe to do so.

eg: converts the 2 instruction code sequence
  la 4, i[TD](2)
  stw 3, 0(4)
to
  stw 3, i[TD](2)

Differential Revision: https://reviews.llvm.org/D101470
2023-07-13 20:40:09 -04:00
Nemanja Ivanovic
329b8cd3e3 [PowerPC] Improve code gen for vector add
Improve codegen for vectors modulo additions.

Reviewed By: nemanjai

Differential Revision: https://reviews.llvm.org/D154447
2023-07-13 15:21:49 -04:00
Nikita Popov
edb2fc6dab [llvm] Remove explicit -opaque-pointers flag from tests (NFC)
Opaque pointers mode is enabled by default, no need to explicitly
enable it.
2023-07-12 14:35:55 +02:00
Jake Egan
bbd0d123d3 Implement -frecord-command-line for XCOFF
This patch extends support of the option `-frecord-command-line` to XCOFF. XCOFF doesn’t have custom sections like ELF, so the command line data is emitted to a .info section instead. A C_INFO symbol is generated with the .info section to preserve the command line data past the link step. Multiple command lines are separated by newlines and null bytes. The command line data can be retrieved on AIX with command `what file_name`.

Reviewed By: scott.linder

Differential Revision: https://reviews.llvm.org/D153600
2023-07-10 12:47:07 -04:00
Matt Arsenault
310f839612 DAG: Lower is.fpclass fcInf to fcmp of fabs
InstCombine should have taken care of this, but I think
this is more useful in the future when the expansion
tries to handle multiple cases at a time with fcmp.

x87 looks worse to me but the only thing I know about it is that
I aggressively do not care about it.

https://reviews.llvm.org/D143198
2023-07-07 17:00:10 -04:00
Nemanja Ivanovic
b0e249d5e2 Reland "[PowerPC] Remove extend between shift and and"
The commit originally caused a bootstrap failure on the big endian
PPC bot as the combine was interfering with the legalizer when
applied on illegal types. This update restricts the combine to
the only types for which it is actually needed. Tested on PPC BE
bootstrap locally.
2023-07-07 14:45:05 -04:00
Qiu Chaofan
a2b5117df7 [PowerPC] Update InputOps of Power10 SchedModel
Count of input operands affect pipeline forwarding in scheduling model.
Previous Power10 model definition arranges some instructions into
incorrect groups, by counting the wrong number of input operands.

This patch updates the model, setting the input operands count correctly
by excluding irrelevant immediate operands and count memory operands of
load instructions correctly.

Reviewed By: shchenz

Differential Revision: https://reviews.llvm.org/D153842
2023-07-07 22:46:22 +08:00
zhijian
d6d7f7b1d2 [AIX][XCOFF] print out the traceback info
Summary:

  Adding a new option -traceback-table to print out the traceback info of xcoff ojbect file.

Reviewers: James Henderson, Fangrui Song, Stephen Peckham, Xing Xue

Differential Revision: https://reviews.llvm.org/D89049
2023-07-06 11:47:08 -04:00
Amy Kwan
598cccea80 [AIX][TLS] Generate optimized local-exec access code sequence using X-Form loads/stores
This patch is a follow up to D149722, D152669 and D153645, where a slightly more
optimized code sequence is generated for 64-bit and 32-bit local-exec accesses
when optimizations are turned on.

Handling is added PPCISelDAGToDAG.cpp in order to check if any D-form loads or
stores that follow an PPCISD::ADD_TLS can be optimized to use an X-Form load or
store. In this particular situation, this allows the ADD_TLS node to be removed
completely.

Differential Revision: https://reviews.llvm.org/D150367
2023-07-06 07:57:05 -05:00
Nemanja Ivanovic
7cd9084c69 Revert "[PowerPC] Remove extend between shift and and"
This reverts commit a57236de4e.
Causes a bootstrap failure on ppc64be.
2023-07-05 20:04:49 -04:00
Nemanja Ivanovic
a57236de4e [PowerPC] Remove extend between shift and and
The SDAG will sometimes insert an extend between
the shift and an and (immediate) even though the
immediate is narrower than the narrow size.
This does not allow us to produce a rotate
instruction (such as rlwinm).
This patch just adds a combine to move the extend
onto the and.

Differential revision: https://reviews.llvm.org/D152911
2023-07-05 16:33:07 -04:00
esmeyi
2d74cf1f24 [XCOFF] Force recording a relocation for weak symbol label.
Summary: Currently, if there are multiple definitions of the same symbol declared has weak linkage, the linker may choose the wrong one when they are compiled with integrated-as. This patch fixes the issue. If the target symbol is a weak label we must not attempt to resolve the fixup directly. Emit a relocation and leave resolution of the final target address to the linker.

Reviewed By: shchenz

Differential Revision: https://reviews.llvm.org/D153839
2023-07-05 01:58:18 -04:00
Lei Huang
c7c3d71414 [PowerPC] add testcase for vector add and shift 2023-07-04 10:45:19 -04:00
Ting Wang
0b955fee90 [PowerPC][NFC] add SADDO/SSUBO test case
Differential Revision: https://reviews.llvm.org/D152339

Reviewed By: qiucf
2023-06-29 20:35:59 -04:00
Ting Wang
919588fd10 [PowerPC][NFC] expose issue on absol-jump-table-enabled.ll (relocation-model=pic + ppc-use-absolute-jumptables)
Differential Revision: https://reviews.llvm.org/D154047
2023-06-29 20:32:15 -04:00
Matt Arsenault
003b58f65b IR: Add llvm.frexp intrinsic
Add an intrinsic which returns the two pieces as multiple return
values. Alternatively could introduce a pair of intrinsics to
separately return the fractional and exponent parts.

AMDGPU has native instructions to return the two halves, but could use
some generic legalization and optimization handling. For example, we
should be able to handle legalization of f16 on older targets, and for
bf16. Additionally antique targets need a hardware workaround which
would be better handled in the backend rather than in library code
where it is now.
2023-06-28 14:50:16 -04:00
Amy Kwan
11b71ade51 [PowerPC][TLS] Add additional TLS X-Form loads/store instructions
This patch is a follow up to D43315, and adds the following new load/store
TLS specific instructions for integer and floating point scalar types:
```
LHAXTLS
LWAXTLS
LHAXTLS_32
LWAXTLS_32
LFSXTLS
LFDXTLS
STFSXTLS
STFDXTLS
```
These instructions can be used to optimized TLS sequences when D-Form
loads/stores follow an ADD_TLS instruction.

Duplicate versions of these instructions are also added within an isAsmParserOnly=1
block (similar to D47382) to allow llvm-mc to assemble these instructions.

Differential Revision: https://reviews.llvm.org/D153645
2023-06-27 11:33:38 -05:00
Matthias Braun
02ba5b8c6b Ignore load/store until stack address computation
No longer conservatively assume a load/store accesses the stack when we
can prove that we did not compute any stack-relative address up to this
point in the program.

We do this in a cheap not-quite-a-dataflow-analysis: Assume
`NoStackAddressUsed` when all predecessors of a block already guarantee
it. Process blocks in reverse post order to guarantee that except for
loop headers we have processed all predecessors of a block before
processing the block itself. For loops we accept the conservative answer
as they are unlikely to be shrink-wrappable anyway.

Differential Revision: https://reviews.llvm.org/D152213
2023-06-26 13:50:36 -07:00
Matthias Braun
759b217626 Switch tests to use update_llc_test_checks
Switch and update some tests to use `update_llc_test_checks` to reduce
clutter in upcoming change.

Differential Revision: https://reviews.llvm.org/D152215
2023-06-26 13:50:36 -07:00
Matt Arsenault
f2596b754c SeparateConstOffsetFromGEP: Don't use SCEV
This was only using the SCEV expressions as a map key, which we can do
just as well with the value pointers. This also allows it to handle
vectors.
2023-06-26 13:58:06 -04:00
Amaury Séchet
632a8aca07 [NFC] Autogenerate CodeGen/PowerPC/tail-dup-break-cfg.ll 2023-06-25 22:55:49 +00:00
Amaury Séchet
e345b9ca7a [NFC] Autogenerate CodeGen/PowerPC/pr40922.ll 2023-06-25 21:05:06 +00:00
Amaury Séchet
93af6bdcaf [NFC] Autogenerate CodeGen/PowerPC/select-i1-vs-i1.ll 2023-06-25 01:27:29 +00:00
Matt Arsenault
80e2c26dfd RegisterCoalescer: Fix name of pass
I finally snapped and fixed this inconsistency.
2023-06-21 10:30:43 -04:00
Kishan Parmar
c42f0a6e64 PowerPC/SPE: Add phony registers for high halves of SPE SuperRegs
The intent of this patch is to make upper halves of SPE SuperRegs(s0,..,s31)
as artificial regs, similar to how X86 has done it.
And emit store /reload instructions for the required halves.

PR : https://github.com/llvm/llvm-project/issues/57307

Reviewed By: jhibbits

Differential Revision: https://reviews.llvm.org/D152437
2023-06-21 10:24:40 +00:00
tianleli
1c27275813 [DAG] Unroll and expand illegal result of LDEXP and POWI instead of widen.
Reviewed By: pengfei

Differential Revision: https://reviews.llvm.org/D153104
2023-06-21 14:27:39 +08:00
Fangrui Song
e0a6561ec9 [XRay] Make xray_fn_idx entries PC-relative
As mentioned by commit c5d38924dc (Apr 2020),
PC-relative entries avoid dynamic relocations and can therefore make the
section read-only.

This is similar to D78082 and D78590. We cannot commit to support
compiler/runtime built at different versions, so just don't play with versions.

For Mach-O support (incomplete yet), we use non-temporary `lxray_fn_idx[0-9]+`
symbols. Label differences are represented as a pair of UNSIGNED and SUBTRACTOR
relocations. The SUBTRACTOR external relocation requires r_extern==1 (needs to
reference a symbol table entry) which can be satisfied by `lxray_fn_idx[0-9]+`.
A `lxray_fn_idx[0-9]+` symbol also serves as the atom for this dead-strippable
section (follow-up to commit b9a134aa62).

Differential Revision: https://reviews.llvm.org/D152661
2023-06-20 22:40:56 -07:00
Amy Kwan
f5ae075048 [AIX][TLS] Generate 32-bit local-exec access code sequence
This patch adds support for the TLS local-exec access model on AIX to allow
for the ability to generate the 32-bit (specifically, non-optimized) code sequence.
This work is a follow up of D149722.

The particular sequence that is generated for this sequence is as follows:
```
.tc var[TC],var[TL]@le.   // variable offset, with the le relocation specifier

bla .__get_tpointer()     // get the thread pointer, modifies r3
lwz reg1, var[TC](2)      // load the variable offset
add reg2, r3, reg1        // add the variable offset to the retrieved thread pointer
```

Differential Revision: https://reviews.llvm.org/D152669
2023-06-20 11:57:38 -05:00
Simon Pilgrim
ff23856c1c [DAG] Fold (abds x, y) -> (abdu x, y) iff both args are known positive
This is a generic DAG combine version of D151055 which recognizes when a signed ABDS can be safely replaced with a unsigned ABDU instruction if it is legal.

Alive2: https://alive2.llvm.org/ce/z/pb5BjG

Differential Revision: https://reviews.llvm.org/D153328
2023-06-20 15:31:22 +01:00
Amy Kwan
d5659808b2 [AIX][TLS] Generate 64-bit local-exec access code sequence
This patch adds support for the TLS local-exec access model on AIX to allow
for the ability to generate the 64-bit (specifically, non-optimized) code sequence.

For this patch in particular, the sequence that is generated involves a load of the
variable offset, followed by an add of the loaded variable offset to r13 (which is
thread pointer, respectively). This code sequence looks like the following:
```
ld reg1,var[TC](2)
add reg2, reg1, r13     // r13 contains the thread pointer
```
The TOC (.tc pseudo-op) entries generated in the assembly files are also
changed where we add the @le relocation for the variable offset.

Differential Revision: https://reviews.llvm.org/D149722
2023-06-19 12:17:30 -05:00
Fangrui Song
49b61ead47 [XRay][test] Make tests less sensitive to .Ltmp/Ltmp label changes 2023-06-18 13:32:40 -07:00
esmeyi
028a261350 [XCOFF] FixupOffsetInCsect should be 0 for R_REF relocation.
Summary: The FixupOffsetInCsect should be 0 for R_REF relocation since it specifies a nonrelocating reference. Otherwise liker would try to relocate the symbol through its address and an error like following occurred.
```
ld: 0711-547 SEVERE ERROR: Object /tmp/1-2a7ea1.o cannot be processed.
	RLD address 0x65 for section 2 (.data) is
	not contained in the section.
```

Reviewed By: shchenz

Differential Revision: https://reviews.llvm.org/D152777
2023-06-15 01:28:45 -04:00
Amaury Séchet
a70d5e25f3 [DAGCombine] Make sure combined nodes are added back to the worklist in topological order.
Currently, a node and its users are added back to the worklist in reverse topological order after it is combined. This diff changes that order to be topological. This is part of a larger migration to get the DAGCombiner to process nodes in topological order.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D127115
2023-06-13 09:14:37 +00:00
Matt Arsenault
eece6ba283 IR: Add llvm.ldexp and llvm.experimental.constrained.ldexp intrinsics
AMDGPU has native instructions and target intrinsics for this, but
these really should be subject to legalization and generic
optimizations. This will enable legalization of f16->f32 on targets
without f16 support.

Implement a somewhat horrible inline expansion for targets without
libcall support. This could be better if we could introduce control
flow (GlobalISel version not yet implemented). Support for strictfp
legalization is less complete but works for the simple cases.
2023-06-06 17:07:18 -04:00
JP Lehr
c9998ec145 Revert "[DAGCombine] Make sure combined nodes are added back to the worklist in topological order."
This reverts commit e69fa03ddd.

This patch lead to build time outs on the AMDGPU OpenMP runtime
buildbot.
2023-06-05 10:55:58 -04:00
Amaury Séchet
e69fa03ddd [DAGCombine] Make sure combined nodes are added back to the worklist in topological order.
Currently, a node and its users are added back to the worklist in reverse topological order after it is combined. This diff changes that order to be topological. This is part of a larger migration to get the DAGCombiner to process nodes in topological order.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D127115
2023-06-05 11:09:18 +00:00
Qiu Chaofan
9e17e08324 [PowerPC] Combine fptoint-store under strict cases
Reviewed By: shchenz

Differential Revision: https://reviews.llvm.org/D141249
2023-06-05 16:24:02 +08:00
esmeyi
6f57d8df2d Revert "[XCOFF][DWARF] XCOFF64 should be able to select the dwarf format in intergrated-as mode."
This reverts commit 4054c68644.

Due to AIX system linker requires DWARF64 for XCOFF64.
2023-06-05 02:50:47 -04:00
Qiu Chaofan
69bc8ff766 Reland "[PowerPC] Simplify fp-to-int store optimization"
The build failure should be fixed by de681d53. Follow-up refactor will
be done in future patches.

This reverts commit e7c5ced0b9.
2023-06-05 13:53:08 +08:00