Materialize zeros by copying from %g0, which is now marked as constant.
This makes it possible for some common operations (like integer negation) to be
performed in fewer instructions.
This continues @arichardson's patch at D132561.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D138887
This reverts commit a888825aee.
This changes the default output of UTC, and as such introduces
spurious changes whenever existing tests are regenerated.
I've indicated in https://reviews.llvm.org/D139006#3989954 how
this can be implemented without causing test churn.
Previously, the label also matched function calls with the function
name, which caused tests to fail because the label matched on the wrong
line.
Add the `define` prefix, so only function defines are matched.
Differential Revision: https://reviews.llvm.org/D139006
As a reminder, UTC_ARGS is used by lit test cases to specify which
arguments need to be passed to update_XXXX_test_checks.py to be
auto-updated properly.
The support is achieved by relying on common.itertests, which is what
other test
updaters use to iterate over test files.
This commit also changes how the --llc-binary option is saved in args.
It used to be saved as "llc", but it is here changed to the standard
"llc_binary" to make use of an existing ignore mechanism for specific
arguments. Without that change, the option would not be ignored and
would appear in UTC_ARGS. This would be different from what e.g.
update_llc_test_checks does. As update_mir_test_checks.py now supports
UTC_ARGS, it became important to ensure the option is ignored.
Differential Revision: https://reviews.llvm.org/D135580
Over the past day or so, i've took a large swing at our tests,
and reduced the number of tests that were still using the old syntax
from ~1800 to just 200.
Left to handle: (as it is seen in this patch)
* Transforms/LSR
* Transforms/CGP
* Transforms/TypePromotion
* Transforms/HardwareLoops
* Analysis/*
* some misc.
I think this is the right point to start actively refusing
to honor the old syntax, except for the old tests,
to prevent the old syntax from creeping back in.
Thus, let's add temporary default-off flag,
and if it is not passed refuse to accept old syntax.
The tests that still need porting are annotated with this flag.
Reviewed By: aeubanks
Differential Revision: https://reviews.llvm.org/D139647
Generation of CHECK lines for fixedStack can be enabled with --print-fixed-stack.
This is particularly useful for tests which need to inspect how the
stack looks, e.g. for ABI tests.
See the other stacked revision building on top of this one which enables UTC_ARGS (in a similar fashion to other test updaters in utils/).
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D135579
Materialize zeros by copying from %g0, which is now marked as constant.
This makes it possible for some common operations (like integer negation) to be
performed in fewer instructions.
This continues @arichardson's patch at D132561.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D138887
When the branch target is out of the range represented by the current
branch instruction's immediate, branch relaxation is required. There
are three types of immediate for branch instructions on LoongArch,
including simm16, simm21 and simm26. And the real branch target
address is PC + sext(simmXX << 2). In addition, the indirect branch
way is implemented to support larger branch target.
BranchRelaxation pass calls `RenumberBlocks` to renumber all of the
machine basic blocks in the function. So the machine basic blocks
number changed in some test cases.
Differential Revision: https://reviews.llvm.org/D137233
This switches everything to use the memory attribute proposed in
https://discourse.llvm.org/t/rfc-unify-memory-effect-attributes/65579.
The old argmemonly, inaccessiblememonly and inaccessiblemem_or_argmemonly
attributes are dropped. The readnone, readonly and writeonly attributes
are restricted to parameters only.
The old attributes are auto-upgraded both in bitcode and IR.
The bitcode upgrade is a policy requirement that has to be retained
indefinitely. The IR upgrade is mainly there so it's not necessary
to update all tests using memory attributes in this patch, which
is already large enough. We could drop that part after migrating
tests, or retain it longer term, to make it easier to import IR
from older LLVM versions.
High-level Function/CallBase APIs like doesNotAccessMemory() or
setDoesNotAccessMemory() are mapped transparently to the memory
attribute. Code that directly manipulates attributes (e.g. via
AttributeList) on the other hand needs to switch to working with
the memory attribute instead.
Differential Revision: https://reviews.llvm.org/D135780
We have a downstream project with a command-line utility that operates
pretty much exactly like `opt`. So it would make sense for us to
maintain tests with update_test_checks.py with our custom tool
substituted for `opt`, as this change allows.
Differential Revision: https://reviews.llvm.org/D136329
Some instructions are not matched by update_mir_test_checks.py because MIFlags and
regex in the script are not synchronized.
Differential Revision: https://reviews.llvm.org/D136170
Functions with `aarch64_sme_pstatesm_body` will emit a SMSTART at the start
of the function, and a SMSTOP at the end of the function, such that all
operations use the right value for vscale.
Because the placement of these nodes is critically important (i.e. no
vscale-dependent operations should be done before SMSTART has been issued),
we require glueing the CopyFromReg to the Entry node such that we can
insert the SMSTART as part of that glued chain.
More details about the SME attributes and design can be found
in D131562.
Reviewed By: aemerson
Differential Revision: https://reviews.llvm.org/D131582
While this does not matter for most targets, when building for Arm Morello,
we have to mark the symbol as a function and add size information, so that
LLD can correctly evaluate relocations against the local symbol.
Since Morello is an out-of-tree target, I tried to reproduce this with
in-tree backends and with the previous reviews applied this results in
a noticeable difference when targeting Thumb.
Background: Morello uses a method similar Thumb where the encoding mode is
specified in the LSB of the symbol. If we don't mark the target as a
function, the relocation will not have the LSB set and calls will end up
using the wrong encoding mode (which will almost certainly crash).
Reviewed By: MaskRay
Differential Revision: https://reviews.llvm.org/D131429
While working on https://reviews.llvm.org/D131429, I got a test diff in
one of the VE tests and running update_llc_test_checks.py deleted all the
code for that function. This updates the regex to handle this new output.
Reviewed By: kaz7
Differential Revision: https://reviews.llvm.org/D131431
While working on https://reviews.llvm.org/D131429, I got a test diff in
one of the VE tests and running update_llc_test_checks.py deleted all the
code for that function. This is a baseline test for this bug (incorrect
regex for VE when .Lfoo$local symbols are used).
Reviewed By: kaz7
Differential Revision: https://reviews.llvm.org/D131434
This patch only add %pc_hi20/%pc_lo12/%plt relocations in order
to be able to generate gnu ld linkable relocation file for the
`hello world` IR :
```
@.str = private unnamed_addr constant [14 x i8] c"Hello world!\0A\00", align 1
define dso_local signext i32 @main() nounwind {
entry:
%call = call signext i32 (ptr, ...) @printf(ptr noundef @.str)
ret i32 0
}
declare dso_local signext i32 @printf(ptr noundef, ...)
```
This patch also updates some test cases due to new modifiers introduced.
New test: test/MC/LoongArch/Relocations/relocations.s
Differential Revision: https://reviews.llvm.org/D132108
This allows a number of optimisation passes to work.
E.g. BranchFolding and MachineBlockPlacement.
Differential Revision: https://reviews.llvm.org/D131316
There is at least one Clang test (clang/test/CodeGen/arm_acle.c) which
has functions guarded by #if's that cause those functions to be compiled
only for a subset of RUN lines.
This results in a case where one RUN line has a body for the function
and another doesn't. Treat this case as a conflict for any prefixes that
the two RUN lines have in common.
This change exposed a bug where functions with '$' in the name weren't
properly recognized in ARM assembly (despite there being a test case
that was supposed to catch the problem!). This bug is fixed as well.
Differential Revision: https://reviews.llvm.org/D130089
The first attempt missed changing test files for tools
(update_llc_test_checks.py).
Original commit message:
This implements the main suggested change from issue #56498.
Using the shorter (non-extending) instruction with only
-Oz ("minsize") rather than -Os ("optsize") is left as a
possible follow-up.
As noted in the bug report, the zero-extending load may have
shorter latency/better throughput across a wide range of x86
micro-arches, and it avoids a potential false dependency.
The cost is an extra instruction byte.
This could cause perf ups and downs from secondary effects,
but I don't think it is possible to account for those in
advance, and that will likely also depend on exact micro-arch.
This does bring LLVM x86 codegen more in line with existing
gcc codegen, so if problems are exposed they are more likely
to occur for both compilers.
Differential Revision: https://reviews.llvm.org/D129775
Add LoongArch assembly scrubbing and triple support to update_llc_test_checks.
Depends on D128432
Reviewed By: MaskRay, xen0n
Differential Revision: https://reviews.llvm.org/D128433
When we appended check lines at the end we could not share prefixes
before. This patch should make it possible and allow us to reduce
some check line counts (especially for Clang/OpenMP tests).
See also: https://reviews.llvm.org/D128686
Differential Revision: https://reviews.llvm.org/D128684
Use the query that doesn't assert if TracksLiveness isn't set, which
needs to always be available. We also need to start printing liveins
regardless of TracksLiveness.
Support the pattern where a test file uses multiple prefixes per run line:
one prefix that is unique to the run line, and additional prefixes that are
common with other run lines.
Decide on a per-function basis which prefix(es) to emit, based on which run
lines have the same output.
Move the renaming of vregs earlier, so that we can compare the output as it
would actually be printed in check lines.
Differential Revision: https://reviews.llvm.org/D126411
This is scoped to autogenerated tests.
The goal is to support having each RUN line specify a list of
check-prefixes where one can specify potentially redundant prefixes. For example,
for X86, if one specified prefixes for both AVX1 and AVX2, and the codegen happened to
match today, one of the prefixes would be used and the onther one not.
If the unused prefix were dropped, and later, codegen differences were
introduced, one would have to go figure out where to add what prefix
(paraphrasing
https://lists.llvm.org/pipermail/llvm-dev/2021-February/148326.html)
To avoid getting errors due to unused prefixes, whole directories can be
opted out (as discussed on that thread), but that means that tests that
aren't autogenerated in such directories could have undetected unused
prefix bugs.
This patch proposes an alternative that both avoids the above, dir-level
optout, and supports the main autogen scenario discussed first. The autogen
tool appends at the end of the test file the list of unused prefixes,
together with a note explaining that is the case. Each prefix is set up
to always pass.
This way, unexpected unused prefixes are easily discoverable, and
expected cases "just work".
Differential Revision: https://reviews.llvm.org/D124306
I had initially assumed this was the problem with
https://github.com/llvm/llvm-project/issues/55271#issuecomment-1133426243
But it turns out that was a simpler issue. This patch is still
more correct than what we were doing before so figured I'd submit
it anyway.
No test case because I'm not sure how to get an undef around
until expansion.
Looking at the test deltas I wonder if it be valid to combine
(sext_inreg (freeze (aextload X))) -> (freeze (sextload X)).
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D126175
Variable captures such as `<MCInst #` can change based on unrelated changes
to the LLVM backends, to avoid the generated test cases being different
use an incrementing counter for variable names instead of using the
actual value from the output file.
This change may also be beneficial for some nameless IR variables
(especially when combined with filtering of output), but for now I've
restricted this change to the obvious candidates (--asm-show-inst output).
Reviewed By: MaskRay
Differential Revision: https://reviews.llvm.org/D125405
To avoid test churn when backends add/rename new instructions/registers,
it makes sense to use FileCheck captures for the exact MCInst/Reg number.
This is motivated by D125307, where I use --asm-show-inst to differentiate
the output for multiple instructions with the same mnemonic.
This does not quite fix the churn issue yet: While files with the generated
checks will be immune to the numbers changing, the update script test
still suffers from this problem since the number is encoded in the
FileCheck variable name. I plan to address this in a follow-up patch.
Reviewed By: MaskRay
Differential Revision: https://reviews.llvm.org/D125307
To avoid test churn when backends add/rename new instructions/registers,
it makes sense to scrub the exact MCInst/Reg number.
Differential Revision: https://reviews.llvm.org/D125305
Subj, or on other words, we have a lot of tests that are driven by
the LoopVectorizer's debug output, but we don't have
any meaningful way to autogenerate checklines in them,
which means that an insurmountable amount of manual work
is required when modifying the appropriate cost models.
That is not sustainable, so this presents a solution.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D121133
This patch makes possible generating NVPTX assembly check lines with
update_llc_test_checks.py utility.
Differential Revision: https://reviews.llvm.org/D122986
Place PersistentId declaration under #if LLVM_ENABLE_ABI_BREAKING_CHECKS to
reduce memory usage when it is not needed.
Differential Revision: https://reviews.llvm.org/D120714
Re-commit of 32e8b550e5
This patch rearranges emission of CFI instructions, so the resulting
DWARF and `.eh_frame` information is precise at every instruction.
The current state is that the unwind info is emitted only after the
function prologue. This is fine for synchronous (e.g. C++) exceptions,
but the information is generally incorrect when the program counter is
at an instruction in the prologue or the epilogue, for example:
```
stp x29, x30, [sp, #-16]! // 16-byte Folded Spill
mov x29, sp
.cfi_def_cfa w29, 16
...
```
after the `stp` is executed the (initial) rule for the CFA still says
the CFA is in the `sp`, even though it's already offset by 16 bytes
A correct unwind info could look like:
```
stp x29, x30, [sp, #-16]! // 16-byte Folded Spill
.cfi_def_cfa_offset 16
mov x29, sp
.cfi_def_cfa w29, 16
...
```
Having this information precise up to an instruction is useful for
sampling profilers that would like to get a stack backtrace. The end
goal (towards this patch is just a step) is to have fully working
`-fasynchronous-unwind-tables`.
Reviewed By: danielkiss, MaskRay
Differential Revision: https://reviews.llvm.org/D111411
Currently the return address ABI registers s[30:31], which fall in the call
clobbered register range, are added as a live-in on the function entry to
preserve its value when we have calls so that it gets saved and restored
around the calls.
But the DWARF unwind information (CFI) needs to track where the return address
resides in a frame and the above approach makes it difficult to track the
return address when the CFI information is emitted during the frame lowering,
due to the involvment of understanding the control flow.
This patch moves the return address ABI registers s[30:31] into callee saved
registers range and stops adding live-in for return address registers, so that
the CFI machinery will know where the return address resides when CSR
save/restore happen during the frame lowering.
And doing the above poses an issue that now the return instruction uses undefined
register `sgpr30_sgpr31`. This is resolved by hiding the return address register
use by the return instruction through the `SI_RETURN` pseudo instruction, which
doesn't take any input operands, until the `SI_RETURN` pseudo gets lowered to the
`S_SETPC_B64_return` during the `expandPostRAPseudo()`.
As an added benefit, this patch simplifies overall return instruction handling.
Note: The AMDGPU CFI changes are there only in the downstream code and another
version of this patch will be posted for review for the downstream code.
Reviewed By: arsenm, ronlieb
Differential Revision: https://reviews.llvm.org/D114652
Previously we used sra+add+xor if ADDCARRY is supported. This changes
to sra+xor+sub is SUBCARRY is available.
This is consistent with the recent change to the default expansion
in LegalizeDAG.
Differential Revision: https://reviews.llvm.org/D121039
We can't just split by space, that's not going to give us the same
argv we'd have gotten from the shell, it could be in a string,
we must actually parse that as argv.