This updates the list of features in 'generic' and 'bleeding-edge' CPUs
in the backend to match
4e0a0eae58/clang/lib/Basic/Targets/WebAssembly.cpp (L150-L178)
This updates existing CodeGen tests in a way that, if a test has
separate RUN lines for a reference-types test and a non-reference-types
test, I added -mattr=-reference-types to the no-reftype test's RUN
command line. I didn't delete existing -mattr=+reference-types lines in
reftype tests because having it helps readability.
Also, when tests is not really about reference-types but they have to
updated because they happen to contain call_indirect lines because now
call_indirect will take __indirect_function_table as an argument, I just
added the table argument to the expected output.
`target-features-cpus.ll` has been updated reflecting the newly added
features.
This splits `target-features.ll` into two tests:
`target-features-attrs.ll` and `target-features-cpus.ll`.
Now `target-features-attrs.ll` contains tests with bitcode function
attributes and `-mattr=` options. The current `target-features.ll`
file's FileCheck lines are confusing, mainly because it is unclear how
`CHECK` and `ATTRS` lines are meant to be different. Turns out, before
67ec8744d7,
`-mattr=` options used to override any existing bitcode function
attributes, but after the commit that's not the case anymore. So the
original test had a line that tested `i32.atomic.rmw.cmpxchg` was not
generated when `-mattr=+simd128` was given (because the existing
`+atomics` in the function attributes is overriden). That commit deleted
that line and changed some `ATTRS` lines into `CHECK`, which was
confusing. This PR simplifies that part and does not test the absence of
any instructions, and the effect of `-mattr=` option is only tested with
the target features section.
And `target-features-cpus.ll` only tests the sets of features enabled by
`-mcpu=` lines. It is better to have this as a separate file because
once you have bitcode function attributes they end up in the target
features section too, making the testing of only the `-mcpu=` options
difficult.
I'm planning on a PR that splits `target-features.ll` into two different
files and fix some other stuff on them:
- `target-features-attrs.ll` that tests target features by bitcode
function attributes and `-mattr=` options
- `target-features-cpus.ll` that tests target features by `-mcpu=`
options
But `target-features-attrs.ll` will share a bulk of the lines with the
current `target-features.ll`. And if I remove `target-features.ll` and
create the two new files in a single PR, git doesn't recognize either of
them as a copy (I hoped at least `target-features-attrs.ll` would be
recognized as a copy because it shares many lines with the current file)
So to make the diff smaller and easier to review, I'm renaming the file
first. I'll follow up with the PR that does the actual splitting.
`rethrow` instruction is a terminator, but when when its DAG is built in
`SelectionDAGBuilder` in a custom routine, it was NOT treated as such.
```ll
rethrow: ; preds = %catch.start
invoke void @llvm.wasm.rethrow() #1 [ "funclet"(token %1) ]
to label %unreachable unwind label %ehcleanup
ehcleanup: ; preds = %rethrow, %catch.dispatch
%tmp = phi i32 [ 10, %catch.dispatch ], [ 20, %rethrow ]
...
```
In this bitcode, because of the `phi`, a `CONST_I32` will be created in
the `rethrow` BB. Without this patch, the DAG for the `rethrow` BB looks
like this:
```
t0: ch,glue = EntryToken
t3: ch = CopyToReg t0, Register:i32 %9, Constant:i32<20>
t5: ch = llvm.wasm.rethrow t0, TargetConstant:i32<12161>
t6: ch = TokenFactor t3, t5
t8: ch = br t6, BasicBlock:ch<unreachable 0x562532e43c50>
```
Note that `CopyToReg` and `llvm.wasm.rethrow` don't have dependence so
either can come first in the selected code, which can result in the code
like
```mir
bb.3.rethrow:
RETHROW 0, implicit-def dead $arguments
%9:i32 = CONST_I32 20, implicit-def dead $arguments
BR %bb.6, implicit-def dead $arguments
```
After this patch, `llvm.wasm.rethrow` is treated as a terminator, and
the DAG will look like
```
t0: ch,glue = EntryToken
t3: ch = CopyToReg t0, Register:i32 %9, Constant:i32<20>
t5: ch = llvm.wasm.rethrow t3, TargetConstant:i32<12161>
t7: ch = br t5, BasicBlock:ch<unreachable 0x5555e3d32c70>
```
Note that now `rethrow` takes a token from `CopyToReg`, so `rethrow` has
to come after `CopyToReg`. And the resulting code will be
```mir
bb.3.rethrow:
%9:i32 = CONST_I32 20, implicit-def dead $arguments
RETHROW 0, implicit-def dead $arguments
BR %bb.6, implicit-def dead $arguments
```
I'm not very familiar with the internals of `getRoot` vs.
`getControlRoot`, but other terminator instructions seem to use the
latter, and using it for `rethrow` too worked.
The wasm backend fetches the tan runtime lib call in
`llvm/include/llvm/IR/RuntimeLibcalls.def` via `StaticLibcallNameMap()`,
but ignores the runtime function because a function sinature mapping is
not specified in RuntimeLibcallSignatureTable(). The fix is to specify
the function signatures for float32-128.
This is a fix for a build break reported on PR
https://github.com/llvm/llvm-project/pull/94559#issuecomment-2159923215.
There is simply way too much going on inside getNode. The complicated
constant folding of vector handling works by looking for build_vector
operands, and then tries to getNode the scalar element and then checks
if
constants were the result. As a side effect, this produces unused scalar
operation nodes (previously, without flags). If the vector operation
were later scalarized, it would find the flagless constant folding
temporary and lose the flag. I don't think this is a reasonable way for
constant folding to operate, but for now fix this by ensuring flags
on the original operation are preserved in the temporary.
This yields a clear code improvement for AMDGPU when f16 isn't legal.
The Wasm cases switch from using a libcall to compare and select. We are
evidently
missing the fcmp+select to fminimum/fmaximum handling, but this would be
further
improved when that's handled. AArch64 also avoids the libcall, but looks
worse and
has a different call for some reason.
This is a mostly-target-independent variadic function optimisation and
lowering pass. It is only enabled for AMDGPU in this initial commit.
The purpose is to make C style variadic functions a zero cost
abstraction. They are lowered to equivalent IR which is then amenable to
other optimisations. This is inherently slightly target specific but
much less so than one might expect - the C varargs interface heavily
constrains the ABI design divergence.
The pass is primarily tested from webassembly. This is because wasm has
a straightforward variadic lowering strategy which coincides exactly
with what this pass transforms code into and a struct passing convention
with few cases to check. Adding further targets conventions is
straightforward and elided from this patch primarily to simplify the
review. Implemented in other branches are Linux X86, AMD64, AArch64 and
NVPTX.
Testing for targets that have existing lowering for va_arg from clang is
most efficiently done by checking that clang | opt completely elides the
variadic syntax from test cases. The lowering produces a struct for each
call site which can be inspected to check the various alignment and
indirections are correct.
AMDGPU presently has no variadic support other than some ad hoc printf
handling. Combined with the pass being inactive on all other targets
landing this represents strict increase in capability with zero risk.
Testing and refining will continue post commit.
In addition to the compiler tests included here, a self contained x64
clang/musl toolchain was constructed using the "lowering" instead of the
systemv ABI and used to build various C programs like lua and libxml2.
Remove support for the icmp and fcmp constant expressions.
This is part of:
https://discourse.llvm.org/t/rfc-remove-most-constant-expressions/63179
As usual, many of the updated tests will no longer test what they were
originally intended to -- this is hard to preserve when constant
expressions get removed, and in many cases just impossible as the
existence of a specific kind of constant expression was the cause of the
issue in the first place.
This reuses most of the code that was created for f32x4 and f64x2 binary
instructions and tries to follow how they were implemented.
add/sub/mul/div - use regular LL instructions
min/max - use the minimum/maximum intrinsic, and also have builtins
pmin/pmax - use the wasm.pmax/pmin intrinsics and also have builtins
Specified at:
29a9b9462c/proposals/half-precision/Overview.md
I think test files for the legacy and the new EH (exnref) are better be
separate, and I'd like to use the current test file names for the new
EH, rather than keeping the current files and naming the new ones as
`-new` or something.
This adds tests for EH/SjLj option errors and swaps the error checking
order for unimportant cosmetic reasons (I think checking EH/SjLj
conflicts is more important than the model checking)
Adds a builtin and intrinsic for the f16x8.splat instruction.
Specified at:
29a9b9462c/proposals/half-precision/Overview.md
Note: the current spec has f16x8.splat as opcode 0x123, but this is
incorrect and will be changed to 0x120 soon.
At the moment, Clang is rather liberal in assuming that 0 (and by extension unqualified) is always a safe default. This does not work for targets that actually use a different value for the default / generic AS (for example, the SPIRV that obtains from HIPSPV or SYCL). This patch is a first, fairly safe step towards trying to clear things up by querying a modules' default AS from the target, rather than assuming it's 0, alongside fixing a few places where things break / we encode the 0 == DefaultAS assumption. A bunch of existing tests are extended to check for non-zero default AS usage.
Adds a builtin and intrinsic for the f32.store_f16 instruction.
The instruction stores an f32 value as an f16 memory. Specified at:
29a9b9462c/proposals/half-precision/Overview.md
Note: the current spec has f32.store_f16 as opcode 0xFD0121, but this is
incorrect and will be changed to 0xFC31 soon.
Adds a builtin and intrinsic for the f32.load_f16 instruction.
The instruction loads an f16 value from memory and puts it in an f32.
Specified at:
29a9b9462c/proposals/half-precision/Overview.md
Note: the current spec has f32.load_f16 as opcode 0xFD0120, but this is
incorrect and will be changed to 0xFC30 soon.
`llvm.trap` will be convert as unreachable which is terminator.
Instruction after terminator will cause validation failed.
This PR introduces a pass to clean instruction after terminator.
Fixes: https://github.com/llvm/llvm-project/issues/68770
Reapply: #90207
`llvm.trap` will be convert as `unreachable` which is terminator.
Instruction after terminator will cause validation failed.
This PR introduces a pass to clean instruction after terminator.
Fixes: #68770.
Currently we check `Subtarget->hasReferenceTypes()` to decide whether to
run `RefTypeMem2Local` pass:
6133878227/llvm/lib/Target/WebAssembly/WebAssemblyTargetMachine.cpp (L491-L495)
This works fine when `-mattr=+reference-types` is given in the command
line (of `llc` or of `wasm-ld` in case of LTO). This also works fine if
the backend is called by Clang, because Clang's feature set will be
passed to the backend when creating a `TargetMachine`:
ac791888bb/clang/lib/CodeGen/BackendUtil.cpp (L549-L550)ac791888bb/clang/lib/CodeGen/BackendUtil.cpp (L561-L562)
But if the backend compilation is called by `llc`, a `TargetMachine` is
created here:
bf1ad1d267/llvm/tools/llc/llc.cpp (L554-L555)
And if the backend is called by `wasm-ld`'s LTO, a `TargetMachine` is
created here:
ac791888bb/llvm/lib/LTO/LTOBackend.cpp (L513)
At this point, in the both places, the created `TargetMachine` only has
access to target features given by the command line with `-mattr=` and
doesn't have access to bitcode functions' `target-features` attribute.
We later gather the target features used by functions and store that
info in the `TargetMachine` in `CoalesceFeaturesAndStripAtomics`,
ac791888bb/llvm/lib/Target/WebAssembly/WebAssemblyTargetMachine.cpp (L202-L206)
but this runs in the pass pipeline driven by the pass manager, so this
has not run by the time we check `Subtarget->hasReferenceTypes()` in
`WebAssemblyPassConfig::addISelPrepare`. So currently `RefTypeMem2Local`
would not run on those functions with
`"target-features"="+reference-types"` attributes if the backend is
called by `llc` or `wasm-ld`.
So this makes `RefTypeMem2Local` pass run unconditionally, and checks
`target-featurs` function attribute to decide whether to run the pass on
each function. This allows the pass to run with `wasm-ld` + LTO and
`llc`, even if `-mattr=+reference-types` is not explicitly given in the
command line again, as long as `+reference-types` is in the function's
`target-features` attribute.
This also covers the case we give the target features by the command
line like `llc -mattr=+reference-types` and not in the bitcode
function's attribute, because attributes given in the command line will
be stored in the function's attributes anyway:
bd28889732/llvm/lib/CodeGen/CommandFlags.cpp (L673-L674)bd28889732/llvm/lib/CodeGen/CommandFlags.cpp (L732-L733)
With this PR,
- `lto0.test_externref_emjs`
- `thinlto0.test_externref_emjs`,
- `lto0.test_externref_emjs_dynlink`,
- `thinlto0.test_externref_emjs_dynlnk`
pass. These currently fail but don't get checked in the CI. I think they
used to pass but started to fail after #83196, because we used to run
mem2reg even with `-O0` before that.
(`ltoN` (N > 0) tests are not affected because they run mem2reg anyway
so they don't need `RefTypeMem2Local`)
Multivalue feature of WebAssembly has been standardized for several
years now. I think it makes sense to be able to enable it in the feature
section by default for our clang/llvm-produced binaries so that the
multivalue feature can be used as necessary when necessary within our
toolchain and also when running other optimizers (e.g. wasm-opt) after
the LLVM code generation.
But some WebAssembly toolchains, such as Emscripten, do not provide both
mulvalue-returning and not-multivalue-returning versions of libraries.
Also allowing the uses of multivalue in the features section does not
necessarily mean we generate them whenever we can to the fullest, which
is a different code generation / optimization option.
So this makes the lowering of multivalue returns conditional on the use
of 'experimental-mv' target ABI. This ABI is turned off by default and
turned on by passing `-Xclang -target-abi -Xclang experimental-mv` to
`clang`, or `-target-abi experimental-mv` to `clang -cc1` or `llc`.
But the purpose of this PR is not tying the multivalue lowering to this
specific 'experimental-mv'. 'experimental-mv' is just one multivalue ABI
we currently have, and it is still experimental, meaning it is not very
well optimized or tuned for performance. (e.g. it does not have the
limitation of the max number of multivalue-lowered values, which can be
detrimental to performance.) We may change the name of this ABI, or
improve it, or add a new multivalue ABI in the future. Also I heard that
WASI is planning to add their multivalue ABI soon. So the plan is,
whenever any one of multivalue ABIs is enabled, we enable the lowering
of multivalue returns in the backend. We currently have only
'experimental-mv' in the repo so we only check for that in this PR.
Related past discussions:
#82714https://github.com/WebAssembly/tool-conventions/pull/223#issuecomment-2008298652
Remove `llvm.threadlocal.address` intrinsic usage when disabling TLS.
This fixes errors revealed by the stricter IR verification introduced in
PR #87841.
This reverts commit 52431fdb1a.
The PR assumed `__threwValue` couldn't be 0, but it could be when the
thrown thing is not a longjmp but an exception, so that `if` check was
actually necessary.
Previously we expected lane constants to be in the range of signed
values for each lane size, but the included test case produced large
unsigned values that fall outside that range. Allow instruction
selection to proceed in this case rather than failing.
Fixes#63817.
When reference-types feature is enabled, forcing mem2reg unconditionally
even in `-O0` has some problems described in #81575. This uses
RefTypeMem2Local pass added in #81965 instead. This also removes
`IsForced` parameter added in
890146b192
given that we don't need it anymore.
This may still hurt debug info related to reference type variables a
little during the backend transformation given that they are not stored
in memory anymore, but reference type variables are presumably rare and
it would be still a lot less damage than forcing mem2reg on the whole
program. Also this fixes the EH problem described in #81575.
Fixes#81575.
This reverts commit 6e6bf9f817.
It turned out the multivalue feature had active outside users and it
could cause some disruptions to them, so I'd like to investigate more
about the workarounds before doing this.
This adds `WebAssemblyRefTypeMem2Local` pass, which changes the address
spaces of reference type `alloca`s to `addrspace(1)`. This in turn
changes the address spaces of all `load` and `store` instructions that
use the `alloca`s.
`addrspace(1)` is `WASM_ADDRESS_SPACE_VAR`, and loads and stores to this
address space become `local.get`s and `local.set`s, thanks to the Wasm
local IR support added in
82f92e35c6.
In a follow-up PR, I am planning to replace the usage of mem2reg pass
with this to solve the reference type `alloca` problems described in
#81575.
We plan to enable multivalue in the features section soon (#80923) for
other reasons, such as the feature having been standardized for many
years and other features being developed (e.g. EH) depending on it. This
is separate from enabling Clang experimental multivalue ABI (`-Xclang
-target-abi -Xclang experimental-mv`), but it turned out we generate
some multivalue code in the backend as well if it is enabled in the
features section.
Given that our backend multivalue generation still has not been much
used nor tested, and enabling the feature in the features section can be
a separate decision from how much multialue (including none) we decide
to generate for now, I'd like to temporarily disable the actual
generation of multivalue in our backend. To do that, this adds an
internal flag `-wasm-emit-multivalue` that defaults to false. All our
existing multivalue tests can use this to test multivalue code. This
flag can be removed later when we are confident the multivalue
generation is well tested.
In WebAssembly, we have `WASM_SYMBOL_NO_STRIP` symbol flag to mark the
referenced content as retained. However, the flag is not enough to
express retained data that is not referenced by any symbol. This patch
adds a new segment flag`WASM_SEG_FLAG_RETAIN` to support "private"
linkage data that is retained by llvm.used.
This kind of data that is not referenced but must be retained is usually
used with encapsulation symbols (__start/__stop). Swift runtime uses
this technique and depends on the fact "all metadata sections in live
objects are retained", which was not guaranteed with `--gc-sections`
before this patch.
This is a revised version of https://reviews.llvm.org/D126950 (has been
reverted) based on @MaskRay's comments
`DemoteCatchSwitchPHIOnly` option in `WinEHPrepare` pass was added in
99d60e0dab,
because Wasm EH uses `WinEHPrepare`, but it doesn't need to demote all
PHIs. PHIs in `catchswitch` BBs have to be removed (= demoted) because
`catchswitch`s are removed in ISel and `catchswitch` BBs are removed as
well, so they can't have other instructions.
But because Wasm EH doesn't use funclets, so PHIs in `catchpad` or
`cleanuppad` BBs don't need to be demoted. That was the reason
`DemoteCatchSwitchPHIOnly` option was added, in order not to demote more
instructions unnecessarily.
The problem is it should have been set to `true` for Wasm EH. (Its
default value is `false` for WinEH) And I mistakenly set it to `false`
and wasn't aware about this for more than 5 years. This was not the end
of the world; it just means we've been demoting more instructions than
we should, possibly huting code size. In practice I think it would've
had hardly any effect in real performance given that the occurrence of
PHIs in `catchpad` or `cleanuppad` BBs are not very frequent and many
people run other optimizers like Binaryen anyway.
If we have a `SETCC (SETCC), 0, NE` and ZeroOrOneBooleanContent, we can remove
the outer setcc as it will produce the same value as the inner. This can be
generalized to anything where the top bits are known to be 0, as the value will
remain as 1 or 0.
When promoted value, it is meaningless to copy value from reg to another
reg with the same type.
This PR add additional check for this cases to reduce the code size.
Fixes: #80053.
In `CoalesceFeaturesAndStripAtomics`, feature string is converted to FeatureBitset and back to feature string. It will lose information about explicit diasbled features.
We previously scanned the whole BB for `DBG_VALUE` instruction even when
the program doesn't have debug info, i.e., the function doesn't have a
subprogram associated with it, which can make compilation unnecessarily
slow. This disables `DebugValueManager` when a `DISubprogram` doesn't
exist for a function.
This only reduces unnecessary work in non-debug mode and does not change
output, so it's hard to add a test to test this behavior.
Test changes were necessary because their `DISubprogram`s were not
correctly linked with the functions, so with this PR the compiler
incorrectly assumed the functions didn't have a subprogram and the tests
started to fail.
Fixes https://github.com/emscripten-core/emscripten/issues/21048.
This patch fixes WebAssembly's FastISel pass to correctly consider
signext/zeroext parameter flags at function declaration.
Previously, the flags at call sites were only considered during code
generation, which caused an interesting bug report #63388 .
This is problematic especially because in WebAssembly's ABI, either
signext or zeroext can be tagged to a function argument, and it must be
correctly reflected in the generated code. Unit test
https://github.com/llvm/llvm-project/blob/main/llvm/test/CodeGen/WebAssembly/signext-zeroext.ll
shows that `i8 zeroext %t` and `i8 signext %t`'s code gen are different.