As with the other update scripts this takes the output of
-passes=print<gisel-value-tracking> and inserts the results into an
existing mir file. This means that the input is a lot like
update_analysis_test_checks.py, and the output needs to insert into a
mir file similarly to update_mir_test_checks.py. The code used to do the
inserting has been moved to common, to allow it to be reused. Otherwise
it tries to reuse the existing infrastructure, and
update_givaluetracking_test_checks is kept relatively short.
Before:
lld-link: error: Stream Error: The stream is too short to perform the requested operation.
lld-link: error: failed to write PDB file ./unit_tests.exe.pdb
After:
lld-link: error: the public (2127832912 bytes) and global (2200532960 bytes) symbols
are too large to fit in a PDB file; the maximum total is 4294967295 bytes.
lld-link: error: failed to write PDB file ./unit_tests.exe.pdb
Flang at Ofast usually produces executables that consume more stack that
other Fortran compilers.
This is in part because the alloca created from temporary heap
allocation by the StackArray pass are created at the function scope
level without lifetimes, and LLVM does not/is not able to merge alloca
that do not have overlapping lifetimes.
This patch adds an option to generate LLVM lifetime in the StackArray
pass at the previous heap allocation/free using the LLVM dialect
operation for it.
Reverts llvm/llvm-project#138143
The PR for the BufferizationState is temporarily reverted due to API incompatibilities that have been initially missed during the update and were not catched by PR checks.
This reverts commit 793bb6b257.
The recommitted version contains a fix to make sure only the original
phis are processed in convertPhisToBlends nu collecting them in a vector
first. This fixes a crash when no mask is needed, because there is only
a single incoming value.
Original message:
This patch moves the logic to predicate and linearize a VPlan to a
dedicated VPlan transform. It mostly ports the existing logic directly.
There are a number of follow-ups planned in the near future to
further improve on the implementation:
* Edge and block masks are cached in VPPredicator, but the block masks
are still made available to VPRecipeBuilder, so they can be accessed
during recipe construction. As a follow-up, this should be replaced by
adding mask operands to all VPInstructions that need them and use that
during recipe construction.
* The mask caching in a map also means that this map needs updating each
time a new recipe replaces a VPInstruction; this would also be handled
by adding mask operands.
PR: https://github.com/llvm/llvm-project/pull/128420
This PR is a follow-up on #138125, and adds a bufferization state class providing information about the IR. The information currently consists of a cached list of symbol tables, which aims to solve the quadratic scaling of the bufferization task with respect to the number of symbols. The PR breaks API compatibility: the `bufferize` method of the `BufferizableOpInterface` has been enriched with a reference to a `BufferizationState` object.
The bufferization state must be kept in a valid state by the interface implementations. For example, if an operation with the `Symbol` trait is inserted or replaced, its parent `SymbolTable` must be updated accordingly (see, for example, the bufferization of `arith::ConstantOp`, where the symbol table of the module gets the new global symbol inserted). Similarly, the invalidation of a symbol table must be performed if an operation with the `SymbolTable` trait is removed (this can be performed using the `invalidateSymbolTable` method, introduced in #138014).
This reapplies https://github.com/llvm/llvm-project/pull/138892, which
was reverted in
5fb9dca14a
due to failures on windows.
Windows loads modules from the Process class, and it does that quite
early, and it kinda makes sense which is why I'm moving the clearing
code even earlier.
The original commit message was:
Minidump files contain explicit information about load addresses of
modules, so it can load them itself. This works on other platforms, but
fails on darwin because DynamicLoaderDarwin nukes the loaded module list
on initialization (which happens after the core file plugin has done its
work).
This used to work until
https://github.com/llvm/llvm-project/pull/109477, which enabled the
dynamic loader
plugins for minidump files in order to get them to provide access to
TLS.
Clearing the load list makes sense, but I think we could do it earlier
in the process, so that both Process and DynamicLoader plugins get a
chance to load modules. This patch does that by calling the function
early in the launch/attach/load core flows.
This fixes TestDynamicValue.py:test_from_core_file on darwin.
The previous approach broke the instantiation convention for templated
substitutions, as we were attempting to instantiate the initializer
even when it was still dependent.
We deferred variable template instantiation until the end of the TU.
However, type deduction requires the initializer immediately,
similar to how constant evaluation does.
Fixes https://github.com/llvm/llvm-project/issues/140773Fixes#135032Fixes#134526
Reapplies https://github.com/llvm/llvm-project/pull/138122
Revert the Target.getSpecifier implementation
(38c3ad36be) and update SystemZAsmBackend
instead.
Many targets with %lo/%hi style specifiers (SPARC, MIPS, PowerPC,
RISC-V) do not force relocations for these specifiers.
Additionally, with the introduction of the addReloc hook,
shouldForceRelocation is not that necessary and should probably be
phased out.
On Cygwin, UNIX sockets involve a handshake between connect and accept
to enable SO_PEERCRED/getpeereid handling. This necessitates accept
being called before connect can return, but at least the tests in
llvm/unittests/Support/raw_socket_stream_test do both on the same thread
(first connect and then accept), resulting in a deadlock. Add a call to
both places sockets are created that turns off the handshake (and
SO_PEERCRED/getpeereid support).
References:
* cec8a6680e/winsup/cygwin/fhandler/socket_local.cc (L1462-L1471)
* https://inbox.sourceware.org/cygwin/Z_UERXFI1g-1v3p2@calimero.vinschen.de/T/#u
In #138329, _GNU_SOURCE was added for Cygwin, but when building Clang
standalone against an installed LLVM this definition was not picked up,
resulting in undefined strnlen. Follow the documentation in
https://llvm.org/docs/CMake.html#embedding-llvm-in-your-project and add
the LLVM_DEFINITIONS in standalone projects' cmakes.
lookupTarget takes StringRef and internally creates an instance of
std::string with the StringRef as part of constructing Triple, so we
don't need to create temporary instances of std::string on our own.
This adds support for lowering deinterleave and interleave intrinsics
for factors 3 5 and 7 into target specific memory intrinsics.
Notably this doesn't add support for handling higher factors constructed
from interleaving interleave intrinsics, e.g. factor 6 from interleave3
+ interleave2.
I initially tried this but it became very complex very quickly. For
example, because there's now multiple factors involved
interleaveLeafValues is no longer symmetric between interleaving and
deinterleaving. There's then also two ways of representing a factor 6
deinterleave: It can both be done as either 1 deinterleave3 and 3
deinterleave2s OR 1 deinterleave2 and 3 deinterleave3s.
I'm not sure the complexity of supporting arbitrary factors is warranted
given how we only need to support a small number of factors currently:
SVE only needs factors 2,3,4 whilst RVV only needs 2,3,4,5,6,7,8.
My preference would be to just add a interleave6 and deinterleave6
intrinsic to avoid all this ambiguity, but I'll defer this discussion to
a later patch.
lookupTarget takes StringRef and internally creates an instance of
std::string with the StringRef as part of constructing Triple, so we
don't need to create a temporary instance of std::string on our own.
Not adding them was leading to a crash when trying to expand these
pseudo instructions.
I've also fixed the register class types for the Xqcibi instructions in
these pseudo instructions which was incorrect and was exposed by the
machine verifier while running the test case added in this patch.
Fixes#140697
I'm fairly certain that the options in this CL are benign, as I don't
believe they affect the AST.
* RTTI - shouldn't affect the AST, should only affect codegen
* Trivial var init - also should only affect codegen
* Stack protector - also codegen
* Exceptions - Since exceptions do allow new things in the AST, but I'm
pretty sure that they can differ in parent and child safely, I marked it
as compatible instead.
I welcome any input from someone more familiar with this than me, as I
might be wrong.
similar to #140570
getting this error:
exit status 1
ld.lld: error: section '.text' address (0x8074) is smaller than image
base (0x10000); specify --image-base
We can use the offset from the original store instead of creating
a new undef offset.
We didn't check if the offset was undef already so we really shouldn't
drop it if it isn't.
This fixes an issue in the waves/EU range calculation wherein, if the
`amdgpu-waves-per-eu` attribute exists and is valid, the entire
attribute may be spuriously and completely ignored if workgroup sizes
and LDS usage restrict the maximum achievable occupancy below the
subtarget maximum. In such cases, we should still honor the requested
minimum number of waves/EU, even if the requested maximum is higher than
the actually achievable maximum (but still within subtarget
specification).
As such, the added unit test `empty_at_least_2_lds_limited`'s waves/EU
range should be [2,4] after this patch, when it is currently [1,4] (i.e,
as if `amdgpu-waves-per-eu` was not specified at all).
Before e377dc4 the default maximum waves/EU was always set to the
subtarget maximum, trivially avoiding the issue.
This reverts commit 90daed32a8 and 4cfbe55781.
These changes exposed cyclic dependencies when LLVM is configured with
modules `-DLLVM_ENABLE_MODULES=ON`.
The GreenDragon CI bots are currently passing because the installed
Xcode is a bit old, and doesn't have the watchpoint handling
bug that was fixed April with this test being added.
But on other CI running newer Xcode debugservers, this test will
fail. Skip this test if we're using an out of tree debugserver.
Padding the mask using 0 elements doesn't work for scalable vectors. Use
VP_LOAD and change the VL instead.
This fixes crash for Zve32x. Test file was split since i64 isn't a valid
element type for Zve32x.
Fixes#140198.
Usage errors in `LTOBackend.cpp` were previously, misleadingly, reported
as internal crashes.
This PR updates `LTOBackend.cpp` to use `reportFatalUsageError` for
reporting usage-related issues.
LLVM Issue: https://github.com/llvm/llvm-project/issues/140953
Internal Tracker: TOOLCHAIN-17744
add `GenericFloatingPointPredicateUtils` in order to generalize
effects of floating point comparisons on `KnownFPClass` for both IR and
MIR.
---------
Co-authored-by: Matt Arsenault <arsenm2@gmail.com>