In the propagate mode, NaN compare equal to each other so in case of
several NaNs the index of the first one needs to be returned. This
commit changes the index update condition to check that the current
index is not that of a NaN.
The commit also simplifies argmax NaN ignore lowering to only use OGT.
This prevent any update in case of NaN. The only case where the index of
a NaN is returned is when all values are NaN and this is covered by the
fact that the initial index value is 0 so no update will result in 0
being returned.
Make sure the process is stopped when computing the symbol context. Both
Adrian and Felipe reported a handful of crashes in GetSymbolContext
called from Statusline::Redraw on the default event thread.
Given that we're handling a StackFrameSP, it's not clear to me how that
could have gotten invalidated, but Jim points out that it doesn't make
sense to compute the symbol context for the frame when the process isn't
stopped.
Depends on #135455
I traced the issue reported by Caroline and Pavel in #134757 back to the
call to ProcessRunLock::TrySetRunning. When that fails, we get a
somewhat misleading error message:
> process resume at entry point failed: Resume request failed - process
still running.
This is incorrect: the problem was not that the process was in a running
state, but rather that the RunLock was being held by another thread
(i.e. the Statusline). TrySetRunning would return false in both cases
and the call site only accounted for the former.
Besides the odd semantics, the current implementation is inherently
race-y and I believe incorrect. If someone is holding the RunLock, the
resume call should block, rather than give up, and with the lock held,
switch the running state and report the old running state.
This patch removes ProcessRunLock::TrySetRunning and updates all callers
to use ProcessRunLock::SetRunning instead. To support that,
ProcessRunLock::SetRunning (and ProcessRunLock::SetStopped, for
consistency) now report whether the process was stopped or running
respectively. Previously, both methods returned true unconditionally.
The old code has been around pretty much pretty much forever, there's
nothing in the git history to indicate that this was done purposely to
solve a particular issue. I've tested this on both Linux and macOS and
confirmed that this solves the statusline issue.
A big thank you to Jim for reviewing my proposed solution offline and
trying to poke holes in it.
The premerge pipeline currently creates an artifacts directory with some
statistics that gets uploaded on the buildkite side for later
inspection. This patch adds support for this on the Github side by using
the upload artifacts action.
Reviewers: Keenuts, lnihlen, mizvekov, tstellar, Endilll
Reviewed By: mizvekov
Pull Request: https://github.com/llvm/llvm-project/pull/135538
InstCombine can transform ADD+GEP into GEP+GEP. But those rewrites does
not currently trigger when the ADD is a disjoint OR (which happens to be
the canonical form for certain ADD operations). Add lit tests to show
that we are lacking such rewrites.
Also add a test case showing that we do not preserve "inbounds nuw",
"nusw nuw" and "nuw" when doing such transforms and the ADD/OR is
known to be NUW.
Reapplies #134068.
The first patch was missing a check to prevent attempts to pair SVE
fill/spill with other Neon load/store instructions, which could happen
specifically if the Neon instruction was unscaled.
Previously, `cc1as` did not consider the Features that can be included
from a target's FPU. This could lead to a situation where assembly files
could not compile as cc1as did not know if a feature was supported.
With this change, all the features for the FPU will be passed to `cc1as`
as `-target-feature` lines. By making this change, it will enable
`+nosimd` to be functional, worked on in #130623, and fix a regression
introduced in 8fa0f0efce so
armv7s-apple-darwin targets can utilise VFPv4 correctly.
---------
Co-authored-by: Martin Storsjö <martin@martin.st>
This patch addresses three problems when promoting allocas to vectors:
- Element types with size < 1 byte in allocas with a vector type caused
divisions by zero.
- Element types whose size doesn't match their AllocSize hit an assertion.
- Access types whose size doesn't match their AllocSize hit an assertion.
With this patch, we do not attempt to promote affected allocas to vectors. In
principle, we could handle these cases in PromoteAlloca, e.g., by truncating
and extending elements from/to their allocation size. It's however unclear if
we ever encounter such cases in practice, so that doesn't seem worth the added
complexity.
For SWDEV-511252
With https://github.com/llvm/llvm-project/pull/112852, we claimed that
llvm.minnum and llvm.maxnum should treat +0.0>-0.0, while libc doesn't
require fmin(3)/fmax(3) for it.
To make llvm.minnum/llvm.maxnum easy to use, we define the builtin
functions for them, include
__builtin_elementwise_minnum
__builtin_elementwise_maxnum
All of them support _Float16, __bf16, float, double, long double.
We add comment markers and print enum names instead of numbers.
For required extensions, we print the feature list instead of raw
bits.
This recommits d0cf5cd which was reverted by 21ff45d.
This reverts commit d0cf5cd5f9.
Error: "declaration of ‘clang::RISCV::RequiredExtensions
{anonymous}::SemaRecord::RequiredExtensions’ changes meaning of
‘RequiredExtensions’ [-fpermissive]"
Among fixupNeedsRelaxationAdvanced (introduced by
https://reviews.llvm.org/D8217) targets, only Hexagon needs the
`MCRelaxableFragment` parameter (commit
86f218e7ec) to get the instruction packet
(MCInst with sub-instruction operands).
As fixupNeedsRelaxationAdvanced follows mayNeedRelaxation, we can store
the MCInst in mayNeedRelaxation and eliminate the MCRelaxableFragment
parameter.
Follow-up to 7c83b7ef17 that eliminates
the MCRelaxableFragment parameter from fixupNeedsRelaxation.
This commit improves the `EnumProp` class, causing it to wrap around an
`EnumInfo` just like` EnumAttr` does. This EnumProp also has logic for
converting to/from an integer attribute and for being read and written
as bitcode.
The following variants of `EnumProp` are provided:
- `EnumPropWithAttrForm` - an EnumProp that can be constructed from (and
will be converted to, if `storeInCustomAttribute` is true) a custom
attribute, like an `EnumAttr`, instead of a plain integer. This is meant
for backwards compatibility with code that uses enum attributes.
`NamedEnumProp` adds a "`mnemonic` `<` $enum `>`" syntax around the
enum, replicating a common pattern seen in MLIR printers and allowing
for reduced ambiguity.
`NamedEnumPropWithAttrForm` combines both of these extensions.
(Sadly, bytecode auto-upgrade is hampered by the lack of the ability to
optionally parse an attribute.)
Depends on #132148
If a hot callsite function is not inlined in the 1st build, inlining the
hot callsite in pre-link stage of SPGO 2nd build may lead to Function
Sample not found in profile file in link stage. It will miss some
profile info.
ThinLTO has already considered and dealed with it by setting
HotCallSiteThreshold to 0 to stop the inline. This patch just adds the
same processing for FullLTO.
Follow-up to commits 5710759eb3
and 634f9a9815
- Integrate `evaluateFixup` into `recordRelocation` and inline code
within `MCAssembler::layout`, removing `handleFixup`.
- Update `fixupNeedsRelaxation` to bypass `shouldForceRelocation` when
calling `evaluateFixup`, eliminating the `WasForced` workaround for
RISC-V linker relaxation (https://reviews.llvm.org/D46350 ).
This prepares for the upcoming change to simplify relocation recording
in MCAssembler.
While both MCAssembler::fixupNeedsRelaxation and
MCAssembler::handleFixup call evaluateFixup and use
shouldForceRelocation, the shouldForceRelocation logic is not supposed
to be needed by MCAssembler::fixupNeedsRelaxation.
The ARM special cases for interworking branches
(https://reviews.llvm.org/D33436 and https://reviews.llvm.org/D33898)
break the assumption. Switch to fixupNeedsRelaxationAdvanced and
explicitly test the conditions.
When a frame is inlined, LLDB will display its name in backtraces as
follows:
```
* thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 1.3
* frame #0: 0x0000000100000398 a.out`func() [inlined] baz(x=10) at inline.cpp:1:42
frame #1: 0x0000000100000398 a.out`func() [inlined] bar() at inline.cpp:2:37
frame #2: 0x0000000100000398 a.out`func() at inline.cpp:4:15
frame #3: 0x00000001000003c0 a.out`main at inline.cpp:7:5
frame #4: 0x000000026eb29ab8 dyld`start + 6812
```
The longer the names get the more confusing this gets because the first
function name that appears is the parent frame. My assumption (which may
need some more surveying) is that for the majority of cases we only care
about the actual frame name (not the parent). So this patch removes all
the special logic that prints the parent frame.
Another quirk of the current format is that the inlined frame name does
not abide by the `${function.name-XXX}` format variables. We always just
print the raw demangled name. With this patch, we would format the
inlined frame name according to the `frame-format` setting (see the
test-cases).
If we really want to have the `parentFrame [inlined] inlinedFrame`
format, we could expose it through a new `frame-format` variable (e..g.,
`${function.inlined-at-name}` and let the user decide where to place
things.
This parameter eliminates a redundant computation for VK_ABS8 in X86 and
reduces reliance on shouldForceRelocation in relaxation decisions.
Note: `local: jmp local@plt` relaxes JMP. This behavior depends on
fixupNeedsRelaxation calling shouldForceRelocation, which might change
in the future.
Both the `CPlusPlusLanguage` plugins and the Swift language plugin
already assume the `sc != nullptr`. And all `FormatEntity` callsites of
`GetFunctionDisplayName` already check for nullptr before passing `sc`.
This patch makes this pre-condition explicit by changing the parameter
to `const SymbolContext &`. This will help with some upcoming changes in
this area.
Removed the calls to `sizeOp` after replacing `SliceOp`:
```
// Remove const_shape size op when it no longer has use point.
Operation *sizeConstShape = sliceOp.getSize().getDefiningOp();
```
Turns out as part of canonicalization, trivially dead ops are removed
anyway, so the above piece of code isn't actually needed.
SVE Operations such as predicated loads become canonicalized to LLVM
masked loads, and doing the same for ptrue(all) to splat(1) creates
further optimization opportunities from generic LLVM IR passes.