The test crc8.le.tc16 is a valid CRC algorithm, but isn't recognized as
such due to a buggy arePHIsIntertwined, which is asymmetric in its
PHINode arguments. There is also a fundamental correctness issue: the
core functionality is to match a XOR that's a recurrence in both PHI
nodes, ignoring casts, but the user of the XOR is never checked. Rewrite
and rename the function.
crc8.le.tc16 is still not recognized as a valid CRC algorithm, due to an
incorrect check for loop iterations exceeding the bitwidth of the
result: in reality, it should not exceed the bitwidth of LHSAux, but we
leave this fix to a follow-up.
Co-authored-by: Piotr Fusik <p.fusik@samsung.com>
Related: #139231
This patch fixes a crash in the affine-loop-fusion pass when
`getInnermostCommonLoop` is called with an empty list of operations.
The function expects at least one op to analyze, and passing an empty
array of ops causes an assertion failure. This change ensures the pass
checks for an empty op array before calling `getInnermostCommonLoop`.
@bondhugula @matthias-springer
Summary:
This patch uses the actual allocator interface to implement
`aligned_alloc`. We do this by simply rounding up the amount allocated.
Because of how index calculation works, any offset within an allocated
pointer will still map to the same chunk, so we can just adjust
internally and it will free all the same.
This patch is a follow up to https://github.com/llvm/llvm-project/pull/146088 and changes the padding value in the linalg vectorizer from `0` to `ub.poison` in `vector.transfer_read`s created for extracting slices or when vectorizing a generic.
Signed-off-by: Fabian Mora <fabian.mora-cordero@amd.com>
A HLASM source file must end with the END instruction. It is implemented
by adding a new function to the target streamer. This change also turns
SystemZHLASMSAsmString.h into a proper header file, and only uses the
SystemZTargetHLASMStreamer when HLASM output is generated.
This PR resolves https://github.com/llvm/llvm-project/issues/144513
The modification include five pattern :
1.vselect Cond, 0, 0 → 0
2.vselect Cond, -1, 0 → bitcast Cond
3.vselect Cond, -1, x → or Cond, x
4.vselect Cond, x, 0 → and Cond, x
5.vselect Cond, 000..., X -> andn Cond, X
1-4 have been migrated to DAGCombine. 5 still in x86 code.
The reason is that you cannot use the andn instruction directly in
DAGCombine, you can only use and+xor, which will introduce optimization
order issues. For example, in the x86 backend, select Cond, 0, x →
(~Cond) & x, the backend will first check whether the cond node of
(~Cond) is a setcc node. If so, it will modify the comparison operator
of the condition.So the x86 backend cannot complete the optimization of
andn.In short, I think it is a better choice to keep the pattern of
vselect Cond, 000..., X instead of and+xor in combineDAG.
For commit, the first is code changes and x86 test(note 1), the second
is tests in other backend(node 2).
---------
Co-authored-by: Simon Pilgrim <llvm-dev@redking.me.uk>
Despite the error message for preempted jobs containing the words
"cancelled", these are considered workflow "failures" by github.
This is important, because if we fail to distinguish between "failed"
and "cancelled" jobs, the restarter will fight to restart jobs a user
intentionally cancelled (either by pressing the "cancel" button, or by
pushing an update to a PR).
This reverts commit 3ea7fc7339. This also
reverts earlier attempts to solve this problem by matching the messages
to detect manual cancellations.
This change also removes ldionne's test workflow, as its hard to
correctly keep in sync.
This change does not attempt to address the maintainability or
testability of this script, which continues to be an issue. If asked to
address these issues, my plan is to write the script in python (which
most people are more familar with), and turn this action into a "docker
action" using a container with the python action and dependencies built
into it. Let me know if that's a direction we're interested in heading.
This patch adds back the needed AutoConvert.h header and removes the
unneeded include guard of MVS to prevent this header from being removed
in the future
The actual `unordered_map` tests live in
`data-formatter-stl/generic/unordered`. The tests here are only testing
`std::unordered_map::iterator`. This patch renames the directory
accordingly. This is in preparation for moving all of the STL tests into
the `generic` directory.
In most contexts the pointer type is implied by the operation
and should be propagated; getPointerTy is for niche cases where
there is a synthesized value.
GenericKernelTy has a pointer to the name that was used to create it.
However, the name passed in as an argument may not outlive the kernel.
Instead, GenericKernelTy now contains a std::string, and copies the
name into there.
This is a workaround for
https://github.com/llvm/llvm-project/issues/82050 by skipping the `DllMain` symbol if seen in aimport library. If this situation occurs, after this commit a warning will also be displayed. The warning can be silenced with `/ignore:exporteddllmain`
- Update the main README to reflect the current project status
- Rework the main API generation documentation. General fixes/tidying,
but also spell out explicitly how to make API changes at the top of the
document since this is what most people will care about.
---------
Co-authored-by: Martin Grant <martingrant@outlook.com>
Add handling for FPFastMathMode in SPIR-V shaders. This is a first pass
that
simply does a direct translation when the proper extension is available.
This will unblock work for HLSL. However, it is not a full solution.
The default math mode for spir-v is determined by the API. When
targeting Vulkan many of the fast math options are assumed. We should do
something particular when targeting Vulkan.
We will also need to handle the hlsl "precise" keyword correctly when
FPFastMathMode is not available.
Unblockes https://github.com/llvm/llvm-project/issues/140739, but we are
keeing it open to track the remaining issues mentioned above.
This patch allows urem by a constant to be expanded more efficiently to
avoid the need for expensive udiv instructions. This is part of the
resolution to issue #118090
#144648 was reverted because it failed the new sanitizer test
`munmap_clear_shadow.c` in IOS's CI.
That issue could be fixed by disabling the test on some platforms, due
to the incompatibility of the test on these platforms.
In detail, we should disable the test in FreeBSD, Apple, NetBSD,
Solaris, and Haiku, where `ReleaseMemoryPagesToOS` executes
`madvise(beg, end, MADV_FREE)`, which tags the relevant pages as 'FREE'
and does not release them immediately.
Consider IR such as this:
for.body:
%iv = phi i64 [ 0, %entry ], [ %iv.next, %for.body ]
%accum = phi i32 [ 0, %entry ], [ %add, %for.body ]
%gep.a = getelementptr i8, ptr %a, i64 %iv
%load.a = load i8, ptr %gep.a, align 1
%ext.a = zext i8 %load.a to i32
%add = add i32 %ext.a, %accum
%iv.next = add i64 %iv, 1
%exitcond.not = icmp eq i64 %iv.next, 1025
br i1 %exitcond.not, label %for.exit, label %for.body
Conceptually we can vectorise this using partial reductions too,
although the current loop vectoriser implementation requires the
accumulation of a multiply. For AArch64 this is easily done with
a udot or sdot with an identity operand, i.e. a vector of (i16 1).
In order to do this I had to teach getScaledReductions that the
accumulated value may come from a unary op, hence there is only
one extension to consider. Similarly, I updated the vplan and
AArch64 TTI cost model to understand the possible unary op.
---------
Co-authored-by: Matt Devereau <matthew.devereau@arm.com>
Summary:
This avoids a ptrtoint by just using the clang builtin. This is clang
specific but only clang can compile GPU code anyway so I do not bother
with a fallback.
This patch fixes a possible data race between main and event handler
threads. Terminated event can be sent from `Disconnect` function or
event handler. Consequently, there are some possible sequences of
events. We must check events twice, because without getting an exited
event, `exit_status` will be None. But, we don't know the order of
events (for example, we can get terminated event before exited event),
so we check events by filter. It is correct, because terminated event
will be sent only once (guarded by `llvm::call_once`).
This patch moved from
[145010](https://github.com/llvm/llvm-project/pull/145010) and based on
idea from this
[comment](https://github.com/llvm/llvm-project/pull/145010#discussion_r2159637210).
This patch is part of a series that adds origin-tracking to the debugify
source location coverage checks, allowing us to report symbolized stack
traces of the point where missing source locations appear.
This patch adds a pair of new functions in `signals.h` that can be used
to collect and symbolize stack traces respectively. This has major
implementation overlap with the existing stack trace
collection/symbolizing methods, but the existing functions are
specialized for dumping a stack trace to stderr when LLVM crashes, while
these new functions are meant to be called repeatedly during the
execution of the program, and therefore we need a separate set of
functions.
Code generation predicates like HasSVE2_or_SME implemented a strict
divide between streaming and non-streaming which meant some SME
instructions were not available unless a matching SVE feature was
enabled.
Always try to fold freeze(op(....)) -> op(freeze(),freeze(),freeze(),...).
This patch proposes we drop the opt-in limit for opcodes that are allowed to push a freeze through the op to freeze all its operands, through the tree towards the roots.
I'm struggling to find a strong reason for this limit apart from the DAG freeze handling being immature for so long - as we've improved coverage in canCreateUndefOrPoison/isGuaranteedNotToBeUndefOrPoison it looks like the regressions are not as severe.
Hopefully this will help some of the regression issues in #143102 etc.
From #143177. This combines the summaries for the pre- and post C++ 11
`std::string` as well as `std::wstring`. In all cases, the data pointer
is reachable through `_M_dataplus._M_p`. It has the correct type (i.e.
`char*`/`wchar_t*`) and it's null terminated, so LLDB knows how to
format it as expected when using `GetSummaryAsCString`.
A reverse interleave access is essentially composed of multiple
load/store operations with same negative stride, and their addresses are
based on the last lane address of member 0 in the interleaved group.
Currently, we already have VPVectorEndPointerRecipe for computing the
last lane address of consecutive reverse memory accesses. This patch
extends VPVectorEndPointerRecipe to support constant stride and extracts
the reverse interleave group address adjustment from
VPInterleaveRecipe::execute, replacing it with a
VPVectorEndPointerRecipe.
The final goal is to support interleaved accesses with EVL tail folding.
Given that VPInterleaveRecipe is large and tightly coupled — combining
both load and store, and embedding operations like reverse pointer
adjustion (GEP), widen load/store, deinterleave/interleave, and reversal
— breaking it down into smaller, dedicated recipes may allow
VPlanTransforms::tryAddExplicitVectorLength to lower them into EVL-aware
form more effectively.
One foreseeable challenge is that
VPlanTransforms::convertToConcreteRecipes currently runs after
tryAddExplicitVectorLength, so decomposing VPInterleaveRecipe will
likely need to happen earlier in the pipeline to be effective.
As discussed in PR #142353, the current testsuite of the `clang` Python
bindings has several issues:
- It `libclang.so` cannot be loaded into `python` to run the testsuite,
the whole `ninja check-all` aborts.
- The result of running the testsuite isn't report like the `lit`-based
tests, rendering them almost invisible.
- The testsuite is disabled in a non-obvious way (`RUN_PYTHON_TESTS`) in
`tests/CMakeLists.txt`, which again doesn't show up in the test results.
All these issues can be avoided by integrating the Python bindings tests
with `lit`, which is what this patch does:
- The actual test lives in `clang/test/bindings/python/bindings.sh` and
is run by `lit`.
- The current `clang/bindings/python/tests` directory (minus the
now-subperfluous `CMakeLists.txt`) is moved into the same directory.
- The check if `libclang` is loadable (originally from PR #142353) is
now handled via a new `lit` feature, `libclang-loadable`.
- The various ways to disable the tests have been turned into `XFAIL`s
as appropriate. This isn't complete and not completely tested yet.
Tested on `sparc-sun-solaris2.11`, `sparcv9-sun-solaris2.11`,
`i386-pc-solaris2.11`, `amd64-pc-solaris2.11`, `i686-pc-linux-gnu`, and
`x86_64-pc-linux-gnu`.
Co-authored-by: Rainer Orth <ro@gcc.gnu.org>
Support TLSDESC to initial-exec or local-exec optimizations. Introduce a
new hook RE_LOONGARCH_RELAX_TLS_GD_TO_IE_PAGE_PC and use existing
R_RELAX_TLS_GD_TO_IE_ABS to support TLSDESC => IE, while use existing
R_RELAX_TLS_GD_TO_LE to support TLSDESC => LE.
In normal or medium code model, there are two forms of code sequences:
* pcalau12i $a0, %desc_pc_hi20(sym_desc)
* addi.d $a0, $a0, %desc_pc_lo12(sym_desc)
* ld.d $ra, $a0, %desc_ld(sym_desc)
* jirl $ra, $ra, %desc_call(sym_desc)
------
* pcaddi $a0, %desc_pcrel_20(sym_desc)
* ld.d $ra, $a0, %desc_ld(sym_desc)
* jirl $ra, $ra, %desc_call(sym_desc)
Convert to IE:
* pcalau12i $a0, %ie_pc_hi20(sym_ie)
* ld.[wd] $a0, $a0, %ie_pc_lo12(sym_ie)
Convert to LE:
* lu12i.w $a0, %le_hi20(sym_le) # le_hi20 != 0, otherwise NOP
* ori $a0 src, %le_lo12(sym_le) # le_hi20 != 0, src = $a0, otherwise src = $zero
Simplicity, whether tlsdescToIe or tlsdescToLe, we always tend to
convert the preceding instructions to NOPs, due to both forms of code
sequence (corresponding to relocation combinations:
R_LARCH_TLS_DESC_PC_HI20+R_LARCH_TLS_DESC_PC_LO12 and
R_LARCH_TLS_DESC_PCREL20_S2) have same process.
TODO: When relaxation enables, redundant NOPs can be removed. It will be
implemented in a future patch.
Note: All forms of TLSDESC code sequences should not appear interleaved
in the normal, medium or extreme code model, which compilers do not
generate and lld is unsupported. This is thanks to the guard in
PostRASchedulerList.cpp in llvm.
```
Calls are not scheduling boundaries before register allocation,
but post-ra we don't gain anything by scheduling across calls
since we don't need to worry about register pressure.
```
Add `dead_on_return` attribute, which is meant to be taken advantage
by the frontend, and states that the memory pointed to by the argument
is dead upon function return. As with `byval`, it is supposed to be
used for passing aggregates by value. The difference lies in the ABI:
`byval` implies that the pointer is explicitly passed as argument to
the callee (during codegen the copy is emitted as per byval contract),
whereas a `dead_on_return`-marked argument implies that the copy
already exists in the IR, is located at a specific stack offset within
the caller, and this memory will not be read further by the caller upon
callee return – or otherwise poison, if read before being written.
RFC: https://discourse.llvm.org/t/rfc-add-dead-on-return-attribute/86871.
Erasing/replacing an op, which is also the current insertion point,
invalidates the insertion point. Explicitly set the insertion point, so
that `copy` does not crash after the One-Shot Dialect Conversion
refactoring. (`ConversionPatternRewriter` will start behaving more like
a "normal" rewriter.)
Fold icmp between a chain of geps and its base pointer. Previously only
a single gep was supported.
This will be extended to handle the case of two gep chains with a common
base in a followup.
This helps to avoid regressions after #137297.
Just like 1d-tensors, reshapes of 0d-tensors (aka scalars) are always
no-folds as they only have one possible layout. This PR adds logic to
the `fold` implementation to optimize these away as is currently
implemented for 1d tensors.
This reverts commit 1108cf6419.
Caused a regression for a weird but interesting case (STT_SECTION symbol
as group signature). We no longer define `sec`
```
.section sec,"ax"
.section .foo,"axG",@progbits,sec
nop
```
Fix#146581