This removes the `ol_impl_result_t` helper class, replacing it with
`llvm::Error`. In addition, some internal functions that returned
`ol_errc_t` now return `llvm::Error` (with a fancy message).
- DXILDataScalarization should not just be limited to global data
- Add a scalarization for alloca
- Add ReversePostOrderTraversal of functions and iterate over basic
blocks and run DataScalarizerVisitor.
- fixes#140143
Adds resource name argument to resource class constructors and to builtin functions that initialize resource handles
`__builtin_hlsl_resource_handlefrombinding` and `__builtin_hlsl_resource_handlefromimplicitbinding`.
Part 1/4 of https://github.com/llvm/llvm-project/issues/105059
This is still leftover from the days when the libc++ and libstdc++
formatters were both written in python and in separate categories. Since
then we group libstdc++ and libc++ formatters into the same cateogry.
This patch removes references to the obsolete `gnu-libstdc++` category
from the docs (and a test).
See [this
thread](https://github.com/llvm/llvm-project/pull/140761#discussion_r2102386080)
for more context
This breaks in the case where there are unreachable blocks after an
entry block with no successors, which don't have a `BBInfo`, causing
crashes.
`BBInfo` doesn't exist for unreachable blocks, see
https://reviews.llvm.org/D27280.
Fixes#135828.
Recently, I was on an issue that generated a large number of Coredumps,
and every time in both LLDB and GDB the signal was just `SIGSEGV`.
This was frustrating because we would expect a `SIGSEGV` to have an
address, or ideally even bounds. After some digging I found the
`si_code` consistently was -6. With some help from
[@cdown](https://github.com/cdown), we found neither LLDB or GDB
supports the si_codes sent from [user
space](https://github.com/torvalds/linux/blob/master/include/uapi/asm-generic/siginfo.h#L185).
Excerpted from the sigaction man page.
```
For a regular signal, the following list shows the values which
can be placed in si_code for any signal, along with the reason
that the signal was generated.
```
For which I added all of the si_codes to every Linux signal. Now for the
Coredump that triggered this whole investigation we get the accurate and
now very informative summary.
<img width="524" alt="image"
src="https://github.com/user-attachments/assets/5149f781-ef21-4491-a077-8fac862fbc20"
/>
Additionally from @labath's suggestion to move this to platform and
leverage the existing `getSiginfo()` call on thread, we can now inspect
the siginfo struct itself via `thread siginfo`. Giving us another
towards GDB parity on elf cores.
This patch fixes:
clang/utils/TableGen/ClangBuiltinTemplatesEmitter.cpp:25:8: error:
'llvm::StringSet' may not intend to support class template argument
deduction [-Werror,-Wctad-maybe-unsupported]
New contributors can just indicate that they are working on the issue
without requesting assignment.
That shouldd reduce the burden of assigned issues that are not actually
being worked on, and new contributors waiting for a maintainer to
asssign them the issue.
---------
Co-authored-by: Danny Mösch <danny.moesch@icloud.com>
This patch moves scalable vectorization tests into an existing generic
vectorization test file:
* vectorization-scalable.mlir --> merged into vectorization.mlir
Rationale:
* Most tests in vectorization-scalable.mlir are variants of existing
tests in vectorization.mlir. Keeping them together improves
maintainability.
* Consolidating tests makes it easier to spot gaps in coverage for
regular vectorization.
* In the Vector dialect, we don't separate tests for scalable vectors;
this change aligns Linalg with that convention.
Notable changes beyond moving tests:
* Updated one of the two matrix-vector multiplication tests to use
`linalg.matvec` instead of `linalg.generic`. CHECK lines remain
unchanged.
* Simplified the lone `linalg.index` test by removing an unnecessary
`tensor.extract`. Also removed canonicalization patterns from the
TD sequence for consistency with other tests.
This patch contributes to the implementation of #141025 — please refer
to that ticket for full context.
Not explicitly defining the default case for ShallowCopy* functions does
not meet the requirements for gcc to actually instantiate the templates,
leading to build errors that show up with gcc but not with clang.
Signed-off-by: Kajetan Puchalski <kajetan.puchalski@arm.com>
This addresses a TODO where previously scalarizeBinopOrCmp
conservatively bailed if one of the operands was a load.
getVectorInstrCost was updated to take in values in
https://reviews.llvm.org/D140498 so we can pass in the scalar value to
be inserted, which should return an accurate cost for a gather.
To prevent regressions on x86 this tries to constant fold NewVecC up
front so we can pass it into TTI and get a more accurate cost.
We want to remove this restriction on RISC-V since this is always
profitable whether or not the scalar is a load.
In `applyAdrpAddLdr()` we make a transformation that is identical to the
one in `applyAdrpAdd()`, so lets reuse that code. Also refactor
`forEachHint()` to use more `ArrayRef` and move around some lines for
consistancy.
Move VZEXT_MOVL nodes up through shift nodes.
We should be trying harder to move VZEXT_MOVL towards any associated SCALAR_TO_VECTOR nodes to make use of MOVD/Q implicit zeroing of upper elements.
Fixes#141475
The `GCNScheduleDAGMILive`'s `RescheduleRegions` bitvector is only used
by the rematerialization stage (`PreRARematStage`). Its presence in the
scheduler's state forces us to maintain its value throughout scheduling
even though it is of no use to the iterative scheduling process itself,
which instead relies on each stage's `initGCNRegion` hook to determine
whether the current region should be rescheduled.
This moves the bitvector to the `PreRARematStage`, which uses it to
store the set of regions that must be rescheduled between stage
initialization and region initialization.
This NFC also swaps a call to `GCNRegPressure::getArchVGPRNum(false)`
for a call to `GCNRegPressure::getArchVGPRNum()`---which is equivalent
but simpler in the context---and makes
`GCNSchedStage::finalizeGCNRegion` use its own API to advance to the
next region.
asin, acos, atan, and atan2 were being lowered to libm calls instead of
llvm intrinsics. Add the conversion patterns to handle these intrinsics
and update tests to expect this.
Update initial construction to connect the Plan's entry to the scalar
preheader during initial construction. This moves a small part of the
skeleton creation out of ILV and will also enable replacing
VPInstruction::ResumePhi with regular VPPhi recipes.
Resume phis need 2 incoming values to start with, the second being the
bypass value from the scalar ph (and used to replicate the incoming
value for other bypass blocks). Adding the extra edge ensures we
incoming values for resume phis match the incoming blocks.
PR: https://github.com/llvm/llvm-project/pull/140132
This is a NFC change.
Added "-mattr=-real-true16" to a few gfx12 tests. This is for the up
coming GFX12 true16 code change. Set these tests to use fake16 flow
since true16 mode are not fully functional for GISEL
Fixes errors about duplicate PHI edges when the input had duplicates
with constexprs in them. The constexpr translation makes new basic
blocks, causing the verifier to complain about duplicate entries in PHI
nodes.
When determining whether an escape source may alias with a noalias
argument, only take provenance captures into account. If only the
address of the argument was captured, an access through the escape
source is not legal.
This patch updates `CombineContractBroadcastMask` to inherit from
`MaskableOpRewritePattern`, enabling it to handle masked
`vector.contract` operations. The pattern rewrites:
```mlir
%a = vector.broadcast %a_bc
%res vector.contract %a_bc, %b, ...
```
into:
```mlir
// Move the broadcast into vector.contract (by updating the indexing
// maps)
%res vector.contract %a, %b, ...
```
The main challenge is supporting cases where the pattern drops a leading
unit dimension. For example:
```mlir
func.func @contract_broadcast_unit_dim_reduction_masked(
%arg0 : vector<8x4xi32>,
%arg1 : vector<8x4xi32>,
%arg2 : vector<8x8xi32>,
%mask: vector<1x8x8x4xi1>) -> vector<8x8xi32> {
%0 = vector.broadcast %arg0 : vector<8x4xi32> to vector<1x8x4xi32>
%1 = vector.broadcast %arg1 : vector<8x4xi32> to vector<1x8x4xi32>
%result = vector.mask %mask {
vector.contract {
indexing_maps = [#map0, #map1, #map2],
iterator_types = ["reduction", "parallel", "parallel", "reduction"],
kind = #vector.kind<add>
} %0, %1, %arg2 : vector<1x8x4xi32>, vector<1x8x4xi32> into vector<8x8xi32>
} : vector<1x8x8x4xi1> -> vector<8x8xi32>
return %result : vector<8x8xi32>
}
```
Here, the leading unit dimension is dropped. To handle this, the mask is
cast to the correct shape using a `vector.shape_cast`:
```mlir
func.func @contract_broadcast_unit_dim_reduction_masked(
%arg0: vector<8x4xi32>,
%arg1: vector<8x4xi32>,
%arg2: vector<8x8xi32>,
%arg3: vector<1x8x8x4xi1>) -> vector<8x8xi32> {
%mask_sc = vector.shape_cast %arg3 : vector<1x8x8x4xi1> to vector<8x8x4xi1>
%res = vector.mask %mask_sc {
vector.contract {
indexing_maps = [#map, #map1, #map2],
iterator_types = ["parallel", "parallel", "reduction"],
kind = #vector.kind<add>
} %arg0, %arg1, %mask_sc : vector<8x4xi32>, vector<8x4xi32> into vector<8x8xi32>
} : vector<8x8x4xi1> -> vector<8x8xi32>
return %res : vector<8x8xi32>
}
```
While this isn't ideal - since it introduces a `vector.shape_cast` that
must be cleaned up later - it reflects the best we can do once the input
reaches `CombineContractBroadcastMask`. A more robust solution may
involve simplifying the input earlier. I am leaving that as a TODO for
myself to explore this further. Posting this now to unblock downstream
work.
LIMITATIONS
Currently, this pattern assumes:
* Only leading dimensions are dropped in the mask.
* All dropped dimensions must be unit-sized.
This refactor was motivated by two bugs identified in out-of-tree
builds:
1. Some implementations of the VisitMembersFunction type (often used to
implement special loading semantics, e.g. -all_load or -ObjC) were assuming
that buffers for archive members were null-terminated, which they are not in
general. This was triggering occasional assertions.
2. Archives may include multiple members with the same file name, e.g.
when constructed by appending files with the same name:
% llvm-ar crs libfoo.a foo.o
% llvm-ar q libfoo.a foo.o
% llvm-ar t libfoo.a foo.o
foo.o
While confusing, these members may be safe to link (provided that they're
individually valid and don't define duplicate symbols). In ORC however, the
archive member name may be used to construct an ORC initializer symbol,
which must also be unique. In that case the duplicate member names lead to a
duplicate definition error even if the members define unrelated symbols.
In addition to these bugs, StaticLibraryDefinitionGenerator had grown a
collection of all member buffers (ObjectFilesMap), a BumpPtrAllocator
that was redundantly storing synthesized archive member names (these are
copied into the MemoryBuffers created for each Object, but were never
freed in the allocator), and a set of COFF-specific import files.
To fix the bugs above and simplify StaticLibraryDefinitionGenerator this
patch makes the following changes:
1. StaticLibraryDefinitionGenerator::VisitMembersFunction is generalized
to take a reference to the containing archive, and the index of the
member within the archive. It now returns an Expected<bool> indicating
whether the member visited should be treated as loadable, not loadable,
or as invalidating the entire archive.
2. A static StaticLibraryDefinitionGenerator::createMemberBuffer method
is added which creates MemoryBuffers with unique names of the form
`<archive-name>[<index>](<member-name>)`. This defers construction of
member names until they're loaded, allowing the BumpPtrAllocator (with
its redundant name storage) to be removed.
3. The ObjectFilesMap (symbol name -> memory-buffer-ref) is replaced
with a SymbolToMemberIndexMap (symbol name -> index) which should be
smaller and faster to construct.
4. The 'loadability' result from VisitMemberFunctions is now taken into
consideration when building the SymbolToMemberIndexMap so that members
that have already been loaded / filtered out can be skipped, and do not
take up any ongoing space.
5. The COFF ImportedDynamicLibraries member is moved out into the
COFFImportFileScanner utility, which can be used as a
VisitMemberFunction.
This fixes the bugs described above; and should lower memory consumption
slightly, especially for archives with many files and / or symbol where
most files are eventually loaded.
Adds support for operand promotion and splitting/widening the result
of the ISD::GET_ACTIVE_LANE_MASK node.
For AArch64, shouldExpandGetActiveLaneMask now returns false for more
types which we know can be legalised.
Currently we generate an incorrect suggestion for shared/unique pointers
to arrays; for instance ([Godbolt](https://godbolt.org/z/Tens1reGP)):
```c++
#include <memory>
void test_shared_ptr_to_array() {
std::shared_ptr<int[]> i;
auto s = sizeof(*i.get());
}
```
```
<source>:5:20: warning: redundant get() call on smart pointer [readability-redundant-smartptr-get]
5 | auto s = sizeof(*i.get());
| ^~~~~~~
| i
1 warning generated.
```
`sizeof(*i)` is incorrect, though, because the array specialization of
`std::shared/unique_ptr` does not have an `operator*()`. Therefore I
have disabled this check for smart pointers to arrays for now; future
work could, of course, improve on this by suggesting, say,
`sizeof(i[0])` in the above example.