We've checked f16/bf16 vector type support using `checkRVVTypeSupport`.
So it's not necessary to add the required features for plain f16/bf16
intrinsics that do not use actual instructions from zvfhmin/zvfbfmin.
Currently if the user enables interleaving during vectorisation of
uncountable early exit loops via the interleave_count pragma and the
enable-early-exit-vectorization option, it will miscompile. There is
ongoing work to fix this, but for now it seems safer to ignore the hint
until it is supported.
---------
Co-authored-by: Paul Walker <paul.walker@arm.com>
Since the use of std::nullopt outside the context of std::optional is
kind of abuse and not intuitive to new comers, this patch deprecates
the constructor. All known uses within the LLVM codebase have been
migrated to other constructors.
Currently FirstActiveLane is not handled correctly during
unrolling. This is currently causing mis-compiles when
vectorizing early-exit loops with interleaving forced.
This patch updates handling of FirstActiveLane to be analogous to
computing final reduction results: during unrolling, the created copies
for its original operand are added as additional operands, and
FirstActiveLane will always produce the index of the first active lane
across all unrolled iterations.
Note that some of the generated code is still incorrect, as we also need
to handle ExtractElement with FirstActiveLane operands. I will share
patches for those soon as well.
PR: https://github.com/llvm/llvm-project/pull/145394
In the pre-legalizer combiner, there exists a bug with UseVectorTruncate
match-apply optimization. When the destinations' types do not match the
vector element type of the G_UNMERGE_VALUES instruction, the resulting
collapsed truncate does not preserve original functional behavior. This
commit introduces a simple type check to ensure that the destination
types match the vector element type.
Given the following example:
```
module {
func.func @main(%arg0: tensor<1x1x1x4x1xf32>, %arg1: tensor<1x1x4xf32>) -> tensor<1x1x1x4x1xf32> {
%pack = linalg.pack %arg1 outer_dims_perm = [1, 2, 0] inner_dims_pos = [2, 0] inner_tiles = [4, 1] into %arg0 : tensor<1x1x4xf32> -> tensor<1x1x1x4x1xf32>
return %pack : tensor<1x1x1x4x1xf32>
}
}
```
We would generate an invalid transpose operation because the calculated
permutation would be `[0, 2, 0]` which is semantically incorrect. As the
permutation must contain unique integers corresponding to the source
tensor dimensions.
The following change modifies how we calculate the permutation array and
ensures that the dimension indices given in the permutation array is
unique.
The above example would then translate to a transpose having a
permutation of `[1, 2, 0]`. Following the rule, that the `inner_dim_pos`
is appended to the permutation array and the preceding indices are
filled with the remaining dimensions.
In TSan, every `k` bytes of application memory (where `k = 8`) maps to a
single shadow/meta cell. This design leads to two distinct outcomes when
calculating the end of a shadow range using `MemToShadow(addr_end)`,
depending on the alignment of `addr_end`:
- **Exclusive End:** If `addr_end` is aligned (`addr_end % k == 0`),
`MemToShadow(addr_end)` points to the first shadow cell *past* the
intended range. This address is an exclusive boundary marker, not a cell
to be operated on.
- **Inclusive End:** If `addr_end` is not aligned (`addr_end % k != 0`),
`MemToShadow(addr_end)` points to the last shadow cell that *is* part of
the range (i.e., the same cell as `MemToShadow(addr_end - 1)`).
Different TSan functions have different expectations for whether the
shadow end should be inclusive or exclusive. However, these expectations
are not always explicitly enforced, which can lead to subtle bugs or
reliance on unstated invariants.
The core of this patch is to ensure that functions ONLY requiring an
**exclusive shadow end** behave correctly.
1. Enforcing Existing Invariants:
For functions like `MetaMap::MoveMemory` and `MapShadow`, the assumption
is that the end address is always `k`-aligned. While this holds true in
the current codebase (e.g., due to some external implicit conditions),
this invariant is not guaranteed by the function's internal context. We
add explicit assertions to make this requirement clear and to catch any
future changes that might violate this assumption.
2. Fixing Latent Bugs:
In other cases, unaligned end addresses are possible, representing a
latent bug. This was the case in `UnmapShadow`. The `size` of a memory
region being unmapped is not always a multiple of `k`. When this
happens, `UnmapShadow` would fail to clear the final (tail) portion of
the shadow memory.
This patch fixes `UnmapShadow` by rounding up the `size` to the next
multiple of `k` before clearing the shadow memory. This is safe because
the underlying OS `unmap` operation is page-granular, and the page size
is guaranteed to be a multiple of `k`.
Notably, this fix makes `UnmapShadow` consistent with its inverse
operation, `MemoryRangeImitateWriteOrResetRange`, which already performs
a similar size round-up.
In summary, this PR:
- **Adds assertions** to `MetaMap::MoveMemory` and `MapShadow` to
enforce their implicit requirement for k-aligned end addresses.
- **Fixes a latent bug** in `UnmapShadow` by rounding up the size to
ensure the entire shadow range is cleared. Two new test cases have been
added to cover this scenario.
- Removes a redundant assertion in `__tsan_java_move`.
- Fixes an incorrect shadow end calculation introduced in commit
4052de6. The previous logic, while fixing an overestimation issue, did
not properly account for `kShadowCell` alignment and could lead to
underestimation.
ArrayRef now has a new constructor that takes a parameter whose type
has data() and size(). This patch migrates:
ArrayRef<T>(X.data(), X.size()
to:
ArrayRef<T>(X)
ArrayRef now has a new constructor that takes a parameter whose type
has data() and size(). This patch migrates:
ArrayRef<T>(X.data(), X.size()
to:
ArrayRef<T>(X)
ArrayRef now has a new constructor that takes a parameter whose type
has data() and size(). This patch migrates:
ArrayRef<T>(X.data(), X.size()
to:
ArrayRef<T>(X)
The disjoint OR (https://github.com/llvm/llvm-project/pull/72583) of two '1's is poison, hence the MSan ought to consider the result uninitialized (rather than initialized - i.e. a false negative - as per the existing instrumentation which ignores disjointedness). This patch adds a flag, `-msan-precise-disjoint-or`, which defaults to false (the legacy behavior). A future patch will default this flag to true.
Updates the test from https://github.com/llvm/llvm-project/pull/145982
We introduced VariantKinds after MCSymbolRefExpr::VariantKind and then
deprecated the VariantKind naming in favor of AtSpecifier (#133214).
Rename the function and type to use the recommended convention.
This is a follow-up to commit 24c860547e ("AMDGPU/MC: Fix emitting
absolute expressions (#136789)").
In some downstream work, we end up with an MCTargetExpr that is a
maximum (AGVK_Max) in an instruction operand. getMachineOpValueCommon
recognizes the absolute nature of the expression and doesn't emit a
fixup. getLitEncoding needs to be aligned with this decision, else we
end up with a 0 immediate without a corresponding fixup.
Note that evaluateAsAbsolute checks for MCConstantExpr as a fast path,
so this accepts strictly more cases than before.
I've tried several ways to write a test for this without success. The
challenge is that there is no upstream way to generate this kind of
expression in an instruction operand natively, and trying to create one
via inline assembly fails because the assembly parser evaluates the
expression to a constant during parsing.
It was assuming that for any location M.N, N was always less than the
number of breakpoint locations. But if you rebuild the target and rerun
multiple times, when the section backing one of the locations is no
longer valid, we remove the location, but we don't reuse the ID. So you
can have a breakpoint that only has location 1.3. The num_locations
check would say that was an invalid location.
This commit adds operations to orc::MemoryAccess for reading basic types
(uint8_t, uint16_t, uint32_t, uint64_t, pointers, buffers, and strings)
from executor memory.
The InProcessMemoryAccess and EPCGenericMemoryAccess implementations are
updated to support the new operations.
c72c0b298c fixed a race condition in Target::GetExecutableModule. The
patch originally added the lock_guard but I suggested using the locking
ModuleList::Modules() helper instead. That didn't consider that the
fallback would still access the ModuleList without holding the lock.
This patch fixes the remaining issue.
The prioritized string table builder was introduced in 9cc9efc. This
patch sets highest priority for the section name to place it at the
start of string table to avoid the issue described in 4d2eda2.
This patch adjusts the cost model to account for the ability of the
AMDGPU optimizer to group together i8 values into i32 values.
Co-authored-by: Erich Keane <ekeane@nvidia.com>