This patch adds support to the haswell sub-architecture (x86_64h) to
scripted processes.
rdar://147208252
Signed-off-by: Med Ismail Bennani <ismail@bennani.ma>
While the source code isn't supposed to change during a build, in some
environments it does. This adds an option that disables caching of stat
failures, meaning that source files can be added to the build during
scanning.
This adds a `-no-cache-negative-stats` option to clang-scan-deps to
enable this behavior. There are no tests for clang-scan-deps as there's
no reliable way to do so from it. A unit test has been added that
modifies the filesystem between scans to test it.
When code calls a function which then immediately tail calls another
function there is no need to go via the intermediate function. By
branching directly to the target function we reduce the program's working
set for a slight increase in runtime performance.
Normally it is relatively uncommon to have functions that just tail call
another function, but with LLVM control flow integrity we have jump tables
that replace the function itself as the canonical address. As a result,
when a function address is taken and called directly, for example after
a compiler optimization resolves the indirect call, or if code built
without control flow integrity calls the function, the call will go via
the jump table.
The impact of this optimization was measured using a large internal
Google benchmark. The results were as follows:
CFI enabled: +0.1% ± 0.05% queries per second
CFI disabled: +0.01% queries per second [not statistically significant]
The optimization is enabled by default at -O2 but may also be enabled
or disabled individually with --{,no-}branch-to-branch.
This optimization is implemented for AArch64 and X86_64 only.
lld's runtime performance (real execution time) after adding this
optimization was measured using firefox-x64 from lld-speed-test [1]
with ldflags "-O2 -S" on an Apple M2 Ultra. The results are as follows:
```
N Min Max Median Avg Stddev
x 512 1.2264546 1.3481076 1.2970261 1.2965788 0.018620888
+ 512 1.2561196 1.3839965 1.3214632 1.3209327 0.019443971
Difference at 95.0% confidence
0.0243538 +/- 0.00233202
1.87831% +/- 0.179859%
(Student's t, pooled s = 0.0190369)
```
[1] https://discourse.llvm.org/t/improving-the-reproducibility-of-linker-benchmarking/86057
Pull Request: https://github.com/llvm/llvm-project/pull/138366
CPA stands for Checked Pointer Arithmetic and is part of the 2023 MTE
architecture extensions for A-profile.
The new CPA instructions perform regular pointer arithmetic (such as
base register + offset) but check for overflow in the most significant
bits of the result, enhancing security by detecting address tampering.
In this patch we intend to capture the semantics of pointer arithmetic
when it is not folded into loads/stores, then generate the appropriate
scalar CPA instructions. In order to preserve pointer arithmetic
semantics through the backend, we use the PTRADD SelectionDAG node type.
Use backend option `-aarch64-use-featcpa-codegen=true` to enable CPA
CodeGen (for a target with CPA enabled).
The story of this PR is that initially it introduced the PTRADD
SelectionDAG node and the respective visitPTRADD() function, adapted
from the CHERI/Morello LLVM tree. The original authors are
@davidchisnall, @jrtc27, @arichardson.
After a while, @ritter-x2a took the part of the code that was
target-independent and merged it separately in #140017. This PR thus
remains as the AArch64-part only.
Mode details about the CPA extension can be found at:
-
https://community.arm.com/arm-community-blogs/b/architectures-and-processors-blog/posts/arm-a-profile-architecture-developments-2023
- https://developer.arm.com/documentation/ddi0602/2023-09/ (e.g ADDPT
instruction)
This PR follows #79569.
It does not address vector FEAT_CPA instructions.
This change speeds up fragment matching for large BOLTed binaries where
all fragments of global parent functions are put under `bolt-pseudo.o`
file symbol:
- before: iterating over symbols under `bolt-pseudo.o` only to fail
to find a parent,
- after: bail out immediately and use a global parent by name.
Test Plan: NFC, updated register-fragments-bolt-symbols.s
`BoltAddressTranslation::getFallthroughsInTrace` iterates over address
translation map entries and therefore has direct access to both original
and translated offsets. Return the translated offsets in fall-throughs
list to avoid duplicate address translation inside `doTrace`.
Test Plan: NFC
fixes#144608
- there is a getPointerOperandIndex function so we don't need to iterate
the operands trying to find the pointer. This resulted in a small
cleanup to visitStoreInst and visitLoadInst.
- The meat of this change was in visitGetElementPtrInst to account for
allocas and not bail when we don't find a global.
Going mostly by the comment here - but it says "vscale is not
necessarily a power-of-2". Both in tree targets have vscale as a power
of two, and we have an existing TTI hook for that.
This patch adds the `PtrLikeTypeInterface` type interface to identify
pointer-like types. This interface is defined as:
```
A ptr-like type represents an object storing a memory address. This object
is constituted by:
- A memory address called the base pointer. This pointer is treated as a
bag of bits without any assumed structure. The bit-width of the base
pointer must be a compile-time constant. However, the bit-width may remain
opaque or unavailable during transformations that do not depend on the
base pointer. Finally, it is considered indivisible in the sense that as
a `PtrLikeTypeInterface` value, it has no metadata.
- Optional metadata about the pointer. For example, the size of the memory
region associated with the pointer.
Furthermore, all ptr-like types have two properties:
- The memory space associated with the address held by the pointer.
- An optional element type. If the element type is not specified, the
pointer is considered opaque.
```
This patch adds this interface to `!ptr.ptr` and the `memref` type.
Furthermore, this patch adds necessary ops and type to handle casting
between `!ptr.ptr` and ptr-like types.
First, it defines the `!ptr.ptr_metadata` type. An opaque type to
represent the metadata of a ptr-like type. The rationale behind adding
this type, is that at high-level the metadata of a type like `memref`
cannot be specified, as its structure is tied to its lowering.
The `ptr.get_metadata` operation was added to extract the opaque pointer
metadata. The concrete structure of the metadata is only known when the
op is lowered.
Finally, this patch adds the `ptr.from_ptr` and `ptr.to_ptr` operations.
Allowing to cast back and forth between `!ptr.ptr` and ptr-like types.
```mlir
func.func @func(%mr: memref<f32, #ptr.generic_space>) -> memref<f32, #ptr.generic_space> {
%ptr = ptr.to_ptr %mr : memref<f32, #ptr.generic_space> -> !ptr.ptr<#ptr.generic_space>
%mda = ptr.get_metadata %mr : memref<f32, #ptr.generic_space>
%res = ptr.from_ptr %ptr metadata %mda : !ptr.ptr<#ptr.generic_space> -> memref<f32, #ptr.generic_space>
return %res : memref<f32, #ptr.generic_space>
}
```
It's future work to replace and remove the `bare-ptr-convention` through
the use of these ops.
---------
Co-authored-by: Mehdi Amini <joker.eph@gmail.com>
Newly introduced Atomic.cpp fails to compile on its own, but somehow
compiles fine in the build. Maybe it's because PCH, but it needs to be
fixed nevertheless.
Fixes#141136
- Implement `visitExtractElementInst` and `visitInsertElementInst` in
`DXILDataScalarizerVisitor` to scalarize `extractelement` and
`insertelement` instructions whose index operand is not a `ConstantInt`
by converting the vector to an array and then loading from the array
- Rename the `replaceVectorWithArray` helper function to
`equivalentArrayTypeFromVector`, relocate the function toward the top of
the file, and remove the unused `Ctx` parameter
This tries to be a bit smarter for the OLD behaviour of CMP0116, to glob
more relevant directories looking for possible dependencies.
The changes are:
- Remove some duplication of lines in the `tablegen` function.
- Put CURRENT_SOURCE_DIR into `tblgen_includes` (at the front)
- Glob all directories in `tblgen_includes`
- Give up on `local_tds` which was wrong when using tablegen to compile
a file in a different directory (as TargetParser does)
- Use `EXTRA_INCLUDES` in TargetParser `tablegen` calls.
This is still an under-approximation of what might be included, at least
comparing the RISCVTargetParserDef.inc.d (after building
`target_parser_gen`), and the list of deps in the ninja file when
explicitly setting CMP0116 to OLD.
Fixes#144639
Changes:
* Decouple layout propagation from subgroup distribution and move it to
an independent pass.
* Refine layout assignment to handle control-flow ops correctly (scf.for, scf.while).
* Refine test cases.
In Aya/Rust, BPF map definitions are nested in two nested types:
* A struct representing the map type (e.g., `HashMap`, `RingBuf`) that
provides methods for interacting with the map type (e.g. `HashMap::get`,
`RingBuf::reserve`).
* An `UnsafeCell`, which informs the Rust compiler that the type is
thread-safe and can be safely mutated even as a global variable. The
kernel guarantees map operation safety.
This leads to a type hierarchy like:
```rust
pub struct HashMap<K, V, const M: usize, const F: usize = 0>(
core::cell::UnsafeCell<HashMapDef<K, V, M, F>>,
);
const BPF_MAP_TYPE_HASH: usize = 1;
pub struct HashMapDef<K, V, const M: usize, const F: usize = 0> {
r#type: *const [i32; BPF_MAP_TYPE_HASH],
key: *const K,
value: *const V,
max_entries: *const [i32; M],
map_flags: *const [i32; F],
}
```
Then used in the BPF program code as a global variable:
```rust
#[link_section = ".maps"]
static HASH_MAP: HashMap<u32, u32, 1337> = HashMap::new();
```
Which is an equivalent of the following BPF map definition in C:
```c
#define BPF_MAP_TYPE_HASH 1
struct {
int (*type)[BPF_MAP_TYPE_HASH];
typeof(int) *key;
typeof(int) *value;
int (*max_entries)[1337];
} map_1 __attribute__((section(".maps")));
```
Accessing the actual map definition requires traversing:
```
HASH_MAP -> __0 -> value
```
Previously, the BPF backend only visited the pointee types of the
outermost struct, and didn’t descend into inner wrappers. This caused
issues when the key/value types were custom structs:
```rust
// Define custom structs for key and values.
pub struct MyKey(u32);
pub struct MyValue(u32);
#[link_section = ".maps"]
#[export_name = "HASH_MAP"]
pub static HASH_MAP: HashMap<MyKey, MyValue, 10> = HashMap::new();
```
These types weren’t fully visited and appeared in BTF as forward
declarations:
```
#30: <FWD> 'MyKey' kind:struct
#31: <FWD> 'MyValue' kind:struct
```
The fix is to enhance `visitMapDefType` to recursively visit inner
composite members. If a member is a composite type (likely a wrapper),
it is now also visited using `visitMapDefType`, ensuring that the
pointee types of the innermost stuct members, like `MyKey` and
`MyValue`, are fully resolved in BTF.
With this fix, the correct BTF entries are emitted:
```
#6: <STRUCT> 'MyKey' sz:4 n:1
#00 '__0' off:0 --> [7]
#7: <INT> 'u32' bits:32 off:0
#8: <PTR> --> [9]
#9: <STRUCT> 'MyValue' sz:4 n:1
#00 '__0' off:0 --> [7]
```
Fixes: #143361
This patch adds an off-by-default flag which, when enabled via `-mllvm -msan-poison-undef-vectors=true`, fixes a false negative in MSan (partially-undefined constant fixed-length vectors). It is currently off by default since, by fixing the false positive, code/tests that previously passed MSan may start failing. The default will be changed in a future patch.
Prior to this patch, MSan computes that partially-undefined constant fixed-length vectors are fully initialized, which leads to false negatives; moreover, benign vector rewriting could theoretically flip MSan's shadow computation from initialized to uninitialized or vice-versa (*). `-msan-poison-undef-vectors=true` calculates the shadow precisely: for each element of the vector, the corresponding shadow is fully uninitialized if the element is undefined/poisoned, otherwise it is fully initialized.
Updates the test from https://github.com/llvm/llvm-project/pull/143823
(*) For example:
```
%x = insertelement <2 x i64> <i64 0, i64 poison>, i64 42, i64 0
%y = insertelement <2 x i64> <i64 poison, i64 poison>, i64 42, i64 0
```
%x and %y are equivalent but, prior to this patch, MSan incorrectly computes the shadow of %x as <0, 0> rather than <0, -1>.
Define the __dmr1024 type used to manipulate the new DMR registers
introduced by the Dense Math Facility (DMF) on PowerPC, and add six
Clang builtins that correspond to the integer outer-product accumulate
to ACC PowerPC instructions:
* __builtin_mma_dmxvi8gerx4
* __builtin_mma_pmdmxvi8gerx4
* __builtin_mma_dmxvi8gerx4pp
* __builtin_mma_pmdmxvi8gerx4pp
* __builtin_mma_dmxvi8gerx4spp
* __builtin_mma_pmdmxvi8gerx4spp.
isComplete previously meant different things for different conversion
directions.
Refactored bytes_processed to bytes_stored which now consistently
increments on every push and decrements on pop making both directions
more consistent with each other
This patch disables unexpected_disabled_cpp17.verify.cpp under clang
modules builds because it changes diagnostics criteria post #143423,
causing the test to fail.
This patch follows a similar style to 853059a150.
This was found when working on trying to land #144033.
Allow the function definition line to match with and without attrbute
set number.
This fixes build break after PR144534:
https://lab.llvm.org/buildbot/#/builders/157/builds/31331
Also move the test to the OpenMP subdirectory where it should have
been from the beginning.
Address NFC mismatches caused by running perf2bolt from under the
wrapper script:
https://lab.llvm.org/buildbot/#/builders/92/builds/20938
> <stdin>:2:64: note: possible intended match here
>
/home/worker/bolt-worker2/bolt-x86_64-ubuntu-nfc/build/bin/llvm-bolt.old:
-spe is available only on AArch64.
Test Plan:
ninja check-bolt
…types usi… (#144676)"
This reverts commit 68471d29ee.
IntegralAP contains a union:
union {
uint64_t *Memory = nullptr;
uint64_t Val;
};
On 64bit systems, both Memory and Val have the same size. However, on 32
bit system, Val is 64bit and Memory only 32bit. Which means the default
initializer for Memory will only zero half of Val. We fixed this by
zero-initializing Val explicitly in the IntegralAP(unsigned BitWidth)
constructor.
See also the discussion in
https://github.com/llvm/llvm-project/pull/144246
When reducing v4f64/v8f32 non-lane crossing X86ISD::VPERMV nodes, we use X86ISD::VPERMILPV nodes for 128-bits, but these are only available for fp types.
Fixes#145046