This is the first one in a series of PRs adding the requirements for
#58654
This PR adds `ByteAddressBuffer`, `RWByteAddressBuffer ` and
`RasterizerOrderedByteAddressBuffer ` definitions as well as their
handle lowering to `dx.RawBuffer`.
closes#58654
---------
Co-authored-by: Joao Saffran <jderezende@microsoft.com>
This patch adds a helper function to replace an idiom like:
CallStackId CSId = hashCallStack(CallStack)
MemProfData.CallStacks.try_emplace(CSId, CallStack);
// Do something with CSId.
This adds support for these instructions and also tests getOperandInfo
for these instructions as well. I think the VL on the using add
instruction can be optimized further, once we add support for optimizing
non-vlmax.
This allows shared libraries instrumented with RTSan to be initialized.
This approach directly mirrors the approach in Tsan, Asan and many of
the other sanitizers
This fixes
https://discourse.llvm.org/t/fixed-register-being-spill-and-restored-in-clang/83058.
We need to do it in `MachineRegisterInfo::getCalleeSavedRegs` instead of
`RISCVRegisterInfo::getCalleeSavedRegs` since the MF argument of
`TargetRegisterInfo:::getCalleeSavedRegs` is `const`, so we can't call
`MF->getRegInfo().disableCalleeSavedRegister` there.
So to put it in `MachineRegisterInfo::getCalleeSavedRegs`, we move
`isRegisterReservedByUser` into `TargetSubtargetInfo`.
fd3907ccb5 introduced a check for system
offload-tblgen executable when doing a standalone build. This check is
bogus, since offload-tblgen is built as part of offload and not some
other preinstalled component. The path is also overwritten below, so the
check only causes tests to be disabled unnecessarily.
This PR adds all the missing semantics for the Linear clause based on
the OpenMP 5.2 restrictions. The restriction details are mentioned
below.
OpenMP 5.2:
5.4.6 linear Clause restrictions
- A linear-modifier may be specified as ref or uval only on a declare
simd directive.
- If linear-modifier is not ref, all list items must be of type integer.
- If linear-modifier is ref or uval, all list items must be dummy
arguments without the VALUE attribute.
- List items must not be Cray pointers or variables that have the
POINTER attribute. Cray pointer support has been deprecated.
- If linear-modifier is ref, list items must be polymorphic variables,
assumed-shape arrays, or variables with the ALLOCATABLE attribute.
- A common block name must not appear in a linear clause.
- The list-item cannot appear more than once
4.4.4 ordered Clause restriction
- If n is explicitly specified, a linear clause must not be specified on
the same directive.
5.11 aligned Clause restriction
- Each list item must have C_PTR or Cray pointer type or have the
POINTER or ALLOCATABLE attribute. Cray pointer support has been
deprecated.
…lauses
Currently we only do semantic checks for REDUCTION. There are two other
clauses, IN_REDUCTION, and TASK_REDUCTION which will also need those
checks. Implement a function that checks the common list-item
requirements for all those clauses.
SVE2.2 introduces instructions with predicated forms with zeroing of
the inactive lanes. This allows in some cases to save a `movprfx` or
a `mov` instruction when emitting code for `_x` or `_z` variants of
intrinsics.
This patch adds support for emitting the zeroing forms of certain
`FCVTZS`, and `FCVTZU` instructions.
Previously, this hit an `llvm_unreachable()` assertion as the type of
`vec_t` did not exactly match `__SVInt8_t`, as it was wrapped in a
typedef.
Comparing the canonical types instead allows the types to match
correctly and avoids the crash.
Fixes#107609
If the shuffle mask uses only indices from the second shuffle operand,
processShuffleMasks function misses it currently, which prevents correct
cost estimation in this corner case. To fix this, need to raise the
limit to 2 * VF rather than just VF and adjust processing
correspondingly. Will allow future improvements for 2 sources
permutations.
Reviewers: RKSimon
Reviewed By: RKSimon
Pull Request: https://github.com/llvm/llvm-project/pull/118972
Add the `__memprof_default_options_str` variable, initialized via the
`-memprof-runtime-default-options` LLVM flag, to hold the default options string
for memprof. This allows us to set these options during compile time in
the clang invocation.
Also update the docs to describe the various ways to set these options.
Similar to 'worker', the 'vector' clause has some rules that needed to
be applied on its argument legality that for combined constructs need to
look at the current construct, not the 'effective' parent construct.
Additionally, it has some interaction with `vector_length` that needed
to be encoded as well. This patch implements it.
getSExtValue assumes the result fits in 64 bits, but this may not be the
case for indcutions with wider types. Instead, directly perform the
compare on the APInt for the ConstantInt.
Fixes https://github.com/llvm/llvm-project/issues/118850.
If the root shuffle node was a VPERMV3 node, then we can always replace it with a new VPERMV3 node - it doesn't matter if other variable shuffles in the chain had multiple uses.
Otherwise, builds with `-DFLANG_CUF_RUNTIME` hits:
```
runtime/CUDA/descriptor.cpp:44:24: error: invalid use of incomplete type 'const class Fortran::runtime::Descriptor'
44 | std::size_t count{src->SizeInBytes()};
```
The main goal is to fold away wave64 code when compiled for wave32.
If we have out of bounds indexing, these will now clamp down to
a low bit which may CSE with the operations on the low half of the
wave.
Add test for existing loop unroll behaviour.
Current behaviour is the single loop with fmul gets runtime unrolled by
count of 4, with the loop remainder unrolled as the 3 for.body9.us.prol
sections. This is quite a lot of compare and branch, negating the
benefits of the low overhead loop mechanism.
Add integral types to ClangIR. These are the first ClangIR types, so the
change includes some infrastructure for managing ClangIR types.
So that the integral types can be used somewhere, generate ClangIR for
global variables using the new `cir.global` op. As with the current
support for functions, global variables are just a stub at the moment.
The only properties that global variables have are a name and a type.
Add a new ClangIR code gen test global-var-simple.cpp, which defines
global variables with most of the integral types.
(Part of upstreaming the ClangIR incubator project into LLVM.)
Split some headers into headers for public and private declarations in
preparation for #110217. Moving the runtime-private headers in
runtime-private include directory will occur in #110298.
* Do not use `sizeof(Descriptor)` in the compiler. The size of the
descriptor is target-dependent while `sizeof(Descriptor)` is the size of
the Descriptor for the host platform which might be too small when
cross-compiling to a different platform. Another problem is that the
emitted assembly ((cross-)compiling to the same target) is not identical
between Flang's running on different systems. Moving the declaration of
`class Descriptor` out of the included header will also reduce the
amount of #included sources.
* Do not use `sizeof(ArrayConstructorVector)` and
`alignof(ArrayConstructorVector)` in the compiler. Same reason as with
`Descriptor`.
* Compute the descriptor's extra flags without instantiating a
Descriptor. `Fortran::runtime::Descriptor` is defined in the runtime
source, but not the compiler source.
* Move `InquiryKeywordHashDecode` into runtime-private header. The
function is defined in the runtime sources and trying to call it in the
compiler would lead to a link-error.
* Move allocator-kind magic numbers into common header. They are the
only declarations out of `allocator-registry.h` in the compiler as well.
This does not make Flang cross-compile ready yet, the main goal is to
avoid transitive header dependencies from Flang to clang-rt. There are
more assumptions that host platform is the same as the target platform.
because:
- This brings Clang in line with GCC for which this is the default for ARM
- It frees up a register, so performance increase, especially on Thumb/6-M
- It will decrease code size
The existing exceptions tests for `vector<T>` have several issues: some
tests did not throw exceptions at all, making them not useful for
exception-safety testing, and some tests did not throw exceptions at the
intended points, failing to serve their expected purpose. This PR fixes
those tests for vector's constructors. Morever, this PR extracted common
classes and utilities into a separate header file, and renamed those
classes using more descriptive names.
Put _LIBCPP_NODEBUG on the new allocator trait aliases introduced in
https://github.com/llvm/llvm-project/pull/115654. This prevents a large
increase in the gdb_index size that was introduced by that PR.