Support the trivial "header"/source switch for module interfaces.
I initially thought the naming are bad and we should rename it. But
later I feel it is better to split patches as much as possible.
From the codes it looks like there are problems. e.g., `isHeaderFile`.
But let's try to fix them in different patches.
Many of the test files had an inconsistent formatting. This patch ran
clang-format over them using the project's .clang-format file, with
column limit = 0, to prevent test directives from being split over
multiple lines.
We use the term "interchangeable instructions" to refer to different
operators that have the same meaning (e.g., `add x, 0` is equivalent to
`mul x, 1`).
Non-constant values are not supported, as they may incur high costs with
little benefit.
---------
Co-authored-by: Alexey Bataev <a.bataev@gmx.com>
The last version of this patch had memory leaks due to using the
BumpPtrAllocator for data types that required destructors to run to
release heap memory (e.g. via std::vector and std::string). This version
avoids that by using smart pointers, and dropping support for
BumpPtrAllocator.
We should refactor this code to use the BumpPtrAllocator again, but that
can be addressed in future patches, since those are more invasive
changes that need to refactor many of the core data types to avoid
owning allocations.
Adds Support for the Mustache Templating Language. See specs here:
https://mustache.github.io/mustache.5.html This patch implements
support+tests for majority of the features of the language including:
- Variables
- Comments
- Lambdas
- Sections
This meant as a library to support places where we have to generate
HTML, such as in clang-doc.
Co-authored-by: Peter Chou <peter.chou@mail.utoronto.ca>
guard_acquire is a helper function used to implement TSan's
__cxa_guard_acquire and pthread_once interceptors.
https://reviews.llvm.org/D54664 introduced optional hooks to support
cooperative multi-threading. It worked by marking the entire
guard_acquire call as a potentially blocking region.
In principle, only the contended case needs to be a potentially blocking
region. This didn't matter for __cxa_guard_acquire because the compiler
emits an inline fast path before calling __cxa_guard_acquire. That is,
once we call __cxa_guard_acquire at all, we know we're in the contended
case.
https://reviews.llvm.org/D107359 then unified the __cxa_guard_acquire
and pthread_once interceptors, adding the hooks to pthread_once.
However, unlike __cxa_guard_acquire, pthread_once callers are not
expected to have an inline fast path. The fast path is inside the
function.
As a result, TSan unnecessarily calls into the cooperative
multi-threading engine on every pthread_once call, despite applications
generally expecting pthread_once to be fast after initialization. Fix
this by deferring the hooks to the contended case inside guard_acquire.
Summary:
Currently we use `TestBigEndian` in CMake to determine endianness. This
doesn't work on all platforms and is deprecated since CMake 3.20.
Instead of using CMake, we can just use the GNU/Clang preprocessor
definitions.
The only difficulty is MSVC, mostly because they don't support the same
macros. But, as far as I'm aware, MSVC / Windows targets are always
little endian, and if not we can just override it for that specific
target in the future.
This generalizes https://github.com/llvm/llvm-project/pull/131975 to non-32-bit Linux (i.e., 64-bit Linux).
This works around an edge case in 64-bit Linux, whereby the memory layout is incompatible if the stack size is unlimited AND ASLR entropy is 31+ bits (see https://github.com/google/sanitizers/issues/856#issuecomment-2747076811).
More generally, this "re-exec without ASLR if layout is incompatible" is a hammer that can work around most shadow mapping issues, without incurring the overhead of using a dynamic shadow.
We previously made an implementation error when adding half overloads
for HLSL library functionality. The half type is always defined in HLSL
and half intrinsics should not be conditionally included.
When native 16-bit types are disabled half is a unique 32-bit float type
with lesser promotion rank than float.
Apply pattern #81782 to intrinsics added in #95999.
Closes#132793
Moving builder classes into separate files
`HLSLBuiltinTypeDeclBuilder.cpp`/`.h`, changing a some
`HLSLExternalSemaSource` methods to private and removing unused methods.
This is a prep work before we start adding more builtin types and
methods, like textures, resource constructors or matrices. For example
constructors could make use of the `BuiltinTypeMethodBuilder`, but this
helper class was defined in `HLSLExternalSemaSource.cpp` after the
method that creates a constructor. Rather than reshuffling the code one
big source file I am moving the builders into a separate cpp & header
file and placing the helper classes declarations up top.
Currently the new header only exposes `BuiltinTypeDeclBuilder` to
`HLSLExternalSemaSource`. In the future but we might decide to expose
more helper classes as needed.
Handle dense resource attributes in the transpose TOSA folder.
Currently their interface does not align with the rest of the
`ElementsAttr` when it comes to data accessing hence the special
handling.
Signed-off-by: Georgios Pinitas <georgios.pinitas@arm.com>
This is similar to PR #132107 but for tests for sys/epoll.h functions.
ErrnoCheckingTest ensures that errno is properly reset at the beginning
of the test case, and is validated at the end of it, so that the manual
code such as the one proposed in PR #131650 would not be necessary.
I was looking at the value range of AcquireAtCycle / ReleaseAtCycle, and
I noticed that while the TableGen error messages said AcquireAtCycle has
to be less than ReleaseAtCycle, in reality they are actually allowed to
be the same. This patch fixes it and add more test cases.
- Add missing int16 extension for concat operator
- Remove int16 extension for cast operator
- Add pro_int and pro_fp profiles for const_shape operator
Signed-off-by: Jerry Ge <jerry.ge@arm.com>
DXC and the DXIL validator expect resources in a DX container to be
specifically ordered CBuffers, Samplers, SRVs, and then UAVs. Match this
behaviour so that we can pass the validator.
Fixes#130232.
This commit follows up 0191307b by auto-upgrading !"align" metadata on
return values to stackalign. This allows us to remove all logic to check
the metadata from NVPTXUtilities.
This gets the consumer fusion method in sync with the corresponding
producer fusion method `tileAndFuseProducerOfSlice`. Not taking this as
input required use of complicated analysis to retrieve the surrounding
loops which are very fragile. Just like the producer fusion method, the
loops need to be taken in as an argument, with typically the loops being
created by the tiling methods.
Some utilities are added to check that the loops passed in are perfectly
nested (in the case of an `scf.for` loop nest.
This is change 1 of N to simplify the implementation of tile and fuse
consumers.
---------
Signed-off-by: MaheshRavishankar <mahesh.ravishankar@gmail.com>
When a developer copy/pastes a failing command line into their
shell to rerun it, they have to manually delete the "RUN: at line
N:" prefix. To make life easier for such developers, let's make it
possible to copy/paste a command without needing to modify it while
still showing the line number in the output by moving the line number
to a comment at the end of the command line.
Reviewers: jroelofs, MaskRay
Reviewed By: jroelofs, MaskRay
Pull Request: https://github.com/llvm/llvm-project/pull/132485
Create a new operation `DSOLocalEquivalentOp`, following the steps of
other constants.
This is similar in a way to `AddressOfOp` but with specific semantics:
only support functions and function aliases (no globals) and extern_weak
linkage is not allowed.
An alternative approach is to use a new `UnitAttr` in `AddressOfOp` and
check that attribute to enforce specific semantics in the verifiers. The
drawback is going against what other constants do and having to add more
attributes in the future when we introduce `no_cfi`, `blockaddress`,
etc.
While here, improve the error message for other missing constants.
If we cannot find any lds DMA instruction that is aliased by some load
from lds, we will still insert vmcnt(0). This is overly cautious since
handling inter-thread dependences is normally managed by the memory
model instead of the waitcnt pass, so this change updates the behavior
to be more inline with how other types of memory events are handled.
Make sure we run each configuration once with the bytecode interpreter
and once with the current one. Add a triple to the one that was
previously without.
Previously `MCSchedModel::getReciprocalThroughput` ignored
`AcquireAtCycle` completey, this patch fixes it by using the largest
`(ReleaseAtCycle - AcquireAtCycle) / NumUnits` as inverse throughput.
Here are some technical explanations:
https://myhsu.xyz/llvm-sched-interval-throughput
---------
Co-authored-by: Julien Villette <julien.villette@sipearl.com>
Sometimes, there may be no matched anchors but the functions still
match. e.g. if the function’s template typename changes, all the
callsites that use the type are mismatched and the caller function that
contains those callsite are mismatched. Introduce a check to match the
functions if their demangled base names are the same.
Follow up to e4284a7c70 "[AMDGPU] 4-align SGPR triples".
Previously TTMP triples like ttmp[3:5] were aligned on a 3-TTMP boundary
which has no basis in hardware.
Aligning them on a 4-TTMP boundary matches what we do for SGPRs, which
reduces the number of extra register classes synthesized by TableGen,
bringing the total number down from 653 to 615.