'num_gangs', 'num_workers', and 'vector_length' are similar to
eachother, and are all the same implementation as for compute
constructs, so this patch just enables them and adds the necessary
testing to ensure they work correctly. These will get more complicated
when they get combined with 'gang', 'worker', 'vector' and 'reduction',
but those restrictions will be implemented when those clauses are
enabled.
RelExpr enumerators are named `R_*`, which can be confused with ELF
relocation type names. Rename the target-specific ones to `RE_*` to
avoid confusion.
For consistency, the target-independent ones can be renamed as well, but
that's not urgent. The relocation processing mechanism with RelExpr has
non-trivial overhead compared with mold's approach, and we might make
more code into Arch/*.cpp files and decrease the enumerators.
Pull Request: https://github.com/llvm/llvm-project/pull/118424
This PR allows out-of-tree dialects to write Python dialect modules
using nanobind instead of pybind11.
It may make sense to migrate in-tree dialects and some of the ODS Python
infrastructure to nanobind, but that is a topic for a future change.
This PR makes the following changes:
* adds nanobind to the CMake and Bazel build systems. We also add
robin_map to the Bazel build, which is a dependency of nanobind.
* adds a PYTHON_BINDING_LIBRARY option to various CMake functions, such
as declare_mlir_python_extension, allowing users to select a Python
binding library.
* creates a fork of mlir/include/mlir/Bindings/Python/PybindAdaptors.h
named NanobindAdaptors.h. This plays the same role, using nanobind
instead of pybind11.
* splits CollectDiagnosticsToStringScope out of PybindAdaptors.h and
into a new header mlir/include/mlir/Bindings/Python/Diagnostics.h, since
it is code that is no way related to pybind11 or for that matter,
Python.
* changed the standalone Python extension example to have both pybind11
and nanobind variants.
* changed mlir/python/mlir/dialects/python_test.py to have both pybind11
and nanobind variants.
Notes:
* A slightly unfortunate thing that I needed to do in the CMake
integration was to use FindPython in addition to FindPython3, since
nanobind's CMake integration expects the Python_ names for variables.
Perhaps there's a better way to do this.
Downstream builders are having issues with this local include. Use a
fuller
path that's more standard throughout the codebase.
Also some of the comments in the bazel overlay are stale. Remove them.
Reported-by: Brooks Moses <bmoses@google.com>
This is another attempt to reland the changes from #108413
The previous two attempts introduced regressions and were reverted. This
PR has been more thoroughly tested with various configurations so
shouldn't cause any problems this time. If anyone is aware of any likely
remaining problems then please let me know.
The changes are identical other than the fixes contained in the last 5
commits.
---
### New API
Previous discussions at the LLVM/Offload meeting have brought up the
need for a new API for exposing the functionality of the plugins. This
change introduces a very small subset of a new API, which is primarily
for testing the offload tooling and demonstrating how a new API can fit
into the existing code base without being too disruptive. Exact designs
for these entry points and future additions can be worked out over time.
The new API does however introduce the bare minimum functionality to
implement device discovery for Unified Runtime and SYCL. This means that
the `urinfo` and `sycl-ls` tools can be used on top of Offload. A
(rough) implementation of a Unified Runtime adapter (aka plugin) for
Offload is available
[here](https://github.com/callumfare/unified-runtime/tree/offload_adapter).
Our intention is to maintain this and use it to implement and test
Offload API changes with SYCL.
### Demoing the new API
```sh
# From the runtime build directory
$ ninja LibomptUnitTests
$ OFFLOAD_TRACE=1 ./offload/unittests/OffloadAPI/offload.unittests
```
### Open questions and future work
* Only some of the available device info is exposed, and not all the
possible device queries needed for SYCL are implemented by the plugins.
A sensible next step would be to refactor and extend the existing device
info queries in the plugins. The existing info queries are all strings,
but the new API introduces the ability to return any arbitrary type.
* It may be sensible at some point for the plugins to implement the new
API directly, and the higher level code on top of it could be made
generic, but this is more of a long-term possibility.
A performance issue was descibed in #114521
**Root Cause**: The function getExpansionLocOfMacro is responsible for
finding the expansion location of macros. When dealing with macro
parameters, it recursively calls itself to check the expansion of macro
arguments. This recursive logic redundantly checks previous macro
expansions, leading to significant performance degradation when macros
are heavily nested.
Solution
**Modification**: Track already processed macros during recursion.
Implementation Details:
Introduced a data structure to record processed macros.
Before each recursive call, check if the macro has already been
processed to avoid redundant calculations.
**Testing**:
1. refer to #114521
Finder->addMatcher(expr(isExpandedFromMacro("NULL")).bind("E"), this);
run clang-tidy on freecad/src/Mod/Path/App/AreaPyImp.cpp
clang-tidy src/Mod/Path/App/AreaPyImp.cpp -checks=-*,testchecker
-p=build/compile_commands.json
checker runs normally
2. check-clang-unit pass
This PR adds more realistic cost estimates for these reduction
intrinsics
- `llvm.vector.reduce.umax`
- `llvm.vector.reduce.umin`
- `llvm.vector.reduce.smax`
- `llvm.vector.reduce.smin`
- `llvm.vector.reduce.fadd`
- `llvm.vector.reduce.fmul`
- `llvm.vector.reduce.fmax`
- `llvm.vector.reduce.fmin`
- `llvm.vector.reduce.fmaximum`
- `llvm.vector.reduce.fminimum`
- `llvm.vector.reduce.mul
`
The pre-existing cost estimates for `llvm.vector.reduce.add` are moved
to `getArithmeticReductionCosts` to reduce complexity in
`getVectorIntrinsicInstrCost` and enable other passes, like the SLP
vectorizer, to benefit from these updated calculations.
These are not expected to provide noticable performance improvements and
are rather provided for the sake of completeness and correctness. This
PR is in draft mode pending benchmark confirmation of this.
This also provides and/or updates cost tests for all of these
intrinsics.
This PR was co-authored by me and @JonPsson1 .
Use references instead of pointers for most state, initialize it all in
the constructor, and common up some of the initialization between the
legacy and new pass manager paths.
This PR fixes:
* emission of OpNames (added newly inserted internal intrinsics and
basic blocks)
* emission of function attributes (SRet is added)
* implementation of SPV_INTEL_optnone so that it emits OptNoneINTEL
Function Control flag, and add implementation of the SPV_EXT_optnone
SPIR-V extension.
This change doesn't introduce any functional differences but aligns the
implementation more closely with LLVM's representation. Previously, the
code generated a lookup table to map MLIR enums to LLVM enums due to the
lack of one-to-one correspondence. With this refactoring, the generated
code now casts directly from one enum to another.
This PR resolved the following issues:
(1) There are rare but possible cases when there are bitcasts between
pointers intertwined in a sophisticated way with loads, stores, function
calls and other instructions that are part of type deduction. In this
case we must account for inserted bitcasts between pointers rather than
just ignore them.
(2) Null pointers have the same constant representation but different
types. Type info from Intrinsic::spv_track_constant() refers to the
opaque (untyped) pointer, so that each MF/v-reg pair would fall into the
same Const record in Duplicate Tracker and would be represented by a
single OpConstantNull instruction, unless we use precise pointee type
info. We must be able to distinguish one constant (null) pointer from
another to avoid generating invalid code with inconsistent types of
operands.
Check for nusw instead of inbounds when decomposing GEPs.
In this particular case, we can also look through multiple nusw
flags, because we will ultimately be working in the unsigned
constraint system.
Introduce a general recipe to generate a scalar phi. Lower
VPCanonicalIVPHIRecipe and VPEVLBasedIVRecipe to VPScalarIVPHIrecipe
before plan execution, avoiding the need for duplicated ::execute
implementations. There are other cases that could benefit, including
in-loop reduction phis and pointer induction phis.
Builds on a similar idea as
https://github.com/llvm/llvm-project/pull/82270.
PR: https://github.com/llvm/llvm-project/pull/114305
.. in the global namespace
The problem was the interaction of #116989 with an optimization in
GetTypesWithQuery. The optimization was only correct for non-exact
matches, but that didn't matter before this PR due to the "second layer
of defense". After that was removed, the query started returning more
types than it should.
Unlike the previous version
(https://github.com/llvm/llvm-project/pull/114978), this patch also
removes an unnecessary assert that causes Clang to crash when compiling
such tests. (clang/lib/AST/DeclCXX.cpp)
https://lab.llvm.org/buildbot/#/builders/52/builds/4021
```c++
template <class T>
class X {
public:
X() = default;
virtual ~X() = default;
virtual int foo(int x, int y, T &entry) = 0;
void bar() {
struct Y : public X<T> {
Y() : X() {}
int foo(int, int, T &) override {
return 42;
}
};
}
};
```
the assertions:
```c++
llvm-project/clang/lib/AST/DeclCXX.cpp:2508: void clang::CXXMethodDecl::addOverriddenMethod(const CXXMethodDecl *): Assertion `!MD->getParent()->isDependentContext() && "Can't add an overridden method to a class template!"' failed.
```
I believe that this assert is unnecessary and contradicts the logic of
this patch. After its removal, Clang was successfully built using
itself, and all tests passed.
Introduce llvm::CmpPredicate, an abstraction over a floating-point
predicate, and a pack of an integer predicate with samesign information,
in order to ease extending large portions of the codebase that take a
CmpInst::Predicate to respect the samesign flag.
We have chosen to demonstrate the utility of this new abstraction by
migrating parts of ValueTracking, InstructionSimplify, and InstCombine
from CmpInst::Predicate to llvm::CmpPredicate. There should be no
functional changes, as we don't perform any extra optimizations with
samesign in this patch, or use CmpPredicate::getMatching.
The design approach taken by this patch allows for unaudited callers of
APIs that take a llvm::CmpPredicate to silently drop the samesign
information; it does not pose a correctness issue, and allows us to
migrate the codebase piece-wise.
Unsigned icmp of gep nuw folds to unsigned icmp of offsets. Unsigned
icmp of gep nusw nuw folds to unsigned samesign icmp of offsets.
Proofs: https://alive2.llvm.org/ce/z/VEwQY8
Fixes an issue introduced by
9fb2db1e1f [flang] Retain spaces when preprocessing fixed-form source
Where flang -fc1 fails to parse preprocessor output because it now
remains in fixed form.
Re-land of #116636
Adds a new address spaces: hlsl_private. Variables with such address
space will be emitted with a Private storage class.
This is useful for variables global to a SPIR-V module, since up to now,
they were still emitted with a Function storage class, which is wrong.
---------
Signed-off-by: Nathan Gauër <brioche@google.com>
* Adds tests for strided accesses.
* Adds tests for reverse loops.
As part of this I've moved one of the negative tests from
load-deref-pred-align.ll into a new file
(load-deref-pred-neg-off.ll) because the pointer type had a
size of 16 bits and I realised it's probably not sensible for
allocas that are >16 bits in size!
Currently FastISel triggers a fallback if there is an unreachable
terminator and the TrapUnreachable option is enabled (the ISD::TRAP
selection does not actually work).
Add handling for NoTrapAfterNoReturn, in which case we don't actually
need to emit a trap. The test is just there to make sure there is no
FastISel fallback (which is why I'm not testing the case without
noreturn). We have other tests that check the actual unreachable codegen
variations.
This reverts commit 2526d5b168, reapplying
ba14dac481 after fixing the conflict with
#117532. The change is that Function::GetAddressRanges now recomputes
the returned value instead of returning the member. This means it now
returns a value instead of a reference type.
When merging STG instructions used for AArch64 stack tagging, we were
stopping on reaching a load or store instruction, but not calls, so it
was possible for an STG to be moved past a call to memcpy.
This test case (reduced from fuzzer-generated C code) was the result of
StackColoring merging allocas A and B into one stack slot, and
StackSafetyAnalysis proving that B does not need tagging, so we end up
with tagged and untagged objects in the same stack slot. The tagged
object (A) is live first, so it is important that it's memory is
restored to the background tag before it gets reused to hold B.
Reverts llvm/llvm-project#110898
This change has caused a cyclic module dependency `fatal error: cyclic
dependency in module 'LLVM_Utils': LLVM_Utils ->
LLVM_Config_ABI_Breaking -> LLVM_Utils`. Reverting for now until we the
right fix.