In addBranchWeightToMiddleTerminator we attempt to add branch weights to
the middle block terminator. We pessimistically assume vscale=1, whereas
we can improve the estimate by using the value of vscale used for
tuning.
This will convert loads of constant strings to immediate values. Put
this behind a flag that is enabled by default so that we can toggle it
if need be.
CTTZ/CTLZ_ZERO_UNDEF nodes can only create poison if the source value is zero - so check with isKnownNeverZero
Pulled out of #146361 and reapplied now that #146490 has landed.
Following on from #118638, this handles widened induction variables with
EVL tail folding by setting the VF operand to be EVL, calculated in the
vector body.
We need to do this for correctness since with EVL tail folding the
number of elements processed in the penultimate iteration may not be VF,
but the runtime EVL, and we need take this into account when updating
the backedge value.
- Because the VF may now not be a live-in we need to move the insertion
point to just after the VFs definition
- We also need to avoid truncating it when it's the same size as the
step type, previously this wasn't a problem for live-ins.
- Also because the VF may be smaller than the IV type, since the EVL is
always i32, we may need to zext it.
On -march=rva23u64 -O3 we get 87.1% more loops vectorized on TSVC, and
42.8% more loops vectorized on SPEC CPU 2017
When declaring multiple arrays of 1 ExaByte in a struct, the offset can
exceed 2EB, causing incorrect struct size reporting (only 1EB). This fix
ensures an error is thrown, preventing the generation of incorrect
assembly. #60272
This keeps getting forgotten (e.g. #66603) - so make a point of adding
it here to make it clear instead of relying on the implicit default of
returning true.
Previously, references to regions and successors were incorrectly disallowed outside the top-level assembly form. This change enables the use of bound regions and successors as variables in custom directives.
Although nice to have to prove the freeze can be moved, this can fail
immediately after freeze(op(...)) -> op(freeze(),freeze(),...) creation
if any of the new freeze nodes now prevents value tracking from seeing
through to the source values (e.g. shift amounts/element indices are in
bounds etc.).
This will allow us to remove the isGuaranteedNotToBeUndefOrPoison checks
inside canCreateUndefOrPoison that were discussed on #146361
Firstly, this commit requires that all types are signless in the strict
mode of the validation pass. This is because signless types on
operations are required by the TOSA specification. The "strict" mode in
the validation pass is the final check for TOSA conformance to the
specification, which can often be used for conversion to other formats.
In addition, a conversion pass `--tosa-convert-integer-type-to-signless`
is provided to allow a user to convert all integer types to signless.
The intention is that this pass can be run before the validation pass.
Following use of this pass, input/output information should be carried
independently by the user.
When compiling with `-march=armv9-a+nosve` we found that Clang still
defines the `__ARM_FEATURE_SVE2` macro, which is explicitly set in
`setArchFeatures` when compiling for armv9-a.
After some experimenting, I found out that the list of features passed
into `AArch64TargetInfo::handleTargetFeatures` has already been expanded
and takes into account `+no[feature]` and has already expanded features
like `armv9-a`.
From that I conclude that `setArchFeatures` is no longer required.
Verify that the alignments specified by clang TargetInfo match the
alignments specified by LLVM data layout, which will hopefully prevent
accidental mismatches in the future.
This currently contains opt-outs for a number of of existing mismatches.
I'm also skipping the verification if options like `-malign-double` are
used, or a language that mandates sizes/alignments that differ from C.
The verification happens in CodeGen, as we can't have an IR dependency
in Basic.
In OpenMP Version 5.1, the tile and unroll directives were added. When
using these directives, it is possible to nest them within other OpenMP
Loop Constructs. This patch enables the semantics to allow for this
behaviour on these specific directives. Any nested loops will be stored
within the initial Loop Construct until reaching the DoConstruct itself.
Relevant tests have been added, and previous behaviour has been retained
with no changes.
See also, #110008
For the checks whether certain intrinsics are used, work with intrinsic
IDs instead of intrinsic names.
This also exposes that some of the checks were incorrect, because the
intrinsics were overloaded. There is no efficient way to determine
whether these are used. This change explicitly documents which
intrinsics are not checked for this reason.
As the data layout a few lines further up specifies, the int, long and
pointer alignment should be 16 instead of the default of 32.
The long long alignment is also incorrect, but that would require a
change to the data layout as well.
Comparison with GCC, which consistently uses 2 byte alignment:
https://gcc.godbolt.org/z/K3x6a7dEf At least based on some spot checks,
the changes to bit field layout also make use match GCC now.
This was found by https://github.com/llvm/llvm-project/pull/144720.
They are left over from our previous attempt at DWARF64. The new attempt
is not using them, and they also don't have equivalents in the llvm
DWARFDataExtractor class.
We added LNLP sched model recently, PFM counter bounding names needs to
match cpu string.
llvm-exegesis wont produce results without correct naming.
Co-authored-by: mattarde <mattarde@intel.com>
reading, and one bug in the new RegisterContextUnifiedCore class.
The PR I landed a few days ago to allow Mach-O corefiles to augment
their registers with additional per-thread registers in metadata exposed
a few bugs in the x86_64 corefile reader when running under different CI
environments. It also showed a bug in my RegisterContextUnifiedCore
class where I wasn't properly handling lookups of unknown registers
(e.g. the LLDB_GENERIC_RA when debugging an intel target).
The Mach-O x86_64 corefile support would say that it had fpu & exc
registers available in every corefile, regardless of whether they were
actually present. It would only read the bytes for the first register
flavor in the LC_THREAD, the GPRs, but it read them incorrectly, so
sometimes you got more register context than you'd expect. The LC_THREAD
register context specifies a flavor and the number of uint32_t words;
the ObjectFileMachO method would read that number of uint64_t's,
exceeding the GPR register space, but it was followed by FPU and then
EXC register space so it didn't crash. If you had a corefile with GPR
and EXC register bytes, it would be written into the GPR and then FPU
register areas, with zeroes filling out the rest of the context.
The code following `llvm_unreachable` is optimized out in Release builds. In this case, `Embedder::create` do not seem to return `nullptr` causing `CreateInvalidMode` test to break. Hence removing `llvm_unreachable`.
This patch fixes:
llvm/unittests/Analysis/FunctionPropertiesAnalysisTest.cpp:132:12:
error: moving a local object in a return statement prevents copy
elision [-Werror,-Wpessimizing-move]
This change simplifies the API by removing the error handling complexity.
- Changed `Embedder::create()` to return `std::unique_ptr<Embedder>` directly instead of `Expected<std::unique_ptr<Embedder>>`
- Updated documentation and tests to reflect the new API
- Added death test for invalid IR2Vec kind in debug mode
- In release mode, simply returns nullptr for invalid kinds instead of creating an error
(Tracking issue - #141817)