`TotalRootEntryCount` captures how many times that root was entered - regardless if a profile was also collected or not (profile collection for a given root happens on only one thread at a time).
We don't do this in compiler_rt because the goal there is to flush out the data as fast as possible, so traversing and multiplying vectors is punted to the profile user.
We really just need to do this when flattening the profile so that the values across roots and flat profiles match. We could do it earlier, too - like when loading the profile - but it seems beneficial (at least for debugging) to keep the counter values the same as the loaded ones. We can revisit this later.
Update Sema Checking to always do an HLSL Array RValue cast in the case
we are dealing with hlsl constant array types
Instead of comparing canonical types, compare canonical unqualified
types
Add a test to show it is possible to assign an array from a cbuffer.
Closes#133767
- Add command line option `num-to-skip-size` to parameterize the size of
`NumToSkip` bytes in the decoder table. Default value will be 2, and
targets that need larger size can use 3.
- Keep all existing targets, except AArch64, to use size 2, and change
AArch64 to use size 3 since it run into the "disassembler decoding table
too large" error with size 2.
- Additional fixes on top of earlier revert: mark `decodeNumToSkip` as
static (not necessary anymore as the generated code is now in anonymous
namespace, but doing it for consistency) and incorporate Bazel build
changes from https://github.com/llvm/llvm-project/pull/136212
- Following is a rough reduction in size for the decoder tables by
switching to size 2.
```
Target Old Size New Size % Reduction
================================================
AArch64 153254 153254 0.00
AMDGPU 471566 412805 12.46
ARC 5724 5061 11.58
ARM 84936 73831 13.07
AVR 1497 1306 12.76
BPF 2172 1927 11.28
CSKY 10064 8692 13.63
Hexagon 47967 41965 12.51
Lanai 1108 982 11.37
LoongArch 24446 21621 11.56
MSP430 4200 3716 11.52
Mips 36330 31415 13.53
PPC 31897 28098 11.91
RISCV 37979 32790 13.66
Sparc 8331 7252 12.95
SystemZ 36722 32248 12.18
VE 48296 42873 11.23
XCore 2590 2316 10.58
Xtensa 3827 3316 13.35
```
Fixes#79893
---
This PR addresses the issue of _attributes_ being incorrectly allowed on
`extern template` declarations
```cpp
[[deprecated]] extern template struct S<int>;
```
fixes#135654
In #128613 we added safe guards to prevent the lowering of just any
intrinsic in the backend. We used `DiagnosticInfoUnsupported` to do
this.
What we found was when using `opt` the diagnostic print function was
called but when using clang the diagnostic message was used.
Printing message in the clang version means we miss valuable debugging
information like function name and function type when LLVMContext was
only needed to call `getBestLocationFromDebugLoc`.
There are a few potential fixes
1. Write a custom DiagnosticInfoUnsupported so we can change the Message
just for DirectX. Too heavy handed so rejected.
2. Add the function name to the Message in DirectX code. Very simple one
line change. Downside is when using opt you see the function name twice.
But makes the clang-dxc bugs more actionable.
3. change CodeGenAction.cpp to always use the print function and not the
message directly. Downside is a bunch of innacurate information shows up
in the message if you don't specify `-debug-info-kind=standalone`.
4. add some book keeping to know which function called the intrinsic
keep a map of these so we can pass the calling function to
`DiagnosticInfoUnsupported` instead of the intrinsic. This would only be
useful if we had debug info so we could distinguish different uses of
the intrinsic by line\col number. We would also need to change from
iterating on every function to doing something like a LazyCallGraph
which is a nonstarter.
5. pick a different means of doing a Diagnostic error, because other
uses of `DiagnosticInfoUnsupported` error when we are in the body of a
function not when we see one being used like in the intrinsic case.
This PR went with a combo of option 2 & 5. Its low code change that also
only impacts the DirectX backend.
The PR will fix the issue
https://github.com/llvm/llvm-project/issues/122728
This patch addresses the signed/zero extension of poison by using a
poison value of the extended type instead of a constant zero of the
extended type.
This pattern used to create an `llvm.func` op, then check additional
requirements and return "failure". This commit moves the checks before
the creation of the replacement op, so that no rollback is necessary
when one of the checks fails.
Note: This is in preparation of the One-Shot Dialect Conversion
refactoring, which removes the rollback functionality.
This fixes a crash reported at
https://github.com/llvm/llvm-project/pull/114250#issuecomment-2813686061
If the vector type isn't legal at all, e.g. bfloat with +zvfbfmin,
then the legalized type will be scalarized. So use getScalarType()
instead of getVectorElement() when checking for f16/bf16.
This patch removes UB from some tests for MachinePipeliner. This patch
fixes following cases.
- Branching on an `undef` value.
- Using `undef`/`null` as a pointer operand of a load/store.
There are other tests of pipeliner that contain the same UB, but for
now, this patch fixes particularly unstable cases when I developed
pipeliner.
This likely does not alter much yet with how the costs are used. Like
other cost functions the CostKind should be passed into and through the
function.
I'm trying to put together an LLVM built toolchain (including LLVM libc)
targeting UEFI, currently I get an error saying "Unknown target". This
PR enables compiling compiler-rt for UEFI.
Very simply extends the bitfield sema checks for assignment to fields
with a preferred type specified to consider the preferred type if the
decl storage type is not explicitly an enum type.
This does mean that if the preferred and explicit types have different
storage requirements we may not warn in all possible cases, but that's a
scenario for which the warnings are much more complex and confusing.
As reported in #135665, C++20 parenthesis initializer list expressions
are not handled correctly and were causing crashes. This commit attempts
to fix the issue by handing parenthesis initializer lists along side
existing initializer lists.
Fixes#135665.
This commit moves code around: The helper functions/classes are moved
into `MemRefToLLVM.cpp`. This simplifies the code a bit: fewer
templatized functions, fewer function calls, fewer lines of code.
This commit also moves checks in `matchAndRewrite` to the beginning of
the functions, such that patterns bail out (`return failure()`) before
starting to modify any IR. (Apart from that, this change is NFC.) This
is in preparation of the One-Shot Dialect Conversion refactoring, which
will disallow pattern rollbacks.
These functions are called from lowering patterns. All IR modifications
in a pattern must be performed through the provided rewriter, but these
functions used to instantiate a new `OpBuilder`, bypassing the provided
rewriter.