Otherwise if the source tree is embedded in another project with a
.clang-format-ignore, some clang-format tests fail because they use that
.clang-format-ignore.
Previously we had one loop over the DAG for immediates and registers and
another loop over the destination operands for mapping from the source.
Now we have a single loop over the destination operands that handles immediates,
registers, and named operands. A helper method is added so we can handle
operands and sub-operands specified by a sub-dag.
My goal is to allow a named operand to appear in a sub-dag which wasn't
supported before. This will allow the destination instruction to have an
operand with sub-operands when the source does not have sub operands.
For RISC-V, I'm looking into using an operand with sub-operands to
represent an reg+offset memory address. I need to be able to lower a
pseudo instruction that only has a register operand to an instruction
that has a reg+offset operand. The offset will be filled in with 0
during expansion and the register will be copied from the source.
The expansion would look like this:
def PseudoCALLIndirect : Pseudo<(outs), (ins GPRJALR:$rs1),
[(riscv_call GPRJALR:$rs1)]>,
PseudoInstExpansion<(JALR X1, (ops GPR:$rs1, 0))>;
Implicit bindings will cause very confusing crashes in the backend at
present, so this is intended at least partially as a stop gap until we
get them implemented (see #110722).
However, I do think that this is useful in the longer term as well as an
off-by-default warning, as it is quite easy to miss a binding or two
when using explicit bindings and the results of that can be surprisingly
hard to debug. I've filed #135907 to track turning this into an
off-by-default warning or removing it eventually as we see fit.
Add verifier check for Slice Op to make sure input1 and output have same
ranks.
Added test in verifier.mlir
Also moved existing slice verifier tests in invalid.mlir to verfier.mlir
Signed-off-by: Tai Ly <tai.ly@arm.com>
Fixed-order recurrence phis cannot be forced to be scalar, they will
always be widened at the moment.
Make sure we don't add them to ForcedScalars, otherwise the legacy cost
model will compute incorrect costs.
This fixes an assertion reported with
https://github.com/llvm/llvm-project/pull/129645.
VPInterleavedAccessInfo has a defined destructor freeing memory, but no
explicitly defined copy constructor or copy assignment op. These are not
used, so this patch marks them as deleted to avoid usage of the
implicitly defined implementations.
The function base name could be way long which overflows and leads to a
crash. Update to extend the max size.
Also changed to use heap allocation( `std::vector<char>` ) to avoid
stack overflow.
- Add command line option `num-to-skip-size` to parameterize the size of
`NumToSkip` bytes in the decoder table. Default value will be 2, and
targets that need larger size can use 3.
- Keep all existing targets, except AArch64, to use size 2, and change
AArch64 to use size 3 since it run into the "disassembler decoding table
too large" error with size 2.
- Following is a rough reduction in size for the decoder tables by
switching to size 2.
```
Target Old Size New Size % Reduction
================================================
AArch64 153254 153254 0.00
AMDGPU 471566 412805 12.46
ARC 5724 5061 11.58
ARM 84936 73831 13.07
AVR 1497 1306 12.76
BPF 2172 1927 11.28
CSKY 10064 8692 13.63
Hexagon 47967 41965 12.51
Lanai 1108 982 11.37
LoongArch 24446 21621 11.56
MSP430 4200 3716 11.52
Mips 36330 31415 13.53
PPC 31897 28098 11.91
RISCV 37979 32790 13.66
Sparc 8331 7252 12.95
SystemZ 36722 32248 12.18
VE 48296 42873 11.23
XCore 2590 2316 10.58
Xtensa 3827 3316 13.35
```
This moves the utility that propagates counter values such that we can reuse it elsewhere. Specifically, in a subsequent patch, it'll be used to guide ICP: we need to prioritize promoting indirect calls that dominate larger portions of the dynamic instruction count. We can compare them based on the dynamic count of IR instructions, and we can get that early with this counter propagation logic.
The patch is mostly a move of the existing logic, with a pimpl - style implementation to hide all the current complexity.
Need to pre-cache last instruction to avoid unexpected changes in the
last instruction detection during the vectorization, caused by adding
the new vector instructions, which add new uses and may affect the
analysis.
Running `clang-dxc` with textual output was emitting various spurious
warnings (if `dxv` wasn't on your path) or errors (if it was). Avoid
these by not attempting to run this tool when it doesn't make sense to
do so.
Fixes#135874.
- _Float16 is now accepted by Clang.
- The half IR type is fully handled by the backend.
- These values are passed in FP registers and converted to/from float around
each operation.
- Compiler-rt conversion functions are now built for s390x including the missing
extendhfdf2 which was added.
Fixes#50374
At the time of instrumentation (and instrumentation lowering), `noreturn` is not applied uniformously. Rather than running `FunctionAttrs` pass, we just need to use `llvm::canReturn` exposed in PR #135650
This adds basic support for populating record types. In order to keep
the change small, everything non-essential was deferred to a later
change set. Only non-recursive structures are handled. Structures
padding is not yet implemented. Bitfields are not supported. No attempt
is made to handle ABI requirements for passing structure arguments.
We emit a macro definition only in a module defining it. But it means
that if another module has an identifier with the same name as the
macro, the users of such module won't be able to use the macro anymore.
Fix by storing that an identifier has a macro definition that's not in a
current module (`MacroDirectivesOffset == 0`). This way
`IdentifierLookupVisitor` knows not to stop at the first module with an
identifier but to keep checking included modules for the actual macro
definition.
Fixes issue #32040.
rdar://30258278
The address space of a source value for an implicit cast isn't really
relevant when emitting conversion warnings. Since the lvalue->rvalue
cast effectively removes the address space they don't factor in, but
they do create visual noise in the diagnostics.
This is a small quality-of-life fixup to get in as HLSL adopts more
address space annotations.
In addition to the new folder, I've also a test for broadcast(splat) ->
splat which I think was missing
Signed-off-by: James Newling <james.newling@gmail.com>
This implements the same overload resolution behavior as GCC,
as described in https://wg21.link/p3606 (sections 1-2, not 3)
If, during overload resolution, a non-template candidate is always
picked because each argument is a perfect match (i.e., the source and
target types are the same), we do not perform deduction for any template
candidate that might exist.
The goal is to be able to merge #122423 without being too disruptive.
This change means that the selection of the best viable candidate and
template argument deduction become interleaved.
To avoid rewriting half of Clang, we store in `OverloadCandidateSet`
enough information to deduce template candidates from
`OverloadCandidateSet::BestViableFunction`. This means the lifetime of
any object used by the template argument must outlive a call to
`Add*Template*Candidate`.
This two-phase resolution is not performed for some initialization as
there are cases where template candidates are a better match per the
standard. It's also bypassed for code completion.
The change has a nice impact on compile times
https://llvm-compile-time-tracker.com/compare.php?from=edc22c64e527171041876f26a491bb1d03d905d5&to=8170b860bd4b70917005796c05a9be013a95abb2&stat=instructions%3AuFixes#62096Fixes#74581Fixes#53454
This PR removes one OpRewritePattern `shape_cast(shape_cast(x)) -> x`
that is already handled by `ShapeCastOp::fold`.
Note that this might affect downstream users who indirectly call
`populateShapeCastFoldingPatterns(RewritePatternSet &patterns,
PatternBenefit)` and then use `patterns` with a `GreedyRewriteConfig
config` that has `config.fold = false`. (only user I've checked is IREE,
that never uses config.fold = false).
Example seen in the 'real world':
```
%0 = vector.broadcast %arg0 : vector<1xi8> to vector<1x8xi8>
%1 = vector.transpose %0, [1, 0] : vector<1x8xi8> to vector<8x1xi8>
```
This PR adds a canonicalizer that rewrites the above as
```
%1 = vector.broadcast %arg0 : vector<1xi8> to vector<8x1xi8>
```
It works by determining if a transpose is only shuffling contiguous
broadcast dimensions.