This changes exposes a low-level helper that is used to implement
`forEachArgumentWithParamType` but can also be used without matchers,
e.g. if performance is a concern.
Commit f5ee10538b introduced a copy of the
implementation of the `forEachArgumentWithParamType` matcher that was
needed for optimizing performance of `-Wunsafe-buffer-usage`.
This change shares the code between the two so that we do not repeat
ourselves and any bugfixes or changes will be picked up by both
implementations in the future.
Fix#134356.
We accidentally skipped checking derived-to-base conversions because
deduction did not strip sugar in the relevant code. This caused
deduction failures when a parameter type had an attribute.
Improved "options" sections of various checks:
1. Added Options keyword to be a delimiter between "body" and "options"
parts of docs
2. Added default values where were absent.
3. Changed double-tick to single-tick in default values.
---------
Co-authored-by: EugeneZelenko <eugene.zelenko@gmail.com>
Details: detailed_command_telemetry (bool) and command_id (int) could
already be freed when the dispatcher's dtor runs. So we should just copy
them into the lambda since they are cheap.
- Moved existing llvm/test/CodeGen/X86/powi.ll file to
llvm/test/CodeGen/X86/powi-const.ll.
- Added new testcases for powi into llvm/test/CodeGen/X86/powi.ll.
The original code is essentially performing isel during legalisation
with the AArch64 specific nodes offering no additional value compared to
ISD::SETCC.
This patch updates flang to follow clang's behavior when processing the
`-mcode-object-version` option.
It is now used to populate an LLVM module flag called
`amdhsa_code_object_version` expected by the backend and also updates
the driver to add the `--amdhsa-code-object-version` option to the
frontend invocation for device compilation of AMDGPU targets.
Recently, we have added a set of complex intrinsics on
the TMA, tcgen05, and Cvt family of instructions.
This patch captures the key learnings from our experience
so far and documents them as guidelines for future design.
Signed-off-by: Durgadoss R <durgadossr@nvidia.com>
This PR will fix a bug in a canonicalization pattern (operation
shape.shape_of: shape of reshape)
```
// Before
func.func @f(%arg0: tensor<?x1xf32>, %arg1: tensor<3xi32>) -> tensor<3xindex> {
%reshape = tensor.reshape %arg0(%arg1) : (tensor<?x1xf32>, tensor<3xi32>) -> tensor<?x1x1xf32>
%0 = shape.shape_of %reshape : tensor<?x1x1xf32> -> tensor<3xindex>
return %0 : tensor<3xindex>
}
//This is will error out as follows:
error: 'tensor.cast' op operand type 'tensor<3xi32>' and result type 'tensor<3xindex>' are cast incompatible
%0 = shape.shape_of %reshape : tensor<?x1x1xf32> -> tensor<3xindex>
^
note: see current operation: %0 = "tensor.cast"(%arg1) : (tensor<3xi32>) -> tensor<3xindex>
```
```
// After
func.func @f(%arg0: tensor<?x1xf32>, %arg1: tensor<3xi32>) -> tensor<3xindex> {
%0 = arith.index_cast %arg1 : tensor<3xi32> to tensor<3xindex>
return %0 : tensor<3xindex>
}
```
See file canonicalize.mlir in the change list for an example.
For the context, this bug was found while running a test on Keras 3, the
canonicalizer errors out due to an invalid tensor.cast operation when
the batch size is dynamic.
The operands of the op are tensor<3xi32> cast to tensor<3xindex>.
This change is related to a previous PR:
https://github.com/llvm/llvm-project/pull/98531
---------
Co-authored-by: Alaa Ali <alaaali@ah-alaaali-l.dhcp.mathworks.com>
Co-authored-by: Mehdi Amini <joker.eph@gmail.com>
createCast in MergeFunctions did not consider ArrayTypes, which results
in the creation of a bitcast between ArrayTypes in the thunk function,
leading to an assertion failure in the provided test case.
The version of createCast in GlobalMergeFunctions does handle
ArrayTypes, so this common code has been factored out into the
IRBuilder.
Sometimes a non-array delete is treated as delete[] when input pointer
is pointer to array. With vector deleting destructors support we now
generate a virtual destructor call instead of simple loop over the
elements. This patch adjusts the codepath that generates virtual call to
expect the case of pointer to array.
Add the Data Inspection Language (DIL) implementation pieces for
handling plain local and global variable names.
See https://discourse.llvm.org/t/rfc-data-inspection-language/69893 for
information about DIL.
This change includes the basic AST, Lexer, Parser and Evaluator pieces,
as well as some tests.
Follow-up to #132003, in particular, see
https://github.com/llvm/llvm-project/pull/132003#issuecomment-2739701936.
This PR extends reduction support for `loop` directives. Consider the
following scenario:
```fortran
subroutine bar
implicit none
integer :: x, i
!$omp teams loop reduction(+: x)
DO i = 1, 5
call foo()
END DO
end subroutine
```
Note the following:
* According to the spec, the `reduction` clause will be attached to
`loop` during earlier stages in the compiler.
* Additionally, `loop` cannot be mapped to `distribute parallel for` due
to the call to a foreign function inside the loop's body.
* Therefore, `loop` must be mapped to `distribute`.
* However, `distribute` does not have `reduction` clauses.
* As a result, we have to move the `reduction`s from the `loop` to its
parent `teams` directive, which is what is done by this PR.
2b11c7de4a introduced
`llvm/include/llvm/MC/MCAsmLexer.h` and made `AsmLexer` inherit from
`MCAsmLexer`, likely to allow target-specific parsers to depend solely
on `MCAsmLexer`. However, this separation now seems unnecessary and
confusing.
`MCAsmLexer` defines virtual functions with `AsmLexer` as its only
implementation, and `AsmLexer` itself has few extra public methods.
To simplify the codebase, this change merges MCAsmLexer.{h,cpp} into
AsmLexer.{h,cpp}. MCAsmLexer.h is temporarily kept as a forwarder.
Note: I doubt that a downstream lexer handling an assembly syntax
significantly different from the standard GNU Assembler syntax would
want to inherit from `MCAsmLexer`. Instead, it's more likely they'd
extend `AsmLexer` by adding new states and modifying its internal logic,
as seen with variables for MASM, M68k, and HLASM.
This patch fixes the following two issues with the createCmpJE for
AArch64:
1. Avoids overwriting the value of the input register RegNo by use XZR
as the destination register.
subs xzr, RegNo, #Imm
which is equivalent to a simple
cmp RegNo, #Imm
2. The immediate operand to the Bcc instruction must be EQ instead of
#Imm.
This patch also adds a new function for createCmpJNE and unit tests for
the both createCmpJE and createCmpJNE for X86 and AArch64.
This patch adds support for parsing symbols in the Xqcibi branch
immediate instructions. While the 32 bit branch instructions use the
same instruction format and relocation as the existing branch
instructions in RISCV, the 48 bit ones use the `InstFormatQC_EB`
instruction format and the `R_RISCV_QC_E_BRANCH` relocation that is
defined in `BinaryFormat/ELFRelocs/RISCV_nonstandard.def.`
Vendor relocation support will be added in a later patch.
The test for utimes added in #134167 might fail if the file for one test
hasn't been cleaned up by the OS before the second test starts. This
patch makes the tests use different files.
Whether the SDK supports builtin modules is a property of the SDK
itself, and really has nothing to do with the target. This was already
worked around for Mac Catalyst, but there are some other more esoteric
non-obvious target-to-sdk mappings that aren't handled. Have the SDK
parse its OS out of CanonicalName and use that instead of the target to
determine if builtin modules are supported.
PTX source files are expected to only contain ASCII text
(https://docs.nvidia.com/cuda/parallel-thread-execution/#source-format) and no null terminators.
`ptxas` has so far not enforced this but is moving towards doing so.
This revealed a problem where the null terminator is getting printed out
in the output file in MLIR path when outputting ptx directly. Only add the null on the assembly output path for JIT instead of in output of `moduleToObject `.
debugserver isn't saving and restoring the SVE/SME register state around
inferior function calls.
Making arbitrary function calls while in Streaming SVE mode is generally
a poor idea because a NEON instruction can be hit and crash the
expression execution, which is how I missed this, but they should be
handled correctly if the user knows it is safe to do.
Re-landing this change after fixing an incorrect behavior on systems
without SME support.
rdar://146886210
We should have had a release note in LLVM 20 about implementing P2165R4
since that is technically an ABI and API break for zip_view. We don't
expect anyone to actually hit the ABI issue, but we've come across some
(fairly small) breakage due to the API change, so this should at least
be mentioned in the release notes.
This PR implements the nonstandard intrinsic time.
In addition to running the unit tests, I also double checked that the
example code works by manually compiling and running it.
Resolves#99221
Key points: For SPIRV backend, it decompose into a `dot` followed a
`add`.
- [x] Implement dot2add clang builtin,
- [x] Link dot2add clang builtin with hlsl_intrinsics.h
- [x] Add sema checks for dot2add to CheckHLSLBuiltinFunctionCall in
SemaHLSL.cpp
- [x] Add codegen for dot2add to EmitHLSLBuiltinExpr in CGBuiltin.cpp
- [x] Add codegen tests to clang/test/CodeGenHLSL/builtins/dot2add.hlsl
- [x] Add sema tests to clang/test/SemaHLSL/BuiltIns/dot2add-errors.hlsl
- [x] Create the int_dx_dot2add intrinsic in IntrinsicsDirectX.td
- [x] Create the DXILOpMapping of int_dx_dot2add to 162 in DXIL.td
- [x] Create the dot2add.ll and dot2add_errors.ll tests in
llvm/test/CodeGen/DirectX/
After 3801bf6164, SPIRVAnalysis needs to
include SPIRV.h provided by SPIRVCodegen, but the CodeGen target already
depends on Analysis, so that would cause a circular dependency.
Analysis is a subdirectory of CodeGen so it makes sense as a part of the
main CodeGen target too.