Remove these intrinsics which can be better represented by load
instructions with `!invariant.load` metadata:
- llvm.nvvm.ldg.global.i
- llvm.nvvm.ldg.global.f
- llvm.nvvm.ldg.global.p
Currently, for AMDGPU, when compiling for OpenCL, we unconditionally use
`private` as the default address space. This is wrong for cases where
the `generic` address space is available, and is corrected via this
patch. In general, this AS map abuse is a bad hack and we should re-work
it altogether, but at least after this patch we will stop being
incorrect for e.g. OpenCL 2.0.
The whole struct is specificed in the __bdos. The calculation of the
whole size of the structure can be done in two ways:
1) sizeof(struct S) + count * sizeof(typeof(fam))
2) offsetof(struct S, fam) + count * sizeof(typeof(fam))
The first will add any remaining whitespace that might exist after
allocation while the second method is more precise, but not quite
expected from programmers. See [1] for a discussion of the topic.
GCC isn't (currently) able to calculate __bdos on a pointer to the whole
structure. Therefore, because of the above issue, we'll choose to match
what GCC does for consistency's sake.
[1] https://lore.kernel.org/lkml/ZvV6X5FPBBW7CO1f@archlinux/
Co-authored-by: Eli Friedman <efriedma@quicinc.com>
Commit 84ee629bc5 ("clang: Remove some pointer bitcasts (#112324)",
2024-10-15) triggered some "Call parameter type does not match function
signature!" errors when using the OpenCL pipe builtin functions under
the spir triple, due to a missing addrspacecast.
This would have been caught by the pipe_builtin.cl test if that had used
the `spir-unknown-unknown` triple, so extend the test to use that
triple too.
- create a clang built-in in Builtins.td
- add semantic checking in SemaHLSL.cpp
- link the WaveReadLaneAt api in hlsl_intrinsics.h
- add lowering to spirv backend op GroupNonUniformShuffle
with Scope = 2 (Group) in SPIRVInstructionSelector.cpp
- add WaveReadLaneAt intrinsic to IntrinsicsDirectX.td and mapping
to DXIL.td
- add tests for HLSL intrinsic lowering to spirv intrinsic in
WaveReadLaneAt.hlsl
- add tests for sema checks in WaveReadLaneAt-errors.hlsl
- add spir-v backend tests in WaveReadLaneAt.ll
- add test to show scalar dxil lowering functionality
- note that this doesn't include support for the scalarizer to
handle WaveReadLaneAt will be added in a future pr
This is the first part #70104
Rename the function to reflect its correct behavior and to be consistent
with `Module::getOrInsertFunction`. This is also in preparation of
adding a new `Intrinsic::getDeclaration` that will have behavior similar
to `Module::getFunction` (i.e, just lookup, no creation).
- add degrees builtin
- link degrees api in hlsl_intrinsics.h
- add degrees intrinsic to IntrinsicsDirectX.td
- add degrees intrinsic to IntrinsicsSPIRV.td
- add lowering from clang builtin to dx/spv intrinsics in CGBuiltin.cpp
- add semantic checks to SemaHLSL.cpp
- add expansion of directx intrinsic to llvm fmul for DirectX in
DXILIntrinsicExpansion.cpp
- add mapping to spir-v intrinsic in SPIRVInstructionSelector.cpp
- add test coverage:
- degrees.hlsl -> check hlsl lowering to dx/spv degrees intrinsics
- degrees-errors.hlsl/half-float-only-errors -> check semantic warnings
- hlsl-intrinsics/degrees.ll -> check lowering of spir-v degrees
intrinsic to SPIR-V backend
- DirectX/degrees.ll -> check expansion and scalarization of directx
degrees intrinsic to fmul
Resolves#99104
- add additional lowering for directx backend in CGBuiltin.cpp
- add directx intrinsic to IntrinsicsDirectX.td
- add semantic check of arguments in SemaHLSL.cpp
- add mapping to DXIL op in DXIL.td
- add testing of semantics in WaveGetLaneIndex-errors.hlsl
- add testing of dxil lowering in WaveGetLaneIndex.ll
Resolves#70105
This change is part of this proposal:
https://discourse.llvm.org/t/rfc-all-the-math-intrinsics/78294
- Add HLSL frontend for atan2
- Add clang Builtin, map to new llvm.atan2
- SemaChecking restrict to floating point and 2 args
- SemaHLSL restrict to float or half.
- Add to clang ReleaseNotes.rst and LanguageExtensions.rst
- Add half-float-only-errors2.hlsl for 2 arg intrinsics, and update half-float-only-errors.hlsl with scalar case for consistency
- Remove fmod-errors.hlsl and pow-errors.hlsl now covered in half-float-only-errors2.hlsl
Part 3 for Implement the atan2 HLSL Function #70096.
This change add the elementwise fmod builtin to support HLSL function
'fmod' in clang for #99118
Builtins.td - add the fmod builtin
CGBuiltin.cpp - lower the builtin to llvm FRem instruction
hlsl_intrinsics.h - add the fmod api
SemaChecking.cpp - add type checks for builtin
SemaHLSL.cpp - add HLSL type checks for builtin
clang/docs/LanguageExtensions.rst - add the builtin in *Elementwise
Builtins*
clang/docs/ReleaseNotes.rst - announce the builtin
I missed a codepath during PR108008 so SME2/SVE2p1 builtins are
converting their struct return type into a large vector, which is
causing unnecessary casting via memory.
On some targets, an FP libcall with argument types such as long double
will be lowered to pass arguments indirectly via pointers. When this is
the case we should not mark the libcall with "int" TBAA as it may lead
to incorrect optimizations.
Currently, this can be seen for long doubles on x86_64-w64-mingw32. The
`load x86_fp80` after the call is (incorrectly) marked with "int" TBAA
(overwriting the previous metadata for "long double").
Nothing seems to break due to this currently as the metadata is being
incorrectly placed on the load and not the call. But if the metadata
is moved to the call (which this patch ensures), LLVM will optimize out
the setup for the arguments.
Noticed while working on #109160
I've left out the sub_sat intrinsics for now - not sure about the history behind them using Intrinsic::wasm_sub_sat_* instead of Intrinsic::*sub_sat
Now that we have the C/C++ `__builtin_elementwise_popcount` intrinsic (#108121) - remove custom target intrinsics that just immediately map to Intrinsic::ctpop and use the generic intrinsic directly.
Add new elementwise popcount builtin to support HLSL function
'countbits'.
elementwise popcount only accepts integer types.
Add hlsl intrinsic 'countbits'
Closes#99094
There's currently no code path that can reach this crash, but:
```
Instruction *Inst = cast<llvm::Instruction>(Call.getScalarVal());
```
fails if the call returns `void`. This could happen if a builtin for
something like `void sincos(double, double*, double*)` is added to
clang.
Instead, use the `llvm::CallBase` returned from `EmitCall()` to set the
TBAA metadata, which should exist no matter the return type.
This implements our original design now that LLVM is comfortable with
structs and arrays of scalable vector types. All SVE ACLE intrinsics
already use struct types so the effect of this change is purely the
types used for alloca and function parameters.
There should be no C/C++ user visible change with this patch.
partially fixes#70078
### Changes
- Implemented `sign` clang builtin
- Linked `sign` clang builtin with `hlsl_intrinsics.h`
- Added sema checks for `sign` to `CheckHLSLBuiltinFunctionCall` in
`SemaChecking.cpp`
- Add codegen for `sign` to `EmitHLSLBuiltinExpr` in `CGBuiltin.cpp`
- Add codegen tests to `clang/test/CodeGenHLSL/builtins/sign.hlsl`
- Add sema tests to `clang/test/SemaHLSL/BuiltIns/sign-errors.hlsl`
### Related PRs
- https://github.com/llvm/llvm-project/pull/101987
- https://github.com/llvm/llvm-project/pull/101988
### Discussion
- Should there be a `usign` intrinsic that handles the unsigned cases?
This patch implements the intrinsics of the form
floatNxM_t vamin[q]_fN(floatNxM_t vn, floatNxM_t vm);
floatNxM_t vamax[q]_fN(floatNxM_t vn, floatNxM_t vm);
as defined in https://github.com/ARM-software/acle/pull/324
---------
Co-authored-by: Hassnaa Hamdi <hassnaa.hamdi@arm.com>
[P2641R4](https://wg21.link/P2641R4)
This new builtin function is declared `consteval`. Support for
`-fexperimental-new-constant-interpreter` will be added in a later
patch.
---------
Co-authored-by: cor3ntin <corentinjabot@gmail.com>
This commits add the WaveIsFirstLane() hlsl intrinsinc. This intrinsic
uses the convergence intrinsincs for the SPIR-V backend. On the DXIL
side, I'm not sure what the strategy is for convergence, so I
implemented that like in DXC: a normal builtin function.
Signed-off-by: Nathan Gauër <brioche@google.com>
Summary:
This patch proposes new llvm types for RISCV vector tuples represented
as `TargetExtType` which contains both `LMUL` and `NF`(num_fields)
information and keep it all the way down to `selectionDAG` to match the
corresponding `MVT`(support in the following patch).
Detail:
Currently we have built-in C types for RISCV vector tuple type, e.g.
`vint32m1x2_t`, however it's is represented as structure of scalable
vector types, i.e. `{<vscale x 2 x i32>, <vscale x 2 x i32>}`. It loses
the information for num_fields(NF) as struct is flattened during
`selectionDAG`, thus it makes it not possible to handle inline assembly
of vector tuple type, it also makes the calling convention of vector
tuple types handing not strait forward and hard to realize the
allocation code, i.e. `RVVArgDispatcher`.
The llvm IR for the example above is then represented as
`target("riscv.vector.tuple", <vscale x 8 x i8>, 2)` in which the first
type parameter is the equivalent size scalable vecotr of i8 element
type, the following integer parameter is the `NF` of the tuple.
The new RISCV specific vector insert/extract intrinsics are also added
as `llvm.riscv.vector.insert` and `llvm.riscv.vector.extract` to handle
tuple type subvector insertion/extraction since the generic ones only
operates on `VectorType` but not `TargetExtType`.
There are total of 32 llvm types added for each `VREGS * NF <= 8`, where
`VREGS` is the vector registers needed for each `LMUL` and `NF` is
num_fields.
The name of types are:
```
target("riscv.vector.tuple", <vscale x 1 x i8>, 2) // LMUL = mf8, NF = 2
target("riscv.vector.tuple", <vscale x 1 x i8>, 3) // LMUL = mf8, NF = 3
target("riscv.vector.tuple", <vscale x 1 x i8>, 4) // LMUL = mf8, NF = 4
target("riscv.vector.tuple", <vscale x 1 x i8>, 5) // LMUL = mf8, NF = 5
target("riscv.vector.tuple", <vscale x 1 x i8>, 6) // LMUL = mf8, NF = 6
target("riscv.vector.tuple", <vscale x 1 x i8>, 7) // LMUL = mf8, NF = 7
target("riscv.vector.tuple", <vscale x 1 x i8>, 8) // LMUL = mf8, NF = 8
target("riscv.vector.tuple", <vscale x 2 x i8>, 2) // LMUL = mf4, NF = 2
target("riscv.vector.tuple", <vscale x 2 x i8>, 3) // LMUL = mf4, NF = 3
target("riscv.vector.tuple", <vscale x 2 x i8>, 4) // LMUL = mf4, NF = 4
target("riscv.vector.tuple", <vscale x 2 x i8>, 5) // LMUL = mf4, NF = 5
target("riscv.vector.tuple", <vscale x 2 x i8>, 6) // LMUL = mf4, NF = 6
target("riscv.vector.tuple", <vscale x 2 x i8>, 7) // LMUL = mf4, NF = 7
target("riscv.vector.tuple", <vscale x 2 x i8>, 8) // LMUL = mf4, NF = 8
target("riscv.vector.tuple", <vscale x 4 x i8>, 2) // LMUL = mf2, NF = 2
target("riscv.vector.tuple", <vscale x 4 x i8>, 3) // LMUL = mf2, NF = 3
target("riscv.vector.tuple", <vscale x 4 x i8>, 4) // LMUL = mf2, NF = 4
target("riscv.vector.tuple", <vscale x 4 x i8>, 5) // LMUL = mf2, NF = 5
target("riscv.vector.tuple", <vscale x 4 x i8>, 6) // LMUL = mf2, NF = 6
target("riscv.vector.tuple", <vscale x 4 x i8>, 7) // LMUL = mf2, NF = 7
target("riscv.vector.tuple", <vscale x 4 x i8>, 8) // LMUL = mf2, NF = 8
target("riscv.vector.tuple", <vscale x 8 x i8>, 2) // LMUL = m1, NF = 2
target("riscv.vector.tuple", <vscale x 8 x i8>, 3) // LMUL = m1, NF = 3
target("riscv.vector.tuple", <vscale x 8 x i8>, 4) // LMUL = m1, NF = 4
target("riscv.vector.tuple", <vscale x 8 x i8>, 5) // LMUL = m1, NF = 5
target("riscv.vector.tuple", <vscale x 8 x i8>, 6) // LMUL = m1, NF = 6
target("riscv.vector.tuple", <vscale x 8 x i8>, 7) // LMUL = m1, NF = 7
target("riscv.vector.tuple", <vscale x 8 x i8>, 8) // LMUL = m1, NF = 8
target("riscv.vector.tuple", <vscale x 16 x i8>, 2) // LMUL = m2, NF = 2
target("riscv.vector.tuple", <vscale x 16 x i8>, 3) // LMUL = m2, NF = 3
target("riscv.vector.tuple", <vscale x 16 x i8>, 4) // LMUL = m2, NF = 4
target("riscv.vector.tuple", <vscale x 32 x i8>, 2) // LMUL = m4, NF = 2
```
RFC:
https://discourse.llvm.org/t/rfc-support-riscv-vector-tuple-type-in-llvm/80005
Getting this to work required a few additional changes:
- Add builtins for any instructions that can't be done with plain C
currently.
- Add support for the saturating version of fp_to_<s,i>_I16x8. Other
vector sizes supported this already.
- Support bitcast of f16x8 to v128. Needed to return a __f16x8 as
v128_t.