The WebAssembly committee has decided on the names `memory.size` and
`memory.grow` for the memory intrinsics, so update the clang builtin
functions to follow those names, keeping both sets of old names in place
for compatibility.
llvm-svn: 333712
This patch replaces all packed (and scalar without rounding
mode) fused intrinsics with fmadd/fmaddsub variations.
Then fmadd/fmaddsub are lowered to native IR.
Patch by tkrupa
Reviewers: craig.topper, sroland, spatel, RKSimon
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D47444
llvm-svn: 333555
These intrinsics are used by MSVC's header files on AArch64 Windows as
well as AArch32, so we should support them for both targets. I've
factored them out of CodeGenFunction::EmitARMBuiltinExpr into separate
functions that EmitAArch64BuiltinExpr can call as well.
Reviewers: javed.absar, mstorsjo
Reviewed By: mstorsjo
Subscribers: kristof.beyls, cfe-commits
Differential Revision: https://reviews.llvm.org/D47476
llvm-svn: 333513
The clang builtins have the same semantics as the stdlib functions.
The stdlib functions are defined in section 7.20.6.1 of the C standard with:
"If the result cannot be represented, the behavior is undefined."
That lets us mark the negation with 'nsw' because "sub i32 0, INT_MIN" would
be UB/poison.
Differential Revision: https://reviews.llvm.org/D47202
llvm-svn: 333038
Because the intrinsics in the headers are implemented as macros, we can't just use a select builtin and pternlog builtin. This would require one of the macro arguments to be used twice. Depending on what was passed to the macro we could expand an expression twice leading to weird behavior. We could maybe declare our local variable in the macro, but that would need to worry about name collisions.
To avoid that just generate IR directly in CGBuiltin.cpp.
Differential Revision: https://reviews.llvm.org/D47125
llvm-svn: 332891
1. added restrictions to memory scope, order and volatile parameters
2. added custom processing for these builtins - currently is not used code,
needed to switch off GCCBuiltin link to the builtins (ongoing change to llvm
tree)
3. builtins renamed as requested
Differential Revision: https://reviews.llvm.org/D43281
llvm-svn: 332848
This is unnecessary for AVX512VL supporting CPUs like SKX. We can just emit a 128-bit masked load/store here no matter what. The backend will widen it to 512-bits on KNL CPUs.
Fixes the frontend portion of PR37386. Need to fix the backend to optimize the new sequences well.
llvm-svn: 331958
Previously we emitted something like
rotl(x, n) {
n &= bitwidth-1;
return n != 0 ? ((x << n) | (x >> (bitwidth - n)) : x;
}
We use a select to avoid the undefined behavior on the (bitwidth - n) shift.
The middle and backend don't really recognize this as a rotate and end up emitting a cmov or control flow because of the select.
A better pattern is (x << (n & mask)) | (x << (-n & mask)) where mask is bitwidth - 1.
Fixes the main complaint in PR37387. There's still some work to be done if the user writes that sequence directly on a short or char where type promotion rules can prevent it from being recognized. The builtin is emitting direct IR with unpromoted types so that isn't a problem for it.
Differential Revision: https://reviews.llvm.org/D46656
llvm-svn: 331943
This is similar to the LLVM change https://reviews.llvm.org/D46290.
We've been running doxygen with the autobrief option for a couple of
years now. This makes the \brief markers into our comments
redundant. Since they are a visual distraction and we don't want to
encourage more \brief markers in new code either, this patch removes
them all.
Patch produced by
for i in $(git grep -l '\@brief'); do perl -pi -e 's/\@brief //g' $i & done
for i in $(git grep -l '\\brief'); do perl -pi -e 's/\\brief //g' $i & done
Differential Revision: https://reviews.llvm.org/D46320
llvm-svn: 331834
The ACLE spec which describes these intrinsics hasn't been published yet, but
this is based on the final draft which will be published soon, and these have
already been implemented by GCC.
Differential revision: https://reviews.llvm.org/D46109
llvm-svn: 331039
SPIR-V encodes the read_only and write_only access qualifiers of pipes,
so separate LLVM IR types are required to target SPIR-V. Other backends
may also find this useful.
These new types are `opencl.pipe_ro_t` and `opencl.pipe_wo_t`, which
replace `opencl.pipe_t`.
This replaces __get_pipe_num_packets(...) and __get_pipe_max_packets(...)
which took a read_only pipe with separate versions for read_only and
write_only pipes, namely:
* __get_pipe_num_packets_ro(...)
* __get_pipe_num_packets_wo(...)
* __get_pipe_max_packets_ro(...)
* __get_pipe_max_packets_wo(...)
These separate versions exist to avoid needing a bitcast to one of the
two qualified pipe types.
Patch by Stuart Brady.
Differential Revision: https://reviews.llvm.org/D46015
llvm-svn: 331026
This is the patch that lowers x86 intrinsics to native IR
in order to enable optimizations.
Patch by tkrupa
Differential Revision: https://reviews.llvm.org/D44786
llvm-svn: 330323
Summary:
A clang builtin for xray typed events. Differs from
__xray_customevent(...) by the presence of a type tag that is vended by
compiler-rt in typical usage. This allows xray handlers to expand logged
events with their type description and plugins to process traced events
based on type.
This change depends on D45633 for the intrinsic definition.
Reviewers: dberris, pelikan, rnk, eizan
Subscribers: cfe-commits, llvm-commits
Differential Revision: https://reviews.llvm.org/D45716
llvm-svn: 330220
Summary:
This change addresses http://llvm.org/PR36926 by allowing users to pick
which instrumentation bundles to use, when instrumenting with XRay. In
particular, the flag `-fxray-instrumentation-bundle=` has four valid
values:
- `all`: the default, emits all instrumentation kinds
- `none`: equivalent to -fnoxray-instrument
- `function`: emits the entry/exit instrumentation
- `custom`: emits the custom event instrumentation
These can be combined either as comma-separated values, or as
repeated flag values.
Reviewers: echristo, kpw, eizan, pelikan
Reviewed By: pelikan
Subscribers: mgorny, cfe-commits
Differential Revision: https://reviews.llvm.org/D44970
llvm-svn: 329985
I believe all the pieces are now in place in the backend to make this work correctly. We can either mask the input to 32 bits for pmuludg or shl/ashr for pmuldq and use a regular mul instruction. The backend should combine this to PMULUDQ/PMULDQ and then SimplifyDemandedBits will remove the and/shifts.
Differential Revision: https://reviews.llvm.org/D45421
llvm-svn: 329605
Found via codespell -q 3 -I ../clang-whitelist.txt
Where whitelist consists of:
archtype
cas
classs
checkk
compres
definit
frome
iff
inteval
ith
lod
methode
nd
optin
ot
pres
statics
te
thru
Patch by luzpaz! (This is a subset of D44188 that applies cleanly with a few
files that have dubious fixes reverted.)
Differential revision: https://reviews.llvm.org/D44188
llvm-svn: 329399
A recent addition to Coroutines TS (https://wg21.link/p0913) adds a pre-defined
coroutine noop_coroutine that does nothing. To implement this feature, we implemented
an llvm.coro.noop intrinsic that returns a coroutine handle to a coroutine that
does nothing when resumed or destroyed.
This patch adds a builtin __builtin_coro_noop() that maps to llvm.coro.noop intrinsic.
Related llvm change: https://reviews.llvm.org/D45114
llvm-svn: 328993
The conversion of operatios to bitcode helps to eliminate an additional
store in certain cases. We used to lower these load intrinsics in DAG to
DAG conversion by which time, the "Dead Store Elimination" pass is
already run. There is an associated LLVM patch.
Patch by Sumanth Gundapaneni.
llvm-svn: 328776
These instructions have been around for a long time, but we
haven't supported intrinsics for them. The "new" vesrions use
the CSx register for the start of the buffer instead of the K
field in the Mx register.
There is a related llvm patch.
Patch by Brendon Cahoon.
llvm-svn: 328725
Putting back the code in commit r327189 that was reverted in r322737. The code is being committed in three stages and this one is the last stage: 1) r327455 fp16 feature flags, 2) r327836 pass half type or i16 based on FullFP16, and 3) the code here which the front-end fp16 vector intrinsic for ARM.
Differential revision https://reviews.llvm.org/D43650
llvm-svn: 328277
This is needed for the upcoming implementation of the
new 8x32x16 and 32x8x16 variants of WMMA instructions
introduced in CUDA 9.1.
Differential Revision: https://reviews.llvm.org/D44719
llvm-svn: 328158
Summary:
Libc++'s default allocator uses `__builtin_operator_new` and `__builtin_operator_delete` in order to allow the calls to new/delete to be ellided. However, libc++ now needs to support over-aligned types in the default allocator. In order to support this without disabling the existing optimization Clang needs to support calling the aligned new overloads from the builtins.
See llvm.org/PR22634 for more information about the libc++ bug.
This patch changes `__builtin_operator_new`/`__builtin_operator_delete` to call any usual `operator new`/`operator delete` function. It does this by performing overload resolution with the arguments passed to the builtin to determine which allocation function to call. If the selected function is not a usual allocation function a diagnostic is issued.
One open issue is if the `align_val_t` overloads should be considered "usual" when `LangOpts::AlignedAllocation` is disabled.
In order to allow libc++ to detect this new behavior the value for `__has_builtin(__builtin_operator_new)` has been updated to `201802`.
Reviewers: rsmith, majnemer, aaron.ballman, erik.pilkington, bogner, ahatanak
Reviewed By: rsmith
Subscribers: cfe-commits
Differential Revision: https://reviews.llvm.org/D43047
llvm-svn: 328134
This way we can support address-space specific variants without explicitly
encoding the space in the name of the intrinsic. Less intrinsics to deal with ->
less boilerplate.
Added a bit of tablegen magic to match/replace an intrinsics with a pointer
argument in particular address space with the space-specific instruction
variant.
Updated tests to use non-default address spaces.
Differential Revision: https://reviews.llvm.org/D43268
llvm-svn: 328006
For generating NEON intrinsics, this determines the NEON data type, and whether
it should be a half type or an i16 type. I.e., we always pass a half type for
AArch64, this hasn't changed, but now also for ARM but only when FullFP16 is
enabled, and i16 otherwise.
This is intended to be non-functional change, but together with the backend
work in D44538 which adds support for f16 vectors, this enables adding the
AArch32 FP16 (vector) intrinsics.
Differential Revision: https://reviews.llvm.org/D44561
llvm-svn: 327836
The MSVC runtime library does not provide a definition of wmemcmp,
so we need an inline implementation.
Differential Revision: https://reviews.llvm.org/D42441
llvm-svn: 323362
Summary:
kunpck intrinsics were removed in favor of native IR a few months ago. The implementation lowers them as by operation on the integer types passed to the intrinsic and then just shifting, masking, and oring them together. A special X86 DAG combine was added to recognize this patter and turn it into a concat_vector operation.
I think it makes more sense to keep the IR implementation closer to vector operations on vXi1. Given that we expect these builtins to be used around other builtins that operate on k-registers which we try to represent in IR with vXi1. InstCombine should be able to get rid of the bitcasts between integers and vXi1 leaving only the vector operations.
Reviewers: RKSimon, spatel, zvi, jina.nahias
Reviewed By: RKSimon
Subscribers: cfe-commits
Differential Revision: https://reviews.llvm.org/D42016
llvm-svn: 322461
These just overloads for _Float128. They're supported by GCC 7 and used
by glibc. APFloat support is already there so just add the overloads.
__builtin_copysignf128
__builtin_fabsf128
__builtin_huge_valf128
__builtin_inff128
__builtin_nanf128
__builtin_nansf128
This is the same support that GCC has, according to the documentation,
but limited to _Float128.
llvm-svn: 321948
r320902 fixed the IRGen for some types of checked multiplications. It
did not handle unsigned overflow correctly in the case where the signed
operand is negative (PR35750).
Eli pointed out that on overflow, the result must be equal to the unique
value that is equivalent to the mathematically-correct result modulo two
raised to the k power, where k is the number of bits in the result type.
This patch fixes the specialized IRGen from r320902 accordingly.
Testing: Apart from check-clang, I modified the test harness from
r320902 to validate the results of all multiplications -- not just the
ones which don't overflow:
https://gist.github.com/vedantk/3eb9c88f82e5c32f2e590555b4af5081
llvm.org/PR35750, rdar://34963321
Differential Revision: https://reviews.llvm.org/D41717
llvm-svn: 321771