Commit Graph

16594 Commits

Author SHA1 Message Date
smanna12
bbe1b06fbb [NFC][CLANG] Fix static analyzer bugs about unnecessary object copies with auto keyword (#75082)
Reported by Static Analyzer Tool:

In ​EmitAssemblyHelper::​RunOptimizationPipeline(): Using the auto
keyword without an & causes the copy of an object of type function.

 /// List of pass builder callbacks ("CodeGenOptions.h").
std::vector<std::function<void(llvm::PassBuilder &)>>
PassBuilderCallbacks;
2023-12-22 20:39:22 -06:00
Dinar Temirbulatov
809f2f3d7d [AArch64][SME2] Add builtins for FDOT, BFDOT, SUDOT, USDOT, SDOT, UDOT. (#75737)
Add SME2 DOT builtins.
2023-12-21 19:41:24 +00:00
Tomas Matheson
7bd17212ef Re-land "[AArch64] Codegen support for FEAT_PAuthLR" (#75947)
This reverts commit 9f0f558742.

Fix expensive checks failure by properly marking register def for ADR.
2023-12-21 18:32:55 +00:00
Dinar Temirbulatov
77c5c44b01 [AArch64][SME2] Add SME2 MLA/MLS builtins. (#75584)
Add SME2 MLA/MLS builtins.
2023-12-21 16:42:24 +00:00
Tomas Matheson
9f0f558742 Revert "[AArch64] Codegen support for FEAT_PAuthLR"
This reverts commit 5992ce90b8.

Builtbot failures with expensive checks enabled.
2023-12-21 16:25:55 +00:00
Tomas Matheson
5992ce90b8 [AArch64] Codegen support for FEAT_PAuthLR
- Adds a new +pc option to -mbranch-protection that will enable
  the use of PC as a diversifier in PAC branch protection code.

- When +pauth-lr is enabled (-march=armv9.5a+pauth-lr) in combination
  with -mbranch-protection=pac-ret+pc, the new 9.5-a instructions
  (pacibsppc, retaasppc, etc) are used.

Documentation for the relevant instructions can be found here:
https://developer.arm.com/documentation/ddi0602/2023-09/Base-Instructions/

Co-authored-by: Lucas Prates <lucas.prates@arm.com>
2023-12-21 14:18:33 +00:00
Dimitry Andric
2c27013fa9 [clang] Add getClangVendor() and use it in CodeGenModule.cpp (#75935)
In 9a38a72f1d `ProductId` was assigned from the stringified value of
`CLANG_VENDOR`, if that macro was defined. However, `CLANG_VENDOR` is
supposed to be a string, as it is defined (optionally) as such in the
top-level clang `CMakeLists.txt`.

Furthermore, `CLANG_VENDOR` is only passed as a build-time define when
compiling `Version.cpp`, so add a `getClangVendor()` function to
`Version.h`, and use it in `CodegGenModule.cpp`, instead of relying on
the macro.

Fixes: 9a38a72f1d
2023-12-20 20:09:39 +01:00
Dimitry Andric
5c1a41f8ad Revert "[clang] Add getClangVendor() and use it in CodeGenModule.cpp (#75935)"
This reverts commit 9055519103, due to an
incorrectly chosen commit message.
2023-12-20 20:07:22 +01:00
Dimitry Andric
9055519103 [clang] Add getClangVendor() and use it in CodeGenModule.cpp (#75935)
In 9a38a72f1d `ProductId` was assigned from the stringified value of
`CLANG_VENDOR`, if that macro was defined. However, `CLANG_VENDOR` is
supposed to be a string, as it is defined (optionally) as such in the
top-level clang `CMakeLists.txt`.

Move the addition of `-DCLANG_VENDOR` to the compiler flags from
`clang/lib/Basic/CMakeLists.txt` to the top-level `CMakeLists.txt`, so
it is consistent across the whole clang codebase. Then remove the
stringification from `CodeGenModule.cpp`, to make it work correctly.

Fixes:		9a38a72f1d
2023-12-20 20:03:19 +01:00
Fangrui Song
207cbbd710 DiagnosticHandler: refactor error checking (#75889)
In LLVMContext::diagnose, set `HasErrors` for `DS_Error` so that all
derived `DiagnosticHandler` have correct `HasErrors` information.

An alternative is to set `HasErrors` in
`DiagnosticHandler::handleDiagnostics`, but all derived
`handleDiagnostics` would have to call the base function.
2023-12-19 21:51:26 -08:00
Nikita Popov
a3d2d34e84 [Clang] Use poison as base for vector literals
When constructing vectors from elements, use poison instead of
undef as the base value. These literals always initialize all
elements (padding the remainder with zero), so that the choice
of base value does not affect semantics.
2023-12-19 11:53:18 +01:00
Bill Wendling
cca4d6cfd2 Revert counted_by attribute feature (#75857)
There are many issues that popped up with the counted_by feature. The
patch #73730 has grown too large and approval is blocking Linux testing.

Includes reverts of:
commit 769bc11f68 ("[Clang] Implement the 'counted_by' attribute
(#68750)")
commit bc09ec6962 ("[CodeGen] Revamp counted_by calculations
(#70606)")
commit 1a09cfb2f3 ("[Clang] counted_by attr can apply only to C99
flexible array members (#72347)")
commit a76adfb992 ("[NFC][Clang] Refactor code to calculate flexible
array member size (#72790)")
commit d8447c78ab ("[Clang] Correct handling of negative and
out-of-bounds indices (#71877)")
Partial commit b31cd07de5 ("[Clang] Regenerate test checks (NFC)")

Closes #73168
Closes #75173
2023-12-18 15:16:09 -08:00
Paul Kirth
d1e2b96b60 [clang][fatlto] Don't set ThinLTO module flag with FatLTO (#75079)
Since FatLTO now uses the UnifiedLTO pipeline, we should not set the
ThinLTO module flag to true, since it may cause an assertion failure.
See https://github.com/llvm/llvm-project/issues/70703 for context.
2023-12-18 13:03:13 -08:00
Justin Bogner
4f54d71501 [HLSL][DirectX] Move handling of resource element types into the frontend
Rather than shepherding a type name all the way to the backend as a
string and attempting to parse it, get the element type out of the AST
and store that in the resource annotation metadata directly.

Pull Request: https://github.com/llvm/llvm-project/pull/75674
2023-12-18 11:43:52 -07:00
Fangrui Song
96aca7c517 [LTO] Improve diagnostics handling when parsing module-level inline assembly (#75726)
Non-LTO compiles set the buffer name to "<inline asm>"
(`AsmPrinter::addInlineAsmDiagBuffer`) and pass diagnostics to
`ClangDiagnosticHandler` (through the `MCContext` handler in
`MachineModuleInfoWrapperPass::doInitialization`) to ensure that
the exit code is 1 in the presence of errors. In contrast, LTO compiles
spuriously succeed even if error messages are printed.

```
% cat a.c
void _start() {}
asm("unknown instruction");
% clang -c a.c
<inline asm>:1:1: error: invalid instruction mnemonic 'unknown'
    1 | unknown instruction
      | ^
1 error generated.
% clang -c -flto a.c; echo $?  # -flto=thin is the same
error: invalid instruction mnemonic 'unknown'
unknown instruction
^~~~~~~
error: invalid instruction mnemonic 'unknown'
unknown instruction
^~~~~~~
0
```

`CollectAsmSymbols` parses inline assembly and is transitively called by
both `ModuleSummaryIndexAnalysis::run` and `WriteBitcodeToFile`, leading
to duplicate diagnostics.

This patch updates `CollectAsmSymbols` to be similar to non-LTO
compiles.
```
% clang -c -flto=thin a.c; echo $?
<inline asm>:1:1: error: invalid instruction mnemonic 'unknown'
    1 | unknown instruction
      | ^
1 errors generated.
1
```

The `HasErrors` check does not prevent duplicate warnings but assembler
warnings are very uncommon.
2023-12-18 09:46:58 -08:00
Gheorghe-Teodor Bercea
4ef6587715 [Clang][OpenMP] Fix mapping of structs to device (#75642)
Fix mapping of structs to device.

The following example fails:

```
#include <stdio.h>
#include <stdlib.h>

struct Descriptor {
  int *datum;
  long int x;
  int xi;
  long int arr[1][30];
};

int main() {
  Descriptor dat = Descriptor();
  dat.datum = (int *)malloc(sizeof(int)*10);
  dat.xi = 3;
  dat.arr[0][0] = 1;

  #pragma omp target enter data map(to: dat.datum[:10]) map(to: dat)

  #pragma omp target
  {
    dat.xi = 4;
    dat.datum[dat.arr[0][0]] = dat.xi;
  }

  #pragma omp target exit data map(from: dat)

 return 0;
}
```

This is a rework of the previous attempt:
https://github.com/llvm/llvm-project/pull/72410
2023-12-18 09:47:59 -05:00
Paul Walker
dea16ebd26 [LLVM][IR] Replace ConstantInt's specialisation of getType() with getIntegerType(). (#75217)
The specialisation will not be valid when ConstantInt gains native
support for vector types.

This is largely a mechanical change but with extra attention paid to constant
folding, InstCombineVectorOps.cpp, LoopFlatten.cpp and Verifier.cpp to
remove the need to call `getIntegerType()`.

Co-authored-by: Nikita Popov <github@npopov.com>
2023-12-18 11:58:42 +00:00
Simon Pilgrim
df3ddd78f6 CGBuiltin - fix gcc Wunused-variable warning. NFC. 2023-12-18 11:51:24 +00:00
Akira Hatanaka
31429e7a89 [CodeGen] Emit a more accurate alignment for non-temporal loads/stores (#75675)
Call EmitPointerWithAlignment to compute the alignment based on the
underlying lvalue's alignment when it's available.
2023-12-17 18:22:44 -08:00
Youngsuk Kim
f49e2b05bf [clang][CGCUDANV] Unify PointerType members of CGNVCUDARuntime (NFC) (#75668)
Unify 3 `Pointertype *` members which all refer to the same llvm type.

Opaque pointer clean-up effort.
2023-12-16 11:47:37 -05:00
Lei Huang
aaa3f72c1c [PowerPC] Emit libcall to frexpl for calls to frexp(ppcDoublDouble) (#75226)
On Linux PPC call lib func ``frexpl`` for calls to ``frexp()`` for input
of type PPCDoubleDouble.

Fixes bug: https://github.com/llvm/llvm-project/issues/64426
2023-12-15 17:23:16 -05:00
Zequan Wu
ab3430f891 [Profile] Add binary profile correlation for code coverage. (#69493)
## Motivation
Since we don't need the metadata sections at runtime, we can somehow
offload them from memory at runtime. Initially, I explored [debug info
correlation](https://discourse.llvm.org/t/instrprofiling-lightweight-instrumentation/59113),
which is used for PGO with value profiling disabled. However, it
currently only works with DWARF and it's be hard to add such artificial
debug info for every function in to CodeView which is used on Windows.
So, offloading profile metadata sections at runtime seems to be a
platform independent option.

## Design
The idea is to use new section names for profile name and data sections
and mark them as metadata sections. Under this mode, the new sections
are non-SHF_ALLOC in ELF. So, they are not loaded into memory at runtime
and can be stripped away as a post-linking step. After the process
exits, the generated raw profiles will contains only headers + counters.
llvm-profdata can be used correlate raw profiles with the unstripped
binary to generate indexed profile.

## Data
For chromium base_unittests with code coverage on linux, the binary size
overhead due to instrumentation reduced from 64M to 38.8M (39.4%) and
the raw profile files size reduce from 128M to 68M (46.9%)
```
$ bloaty out/cov/base_unittests.stripped -- out/no-cov/base_unittests.stripped
    FILE SIZE        VM SIZE
 --------------  --------------
  +121% +30.4Mi  +121% +30.4Mi    .text
  [NEW] +14.6Mi  [NEW] +14.6Mi    __llvm_prf_data
  [NEW] +10.6Mi  [NEW] +10.6Mi    __llvm_prf_names
  [NEW] +5.86Mi  [NEW] +5.86Mi    __llvm_prf_cnts
   +95% +1.75Mi   +95% +1.75Mi    .eh_frame
  +108%  +400Ki  +108%  +400Ki    .eh_frame_hdr
  +9.5%  +211Ki  +9.5%  +211Ki    .rela.dyn
  +9.2% +95.0Ki  +9.2% +95.0Ki    .data.rel.ro
  +5.0% +87.3Ki  +5.0% +87.3Ki    .rodata
  [ = ]       0   +13% +47.0Ki    .bss
   +40% +1.78Ki   +40% +1.78Ki    .got
   +12% +1.49Ki   +12% +1.49Ki    .gcc_except_table
  [ = ]       0   +65% +1.23Ki    .relro_padding
   +62% +1.20Ki  [ = ]       0    [Unmapped]
   +13%    +448   +19%    +448    .init_array
  +8.8%    +192  [ = ]       0    [ELF Section Headers]
  +0.0%    +136  +0.0%     +80    [7 Others]
  +0.1%     +96  +0.1%     +96    .dynsym
  +1.2%     +96  +1.2%     +96    .rela.plt
  +1.5%     +80  +1.2%     +64    .plt
  [ = ]       0 -99.2% -3.68Ki    [LOAD #5 [RW]]
  +195% +64.0Mi  +194% +64.0Mi    TOTAL
$ bloaty out/cov-cor/base_unittests.stripped -- out/no-cov/base_unittests.stripped
    FILE SIZE        VM SIZE
 --------------  --------------
  +121% +30.4Mi  +121% +30.4Mi    .text
  [NEW] +5.86Mi  [NEW] +5.86Mi    __llvm_prf_cnts
   +95% +1.75Mi   +95% +1.75Mi    .eh_frame
  +108%  +400Ki  +108%  +400Ki    .eh_frame_hdr
  +9.5%  +211Ki  +9.5%  +211Ki    .rela.dyn
  +9.2% +95.0Ki  +9.2% +95.0Ki    .data.rel.ro
  +5.0% +87.3Ki  +5.0% +87.3Ki    .rodata
  [ = ]       0   +13% +47.0Ki    .bss
   +40% +1.78Ki   +40% +1.78Ki    .got
   +12% +1.49Ki   +12% +1.49Ki    .gcc_except_table
   +13%    +448   +19%    +448    .init_array
  +0.1%     +96  +0.1%     +96    .dynsym
  +1.2%     +96  +1.2%     +96    .rela.plt
  +1.2%     +64  +1.2%     +64    .plt
  +2.9%     +64  [ = ]       0    [ELF Section Headers]
  +0.0%     +40  +0.0%     +40    .data
  +1.2%     +32  +1.2%     +32    .got.plt
  +0.0%     +24  +0.0%      +8    [5 Others]
  [ = ]       0 -22.9%    -872    [LOAD #5 [RW]]
 -74.5% -1.44Ki  [ = ]       0    [Unmapped]
  [ = ]       0 -76.5% -1.45Ki    .relro_padding
  +118% +38.8Mi  +117% +38.8Mi    TOTAL
```

A few things to note:
1. llvm-profdata doesn't support filter raw profiles by binary id yet,
so when a raw profile doesn't belongs to the binary being digested by
llvm-profdata, merging will fail. Once this is implemented,
llvm-profdata should be able to only merge raw profiles with the same
binary id as the binary and discard the rest (with mismatched/missing
binary id). The workflow I have in mind is to have scripts invoke
llvm-profdata to get all binary ids for all raw profiles, and
selectively choose the raw pnrofiles with matching binary id and the
binary to llvm-profdata for merging.
2. Note: In COFF, currently they are still loaded into memory but not
used. I didn't do it in this patch because I noticed that `.lcovmap` and
`.lcovfunc` are loaded into memory. A separate patch will address it.
3. This should works with PGO when value profiling is disabled as debug
info correlation currently doing, though I haven't tested this yet.
2023-12-14 14:16:38 -05:00
Alan Phipps
8ecbb0404d Reland "[Coverage][llvm-cov] Enable MC/DC Support in LLVM Source-based Code Coverage (2/3)"
Part 2 of 3. This includes the Visualization and Evaluation components.

Differential Revision: https://reviews.llvm.org/D138847
2023-12-13 15:10:05 -06:00
Kazu Hirata
f3dcc2351c [clang] Use StringRef::{starts,ends}_with (NFC) (#75149)
This patch replaces uses of StringRef::{starts,ends}with with
StringRef::{starts,ends}_with for consistency with
std::{string,string_view}::{starts,ends}_with in C++20.

I'm planning to deprecate and eventually remove
StringRef::{starts,ends}with.
2023-12-13 08:54:13 -08:00
CarolineConcatto
f2464ca317 [SVE2.1][Clang][LLVM]Int/FP reduce builtin in Clang and LLVM intrinsic (#69926)
This patch implements the builtins in Clang
and the LLVM-IR intrinsic for the following:

// Variants are also available for:
// _s8, _s16, _u16, _s32, _u32, _s64, _u64,
// _f16, _f32, _f64uint8x16_t svaddqv[_u8](svbool_t pg, svuint8_t zn);

// Variants are also available for:
// _s8, _u16, _s16, _u32, _s32, _u64, _s64
uint8x16_t svandqv[_u8](svbool_t pg, svuint8_t zn); uint8x16_t
sveorqv[_u8](svbool_t pg, svuint8_t zn); uint8x16_t svorqv[_u8](svbool_t
pg, svuint8_t zn);

// Variants are also available for:
// _s8, _u16, _s16, _u32, _s32, _u64, _s64;
uint8x16_t svmaxqv[_u8](svbool_t pg, svuint8_t zn); uint8x16_t
svminqv[_u8](svbool_t pg, svuint8_t zn);

// Variants are also available for _f32, _f64
float16x8_t svmaxnmqv[_f16](svbool_t pg, svfloat16_t zn); float16x8_t
svminnmqv[_f16](svbool_t pg, svfloat16_t zn);

According to the PR#257[1]

The reduction instruction uses scalable vectors as input and fixed
vectors as output, therefore we changed SVEEmitter to emit fixed vector
types in case the neon header(arm_neon.h) is not present.

[1]https://github.com/ARM-software/acle/pull/257

Co-author: Dinar Temirbulatov <dinar.temirbulatov@arm.com>
2023-12-13 15:45:59 +00:00
Fangrui Song
831484efa0 [DebugInfo] Fix duplicate DIFile when main file is preprocessed (#75022)
When the main file is preprocessed and we change `MainFileName` to the
original source file name (e.g. `a.i => a.c`), the source manager does
not contain `a.c`, but we incorrectly associate the DIFile(a.c) with
md5(a.i). This causes CGDebugInfo::emitFunctionStart to create a
duplicate DIFile and leads to a spurious "inconsistent use of MD5
checksums" warning.

```
% cat a.c
void f() {}
% clang -c -g a.c  # no warning
% clang -E a.c -o a.i && clang -g -S a.i && clang -g -c a.s
a.s:9:2: warning: inconsistent use of MD5 checksums
        .file   1 "a.c"
        ^
% grep DIFile a.ll
!1 = !DIFile(filename: "a.c", directory: "/tmp/c", checksumkind: CSK_MD5, checksum: "c5b2e246df7d5f53e176b097a0641c3d")
!11 = !DIFile(filename: "a.c", directory: "/tmp/c")
% grep 'file.*a.c' a.s
        .file   "a.c"
        .file   0 "/tmp/c" "a.c" md5 0x2d14ea70fee15102033eb8d899914cce
        .file   1 "a.c"
```

Fix #56378 by disassociating md5(a.i) with a.c.
2023-12-12 10:13:42 -08:00
Artem Belevich
631c6e834c [CUDA] Add support for CUDA-12.3 and sm_90a (#74895) 2023-12-11 12:18:28 -08:00
Zahira Ammarguellat
b40c534656 [clang] Add support for -fcx-limited-range, #pragma CX_LIMITED_RANGE and -fcx-fortran-rules. (#70244)
This patch adds the #pragma CX_LIMITED_RANGE defined in the C
specification.
It also adds the options -f[no]cx-limited-range and
-f[no]cx-fortran-rules.
-fcx-limited-range enables algebraic formulas for complex multiplication
and division. This option is enabled with -ffast-math.
-fcx-fortran-rules enables algebraic formulas for complex multiplication
and enables Smith’s algorithm for complex division (SMITH, R. L.
Algorithm 116: Complex division. Commun. ACM 5, 8 (1962)).

---------

Signed-off-by: Med Ismail Bennani <ismail@bennani.ma>
Co-authored-by: Joseph Huber <jhuber6@vols.utk.edu>
Co-authored-by: Guray Ozen <guray.ozen@gmail.com>
Co-authored-by: Nishant Patel <nishant.b.patel@intel.com>
Co-authored-by: Jessica Clarke <jrtc27@jrtc27.com>
Co-authored-by: Petr Hosek <phosek@google.com>
Co-authored-by: Joseph Huber <35342157+jhuber6@users.noreply.github.com>
Co-authored-by: Craig Topper <craig.topper@sifive.com>
Co-authored-by: Alexander Yermolovich <43973793+ayermolo@users.noreply.github.com>
Co-authored-by: Usama Hameed <u_hameed@apple.com>
Co-authored-by: Philip Reames <preames@rivosinc.com>
Co-authored-by: Evgenii Kudriashov <evgenii.kudriashov@intel.com>
Co-authored-by: Fangrui Song <i@maskray.me>
Co-authored-by: Aart Bik <39774503+aartbik@users.noreply.github.com>
Co-authored-by: Valentin Clement <clementval@gmail.com>
Co-authored-by: Youngsuk Kim <youngsuk.kim@hpe.com>
Co-authored-by: Arthur Eubanks <aeubanks@google.com>
Co-authored-by: Jan Svoboda <jan_svoboda@apple.com>
Co-authored-by: Walter Erquinigo <a20012251@gmail.com>
Co-authored-by: Eric <eric@efcs.ca>
Co-authored-by: Fazlay Rabbi <106703039+mdfazlay@users.noreply.github.com>
Co-authored-by: Pete Lawrence <plawrence@apple.com>
Co-authored-by: Jonas Devlieghere <jonas@devlieghere.com>
Co-authored-by: Adrian Prantl <aprantl@apple.com>
Co-authored-by: Owen Pan <owenpiano@gmail.com>
Co-authored-by: LLVM GN Syncbot <llvmgnsyncbot@gmail.com>
Co-authored-by: Med Ismail Bennani <ismail@bennani.ma>
Co-authored-by: Congcong Cai <congcongcai0907@163.com>
Co-authored-by: Rik Huijzer <github@huijzer.xyz>
Co-authored-by: Wang Pengcheng <wangpengcheng.pp@bytedance.com>
Co-authored-by: Yuanfang Chen <tabloid.adroit@gmail.com>
Co-authored-by: Kazu Hirata <kazu@google.com>
Co-authored-by: Mehdi Amini <joker.eph@gmail.com>
Co-authored-by: Aiden Grossman <agrossman154@yahoo.com>
Co-authored-by: Rana Pratap Reddy <109514914+ranapratap55@users.noreply.github.com>
Co-authored-by: Yingwei Zheng <dtcxzyw2333@gmail.com>
Co-authored-by: Piotr Zegar <me@piotrzegar.pl>
Co-authored-by: KAWASHIMA Takahiro <t-kawashima@fujitsu.com>
Co-authored-by: Tobias Hieta <tobias@hieta.se>
Co-authored-by: Luke Lau <luke@igalia.com>
Co-authored-by: Shivam Gupta <shivam98.tkg@gmail.com>
Co-authored-by: cor3ntin <corentinjabot@gmail.com>
Co-authored-by: Yeting Kuo <46629943+yetingk@users.noreply.github.com>
Co-authored-by: Stanislav Mekhanoshin <rampitec@users.noreply.github.com>
Co-authored-by: David Spickett <david.spickett@linaro.org>
Co-authored-by: Matthew Devereau <matthew.devereau@arm.com>
Co-authored-by: Martin Storsjö <martin@martin.st>
Co-authored-by: Qiu Chaofan <qiucofan@cn.ibm.com>
Co-authored-by: Pierre van Houtryve <pierre.vanhoutryve@amd.com>
Co-authored-by: Mikael Holmen <mikael.holmen@ericsson.com>
Co-authored-by: Uday Bondhugula <uday@polymagelabs.com>
Co-authored-by: Nikita Popov <npopov@redhat.com>
Co-authored-by: Johannes Reifferscheid <jreiffers@google.com>
Co-authored-by: Benjamin Kramer <benny.kra@googlemail.com>
Co-authored-by: Oliver Stannard <oliver.stannard@arm.com>
Co-authored-by: Dmitry Vyukov <dvyukov@google.com>
Co-authored-by: Benjamin Maxwell <benjamin.maxwell@arm.com>
Co-authored-by: Piotr Sobczak <piotr.sobczak@amd.com>
Co-authored-by: Simon Pilgrim <llvm-dev@redking.me.uk>
Co-authored-by: Timm Bäder <tbaeder@redhat.com>
Co-authored-by: Sunil Kuravinakop <koops@hpe.com>
Co-authored-by: zhongyunde 00443407 <zhongyunde@huawei.com>
Co-authored-by: Christudasan Devadasan <Christudasan.Devadasan@amd.com>
Co-authored-by: bjacob <jacob.benoit.1@gmail.com>
Co-authored-by: Weining Lu <luweining@loongson.cn>
Co-authored-by: Andrzej Warzyński <andrzej.warzynski@arm.com>
Co-authored-by: Jay Foad <jay.foad@amd.com>
Co-authored-by: Markus Mützel <markus.muetzel@gmx.de>
Co-authored-by: Erik Jonsson <erik.j.jonsson@ericsson.com>
Co-authored-by: Pete Steinfeld <47540744+psteinfeld@users.noreply.github.com>
Co-authored-by: Alexey Bataev <a.bataev@outlook.com>
Co-authored-by: Louis Dionne <ldionne.2@gmail.com>
Co-authored-by: Qizhi Hu <836744285@qq.com>
2023-12-11 10:03:27 -05:00
Mircea Trofin
1d608fc755 [NFC][InstrProf] Refactor InstrProfiling lowering pass (#74970)
Akin other passes - refactored the name to `InstrProfilingLoweringPass` to better communicate what it does, and split the pass part and the transformation part to avoid needing to initialize object state during `::run`.

A subsequent PR will move `InstrLowering` to the .cpp file and rename it to `InstrLowerer`.
2023-12-10 18:03:08 -08:00
Kazu Hirata
2ec95c19a2 [clang] Use llvm::to_underlying (NFC) 2023-12-09 17:08:48 -08:00
Justin Bogner
7a13e410fd [DirectX] Move ROV info into HLSL metadata. NFC
Pull Request: https://github.com/llvm/llvm-project/pull/74896
2023-12-09 10:42:45 -08:00
Dinar Temirbulatov
49b27b150b [AArch64][SME2] Add builtins to cast svbool from/to svcount. (#74720)
Add builtin: 'svreinterpret_b' to cast from svcount_t to svbool_t.
Add builtin: 'svreinterpret_c'  to cast from svbool_t  to svcount_t.

Patch by: Hassnaa Hamdi <hassnaa.hamdi@arm.com>
2023-12-08 16:38:29 +00:00
David Sherwood
c1cfa1757c [Clang] Emit TBAA info for enums in C (#73326)
When emitting TBAA information for enums in C code we currently just
treat the data as an 'omnipotent char'. However, with C strict aliasing
this means we fail to optimise certain cases. For example, in the
SPEC2017 xz benchmark there are structs that contain arrays of enums,
and clang pessmistically assumes that accesses to those enums could
alias with other struct members that have a different type.

According to

https://en.cppreference.com/w/c/language/enum

enums should be treated as 'int' types unless explicitly specified (C23)
or if 'int' would not be large enough to hold all the enumerated values.
In the latter case the compiler is free to choose a suitable integer
that would hold all such values.

When compiling C code this patch generates TBAA information for the enum
by using an equivalent integer of the size clang has already chosen for
the enum. I have ignored C++ for now because the rules are more complex.

New test added here:

  clang/test/CodeGen/tbaa.c
2023-12-08 12:58:39 +00:00
Mike Rice
0808be47b8 [NFC] Remove unneeded nullptr checks after cast<> (#74674)
Since VD is assigned from a cast<VarDecl> it cannot be a nullptr or it
would have asserted. Remove the subsequent checks to clear up any
misunderstanding.
2023-12-07 16:20:22 -08:00
Joseph Huber
97f3be2c5a [CUDA][HIP] Improve variable registration with the new driver (#73177)
Summary:
This patch adds support for registering texture / surface variables from
CUDA / HIP. Additionally, we now properly track the `extern` and `const`
flags that are also used in these runtime functions.

This does not implement the `managed` variables yet as those seem to
require some extra handling I'm not familiar with. The issue is that the
current offload entry isn't large enough to carry size and alignment
information along with an extra global.
2023-12-07 15:44:23 -06:00
Joseph Huber
4e80bc7d71 [Clang] Introduce scoped variants of GNU atomic functions (#72280)
Summary:
The standard GNU atomic operations are a very common way to target
hardware atomics on the device. With more heterogenous devices being
introduced, the concept of memory scopes has been in the LLVM language
for awhile via the `syncscope` modifier. For targets, such as the GPU,
this can change code generation depending on whether or not we only need
to be consistent with the memory ordering with the entire system, the
single GPU device, or lower.

Previously these scopes were only exported via the `opencl` and `hip`
variants of these functions. However, this made it difficult to use
outside of those languages and the semantics were different from the
standard GNU versions. This patch introduces a `__scoped_atomic` variant
for the common functions. There was some discussion over whether or not
these should be overloads of the existing ones, or simply new variants.
I leant towards new variants to be less disruptive.

The scope here can be one of the following

```
__MEMORY_SCOPE_SYSTEM // All devices and systems
__MEMORY_SCOPE_DEVICE // Just this device
__MEMORY_SCOPE_WRKGRP // A 'work-group' AKA CUDA block
__MEMORY_SCOPE_WVFRNT // A 'wavefront' AKA CUDA warp
__MEMORY_SCOPE_SINGLE // A single thread.
```
Naming consistency was attempted, but it is difficult to capture to full
spectrum with no many names. Suggestions appreciated.
2023-12-07 13:40:25 -06:00
jyu2-git
0113722d82 [OpenMP] Fix runtime problem due to wrong map size. (#74692)
Currently we are missing set up-boundary address for FinalArraySection
as highests elements in partial struct data.

Currently for:
\#pragma omp target map(D.a) map(D.b[:2])
The size is:
  %a = getelementptr inbounds %struct.DataTy, ptr %D, i32 0, i32 0
  %b = getelementptr inbounds %struct.DataTy, ptr %D, i32 0, i32 1
  %arrayidx = getelementptr inbounds [2 x float], ptr %b, i64 0, i64 0
  %2 = getelementptr float, ptr %arrayidx, i32 1
  %3 = ptrtoint ptr %2 to i64
  %4 = ptrtoint ptr %a to i64
  %5 = sub i64 %3, %4
%6 = sdiv exact i64 %5, ptrtoint (ptr getelementptr (i8, ptr null, i32
1) to i64)

Where %2 is wrong for (D.b[:2]) is pointer to first element of array
section. It should pointe to last element of array section.
  
The fix is to emit the pointer to the last element of array section and
use this pointer as the highest element in partial struct data.

After change IR:
  %a = getelementptr inbounds %struct.DataTy, ptr %D, i32 0, i32 0
  %b = getelementptr inbounds %struct.DataTy, ptr %D, i32 0, i32 1
  %arrayidx = getelementptr inbounds [2 x float], ptr %b, i64 0, i64 0
  %b1 = getelementptr inbounds %struct.DataTy, ptr %D, i32 0, i32 1
  %arrayidx2 = getelementptr inbounds [2 x float], ptr %b1, i64 0, i64 1
  %1 = getelementptr float, ptr %arrayidx2, i32 1
  %2 = ptrtoint ptr %1 to i64
  %3 = ptrtoint ptr %a to i64
  %4 = sub i64 %2, %3
%5 = sdiv exact i64 %4, ptrtoint (ptr getelementptr (i8, ptr null, i32
1) to i64)
2023-12-07 09:38:56 -08:00
Michael Buch
4db54e6597 [clang][DebugInfo] Revert "emit definitions for constant-initialized static data-members" (#74580)
This commit reverts the changes in
https://github.com/llvm/llvm-project/pull/71780 and all of its follow-up
patches.

We got reports of the `.debug_names/.debug_gnu_pubnames/gdb_index/etc.`
sections growing by a non-trivial amount for some large projects. While
GCC emits definitions for static data member constants into the Names
index, they do so *only* for explicitly `constexpr` members. We were
indexing *all* constant-initialized const-static members, which is
likely where the significant size difference comes from. However, only
emitting explicitly `constexpr` variables into the index doesn't seem
like a good way forward, since from clang's perspective `const`-static
integrals are `constexpr` too, and that shouldn't be any different in
the debug-info component. Also, as new code moves to `constexpr` instead
of `const` static for constants, such solution would just delay the
growth of the Names index.

To prevent the size regression we revert to not emitting definitions for
static data-members that have no location.

To support access to such constants from LLDB we'll most likely have to
have to make LLDB find the constants by looking at the containing class
first.
2023-12-06 22:13:54 +00:00
elizabethandrews
cee5b8777f [Clang] Fix linker error for function multiversioning (#71706)
Currently target_clones attribute results in a linker error when there
are no multi-versioned function declarations in the calling TU.

In the calling TU, the call is generated with the ‘normal’ assembly
name. This does not match any of the versions or the ifunc, since
version mangling includes a .versionstring, and the ifunc includes
.ifunc suffix. The linker error is not seen with GCC since the mangling
for the ifunc symbol in GCC is the ‘normal’ assembly name for function
i.e. no ifunc suffix.

This PR removes the .ifunc suffix to match GCC. It also adds alias with
the .ifunc suffix so as to ensure backward compatibility.

The changes exclude aarch64 target because the mangling for default
versions on aarch64 does not include a .default suffix and is the
'normal' assembly name, unlike other targets. It is not clear to me what
the correct behavior for this target is.

Old Phabricator review - https://reviews.llvm.org/D158666

---------

Co-authored-by: Tom Honermann <tom@honermann.net>
2023-12-05 18:11:53 -05:00
Eduard Zingerman
030b8cb156 [BPF] Attribute preserve_static_offset for structs
This commit adds a new BPF specific structure attribte
`__attribute__((preserve_static_offset))` and a pass to deal with it.

This attribute may be attached to a struct or union declaration, where
it notifies the compiler that this structure is a "context" structure.
The following limitations apply to context structures:
- runtime environment might patch access to the fields of this type by
  updating the field offset;

  BPF verifier limits access patterns allowed for certain data
  types. E.g. `struct __sk_buff` and `struct bpf_sock_ops`. For these
  types only `LD/ST <reg> <static-offset>` memory loads and stores are
  allowed.

  This is so because offsets of the fields of these structures do not
  match real offsets in the running kernel. During BPF program
  load/verification loads and stores to the fields of these types are
  rewritten so that offsets match real offsets. For this rewrite to
  happen static offsets have to be encoded in the instructions.

  See `kernel/bpf/verifier.c:convert_ctx_access` function in the Linux
  kernel source tree for details.

- runtime environment might disallow access to the field of the type
  through modified pointers.

  During BPF program verification a tag `PTR_TO_CTX` is tracked for
  register values. In case if register with such tag is modified BPF
  programs are not allowed to read or write memory using register. See
  kernel/bpf/verifier.c:check_mem_access function in the Linux kernel
  source tree for details.

Access to the structure fields is translated to IR as a sequence:
- `(load (getelementptr %ptr %offset))` or
- `(store (getelementptr %ptr %offset))`

During instruction selection phase such sequences are translated as a
single load instruction with embedded offset, e.g. `LDW %ptr, %offset`,
which matches access pattern necessary for the restricted
set of types described above (when `%offset` is static).

Multiple optimizer passes might separate these instructions, this
includes:
- SimplifyCFGPass (sinking)
- InstCombine (sinking)
- GVN (hoisting)

The `preserve_static_offset` attribute marks structures for which the
following transformations happen:
- at the early IR processing stage:
  - `(load (getelementptr ...))` replaced by call to intrinsic
    `llvm.bpf.getelementptr.and.load`;
  - `(store (getelementptr ...))` replaced by call to intrinsic
    `llvm.bpf.getelementptr.and.store`;
- at the late IR processing stage this modification is undone.

Such handling prevents various optimizer passes from generating
sequences of instructions that would be rejected by BPF verifier.

The __attribute__((preserve_static_offset)) has a priority over
__attribute__((preserve_access_index)). When preserve_access_index
attribute is present preserve access index transformations are not
applied.

This addresses the issue reported by the following thread:

https://lore.kernel.org/bpf/CAA-VZPmxh8o8EBcJ=m-DH4ytcxDFmo0JKsm1p1gf40kS0CE3NQ@mail.gmail.com/T/#m4b9ce2ce73b34f34172328f975235fc6f19841b6

This is a second attempt to commit this change, previous reverted
commit is: cb13e9286b.
The following items had been fixed:
- test case bpf-preserve-static-offset-bitfield.c now uses
  `-triple bpfel` to avoid different codegen for little/big endian
  targets.
- BPFPreserveStaticOffset.cpp:removePAICalls() modified to avoid
  use after free for `WorkList` elements `V`.

Differential Revision: https://reviews.llvm.org/D133361
2023-12-05 19:21:42 +02:00
Baodi
4ea268d831 [Clang][OpenMP] Fix private variables registration in simd (#74105)
Fix #69214 
In `emitOMPSimdRegion`, the `EmitOMPPrivateLoopCounters` should be after
`EmitOMPPrivateClause`.
If not, the private variables will be registered too early, which is not
allowed by `EmitOMPPrivateClause`.
2023-12-05 09:16:45 -05:00
James Y Knight
4d4c30a37c Use Address for CGBuilder's CreateAtomicRMW and CreateAtomicCmpXchg. (#74349)
Update all callers to pass through the Address.

For the older builtins such as `__sync_*` and MSVC `_Interlocked*`,
natural alignment of the atomic access is _assumed_. This change
preserves that behavior. It will pass through greater-than-required
alignments, however.
2023-12-04 13:37:04 -05:00
Ulrich Weigand
c61eb44005 [SystemZ] Implement vector rotate in terms of funnel shift
Clang currently implements a set of vector rotate builtins
(__builtin_s390_verll*) in terms of platform-specific LLVM
intrinsics.  To simplify the IR (and allow for common code
optimizations if applicable), this patch removes those LLVM
intrinsics and implements the builtins in terms of the
platform-independent funnel shift intrinsics instead.

Also, fix the prototype of the __builtin_s390_verll*
builtins for full compatibility with GCC.
2023-12-04 16:52:00 +01:00
Youngsuk Kim
d43c081aef [clang][CGOpenMPRuntimeGPU] Merge consecutive AddrSpaceCasts (NFC) (#74279)
Merge consecutive AddrSpaceCasts into a single AddrSpaceCast.
2023-12-04 07:03:09 -05:00
Nathan Sidwell
1fa35f0b5d [clang] Avoid recalculating TBAA base type info (#73264)
As nullptr is a legitimate value, change the BaseTypeMetadataCache hash lookup/insertion to use find and
insert rather than the subscript operator. 

Also adjust getBaseTypeInfoHelper to do no insertion, but let getBaseTypeInfo do that.
2023-12-02 11:54:59 -05:00
Romaric Jodin
d56e0d07cc clang/OpenCL: set sqrt fp accuracy on call to Z4sqrt (#66651)
This is reverting the previous implementation to avoid adding inline
function in opencl headers.
This was breaking clspv flow google/clspv#1231, while
https://reviews.llvm.org/D156743 mentioned that just decorating the call
node with `!pfmath` was enough.
This PR is implementing this idea.
The test has been updated with this implementation.
2023-12-01 16:34:44 +09:00
Paul Kirth
cfe1ece833 [clang][llvm][fatlto] Avoid cloning modules in FatLTO (#72180)
https://github.com/llvm/llvm-project/issues/70703 pointed out that
cloning LLVM modules could lead to miscompiles when using FatLTO.

This is due to an existing issue when cloning modules with labels (see
#55991 and #47769). Since this can lead to miscompilation, we can avoid
cloning the LLVM modules, which was desirable anyway.

This patch modifies the EmbedBitcodePass to no longer clone the module
or run an input pipeline over it. Further, it make FatLTO always perform
UnifiedLTO, so we can still defer the Thin/Full LTO decision to
link-time. Lastly, it removes dead/obsolete code related to now defunct
options that do not work with the EmbedBitcodePass implementation any
longer.
2023-11-30 17:09:34 -08:00
Eduard Zingerman
2484469803 Revert "[BPF] Attribute preserve_static_offset for structs"
This reverts commit cb13e9286b.
Buildbot reports MSAN failures in tests added in this commit:
https://lab.llvm.org/buildbot/#/builders/5/builds/38806

Failing tests:
  LLVM :: CodeGen/BPF/preserve-static-offset/load-arr-pai.ll
  LLVM :: CodeGen/BPF/preserve-static-offset/load-ptr-pai.ll
  LLVM :: CodeGen/BPF/preserve-static-offset/load-struct-pai.ll
  LLVM :: CodeGen/BPF/preserve-static-offset/load-union-pai.ll
  LLVM :: CodeGen/BPF/preserve-static-offset/store-pai.ll
2023-11-30 22:29:45 +02:00
Youngsuk Kim
ff485a0e77 [clang] Remove no-op ptr-to-ptr bitcasts (NFC)
Opaque ptr cleanup effort (NFC).
2023-11-30 14:00:31 -06:00
Ivan R. Ivanov
065796bb92 [clang][OpenMP] Fix missing DI for __kmpc_global_thread_num (#73856)
Co-authored-by: Ivan Radanov Ivanov <ivanov2@llnl.gov>
2023-11-30 13:21:03 -06:00