Reported by Static Analyzer Tool:
In EmitAssemblyHelper::RunOptimizationPipeline(): Using the auto
keyword without an & causes the copy of an object of type function.
/// List of pass builder callbacks ("CodeGenOptions.h").
std::vector<std::function<void(llvm::PassBuilder &)>>
PassBuilderCallbacks;
- Adds a new +pc option to -mbranch-protection that will enable
the use of PC as a diversifier in PAC branch protection code.
- When +pauth-lr is enabled (-march=armv9.5a+pauth-lr) in combination
with -mbranch-protection=pac-ret+pc, the new 9.5-a instructions
(pacibsppc, retaasppc, etc) are used.
Documentation for the relevant instructions can be found here:
https://developer.arm.com/documentation/ddi0602/2023-09/Base-Instructions/
Co-authored-by: Lucas Prates <lucas.prates@arm.com>
In 9a38a72f1d `ProductId` was assigned from the stringified value of
`CLANG_VENDOR`, if that macro was defined. However, `CLANG_VENDOR` is
supposed to be a string, as it is defined (optionally) as such in the
top-level clang `CMakeLists.txt`.
Furthermore, `CLANG_VENDOR` is only passed as a build-time define when
compiling `Version.cpp`, so add a `getClangVendor()` function to
`Version.h`, and use it in `CodegGenModule.cpp`, instead of relying on
the macro.
Fixes: 9a38a72f1d
In 9a38a72f1d `ProductId` was assigned from the stringified value of
`CLANG_VENDOR`, if that macro was defined. However, `CLANG_VENDOR` is
supposed to be a string, as it is defined (optionally) as such in the
top-level clang `CMakeLists.txt`.
Move the addition of `-DCLANG_VENDOR` to the compiler flags from
`clang/lib/Basic/CMakeLists.txt` to the top-level `CMakeLists.txt`, so
it is consistent across the whole clang codebase. Then remove the
stringification from `CodeGenModule.cpp`, to make it work correctly.
Fixes: 9a38a72f1d
In LLVMContext::diagnose, set `HasErrors` for `DS_Error` so that all
derived `DiagnosticHandler` have correct `HasErrors` information.
An alternative is to set `HasErrors` in
`DiagnosticHandler::handleDiagnostics`, but all derived
`handleDiagnostics` would have to call the base function.
When constructing vectors from elements, use poison instead of
undef as the base value. These literals always initialize all
elements (padding the remainder with zero), so that the choice
of base value does not affect semantics.
There are many issues that popped up with the counted_by feature. The
patch #73730 has grown too large and approval is blocking Linux testing.
Includes reverts of:
commit 769bc11f68 ("[Clang] Implement the 'counted_by' attribute
(#68750)")
commit bc09ec6962 ("[CodeGen] Revamp counted_by calculations
(#70606)")
commit 1a09cfb2f3 ("[Clang] counted_by attr can apply only to C99
flexible array members (#72347)")
commit a76adfb992 ("[NFC][Clang] Refactor code to calculate flexible
array member size (#72790)")
commit d8447c78ab ("[Clang] Correct handling of negative and
out-of-bounds indices (#71877)")
Partial commit b31cd07de5 ("[Clang] Regenerate test checks (NFC)")
Closes#73168Closes#75173
Since FatLTO now uses the UnifiedLTO pipeline, we should not set the
ThinLTO module flag to true, since it may cause an assertion failure.
See https://github.com/llvm/llvm-project/issues/70703 for context.
Rather than shepherding a type name all the way to the backend as a
string and attempting to parse it, get the element type out of the AST
and store that in the resource annotation metadata directly.
Pull Request: https://github.com/llvm/llvm-project/pull/75674
Non-LTO compiles set the buffer name to "<inline asm>"
(`AsmPrinter::addInlineAsmDiagBuffer`) and pass diagnostics to
`ClangDiagnosticHandler` (through the `MCContext` handler in
`MachineModuleInfoWrapperPass::doInitialization`) to ensure that
the exit code is 1 in the presence of errors. In contrast, LTO compiles
spuriously succeed even if error messages are printed.
```
% cat a.c
void _start() {}
asm("unknown instruction");
% clang -c a.c
<inline asm>:1:1: error: invalid instruction mnemonic 'unknown'
1 | unknown instruction
| ^
1 error generated.
% clang -c -flto a.c; echo $? # -flto=thin is the same
error: invalid instruction mnemonic 'unknown'
unknown instruction
^~~~~~~
error: invalid instruction mnemonic 'unknown'
unknown instruction
^~~~~~~
0
```
`CollectAsmSymbols` parses inline assembly and is transitively called by
both `ModuleSummaryIndexAnalysis::run` and `WriteBitcodeToFile`, leading
to duplicate diagnostics.
This patch updates `CollectAsmSymbols` to be similar to non-LTO
compiles.
```
% clang -c -flto=thin a.c; echo $?
<inline asm>:1:1: error: invalid instruction mnemonic 'unknown'
1 | unknown instruction
| ^
1 errors generated.
1
```
The `HasErrors` check does not prevent duplicate warnings but assembler
warnings are very uncommon.
Fix mapping of structs to device.
The following example fails:
```
#include <stdio.h>
#include <stdlib.h>
struct Descriptor {
int *datum;
long int x;
int xi;
long int arr[1][30];
};
int main() {
Descriptor dat = Descriptor();
dat.datum = (int *)malloc(sizeof(int)*10);
dat.xi = 3;
dat.arr[0][0] = 1;
#pragma omp target enter data map(to: dat.datum[:10]) map(to: dat)
#pragma omp target
{
dat.xi = 4;
dat.datum[dat.arr[0][0]] = dat.xi;
}
#pragma omp target exit data map(from: dat)
return 0;
}
```
This is a rework of the previous attempt:
https://github.com/llvm/llvm-project/pull/72410
The specialisation will not be valid when ConstantInt gains native
support for vector types.
This is largely a mechanical change but with extra attention paid to constant
folding, InstCombineVectorOps.cpp, LoopFlatten.cpp and Verifier.cpp to
remove the need to call `getIntegerType()`.
Co-authored-by: Nikita Popov <github@npopov.com>
## Motivation
Since we don't need the metadata sections at runtime, we can somehow
offload them from memory at runtime. Initially, I explored [debug info
correlation](https://discourse.llvm.org/t/instrprofiling-lightweight-instrumentation/59113),
which is used for PGO with value profiling disabled. However, it
currently only works with DWARF and it's be hard to add such artificial
debug info for every function in to CodeView which is used on Windows.
So, offloading profile metadata sections at runtime seems to be a
platform independent option.
## Design
The idea is to use new section names for profile name and data sections
and mark them as metadata sections. Under this mode, the new sections
are non-SHF_ALLOC in ELF. So, they are not loaded into memory at runtime
and can be stripped away as a post-linking step. After the process
exits, the generated raw profiles will contains only headers + counters.
llvm-profdata can be used correlate raw profiles with the unstripped
binary to generate indexed profile.
## Data
For chromium base_unittests with code coverage on linux, the binary size
overhead due to instrumentation reduced from 64M to 38.8M (39.4%) and
the raw profile files size reduce from 128M to 68M (46.9%)
```
$ bloaty out/cov/base_unittests.stripped -- out/no-cov/base_unittests.stripped
FILE SIZE VM SIZE
-------------- --------------
+121% +30.4Mi +121% +30.4Mi .text
[NEW] +14.6Mi [NEW] +14.6Mi __llvm_prf_data
[NEW] +10.6Mi [NEW] +10.6Mi __llvm_prf_names
[NEW] +5.86Mi [NEW] +5.86Mi __llvm_prf_cnts
+95% +1.75Mi +95% +1.75Mi .eh_frame
+108% +400Ki +108% +400Ki .eh_frame_hdr
+9.5% +211Ki +9.5% +211Ki .rela.dyn
+9.2% +95.0Ki +9.2% +95.0Ki .data.rel.ro
+5.0% +87.3Ki +5.0% +87.3Ki .rodata
[ = ] 0 +13% +47.0Ki .bss
+40% +1.78Ki +40% +1.78Ki .got
+12% +1.49Ki +12% +1.49Ki .gcc_except_table
[ = ] 0 +65% +1.23Ki .relro_padding
+62% +1.20Ki [ = ] 0 [Unmapped]
+13% +448 +19% +448 .init_array
+8.8% +192 [ = ] 0 [ELF Section Headers]
+0.0% +136 +0.0% +80 [7 Others]
+0.1% +96 +0.1% +96 .dynsym
+1.2% +96 +1.2% +96 .rela.plt
+1.5% +80 +1.2% +64 .plt
[ = ] 0 -99.2% -3.68Ki [LOAD #5 [RW]]
+195% +64.0Mi +194% +64.0Mi TOTAL
$ bloaty out/cov-cor/base_unittests.stripped -- out/no-cov/base_unittests.stripped
FILE SIZE VM SIZE
-------------- --------------
+121% +30.4Mi +121% +30.4Mi .text
[NEW] +5.86Mi [NEW] +5.86Mi __llvm_prf_cnts
+95% +1.75Mi +95% +1.75Mi .eh_frame
+108% +400Ki +108% +400Ki .eh_frame_hdr
+9.5% +211Ki +9.5% +211Ki .rela.dyn
+9.2% +95.0Ki +9.2% +95.0Ki .data.rel.ro
+5.0% +87.3Ki +5.0% +87.3Ki .rodata
[ = ] 0 +13% +47.0Ki .bss
+40% +1.78Ki +40% +1.78Ki .got
+12% +1.49Ki +12% +1.49Ki .gcc_except_table
+13% +448 +19% +448 .init_array
+0.1% +96 +0.1% +96 .dynsym
+1.2% +96 +1.2% +96 .rela.plt
+1.2% +64 +1.2% +64 .plt
+2.9% +64 [ = ] 0 [ELF Section Headers]
+0.0% +40 +0.0% +40 .data
+1.2% +32 +1.2% +32 .got.plt
+0.0% +24 +0.0% +8 [5 Others]
[ = ] 0 -22.9% -872 [LOAD #5 [RW]]
-74.5% -1.44Ki [ = ] 0 [Unmapped]
[ = ] 0 -76.5% -1.45Ki .relro_padding
+118% +38.8Mi +117% +38.8Mi TOTAL
```
A few things to note:
1. llvm-profdata doesn't support filter raw profiles by binary id yet,
so when a raw profile doesn't belongs to the binary being digested by
llvm-profdata, merging will fail. Once this is implemented,
llvm-profdata should be able to only merge raw profiles with the same
binary id as the binary and discard the rest (with mismatched/missing
binary id). The workflow I have in mind is to have scripts invoke
llvm-profdata to get all binary ids for all raw profiles, and
selectively choose the raw pnrofiles with matching binary id and the
binary to llvm-profdata for merging.
2. Note: In COFF, currently they are still loaded into memory but not
used. I didn't do it in this patch because I noticed that `.lcovmap` and
`.lcovfunc` are loaded into memory. A separate patch will address it.
3. This should works with PGO when value profiling is disabled as debug
info correlation currently doing, though I haven't tested this yet.
This patch replaces uses of StringRef::{starts,ends}with with
StringRef::{starts,ends}_with for consistency with
std::{string,string_view}::{starts,ends}_with in C++20.
I'm planning to deprecate and eventually remove
StringRef::{starts,ends}with.
This patch implements the builtins in Clang
and the LLVM-IR intrinsic for the following:
// Variants are also available for:
// _s8, _s16, _u16, _s32, _u32, _s64, _u64,
// _f16, _f32, _f64uint8x16_t svaddqv[_u8](svbool_t pg, svuint8_t zn);
// Variants are also available for:
// _s8, _u16, _s16, _u32, _s32, _u64, _s64
uint8x16_t svandqv[_u8](svbool_t pg, svuint8_t zn); uint8x16_t
sveorqv[_u8](svbool_t pg, svuint8_t zn); uint8x16_t svorqv[_u8](svbool_t
pg, svuint8_t zn);
// Variants are also available for:
// _s8, _u16, _s16, _u32, _s32, _u64, _s64;
uint8x16_t svmaxqv[_u8](svbool_t pg, svuint8_t zn); uint8x16_t
svminqv[_u8](svbool_t pg, svuint8_t zn);
// Variants are also available for _f32, _f64
float16x8_t svmaxnmqv[_f16](svbool_t pg, svfloat16_t zn); float16x8_t
svminnmqv[_f16](svbool_t pg, svfloat16_t zn);
According to the PR#257[1]
The reduction instruction uses scalable vectors as input and fixed
vectors as output, therefore we changed SVEEmitter to emit fixed vector
types in case the neon header(arm_neon.h) is not present.
[1]https://github.com/ARM-software/acle/pull/257
Co-author: Dinar Temirbulatov <dinar.temirbulatov@arm.com>
When the main file is preprocessed and we change `MainFileName` to the
original source file name (e.g. `a.i => a.c`), the source manager does
not contain `a.c`, but we incorrectly associate the DIFile(a.c) with
md5(a.i). This causes CGDebugInfo::emitFunctionStart to create a
duplicate DIFile and leads to a spurious "inconsistent use of MD5
checksums" warning.
```
% cat a.c
void f() {}
% clang -c -g a.c # no warning
% clang -E a.c -o a.i && clang -g -S a.i && clang -g -c a.s
a.s:9:2: warning: inconsistent use of MD5 checksums
.file 1 "a.c"
^
% grep DIFile a.ll
!1 = !DIFile(filename: "a.c", directory: "/tmp/c", checksumkind: CSK_MD5, checksum: "c5b2e246df7d5f53e176b097a0641c3d")
!11 = !DIFile(filename: "a.c", directory: "/tmp/c")
% grep 'file.*a.c' a.s
.file "a.c"
.file 0 "/tmp/c" "a.c" md5 0x2d14ea70fee15102033eb8d899914cce
.file 1 "a.c"
```
Fix#56378 by disassociating md5(a.i) with a.c.
Akin other passes - refactored the name to `InstrProfilingLoweringPass` to better communicate what it does, and split the pass part and the transformation part to avoid needing to initialize object state during `::run`.
A subsequent PR will move `InstrLowering` to the .cpp file and rename it to `InstrLowerer`.
Add builtin: 'svreinterpret_b' to cast from svcount_t to svbool_t.
Add builtin: 'svreinterpret_c' to cast from svbool_t to svcount_t.
Patch by: Hassnaa Hamdi <hassnaa.hamdi@arm.com>
When emitting TBAA information for enums in C code we currently just
treat the data as an 'omnipotent char'. However, with C strict aliasing
this means we fail to optimise certain cases. For example, in the
SPEC2017 xz benchmark there are structs that contain arrays of enums,
and clang pessmistically assumes that accesses to those enums could
alias with other struct members that have a different type.
According to
https://en.cppreference.com/w/c/language/enum
enums should be treated as 'int' types unless explicitly specified (C23)
or if 'int' would not be large enough to hold all the enumerated values.
In the latter case the compiler is free to choose a suitable integer
that would hold all such values.
When compiling C code this patch generates TBAA information for the enum
by using an equivalent integer of the size clang has already chosen for
the enum. I have ignored C++ for now because the rules are more complex.
New test added here:
clang/test/CodeGen/tbaa.c
Since VD is assigned from a cast<VarDecl> it cannot be a nullptr or it
would have asserted. Remove the subsequent checks to clear up any
misunderstanding.
Summary:
This patch adds support for registering texture / surface variables from
CUDA / HIP. Additionally, we now properly track the `extern` and `const`
flags that are also used in these runtime functions.
This does not implement the `managed` variables yet as those seem to
require some extra handling I'm not familiar with. The issue is that the
current offload entry isn't large enough to carry size and alignment
information along with an extra global.
Summary:
The standard GNU atomic operations are a very common way to target
hardware atomics on the device. With more heterogenous devices being
introduced, the concept of memory scopes has been in the LLVM language
for awhile via the `syncscope` modifier. For targets, such as the GPU,
this can change code generation depending on whether or not we only need
to be consistent with the memory ordering with the entire system, the
single GPU device, or lower.
Previously these scopes were only exported via the `opencl` and `hip`
variants of these functions. However, this made it difficult to use
outside of those languages and the semantics were different from the
standard GNU versions. This patch introduces a `__scoped_atomic` variant
for the common functions. There was some discussion over whether or not
these should be overloads of the existing ones, or simply new variants.
I leant towards new variants to be less disruptive.
The scope here can be one of the following
```
__MEMORY_SCOPE_SYSTEM // All devices and systems
__MEMORY_SCOPE_DEVICE // Just this device
__MEMORY_SCOPE_WRKGRP // A 'work-group' AKA CUDA block
__MEMORY_SCOPE_WVFRNT // A 'wavefront' AKA CUDA warp
__MEMORY_SCOPE_SINGLE // A single thread.
```
Naming consistency was attempted, but it is difficult to capture to full
spectrum with no many names. Suggestions appreciated.
Currently we are missing set up-boundary address for FinalArraySection
as highests elements in partial struct data.
Currently for:
\#pragma omp target map(D.a) map(D.b[:2])
The size is:
%a = getelementptr inbounds %struct.DataTy, ptr %D, i32 0, i32 0
%b = getelementptr inbounds %struct.DataTy, ptr %D, i32 0, i32 1
%arrayidx = getelementptr inbounds [2 x float], ptr %b, i64 0, i64 0
%2 = getelementptr float, ptr %arrayidx, i32 1
%3 = ptrtoint ptr %2 to i64
%4 = ptrtoint ptr %a to i64
%5 = sub i64 %3, %4
%6 = sdiv exact i64 %5, ptrtoint (ptr getelementptr (i8, ptr null, i32
1) to i64)
Where %2 is wrong for (D.b[:2]) is pointer to first element of array
section. It should pointe to last element of array section.
The fix is to emit the pointer to the last element of array section and
use this pointer as the highest element in partial struct data.
After change IR:
%a = getelementptr inbounds %struct.DataTy, ptr %D, i32 0, i32 0
%b = getelementptr inbounds %struct.DataTy, ptr %D, i32 0, i32 1
%arrayidx = getelementptr inbounds [2 x float], ptr %b, i64 0, i64 0
%b1 = getelementptr inbounds %struct.DataTy, ptr %D, i32 0, i32 1
%arrayidx2 = getelementptr inbounds [2 x float], ptr %b1, i64 0, i64 1
%1 = getelementptr float, ptr %arrayidx2, i32 1
%2 = ptrtoint ptr %1 to i64
%3 = ptrtoint ptr %a to i64
%4 = sub i64 %2, %3
%5 = sdiv exact i64 %4, ptrtoint (ptr getelementptr (i8, ptr null, i32
1) to i64)
This commit reverts the changes in
https://github.com/llvm/llvm-project/pull/71780 and all of its follow-up
patches.
We got reports of the `.debug_names/.debug_gnu_pubnames/gdb_index/etc.`
sections growing by a non-trivial amount for some large projects. While
GCC emits definitions for static data member constants into the Names
index, they do so *only* for explicitly `constexpr` members. We were
indexing *all* constant-initialized const-static members, which is
likely where the significant size difference comes from. However, only
emitting explicitly `constexpr` variables into the index doesn't seem
like a good way forward, since from clang's perspective `const`-static
integrals are `constexpr` too, and that shouldn't be any different in
the debug-info component. Also, as new code moves to `constexpr` instead
of `const` static for constants, such solution would just delay the
growth of the Names index.
To prevent the size regression we revert to not emitting definitions for
static data-members that have no location.
To support access to such constants from LLDB we'll most likely have to
have to make LLDB find the constants by looking at the containing class
first.
Currently target_clones attribute results in a linker error when there
are no multi-versioned function declarations in the calling TU.
In the calling TU, the call is generated with the ‘normal’ assembly
name. This does not match any of the versions or the ifunc, since
version mangling includes a .versionstring, and the ifunc includes
.ifunc suffix. The linker error is not seen with GCC since the mangling
for the ifunc symbol in GCC is the ‘normal’ assembly name for function
i.e. no ifunc suffix.
This PR removes the .ifunc suffix to match GCC. It also adds alias with
the .ifunc suffix so as to ensure backward compatibility.
The changes exclude aarch64 target because the mangling for default
versions on aarch64 does not include a .default suffix and is the
'normal' assembly name, unlike other targets. It is not clear to me what
the correct behavior for this target is.
Old Phabricator review - https://reviews.llvm.org/D158666
---------
Co-authored-by: Tom Honermann <tom@honermann.net>
This commit adds a new BPF specific structure attribte
`__attribute__((preserve_static_offset))` and a pass to deal with it.
This attribute may be attached to a struct or union declaration, where
it notifies the compiler that this structure is a "context" structure.
The following limitations apply to context structures:
- runtime environment might patch access to the fields of this type by
updating the field offset;
BPF verifier limits access patterns allowed for certain data
types. E.g. `struct __sk_buff` and `struct bpf_sock_ops`. For these
types only `LD/ST <reg> <static-offset>` memory loads and stores are
allowed.
This is so because offsets of the fields of these structures do not
match real offsets in the running kernel. During BPF program
load/verification loads and stores to the fields of these types are
rewritten so that offsets match real offsets. For this rewrite to
happen static offsets have to be encoded in the instructions.
See `kernel/bpf/verifier.c:convert_ctx_access` function in the Linux
kernel source tree for details.
- runtime environment might disallow access to the field of the type
through modified pointers.
During BPF program verification a tag `PTR_TO_CTX` is tracked for
register values. In case if register with such tag is modified BPF
programs are not allowed to read or write memory using register. See
kernel/bpf/verifier.c:check_mem_access function in the Linux kernel
source tree for details.
Access to the structure fields is translated to IR as a sequence:
- `(load (getelementptr %ptr %offset))` or
- `(store (getelementptr %ptr %offset))`
During instruction selection phase such sequences are translated as a
single load instruction with embedded offset, e.g. `LDW %ptr, %offset`,
which matches access pattern necessary for the restricted
set of types described above (when `%offset` is static).
Multiple optimizer passes might separate these instructions, this
includes:
- SimplifyCFGPass (sinking)
- InstCombine (sinking)
- GVN (hoisting)
The `preserve_static_offset` attribute marks structures for which the
following transformations happen:
- at the early IR processing stage:
- `(load (getelementptr ...))` replaced by call to intrinsic
`llvm.bpf.getelementptr.and.load`;
- `(store (getelementptr ...))` replaced by call to intrinsic
`llvm.bpf.getelementptr.and.store`;
- at the late IR processing stage this modification is undone.
Such handling prevents various optimizer passes from generating
sequences of instructions that would be rejected by BPF verifier.
The __attribute__((preserve_static_offset)) has a priority over
__attribute__((preserve_access_index)). When preserve_access_index
attribute is present preserve access index transformations are not
applied.
This addresses the issue reported by the following thread:
https://lore.kernel.org/bpf/CAA-VZPmxh8o8EBcJ=m-DH4ytcxDFmo0JKsm1p1gf40kS0CE3NQ@mail.gmail.com/T/#m4b9ce2ce73b34f34172328f975235fc6f19841b6
This is a second attempt to commit this change, previous reverted
commit is: cb13e9286b.
The following items had been fixed:
- test case bpf-preserve-static-offset-bitfield.c now uses
`-triple bpfel` to avoid different codegen for little/big endian
targets.
- BPFPreserveStaticOffset.cpp:removePAICalls() modified to avoid
use after free for `WorkList` elements `V`.
Differential Revision: https://reviews.llvm.org/D133361
Fix#69214
In `emitOMPSimdRegion`, the `EmitOMPPrivateLoopCounters` should be after
`EmitOMPPrivateClause`.
If not, the private variables will be registered too early, which is not
allowed by `EmitOMPPrivateClause`.
Update all callers to pass through the Address.
For the older builtins such as `__sync_*` and MSVC `_Interlocked*`,
natural alignment of the atomic access is _assumed_. This change
preserves that behavior. It will pass through greater-than-required
alignments, however.
Clang currently implements a set of vector rotate builtins
(__builtin_s390_verll*) in terms of platform-specific LLVM
intrinsics. To simplify the IR (and allow for common code
optimizations if applicable), this patch removes those LLVM
intrinsics and implements the builtins in terms of the
platform-independent funnel shift intrinsics instead.
Also, fix the prototype of the __builtin_s390_verll*
builtins for full compatibility with GCC.
As nullptr is a legitimate value, change the BaseTypeMetadataCache hash lookup/insertion to use find and
insert rather than the subscript operator.
Also adjust getBaseTypeInfoHelper to do no insertion, but let getBaseTypeInfo do that.
This is reverting the previous implementation to avoid adding inline
function in opencl headers.
This was breaking clspv flow google/clspv#1231, while
https://reviews.llvm.org/D156743 mentioned that just decorating the call
node with `!pfmath` was enough.
This PR is implementing this idea.
The test has been updated with this implementation.
https://github.com/llvm/llvm-project/issues/70703 pointed out that
cloning LLVM modules could lead to miscompiles when using FatLTO.
This is due to an existing issue when cloning modules with labels (see
#55991 and #47769). Since this can lead to miscompilation, we can avoid
cloning the LLVM modules, which was desirable anyway.
This patch modifies the EmbedBitcodePass to no longer clone the module
or run an input pipeline over it. Further, it make FatLTO always perform
UnifiedLTO, so we can still defer the Thin/Full LTO decision to
link-time. Lastly, it removes dead/obsolete code related to now defunct
options that do not work with the EmbedBitcodePass implementation any
longer.