This will help distinguish release branch builds from development branch
builds, and is similar to GCC's version numbering policy.
Thus, the branch `releases/18.x` will start out numbered 18.1.0, instead
of 18.0.0.
Unchanged are other versioning policies:
- mainline will be numbered 18.0.0, 19.0.0, ...
- typical release branch releases will increment micro version, e.g.
18.1.1, 18.1.2, ....
- If an ABI break is required on the release branch, the minor version
will be incremented, e.g. to 18.2.0.
See the Discourse RFC:
https://discourse.llvm.org/t/rfc-name-the-first-release-from-a-branch-n-1-0-instead-of-n-0-0/75384
This patch removes the explicit llvm-objdump man page. By enabling
sphinx man page output with `-DLLVM_ENABLE_SPHINX=ON` and
`-DSPHINX_OUTPUT_MAN=ON`, we can generate man pages for all the llvm
binary utilities from the restructured text documentation. Having an
additional man page upstream increases fragementation and maintenance.
Atomic instructions (load / store/ atomicrwm / cmpxchg) are not
really undefined behavior if they lack natural alignment. They will
(with AtomicExpand pass enabled) be converted into libcalls.
Update the language reference to reflect this.
When llvm-objdump switched from cl:: to OptTable
(https://reviews.llvm.org/D100433), we dropped support for LLVM cl::
options. Some LLVM_DEBUG in `llvm/lib/Target/$target/MCDisassembler/`
files might be useful. Add -mllvm to allow dumping the information.
```
# -debug is available in an LLVM_ENABLE_ASSERTIONS=on build
llvm-objdump -d -mllvm -debug a.o > /dev/null
```
Link:
https://discourse.llvm.org/t/how-to-enable-debug-logs-in-llvm-objdump/75758
The RISC-V vector crypto extensions have been ratified. This patch
updates the Clang and LLVM support for these extensions to be
non-experimental, while leaving the C intrinsics as experimental since
the C intrinsics are not yet standardized.
Co-authored-by: Brandon Wu <brandon.wu@sifive.com>
This is an experimental address space for strided buffers. These buffers
can have structs as elements and
a stride > 1.
These pointers allow the indexed access in units of stride, i.e., they
point at `buffer[index * stride]`.
Thus, we can use the `idxen` modifier for buffer loads.
We assign address space 9 to 192-bit buffer pointers which contain a
128-bit descriptor, a 32-bit offset and a 32-bit index. Essentially,
they are fat buffer pointers with an additional 32-bit index.
Now we have default abi lp64 for rv64if and ilp32 for rv32if, which is
different with riscv-gnu-toolchain. In
8e9fb09a0c/configure (L3385)
when have f and not d, it prefers lp64f/ilp32f but no soft float. This
patch tries to make their behaviors consistent.
`--gap-fill <value>` fills the gaps between sections with a specified
8-bit value, instead of zero.
`--pad-to <address>` pads the output binary up to the specified load
address, using the 8-bit value from `--gap-fill` or zero.
These options are only supported for ELF input and binary output.
... by lowering them as lazy resolve-on-first-use symbol resolvers. Note that this is subtly different timing than on ELF platforms, where ifunc resolution happens at load time.
Since ld64 and ld-prime don't support all the cases we need for these, we lower them manually in the AsmPrinter.
## Motivation
Since we don't need the metadata sections at runtime, we can somehow
offload them from memory at runtime. Initially, I explored [debug info
correlation](https://discourse.llvm.org/t/instrprofiling-lightweight-instrumentation/59113),
which is used for PGO with value profiling disabled. However, it
currently only works with DWARF and it's be hard to add such artificial
debug info for every function in to CodeView which is used on Windows.
So, offloading profile metadata sections at runtime seems to be a
platform independent option.
## Design
The idea is to use new section names for profile name and data sections
and mark them as metadata sections. Under this mode, the new sections
are non-SHF_ALLOC in ELF. So, they are not loaded into memory at runtime
and can be stripped away as a post-linking step. After the process
exits, the generated raw profiles will contains only headers + counters.
llvm-profdata can be used correlate raw profiles with the unstripped
binary to generate indexed profile.
## Data
For chromium base_unittests with code coverage on linux, the binary size
overhead due to instrumentation reduced from 64M to 38.8M (39.4%) and
the raw profile files size reduce from 128M to 68M (46.9%)
```
$ bloaty out/cov/base_unittests.stripped -- out/no-cov/base_unittests.stripped
FILE SIZE VM SIZE
-------------- --------------
+121% +30.4Mi +121% +30.4Mi .text
[NEW] +14.6Mi [NEW] +14.6Mi __llvm_prf_data
[NEW] +10.6Mi [NEW] +10.6Mi __llvm_prf_names
[NEW] +5.86Mi [NEW] +5.86Mi __llvm_prf_cnts
+95% +1.75Mi +95% +1.75Mi .eh_frame
+108% +400Ki +108% +400Ki .eh_frame_hdr
+9.5% +211Ki +9.5% +211Ki .rela.dyn
+9.2% +95.0Ki +9.2% +95.0Ki .data.rel.ro
+5.0% +87.3Ki +5.0% +87.3Ki .rodata
[ = ] 0 +13% +47.0Ki .bss
+40% +1.78Ki +40% +1.78Ki .got
+12% +1.49Ki +12% +1.49Ki .gcc_except_table
[ = ] 0 +65% +1.23Ki .relro_padding
+62% +1.20Ki [ = ] 0 [Unmapped]
+13% +448 +19% +448 .init_array
+8.8% +192 [ = ] 0 [ELF Section Headers]
+0.0% +136 +0.0% +80 [7 Others]
+0.1% +96 +0.1% +96 .dynsym
+1.2% +96 +1.2% +96 .rela.plt
+1.5% +80 +1.2% +64 .plt
[ = ] 0 -99.2% -3.68Ki [LOAD #5 [RW]]
+195% +64.0Mi +194% +64.0Mi TOTAL
$ bloaty out/cov-cor/base_unittests.stripped -- out/no-cov/base_unittests.stripped
FILE SIZE VM SIZE
-------------- --------------
+121% +30.4Mi +121% +30.4Mi .text
[NEW] +5.86Mi [NEW] +5.86Mi __llvm_prf_cnts
+95% +1.75Mi +95% +1.75Mi .eh_frame
+108% +400Ki +108% +400Ki .eh_frame_hdr
+9.5% +211Ki +9.5% +211Ki .rela.dyn
+9.2% +95.0Ki +9.2% +95.0Ki .data.rel.ro
+5.0% +87.3Ki +5.0% +87.3Ki .rodata
[ = ] 0 +13% +47.0Ki .bss
+40% +1.78Ki +40% +1.78Ki .got
+12% +1.49Ki +12% +1.49Ki .gcc_except_table
+13% +448 +19% +448 .init_array
+0.1% +96 +0.1% +96 .dynsym
+1.2% +96 +1.2% +96 .rela.plt
+1.2% +64 +1.2% +64 .plt
+2.9% +64 [ = ] 0 [ELF Section Headers]
+0.0% +40 +0.0% +40 .data
+1.2% +32 +1.2% +32 .got.plt
+0.0% +24 +0.0% +8 [5 Others]
[ = ] 0 -22.9% -872 [LOAD #5 [RW]]
-74.5% -1.44Ki [ = ] 0 [Unmapped]
[ = ] 0 -76.5% -1.45Ki .relro_padding
+118% +38.8Mi +117% +38.8Mi TOTAL
```
A few things to note:
1. llvm-profdata doesn't support filter raw profiles by binary id yet,
so when a raw profile doesn't belongs to the binary being digested by
llvm-profdata, merging will fail. Once this is implemented,
llvm-profdata should be able to only merge raw profiles with the same
binary id as the binary and discard the rest (with mismatched/missing
binary id). The workflow I have in mind is to have scripts invoke
llvm-profdata to get all binary ids for all raw profiles, and
selectively choose the raw pnrofiles with matching binary id and the
binary to llvm-profdata for merging.
2. Note: In COFF, currently they are still loaded into memory but not
used. I didn't do it in this patch because I noticed that `.lcovmap` and
`.lcovfunc` are loaded into memory. A separate patch will address it.
3. This should works with PGO when value profiling is disabled as debug
info correlation currently doing, though I haven't tested this yet.
Add the `dead_on_unwind` attribute, which states that the caller will
not read from this argument if the call unwinds. This allows eliding
stores that could otherwise be visible on the unwind path, for example:
```
declare void @may_unwind()
define void @src(ptr noalias dead_on_unwind %out) {
store i32 0, ptr %out
call void @may_unwind()
store i32 1, ptr %out
ret void
}
define void @tgt(ptr noalias dead_on_unwind %out) {
call void @may_unwind()
store i32 1, ptr %out
ret void
}
```
The optimization is not valid without `dead_on_unwind`, because the `i32
0` value might be read if `@may_unwind` unwinds.
This attribute is primarily intended to be used on sret arguments. In
fact, I previously wanted to change the semantics of sret to include
this "no read after unwind" property (see D116998), but based on the
feedback there it is better to keep these attributes orthogonal (sret is
an ABI attribute, dead_on_unwind is an optimization attribute). This is
a reboot of that change with a separate attribute.
Remove the part about implicit conversion from an iterator to a pointer.
This part of the manual was written 14 years ago, in:
37027c30ec
There do exist a type casting operator in `ilist` then:
37027c30ec/llvm/include/llvm/ADT/ilist.h (L192-L194)
But it has been remove since 2016:
f197b1f78f
So I think it makes sense to remove this part to avoid mislead new
contributors.
This adds a link from the main docs page back to the README where
I have previously added a list of useful resources.
To that list, I've added a link to my recent llvm blog post.
These flags are usable on floating point arithmetic, as well as call,
select, and phi instructions whose resulting type is floating point, or
a vector of, or an array of, a valid type. Whether or not the flags are
valid for a given instruction can be checked with the new
LLVMCanValueUseFastMathFlags function.
These are exposed using a new LLVMFastMathFlags type, which is an alias
for unsigned. An anonymous enum defines the bit values for it.
Tests are added in echo.ll for select/phil/call, and the floating point
types in the new float_ops.ll bindings test.
Select and the floating point arithmetic instructions were not
implemented in llvm-c-test/echo.cpp, so they were added as well.
Neoverse N2 was incorrectly marked as an Armv8.5a core. This has been
changed to an Armv9.0a core. However, crypto options are not enabled
by default for Armv9 cores, so -mcpu=neoverse-n2+crypto is required
to enable crypto for this core.
Neoverse N2 Technical Reference Manual:
https://developer.arm.com/documentation/102099/0003/
Added the following functions for manipulating operand bundles, as well as
building ``call`` and ``invoke`` instructions that use operand bundles:
* LLVMBuildCallWithOperandBundles
* LLVMBuildInvokeWithOperandBundles
* LLVMCreateOperandBundle
* LLVMDisposeOperandBundle
* LLVMGetNumOperandBundles
* LLVMGetOperandBundleAtIndex
* LLVMGetNumOperandBundleArgs
* LLVMGetOperandBundleArgAtIndex
* LLVMGetOperandBundleTag
Fixes#71873.
Add LL parsing for `<N x ty> splat(ty <imm>)` that lowers onto
ConstantInt::get() for integer types and ConstantFP::get() for
floating-point types.
The intent is to extend ConstantInt/FP classes to support vector types
rather than redirecting to other constant classes as the get() methods
do today.
This patch gives IR writers the convenience of using the shorthand
today, thus allowing existing tests to be ported.
As [scottmcm
described](https://discourse.llvm.org/t/conv-c-and-conv-preservemost-mix-badly-on-windows-x64/73054),
the `preserve_most` calling convention, as currently implemented, is a
bad fit for Windows on x64. The intent of `preserve_most` is "to make
the code in the caller as unintrusive as possible", but `preserve_most`
causes the caller to spill and restore ten SIMD registers. It would be
preferable to make `preserve_most` treat the XMM registers however the C
calling convention does on the target operating system.
This is a breaking change, but the documentation indicates that
`preserve_most` is still experimental, so I believe that ABI
compatibility is not yet a requirement.
I've plumbed the LLVM DebugInfoD client into LLDB, and added automatic
downloading of DWP files to the SymbolFileDWARF.cpp plugin. If you have
DEBUGINFOD_URLS set to a space delimited set of web servers, LLDB will
try to use them as a last resort when searching for DWP files. If you do
*not* have that environment variable set, nothing should be changed.
There's also a setting, per @clayborg 's suggestion, that will override
the environment variable, or can be used instead of the environment
variable. The setting is why I also needed to add an API to the
llvm-debuginfod library
### Test Plan:
Suggestions are welcome here. I should probably have some positive and
negative tests, but I wanted to get the diff up for people who have a
clue what they're doing to rip it to pieces before spending too much
time validating the initial implementation.
---------
Co-authored-by: Kevin Frei <freik@meta.com>
Co-authored-by: Alex Langford <nirvashtzero@gmail.com>
Positive options: -mapx-features=<comma-separated-features>
Negative options: -mno-apx-features=<comma-separated-features>
-m[no-]apx-features is designed to be able to control separate APX
features.
Besides, we also support the flag -m[no-]apxf, which can be used like an
alias of -m[no-]apx-features=< all APX features covered by CPUID APX_F>
Behaviour when positive and negative options are used together:
For boolean flags, the last one wins
-mapxf -mno-apxf -> -mno-apxf
-mno-apxf -mapxf -> -mapxf
For flags that take a set as arguments, it sets the mask by order of the
flags
-mapx-features=egpr,ndd -mno-apx-features=egpr -> -egpr,+ndd
-mapx-features=egpr -mno-apx-features=egpr,ndd -> -egpr,-ndd
-mno-apx-features=egpr -mapx-features=egpr,ndd -> +egpr,+ndd
-mno-apx-features=egpr,ndd -mapx-features=egpr -> -ndd,+egpr
The design is aligned with gcc
https://gcc.gnu.org/pipermail/gcc-patches/2023-August/628905.html
AArch64 has supported the GHC calling convention for quite a few years
at this point (https://reviews.llvm.org/D6877), but the LangRef never
got updated noting that this was implemented on AArch64.
https://github.com/llvm/llvm-project/issues/70703 pointed out that
cloning LLVM modules could lead to miscompiles when using FatLTO.
This is due to an existing issue when cloning modules with labels (see
#55991 and #47769). Since this can lead to miscompilation, we can avoid
cloning the LLVM modules, which was desirable anyway.
This patch modifies the EmbedBitcodePass to no longer clone the module
or run an input pipeline over it. Further, it make FatLTO always perform
UnifiedLTO, so we can still defer the Thin/Full LTO decision to
link-time. Lastly, it removes dead/obsolete code related to now defunct
options that do not work with the EmbedBitcodePass implementation any
longer.
Clarify how the addend is used in _HI relocation types like
R_AMDGPU_ABS32_HI based on the current behaviour of the Mesa and AMDPAL
ELF loaders.
This affects Mesa and AMDPAL because they use REL relocation records, so
the addend for these types is the 32-bit literal value from the
instruction being relocated. AMDHSA is not affected because it uses RELA
relocation records which have a 64-bit addend.
This flag indicates that every bit is known to be zero in at least one
of the inputs. This allows the Or to be treated as an Add since there is
no possibility of a carry from any bit.
If the flag is present and this property does not hold, the result is
poison.
This makes it easier to reverse the InstCombine transform that turns Add
into Or.
This is inspired by a comment here
https://github.com/llvm/llvm-project/pull/71955#discussion_r1391614578
Discourse thread
https://discourse.llvm.org/t/rfc-add-or-disjoint-flag/75036