When ASan.MaxInlinePoisoningSize == 0 , it means that no shadow memory
operations should be made via inlined instrumentation code,
but only via calls to shadow setting functions. This change fixes one
violation of this, which happened when the function allocas count
was small, i.e. less than 5 - in the code modifying the shadow just
before ret instruction.
We now explicitly check ASan.MaxInlinePoisoningSize , and if it's 0 then
we disallow inlining. It is required for the instrumentation
emitting code suitable for handling by ABI implementation.
rdar://119513720
Co-authored-by: Mariusz Borsa <m_borsa@apple.com>
In #74514 and #74778 we marked various instrumentation-added sections as
large. This causes an extra PT_LOAD segment if using the small code
model. Since people using the small code model presumably aren't hitting
relocation limits, disable this when using the small code model to avoid
the extra segment.
This uses Module::getCodeModel() which isn't necessarily reliable since
it reads module metadata (which right now only the clang frontend sets),
but it would be nice to get to a point where we reliably put this sort
of information (e.g. PIC/code model/etc) in the IR. This requires
duplicating the existing tests since opt/llc currently don't set these
metadata. If we get to a point where they do set the code model metadata
based on command line arguments then we can deduplicate these tests.
## Motivation
Since we don't need the metadata sections at runtime, we can somehow
offload them from memory at runtime. Initially, I explored [debug info
correlation](https://discourse.llvm.org/t/instrprofiling-lightweight-instrumentation/59113),
which is used for PGO with value profiling disabled. However, it
currently only works with DWARF and it's be hard to add such artificial
debug info for every function in to CodeView which is used on Windows.
So, offloading profile metadata sections at runtime seems to be a
platform independent option.
## Design
The idea is to use new section names for profile name and data sections
and mark them as metadata sections. Under this mode, the new sections
are non-SHF_ALLOC in ELF. So, they are not loaded into memory at runtime
and can be stripped away as a post-linking step. After the process
exits, the generated raw profiles will contains only headers + counters.
llvm-profdata can be used correlate raw profiles with the unstripped
binary to generate indexed profile.
## Data
For chromium base_unittests with code coverage on linux, the binary size
overhead due to instrumentation reduced from 64M to 38.8M (39.4%) and
the raw profile files size reduce from 128M to 68M (46.9%)
```
$ bloaty out/cov/base_unittests.stripped -- out/no-cov/base_unittests.stripped
FILE SIZE VM SIZE
-------------- --------------
+121% +30.4Mi +121% +30.4Mi .text
[NEW] +14.6Mi [NEW] +14.6Mi __llvm_prf_data
[NEW] +10.6Mi [NEW] +10.6Mi __llvm_prf_names
[NEW] +5.86Mi [NEW] +5.86Mi __llvm_prf_cnts
+95% +1.75Mi +95% +1.75Mi .eh_frame
+108% +400Ki +108% +400Ki .eh_frame_hdr
+9.5% +211Ki +9.5% +211Ki .rela.dyn
+9.2% +95.0Ki +9.2% +95.0Ki .data.rel.ro
+5.0% +87.3Ki +5.0% +87.3Ki .rodata
[ = ] 0 +13% +47.0Ki .bss
+40% +1.78Ki +40% +1.78Ki .got
+12% +1.49Ki +12% +1.49Ki .gcc_except_table
[ = ] 0 +65% +1.23Ki .relro_padding
+62% +1.20Ki [ = ] 0 [Unmapped]
+13% +448 +19% +448 .init_array
+8.8% +192 [ = ] 0 [ELF Section Headers]
+0.0% +136 +0.0% +80 [7 Others]
+0.1% +96 +0.1% +96 .dynsym
+1.2% +96 +1.2% +96 .rela.plt
+1.5% +80 +1.2% +64 .plt
[ = ] 0 -99.2% -3.68Ki [LOAD #5 [RW]]
+195% +64.0Mi +194% +64.0Mi TOTAL
$ bloaty out/cov-cor/base_unittests.stripped -- out/no-cov/base_unittests.stripped
FILE SIZE VM SIZE
-------------- --------------
+121% +30.4Mi +121% +30.4Mi .text
[NEW] +5.86Mi [NEW] +5.86Mi __llvm_prf_cnts
+95% +1.75Mi +95% +1.75Mi .eh_frame
+108% +400Ki +108% +400Ki .eh_frame_hdr
+9.5% +211Ki +9.5% +211Ki .rela.dyn
+9.2% +95.0Ki +9.2% +95.0Ki .data.rel.ro
+5.0% +87.3Ki +5.0% +87.3Ki .rodata
[ = ] 0 +13% +47.0Ki .bss
+40% +1.78Ki +40% +1.78Ki .got
+12% +1.49Ki +12% +1.49Ki .gcc_except_table
+13% +448 +19% +448 .init_array
+0.1% +96 +0.1% +96 .dynsym
+1.2% +96 +1.2% +96 .rela.plt
+1.2% +64 +1.2% +64 .plt
+2.9% +64 [ = ] 0 [ELF Section Headers]
+0.0% +40 +0.0% +40 .data
+1.2% +32 +1.2% +32 .got.plt
+0.0% +24 +0.0% +8 [5 Others]
[ = ] 0 -22.9% -872 [LOAD #5 [RW]]
-74.5% -1.44Ki [ = ] 0 [Unmapped]
[ = ] 0 -76.5% -1.45Ki .relro_padding
+118% +38.8Mi +117% +38.8Mi TOTAL
```
A few things to note:
1. llvm-profdata doesn't support filter raw profiles by binary id yet,
so when a raw profile doesn't belongs to the binary being digested by
llvm-profdata, merging will fail. Once this is implemented,
llvm-profdata should be able to only merge raw profiles with the same
binary id as the binary and discard the rest (with mismatched/missing
binary id). The workflow I have in mind is to have scripts invoke
llvm-profdata to get all binary ids for all raw profiles, and
selectively choose the raw pnrofiles with matching binary id and the
binary to llvm-profdata for merging.
2. Note: In COFF, currently they are still loaded into memory but not
used. I didn't do it in this patch because I noticed that `.lcovmap` and
`.lcovfunc` are loaded into memory. A separate patch will address it.
3. This should works with PGO when value profiling is disabled as debug
info correlation currently doing, though I haven't tested this yet.
We only allow for assembly code in naked function, and PGO
instrumentation (esp. temporal instrumentation that introduces a
function call) can wreak havoc in this.
Fix#74573
Akin other passes - refactored the name to `InstrProfilingLoweringPass` to better communicate what it does, and split the pass part and the transformation part to avoid needing to initialize object state during `::run`.
A subsequent PR will move `InstrLowering` to the .cpp file and rename it to `InstrLowerer`.
We'd like to make various instrprof globals large to make them not
contribute to relocation pressure since there are no direct accesses
to them in the module.
Similar to what was done for asan_globals in #74514.
This affects the __llvm_prf_vals, __llvm_prf_vnds, and __llvm_prf_names
sections.
The reland fixes platform.ll.
We'd like to make various instrprof globals large to make them not
contribute to relocation pressure since there are no direct accesses
to them in the module.
Similar to what was done for asan_globals in #74514.
This affects the __llvm_prf_vals, __llvm_prf_vnds, and __llvm_prf_names
sections.
We'd like to make the asan_globals section large to make it not
contribute to relocation pressure since there are no direct PC32
references to it.
Following #74498, we can do that by marking the code model for the
global explicitly large.
Without this change, asan_globals gets placed between .data and .bss.
With this change, it gets placed after .bss.
Now MemProf can't do IR annotation right in the local linkage function
and global initial function __cxx_global_var_init. In llvm-profdata
which convert raw memory profile to memory profile, it uses function
name in dwarf to create GUID. But when llvm consumes memory profile, it
use `getIRPGOFuncName` or `getPGOFuncName` which returns local linkage
function as `FileName;FunctionName` or `FileName:FunctionName` to get
function name and create GUID. So profile creator's GUID is not same as
profile consumer.
So I think MemProf should be used with `unique-internal-linkage-names`
and don't use PGOFuncName.
__cxx_global_var_init is created later than where
UniqueInternalLinkageNames works. So I add uniq suffix to
__cxx_global_var_init additionally.
Co-authored-by: lifengxiang <lifengxiang.1025@bytedance.com>
If a suspend happens in the resume part (this can happen in the case of chained coroutines), and that's part of a loop, the pre-split CFG has the suspend block as an exit of that loop. PGO Counter Promotion will then try to commit the temporary counter to the global in that "exit" block (it also does that in the other loop exit BBs, which also includes
the "destroy" case). This interferes with symmetric transfer.
We don't need to commit the counter in the suspend case - it's not a loop exit from the perspective of the behavior of the program. The regular loop exit, together with the "destroy" case, completely cover any updates that may need to happen to the global counter.
If `Type::getPointerTo` is called solely to support an unnecessary
pointer-cast, remove the call entirely.
Otherwise, replace with IRB.getPtrTy().
Clean-up work towards removing method `Type::getPointerTo`.
This patch adds the ability to pass values for the command line options
of -max-inline-poisoning-size, -instrumentation-with-calls-threshold and
-asan-guard-against-version-mismatch through the AddressSanitizerOptions
struct. The motivation is to use these new options when using the pass
in Swift.
rdar://118470958
Caller puts argument shadow one by one into __msan_va_arg_tls, until it
reaches kParamTLSSize. After that it still increment OverflowOffset but
does not store the shadow.
Callee needs OverflowOffset to prepare a shadow for the entire overflow
area. It's done by creating "varargs shadow copy" for complete list of
args, copying available shadow from __msan_va_arg_tls, and clearing the
rest.
However callee does not know if the tail of __msan_va_arg_tls was not
able to fit an argument, and callee will copy tail shadow into "varargs
shadow copy", and later used as a shadow for an omitted argument.
So that unused tail of the __msan_va_arg_tls must be cleared if left
unused.
This allows us to enable compiler-rt/test/msan/vararg_shadow.cpp for
x86.
Reviewers: kstoimenov, thurstond
Reviewed By: thurstond
Pull Request: https://github.com/llvm/llvm-project/pull/72707
3bc439bdff implemented overflow copying in a different way.
It's lucky to pass this test, but may fails in a different way.
Reviewers: thurstond, iii-i
Reviewed By: thurstond
Pull Request: https://github.com/llvm/llvm-project/pull/72710
Currently, the ConstructorKind member variable in AddressSanitizerPass
gets overriden by the ClConstructorKind whether the option is passed
from the command line or not. This override should only happen if the
ClConstructorKind argument is passed from the command line. Otherwise,
the constructor should honor the argument passed to it. This patch makes
this fix.
rdar://118423755
Fixes a bug introduced by
commit f95b2f1acf ("Reland [InstrProf][compiler-rt] Enable MC/DC
Support in LLVM Source-based Code Coverage (1/3)")
The InstrProfiling pass was refactored when introducing support for
MC/DC such that the creation of the data variable was abstracted and
called only once per function from ::run(). Because ::run() only
iterated over functions there were not fully inlined, and because it
only created the data variable for the first intrinsic that it saw, data
variables corresponding to functions fully inlined into other
instrumented callers would end up without a data variable, resulting in
loss of coverage information. This patch does the following:
1.) Move the call of createDataVariable() to getOrCreateRegionCounters()
so that the creation of the data variable will happen indirectly either
from ::new() or during profile intrinsic lowering when it is needed.
This effectively restores the behavior prior to the refactor and ensures
that all data variables are created when needed (and not duplicated).
2.) Process all MC/DC bitmap parameter intrinsics in ::run() prior to
calling getOrCreateRegionCounters(). This ensures bitmap regions are
created for each function including functions that are fully inlined. It
also ensures that the bitmap region is created for each function prior
to the creation of the data variable because it is referenced by the
data variable. Again, duplication is prevented if the same parameter
intrinsic is inlined into multiple functions.
3.) No longer pass the MC/DC intrinsic to createDataVariable(). This
decouples the creation of the data variable from a specific MC/DC
intrinsic. Instead, with #2 above, store the number of bitmap bytes
required in the PerFunctionProfileData in the ProfileDataMap along with
the function's CounterRegion and BitmapRegion variables. This ties the
bitmap information directly to the function to which it belongs, and the
data variable created for that function can reference that.
Fixes a bug introduced by
commit f95b2f1acf ("Reland [InstrProf][compiler-rt] Enable MC/DC
Support in LLVM Source-based Code Coverage (1/3)")
createDataVariable() needs to check that a data variable wasn't already
created before creating it. Previously, this was done inadvertantly in
getOrCreateRegionCounters(), which checked that the RegionCounters was
not created multiple times before creating the counter section and the
data variable. When the creation of the data variable was abstracted
into its own function (createDataVariable()), there was no corresponding
check. This was failing on a case in which an instrumented function was
being inlined into multiple functions and a duplicate data variable was
created, which led to a segfault in emitNameData(). Test case added
based on the repro that also ensures a single data variable was created
in this case.
Cast StackSaveAreaPtr, GrRegSaveAreaPtr, VrRegSaveAreaPtr to pointers to
fix assertions in getShadowOriginPtrKernel().
Fixes: https://github.com/llvm/llvm-project/issues/69738
Patch by Mark Johnston.
Loosen up the matching so that a missing leaf debug frame in the profile
does not prevent matching an allocation context if we can match further
up the inlined call context. This relies on the pre-inliner, which was
already the default when performing normal PGO feedback along with the
MemProf feedback, but to ensure matching is not affected by the presence
of PGO, enable the pre-inliner for MemProf feedback as well.
The `skipPGO()` function was added in https://reviews.llvm.org/D137184.
Unfortunately, it also blocked functions from being annotated (PGOUse),
which I believe will cause confusion to users if a function has a
profile but it is not PGO'd.
The docs for `noprofile` and `skipprofile` only claim to block
instrumentation, not PGO optimization:
https://llvm.org/docs/LangRef.html
Detect when we are matching a memprof profile with no column numbers,
and in that case treat all column numbers as 0 when matching. The
profiled binary might have been built with -gno-column-info, for
example.
Part 1 of 3. This includes the LLVM back-end processing and profile
reading/writing components. compiler-rt changes are included.
Differential Revision: https://reviews.llvm.org/D138846