In extremely rare cases (estimated 1 in 3 million), minor allocations
that happen after the memory layout was checked in
InitializePlatformEarly() [1] may result in the memory layout
unexpectedly being incompatible in InitializePlatform(). We fix this by
adding another memory layout check (and opportunity to re-exec without
ASLR) in InitializePlatform().
To improve future debuggability, this patch also dumps the process map
if the memory layout is unexpectedly incompatible.
[1]
```
__sanitizer::InitializePlatformEarly();
__tsan::InitializePlatformEarly();
#if !SANITIZER_GO
InitializeAllocator(); // <-- ~8MB mmap'ed
ReplaceSystemMalloc();
#endif
if (common_flags()->detect_deadlocks)
ctx->dd = DDetector::Create(flags()); // <-- ~4MB mmap'ed
Processor *proc = ProcCreate(); // <-- ~1MB mmap'ed
ProcWire(proc, thr);
InitializeInterceptors(); <-- ~3MB mmap'ed
InitializePlatform();
```
My previous patch, "Re-exec TSan with no ASLR if memory layout is incompatible on Linux (#78351)" (0784b1eefa) hoisted the 'personality' call, to share the code between Android and non-Android Linux. Unfortunately, this eager call to 'personality' may trigger sandbox violations on non-Android Linux.
This patch fixes the issue by only calling 'personality' on non-Android Linux if the memory mapping is incompatible. This may still cause a sandbox violation, but only if it was going to abort anyway due to an incompatible memory mapping.
(The behavior on Android Linux is unchanged by this patch or the previous patch.)
In 0784b1eefa some code for re-execution was moved to
`ReExecIfNeeded()`, but also extended with a few Linux-only features.
This leads to compile errors on FreeBSD, or other non-Linux platforms:
compiler-rt/lib/tsan/rtl/tsan_platform_linux.cpp:247:25: error: use of
undeclared identifier 'personality'
247 | int old_personality = personality(0xffffffff);
| ^
compiler-rt/lib/tsan/rtl/tsan_platform_linux.cpp:249:54: error: use of
undeclared identifier 'ADDR_NO_RANDOMIZE'
249 | (old_personality != -1) && ((old_personality & ADDR_NO_RANDOMIZE)
== 0);
| ^
compiler-rt/lib/tsan/rtl/tsan_platform_linux.cpp:281:46: error: use of
undeclared identifier 'ADDR_NO_RANDOMIZE'
281 | CHECK_NE(personality(old_personality | ADDR_NO_RANDOMIZE), -1);
| ^
Surround the affected part with a `#if SANITIZER_LINUX` block for now.
TSan's shadow mappings only support 30-bits of ASLR entropy on x86
Linux, and it is not practical to support the maximum of 32-bits (due to pointer compression and the overhead of shadow mappings). Instead, this patch changes TSan to re-exec without ASLR if it encounters an
incompatible memory layout, as suggested by Dmitry in
https://github.com/google/sanitizers/issues/1716.
If ASLR is already disabled but the memory layout is still incompatible,
it will abort.
This patch involves a bit of refactoring, because the old code is:
1. InitializePlatformEarly()
2. InitializeAllocator()
3. InitializePlatform(): CheckAndProtect()
but it may already segfault during InitializeAllocator() if the memory
layout is incompatible, before we get a chance to check in
CheckAndProtect().
This patch adds CheckAndProtect() during InitializePlatformEarly(), before the allocator is initialized. Naturally, it is necessary to ensure that CheckAndProtect() does *not* allow the heap regions to be occupied here, hence we generalize CheckAndProtect() to optionally check the heap
regions. We keep the original behavior of CheckAndProtect() in InitializePlatform() as a last line of defense.
We need to be careful not to prematurely abort if ASLR is disabled but TSan was going to re-exec for other reasons (e.g., unlimited stack size); we implement this by moving all the re-exec logic into ReExecIfNeeded().
Implements for sv39 and sv48 VMA layout.
Userspace only has access to the bottom half of vma range. The top half
is used by kernel. There is no dedicated vsyscall or heap segment.
PIE program is allocated to start at TASK_SIZE/3*2. Maximum ASLR is
ARCH_MMAP_RND_BITS_MAX+PAGE_SHIFT=24+12=36 Loader, vdso and other
libraries are allocated below stack from the top.
Also change RestoreAddr to use 4 bits to accommodate MappingRiscv64_48
Reviewed by: MaskRay, dvyukov, asb, StephenFan, luismarques, jrtc27,
hiraditya, vitalybuka
Differential Revision: https://reviews.llvm.org/D145214
D145214 was reverted because one file was missing in the latest commit.
Luckily the file was there in the previous commit, probably the author
missed uploading that file with latest commit.
Co-authored-by: Alex Fan <alex.fan.q@gmail.com>
Implements for sv39 and sv48 VMA layout.
Userspace only has access to the bottom half of vma range. The top half is used by kernel.
There is no dedicated vsyscall or heap segment.
PIE program is allocated to start at TASK_SIZE/3*2. Maximum ASLR is ARCH_MMAP_RND_BITS_MAX+PAGE_SHIFT=24+12=36
Loader, vdso and other libraries are allocated below stack from the top.
Also change RestoreAddr to use 4 bits to accommodate MappingRiscv64_48
Reviewed by: MaskRay, dvyukov, asb, StephenFan, luismarques, jrtc27, hiraditya, vitalybuka
Differential Revision: https://reviews.llvm.org/D145214
Introducing xor key to derive unmangled sp is here to follow the way
that the glibc adds support for pointer mangling on loongarch in commit
1c9bc1b6e50293a1b7037a7bfbf835868a55baed.
Reviewed By: SixWeining, wangleiat, xen0n
Differential Revision: https://reviews.llvm.org/D146716
This patch enabled tsan for loongarch64 with 47-bit VMA layout. All
tests are passing.
Also adds assembly routines to enable setjmp/longjmp for loongarch64
on linux.
Reviewed By: dvyukov, SixWeining, #sanitizers
Differential Revision: https://reviews.llvm.org/D138489
The stack pointer is stored in the second slot in the jump buffer on
AArch64. Use the correct slot value to read this rather than the
following register.
Reviewed by: melver
Differential Revision: https://reviews.llvm.org/D125762
This change switches tsan to the new runtime which features:
- 2x smaller shadow memory (2x of app memory)
- faster fully vectorized race detection
- small fixed-size vector clocks (512b)
- fast vectorized vector clock operations
- unlimited number of alive threads/goroutimes
Depends on D112602.
Reviewed By: melver
Differential Revision: https://reviews.llvm.org/D112603
This change switches tsan to the new runtime which features:
- 2x smaller shadow memory (2x of app memory)
- faster fully vectorized race detection
- small fixed-size vector clocks (512b)
- fast vectorized vector clock operations
- unlimited number of alive threads/goroutimes
Depends on D112602.
Reviewed By: melver
Differential Revision: https://reviews.llvm.org/D112603
This change switches tsan to the new runtime which features:
- 2x smaller shadow memory (2x of app memory)
- faster fully vectorized race detection
- small fixed-size vector clocks (512b)
- fast vectorized vector clock operations
- unlimited number of alive threads/goroutimes
Depends on D112602.
Reviewed By: melver
Differential Revision: https://reviews.llvm.org/D112603
This change switches tsan to the new runtime which features:
- 2x smaller shadow memory (2x of app memory)
- faster fully vectorized race detection
- small fixed-size vector clocks (512b)
- fast vectorized vector clock operations
- unlimited number of alive threads/goroutimes
Differential Revision: https://reviews.llvm.org/D112603
This change switches tsan to the new runtime which features:
- 2x smaller shadow memory (2x of app memory)
- faster fully vectorized race detection
- small fixed-size vector clocks (512b)
- fast vectorized vector clock operations
- unlimited number of alive threads/goroutimes
Depends on D112602.
Reviewed By: melver
Differential Revision: https://reviews.llvm.org/D112603
This change switches tsan to the new runtime which features:
- 2x smaller shadow memory (2x of app memory)
- faster fully vectorized race detection
- small fixed-size vector clocks (512b)
- fast vectorized vector clock operations
- unlimited number of alive threads/goroutimes
Depends on D112602.
Reviewed By: melver
Differential Revision: https://reviews.llvm.org/D112603
stats_size argument is unnecessary in GetMemoryProfile and in the callback.
It just clutters code. The callback knowns how many stats to expect.
Depends on D112789.
Reviewed By: melver
Differential Revision: https://reviews.llvm.org/D112790
Print meaningful stack frames for stack/tls races
(instead of PC 1/2 that don't symbolize).
Imitate stack/tls writes after we create and initialize
the new thread, otherwise the races are not detected.
This is re-submit of the following reverted commits,
but without tests as they failed on a number of OSes/arches:
"tsan: fix and test detection of TLS races"
"tsan: fix tls_race3 test on darwin"
"tsan: print a meaningful frame for stack races"
Differential Revision: https://reviews.llvm.org/D111147
The trace tests crashed on darwin because of some thread
initialization issues (thread initialization is somewhat
different on darwin).
Instead of starting real threads, create a new ThreadState
in the main thread. This makes the tests more unit-testy
and hopefully won't crash on darwin (there is almost no
platform-specific code involved now).
This will also help with future trace tests that will need
more than 1 thread. Creating more than 1 real thread and
dispatching test actions across multiple threads in the
required deterministic order is painful.
Depends on D110539.
Reviewed By: melver
Differential Revision: https://reviews.llvm.org/D110546
Currently detection of races with TLS/stack initialization
is broken because we imitate the write before thread initialization,
so it's modelled with a wrong thread/epoch.
Fix that and add a test.
Reviewed By: melver
Differential Revision: https://reviews.llvm.org/D110538
Fix few remaining cases where we use u64 instead of the new RawShadow type.
Depends on D110265.
Reviewed By: melver
Differential Revision: https://reviews.llvm.org/D110266
Remove unnecessary enum values in the memory profiler.
There is no point in spelling them, it can only lead to bugs
and larger diffs when values are added/removed.
Reviewed By: melver
Differential Revision: https://reviews.llvm.org/D110263
Write uptime in real time seconds for every mem profile record.
Uptime is useful to make more sense out of the profile,
compare random lines, etc.
Depends on D110153.
Reviewed By: melver, vitalybuka
Differential Revision: https://reviews.llvm.org/D110154
We allocate things from the internal allocator,
it's useful to know how much it consumes.
Depends on D110150.
Reviewed By: melver, vitalybuka
Differential Revision: https://reviews.llvm.org/D110151
We currently query number of threads before reading /proc/self/smaps.
But reading /proc/self/smaps can take lots of time for huge processes
and it's retries several times with different buffer sizes.
Overall it can take tens of seconds. This can make number of threads
significantly inconsistent with the rest of the stats.
So query it after reading /proc/self/smaps.
Depends on D110149.
Reviewed By: melver, vitalybuka
Differential Revision: https://reviews.llvm.org/D110150
Include info about MBlock/SyncObj memory consumption in the memory profile.
Depends on D110148.
Reviewed By: melver, vitalybuka
Differential Revision: https://reviews.llvm.org/D110149
We account low and high ranges, but forgot abount the mid range.
Account mid range as well.
Reviewed By: melver
Differential Revision: https://reviews.llvm.org/D110148
Currently there are 2 levels when selecting the active mapping:
the branchy ifdef tree + another ifdef tree in SelectMapping.
Moreover, there is an additional indirection for some platforms
via HAS_48_BIT_ADDRESS_SPACE define. This makes already complex
logic even more complex and almost impossible to read.
Remove one level of indirection and define the active mapping
in SelectMapping.
Depends on D107742.
Reviewed By: melver
Differential Revision: https://reviews.llvm.org/D107743
Define all fields to 0 for all mappings.
This allows to write portable code and tests.
For all existing cases 0 values work out of the box
because we check if an address belongs to the range
and nothing belongs to [0, 0] range.
Depends on D107738.
Reviewed By: melver
Differential Revision: https://reviews.llvm.org/D107739
Unify Go mapping naming with C++ naming to allow
writing portable code/tests that can work for both C++ and Go.
No functional changes.
Depends on D107737.
Reviewed By: melver
Differential Revision: https://reviews.llvm.org/D107738
This was reverted by f176803ef1 due to
Ubuntu 16.04 x86-64 glibc 2.23 problems.
This commit additionally calls `__tls_get_addr({modid,0})` to work around the
dlpi_tls_data==NULL issues for glibc<2.25
(https://sourceware.org/bugzilla/show_bug.cgi?id=19826)
GetTls is the range of
* thread control block and optional TLS_PRE_TCB_SIZE
* static TLS blocks plus static TLS surplus
On glibc, lsan requires the range to include
`pthread::{specific_1stblock,specific}` so that allocations only referenced by
`pthread_setspecific` can be scanned.
This patch uses `dl_iterate_phdr` to collect TLS blocks. Find the one
with `dlpi_tls_modid==1` as one of the initially loaded module, then find
consecutive ranges. The boundaries give us addr and size.
This allows us to drop the glibc internal `_dl_get_tls_static_info` and
`InitTlsSize` entirely. Use the simplified method with non-Android Linux for
now, but in theory this can be used with *BSD and potentially other ELF OSes.
This simplification enables D99566 for TLS Variant I architectures.
See https://reviews.llvm.org/D93972#2480556 for analysis on GetTls usage
across various sanitizers.
Differential Revision: https://reviews.llvm.org/D98926