clang-p2996

Author	SHA1	Message	Date
Amir Ayupov	cafd3e10c3	[BOLT][test] Fix NFC check with pre-aggregated-perf.test (#113944 ) NFC checks have been failing starting with https://lab.llvm.org/buildbot/#/builders/92/builds/8567. NFC testing wrapper (llvm-bolt-wrapper) replaces the call of `perf2bolt` with `llvm-bolt --aggregate-only --ignore-build-id`. `show-density` is automatically enabled for perf2bolt only but not for `llvm-bolt --aggregate-only`. Add the flag to the test to work around the issue. Test Plan: ``` cd build ../llvm-project/bolt/utils/nfc-check-setup.py --switch-back --verbose bin/llvm-lit -a tools/bolt/test/X86/pre-aggregated-perf.test ```	2024-10-28 11:30:30 -07:00
Amir Ayupov	6ee5ff95ab	[BOLT] Add profile density computation Reuse the definition of profile density from llvm-profgen (#92144): - the density is computed in perf2bolt using raw samples (perf.data or pre-aggregated data), - function density is the ratio of dynamically executed function bytes to the static function size in bytes, - profile density: - functions are sorted by density in decreasing order, accumulating their respective sample counts, - profile density is the smallest density covering 99% of total sample count. In other words, BOLT binary profile density is the minimum amount of profile information per function (excluding functions in tail 1% sample count) which is sufficient to optimize the binary well. The density threshold of 60 was determined through experiments with large binaries by reducing the sample count and checking resulting profile density and performance. The threshold is conservative. perf2bolt would print the warning if the density is below the threshold and suggest to increase the sampling duration and/or frequency to reach a given density, e.g.: ``` BOLT-WARNING: BOLT is estimated to optimize better with 2.8x more samples. ``` Test Plan: updated pre-aggregated-perf.test Reviewers: maksfb, wlei-llvm, rafaelauler, ayermolo, dcci, WenleiHe Reviewed By: WenleiHe, wlei-llvm Pull Request: https://github.com/llvm/llvm-project/pull/101094	2024-10-24 18:30:59 -07:00
Amir Ayupov	08916cef7e	[BOLT] Set RawBranchCount in DataAggregator Align DataAggregator (Linux perf and pre-aggregated profile reader) to DataReader (fdata profile reader) behavior: set BF->RawBranchCount which is used in profile density computation (#101094). Reviewers: ayermolo, maksfb, dcci, rafaelauler, WenleiHe Reviewed By: WenleiHe Pull Request: https://github.com/llvm/llvm-project/pull/101093	2024-10-24 18:28:44 -07:00
Kazu Hirata	6803062eb7	[BOLT] Fix a build failure This patch fixes: bolt/lib/Core/DIEBuilder.cpp:285:40: error: too many arguments to function call, expected 2, have 3	2024-10-22 10:20:20 -07:00
Kazu Hirata	9f264e4d2f	[BOLT] Avoid repeated hash lookups (NFC) (#112822 )	2024-10-18 08:39:31 -07:00
sinan	c3bbc3a57d	[BOLT] Fix logs with no hex convension (#112650 ) Add `utohexstr` to ensure that offsets/addresses are correctly formatted as hexadecimal values.	2024-10-18 09:46:41 +08:00
Paschalis Mpeis	cb9bacf57d	[AArch64][BOLT] Ensure tentative code layout for cold BBs runs. (#96609 ) When split functions is used, BOLT may skip tentative code layout estimation in some cases, like: - when there is no profile data for some blocks (ie cold blocks) - when there are cold functions in lite mode - when skip functions is used However, when rewriting the binary we still need to compute PC-relative distances between hot and cold basic blocks. Without cold layout estimation, BOLT uses '0x0' as the address of the first cold block, leading to incorrect estimations of any PC-relative addresses. This affects large binaries as the relaxStub method expands more branches than necessary using the short-jump sequence, at it wrongly believes it has exceeded the branch distance boundary. This increases code size with both a larger and slower sequence; however, performance regression is expected to be minimal since this only affects any called cold code. Example of such an unnecessary relaxation: from: ```armasm b .Ltmp1234 ``` to: ```armasm adrp x16, .Ltmp1234 add x16, x16, :lo12:.Ltmp1234 br x16 ```	2024-10-17 08:59:05 +01:00
Amir Ayupov	3c4f00905e	[BOLT] Support perf2bolt-N in the driver Check invoked tool with `starts_with`. Addresses the issue where `perf2bolt` invoked using a distro symlink `perf2bolt-16` fails to run in perf2bolt mode and runs in llvm-bolt mode instead. The issue is mentioned in https://vondra.me/posts/playing-with-bolt-and-postgres/ Test Plan: ``` ln -sf perf2bolt perf2bolt-20 perf2bolt-20 clang -p perf.data -o fdata.clang -w yaml.clang ... PERF2BOLT: wrote 188593 objects and 0 memory objects to fdata.clang ``` Reviewers: ayermolo, rafaelauler, dcci, maksfb Reviewed By: maksfb Pull Request: https://github.com/llvm/llvm-project/pull/111072	2024-10-14 10:17:31 -07:00
Kazu Hirata	23c834092e	[BOLT] Avoid repeated set lookups (NFC) (#112157 )	2024-10-14 06:55:04 -07:00
Kazu Hirata	7928e14f5e	[BOLT] Avoid repeated map lookups (NFC) (#112118 )	2024-10-12 22:06:49 -07:00
Kazu Hirata	b192f208d6	[BOLT] Avoid repeated hash lookups (NFC) (#112073 )	2024-10-12 08:03:39 -07:00
Amir Ayupov	79d695f049	[BOLT][NFCI] Speedup BAT::writeMaps For a large binary with BAT section of size 38 MB with ~170k maps, reduces writeMaps time from 70s down to 1s. The inefficiency was in the use of std::distance with std::map::iterator which doesn't provide random access. Use sorted vector for lookups. Test Plan: NFC Reviewers: maksfb, rafaelauler, dcci, ayermolo Reviewed By: maksfb Pull Request: https://github.com/llvm/llvm-project/pull/112061	2024-10-11 21:40:53 -07:00
Kazu Hirata	1be849c529	[BOLT] Avoid repeated hash lookups (NFC) (#111782 )	2024-10-09 20:19:58 -07:00
Maksim Panchenko	0e86e5214c	[BOLT][AArch64] Reduce the number of ADR relaxations (#111577 ) If ADR instruction references the same function, we can skip relaxation even if the function is split but ADR is in the main fragment.	2024-10-08 16:15:00 -07:00
ShatianWang	4cab01f072	[BOLT] Profile quality stats -- CFG discontinuity (#109683 ) In a perfect profile, each positive-execution-count block in the function’s CFG should be reachable from a positive-execution-count function entry block through a positive-execution-count path. This new pass checks how well the BOLT input profile satisfies this “CFG continuity” property. More specifically, for each of the hottest 1000 functions, the pass calculates the function’s fraction of basic block execution counts that is “unreachable”. It then reports the 95th percentile of the distribution of the 1000 unreachable fractions in a single BOLT-INFO line. The smaller the reported value is, the better the BOLT profile satisfies the CFG continuity property. The default value of 1000 above can be changed via the hidden BOLT option `-num-functions-for-continuity-check=[N]`. If more detailed stats are needed, `-v=1` can be added to the BOLT invocation: the hottest N functions will be grouped into 5 equally-sized buckets, from the hottest to the coldest; for each bucket, various summary statistics of the distribution of the fractions and the raw unreachable execution counts will be reported.	2024-10-08 19:07:43 -04:00
Tex Riddell	e237d8aac8	[BOLT] Fix tests broken by `abe0dd1` (#110071 ) `abe0dd195a` (#109553) changed default llvm-objdump output for consecutive zeros. This broke two tests: BOLT :: AArch64/constant_island_pie_update.s BOLT :: AArch64/update-weak-reference-symbol.s This fixes the test failures by adding -z to llvm-objdump in RUN line.	2024-09-25 19:34:57 -07:00
Maksim Panchenko	4db0cc4c55	[BOLT] Allow sections in --print-only flag (#109622 ) While printing functions, expand --print-only flag to accept section names. E.g., "--print-only=\.init" will only print functions from ".init" section.	2024-09-25 23:44:06 +02:00
Maksim Panchenko	6fb39ac77b	[BOLT][merge-fdata] Initialize YAML profile header (#109613 ) While merging profiles, some fields in the input header, e.g. HashFunction, could be uninitialized leading to a UMR. Initialize merged header with the first input header. Fixes #109592	2024-09-25 23:18:34 +02:00
Amir Ayupov	300051159b	[BOLT][test] Update log.test and perf_test Address noisy tests by: - perf_test: bumping sampling frequency to maximum, - log.test: matching Binary Function "main"	2024-09-23 15:47:19 -07:00
Youngsuk Kim	0a5edb4de4	[bolt] Don't call llvm::raw_string_ostream::flush() (NFC) Don't call raw_string_ostream::flush(), which is essentially a no-op. As specified in the docs, raw_string_ostream is always unbuffered. ( `65b13610a5` for further reference )	2024-09-23 17:07:11 -05:00
Kristof Beyls	6d216fb7b8	[perf2bolt] Improve heuristic to map in-process addresses to specific… (#109397 ) … segments in Elf binary. The heuristic is improved by also taking into account that only executable segments should contain instructions. Fixes #109384.	2024-09-23 15:14:51 +02:00
sinan	31ac3d092b	[BOLT] Add .iplt support to x86 (#106513 ) Add X86 support for parsing .iplt section and symbols.	2024-09-23 18:22:43 +08:00
Daniil Fukalov	65bc259a97	[NFC] Add explicit #include llvm-config.h where its macros are used, last part. (#107615 ) (this is the part related to bolt, lld and mlir) Without these explicit includes, removing other headers, who implicitly include llvm-config.h, may have non-trivial side effects. For example, `clangd` may report even `llvm-config.h` as "no used" in case it defines a macro, that is explicitly used with #ifdef. It is actually amplified with different build configs which use different set of macros.	2024-09-20 19:59:39 +02:00
Tom Stellard	773353b20a	[bolt][tests] Skip tests that use perf when perf counters are unavailable (#107892 ) On the GitHub Action runners, perf always fails with the error below , so we need to skip the perf tests on platforms like this that have limited access to the perf counters. ``` Access to performance monitoring and observability operations is limited. Consider adjusting /proc/sys/kernel/perf_event_paranoid setting to open access to performance monitoring and observability operations for processes without CAP_PERFMON, CAP_SYS_PTRACE or CAP_SYS_ADMIN Linux capability. More information can be found at 'Perf events and tool security' document: https://www.kernel.org/doc/html/latest/admin-guide/perf-security.html perf_event_paranoid setting is 4: -1: Allow use of (almost) all events by all users Ignore mlock limit after perf_event_mlock_kb without CAP_IPC_LOCK >= 0: Disallow raw and ftrace function tracepoint access >= 1: Disallow CPU event access >= 2: Disallow kernel profiling To make the adjusted perf_event_paranoid setting permanent preserve it in /etc/sysctl.conf (e.g. kernel.perf_event_paranoid = <setting>) ```	2024-09-17 17:07:35 -07:00
Nikita Popov	827dd1ef2f	[Bolt] Explicitly request PIE in tests (#108818 ) When clang is built with `-DCLANG_DEFAULT_PIE_ON_LINUX=OFF`, a number of bolt tests fail: BOLT :: AArch64/build_id.c BOLT :: AArch64/plt-call.test BOLT :: X86/dwarf5-dwarf4-types-backward-forward-cross-reference.test BOLT :: X86/dwarf5-locexpr-referrence.test BOLT :: X86/internal-call-instrument.s BOLT :: X86/linux-static-keys.s BOLT :: X86/plt-call.test Avoid this by explicitly adding `-fPIE` and `-pie` to the default flags in tests, so we don't depend on the clang-side default.	2024-09-17 08:58:49 +02:00
Maksim Panchenko	d32fe95d82	[BOLT][AArch64] Do not relax ADR referencing the same fragment (#108673 ) ADR can reference a secondary entry point in the same function. If that's the case, we can skip relaxing the instruction when it is in the same fragment as its target. Fixes #108290	2024-09-13 20:41:37 -07:00
Amir Ayupov	cd774c873c	[BOLT][NFC] Rename ProfilePseudoProbeDesc Address build issues due to aliasing PseudoProbeDesc, e.g. https://lab.llvm.org/buildbot/#/builders/113/builds/2743	2024-09-12 21:25:38 -07:00
Amir Ayupov	c00c62c113	[BOLT] Add pseudo probe inline tree to YAML profile Add probe inline tree information to YAML profile, at function level: - function GUID, - checksum, - parent node id, - call site in the parent. This information is used for pseudo probe block matching (#99891). The encoding adds/changes probe information in multiple levels of YAML profile: - BinaryProfile: add pseudo_probe_desc with GUIDs and Hashes, which permits deduplication of data: - many GUIDs are duplicate as the same callee is commonly inlined into multiple callers, - hashes are also very repetitive, especially for functions with low block counts. - FunctionProfile: add inline tree (see above). Top-level function is included as root of function inline tree, which makes guid and pseudo_probe_desc_hash fields redundant. - BlockProfile: densely-encoded block probe information: - probes reference their containing inline tree node, - separate lists for block, call, indirect call probes, - block probe encoding is specialized: ids are encoded as bitset in uint64_t. If only block probe with id=1 is present, it's encoded as implicit entry (id=0, omitted). - inline tree nodes with identical probes share probe description where node indices are combined into a list. On top of #107970, profile with new probe encoding has the following characteristics (profile for a large binary): - Profile without probe information: 33MB, 3.8MB compressed (baseline). - Profile with inline tree information: 92MB, 14MB compressed. Profile processing time (YAML parsing, inference, attaching steps): - profile without pseudo probes: 5s, - profile with pseudo probes, without pseudo probe matching: 11s, - with pseudo probe matching: 12.5s. Test Plan: updated pseudoprobe-decoding-inline.test Reviewers: wlei-llvm, ayermolo, rafaelauler, dcci, maksfb Reviewed By: wlei-llvm, rafaelauler Pull Request: https://github.com/llvm/llvm-project/pull/107137	2024-09-12 20:51:35 -07:00
Amir Ayupov	ccc7a072db	[BOLT] Drop blocks without profile in BAT YAML (#107970 ) Align BAT YAML (DataAggregator) to YAMLProfileWriter which drops blocks without profile: `61372fc5db/bolt/lib/Profile/YAMLProfileWriter.cpp (L162-L176)` Test Plan: NFCI	2024-09-11 16:36:47 -07:00
Amir Ayupov	86ec59e2f7	[BOLT] Only parse probes for profiled functions in profile-write-pseudo-probes mode (#106365 ) Implement selective probe parsing for profiled functions only when emitting probe information to YAML profile as suggested in https://github.com/llvm/llvm-project/pull/102904#pullrequestreview-2248714190 For a large binary, this reduces probe parsing - processing time from 10.5925s to 5.6295s, - peak RSS from 10.54 to 7.98 GiB.	2024-09-11 16:33:34 -07:00
Amir Ayupov	c820bd3e33	[BOLT][NFC] Rename profile-use-pseudo-probes The flag currently controls writing of probe information in YAML profile. #99891 adds a separate flag to use probe information for stale profile matching. Thus `profile-use-pseudo-probes` becomes a misnomer and `profile-write-pseudo-probes` better captures the intent. Reviewers: maksfb, WenleiHe, ayermolo, rafaelauler, dcci Reviewed By: rafaelauler Pull Request: https://github.com/llvm/llvm-project/pull/106364	2024-09-11 16:27:33 -07:00
Amir Ayupov	a66ce58ac6	[BOLT] Drop suffixes in parsePseudoProbe GUID assignment (#106243 ) Pseudo probe function records contain GUIDs assigned by the compiler using an IR function name. Thus suffixes added later (e.g. `.llvm.` for internal symbols, `.destroy`/`.resume` for coroutine fragments, and `.cold`/`.warm` for split fragments) cause GUID mismatch. Address that by dropping those suffixes using `getCommonName` which is a parametrized form of `getLTOCommonName`.	2024-09-11 14:42:51 -07:00
Amir Ayupov	15fa3ba547	[BOLT][YAML] Allow unknown keys in the input (#100824 ) This ensures forward compatibility, where old BOLT versions can consume the profile created by newer versions with extra keys. Test Plan: added yaml-unknown-keys.test	2024-09-03 11:27:57 -07:00
Maksim Panchenko	abd69b3653	[BOLT] Handle internal calls in ValidateInternalCalls (#105736 ) Move handling of all internal calls into the designated pass. Preserve NOPs and mark functions as non-simple on non-X86 platforms.	2024-08-27 11:31:32 -07:00
Amir Ayupov	a79cf0228e	[MC][NFC] Use vector for GUIDProbeFunctionMap Replace unordered_map with a vector. Pre-parse the section to statically allocate storage. Use BumpPtrAllocator for FuncName strings, keep StringRef in FuncDesc. Reduces peak RSS of pseudo probe parsing from 9.08 GiB to 8.89 GiB as part of perf2bolt with a large binary. Test Plan: ``` bin/llvm-lit -sv test/tools/llvm-profgen ``` Reviewers: wlei-llvm, rafaelauler, dcci, maksfb, ayermolo Reviewed By: wlei-llvm Pull Request: https://github.com/llvm/llvm-project/pull/102905	2024-08-26 09:15:53 -07:00
Amir Ayupov	ee09f7d1fc	[MC][NFC] Reduce Address2ProbesMap size Replace the map from addresses to list of probes with a flat vector containing probe references sorted by their addresses. Reduces pseudo probe parsing time from 9.56s to 8.59s and peak RSS from 9.66 GiB to 9.08 GiB as part of perf2bolt processing a large binary. Test Plan: ``` bin/llvm-lit -sv test/tools/llvm-profgen ``` Reviewers: maksfb, rafaelauler, dcci, ayermolo, wlei-llvm Reviewed By: wlei-llvm Pull Request: https://github.com/llvm/llvm-project/pull/102904	2024-08-26 09:14:35 -07:00
Amir Ayupov	04ebd1907c	[MC][NFC] Statically allocate storage for decoded pseudo probes and function records Use #102774 to allocate storage for decoded probes (`PseudoProbeVec`) and function records (`InlineTreeVec`). Leverage that to also shrink sizes of `MCDecodedPseudoProbe`: - Drop Guid since it's accessible via `InlineTree`. `MCDecodedPseudoProbeInlineTree`: - Keep track of probes and inlinees using `ArrayRef`s now that probes and function records belonging to the same function are allocated contiguously. This reduces peak RSS from 13.7 GiB to 9.7 GiB and pseudo probe parsing time (as part of perf2bolt) from 15.3s to 9.6s for a large binary with 400MiB .pseudo_probe section containing 43M probes and 25M function records. Depends on: #102774 #102787 #102788 Reviewers: maksfb, rafaelauler, dcci, ayermolo, wlei-llvm Reviewed By: wlei-llvm Pull Request: https://github.com/llvm/llvm-project/pull/102789	2024-08-26 09:09:13 -07:00
Amir Ayupov	121ed07975	[MC][NFC] Count pseudo probes and function records Pre-parse pseudo probes section counting the number of probes and function records. These numbers are used in follow-up diff to pre-allocate vectors for decoded probes and inline tree nodes. Additional benefit is avoiding error handling during parsing. This pre-parsing is fast: for a 404MiB .pseudo_probe section with 43373881 probes and 25228770 function records, it only takes 0.68±0.01s. The total time of buildAddress2ProbeMap is 21s. Reviewers: dcci, maksfb, rafaelauler, wlei-llvm, ayermolo Reviewed By: wlei-llvm Pull Request: https://github.com/llvm/llvm-project/pull/102774	2024-08-26 09:05:34 -07:00
Harini0924	7f3793207b	[BOLT][test] Removed the use of parentheses in BOLT tests with lit internal shell (#105720 ) This patch addresses compatibility issues with the lit internal shell by removing the use of subshell execution (parentheses and subshell syntax) in the `BOLT` tests. The lit internal shell does not support parentheses, so the tests have been refactored to use separate command invocations, with outputs redirected to temporary files where necessary. This change is relevant for enabling the lit internal shell by default, as outlined in [[RFC] Enabling the Lit Internal Shell by Default](https://discourse.llvm.org/t/rfc-enabling-the-lit-internal-shell-by-default/80179) fixes: #102401	2024-08-23 08:20:11 -07:00
ShatianWang	cbd302410e	[BOLT] Improve BinaryFunction::inferFallThroughCounts() (#105450 ) This PR improves how basic block execution count is updated when using the BOLT option `-infer-fall-throughs`. Previously, if a 0-count fall-through edge is assigned a positive inferred count N, then the successor block's execution count will be incremented by N. Since the successor's execution count is calculated using information besides inflow sum (such as outflow sum), it likely is already correct, and incrementing it by an additional N would be wrong. This PR improves how the successor's execution count is updated by using the max over its current count and N.	2024-08-21 00:35:07 -04:00
Maksim Panchenko	8f3050684e	[BOLT] Reduce CFI warning verbosity (#105336 ) CFI programs may have more saves than restores and this is completely benign from BOLT's perspective. Reduce the verbosity and print the warning only under `-v=1` and above.	2024-08-20 13:41:19 -07:00
Harini0924	4f5d866af7	[llvm-lit] Add REQUIRES: shell to BOLT permission test for lit internal shell (#103012 ) This patch adds the `REQUIRES: shell` directive to the BOLT permission test to ensure it only runs in environments with a full-featured Unix-like shell. This change is necessary because the test relies on advanced shell capabilities that are not supported by lit's internal shell. Reasoning: The BOLT permission test uses features like running commands in the background with `&`, performing arithmetic operations, and handling special number formats (octal). These features require a more capable shell than what lit's internal shell provides. Without a proper shell, the test could fail or behave unpredictably. This change is relevant for enabling the lit internal shell by default, as outlined in [[RFC] Enabling the Lit Internal Shell by Default](https://discourse.llvm.org/t/rfc-enabling-the-lit-internal-shell-by-default/80179)	2024-08-13 19:58:59 -07:00
Connie	887f7002b6	[NFC][bolt][test] Change '\|&' to '2>&1 \|' for lit internal shell support (#102402 ) This patches changes all references to '\|&' in bolt tests to instead use the '2>&1 \|' syntax for better consistency across testing and so that lit's internal shell can be used to run these tests. This addresses a suggestion made in the comments of this RFC: https://discourse.llvm.org/t/rfc-enabling-the-lit-internal-shell-by-default/80179. Fixes https://github.com/llvm/llvm-project/issues/102388	2024-08-12 17:18:17 -07:00
Peter Jung	c1912b4dd7	[BOLT][docs] Fix typo (#98640 ) Typo: `chwon` --> `chown` Signed-off-by: Peter Jung <admin@ptr1337.dev>	2024-08-08 18:05:41 -07:00
Sayhaan Siddiqui	6aad62cf5b	[BOLT][DWARF] Add parallelization for processing of DWO debug information (#100282 ) Enables parallelization for the processing of DWO CUs.	2024-08-08 16:41:51 -07:00
Davide Italiano	e49549ff19	Revert "[BOLT] Abort on out-of-section symbols in GOT (#100801 )" This reverts commit `a4900f0d93`.	2024-08-07 20:52:19 -07:00
Vladislav Khmelevsky	445023f173	Revert "[BOLT] Move ADRRelaxationPass (#101371 )" (#102333 ) This reverts commit `750b12f06b`. The pass should run after splitting phase, but before nop removal	2024-08-07 21:03:51 +04:00
Sayhaan Siddiqui	62e894e0d7	[BOLT][DWARF][NFC] Move Arch assignment out of createBinaryContext (#102054 ) Moves the assignment of Arch out of createBinaryContext to prevent data races when parallelized.	2024-08-07 16:55:39 +00:00
Vladislav Khmelevsky	a4900f0d93	[BOLT] Abort on out-of-section symbols in GOT (#100801 ) This patch aborts BOLT execution if it finds out-of-section (section end) symbol in GOT table. In order to handle such situations properly in future, we would need to have an arch-dependent way to analyze relocations or its sequences, e.g., for ARM it would probably be ADRP + LDR analysis in order to get GOT entry address. Currently, it is also challenging because GOT-related relocation symbols are replaced to __BOLT_got_zero. Anyway, it seems to be quite a rare case, which seems to be only? related to static binaries. For the most part, it seems that it should be handled on the linker stage, since static binary should not have GOT table at all. LLD linker with relaxations enabled would replace instruction addresses from GOT directly to target symbols, which eliminates the problem. Anyway, in order to achieve detection of such cases, this patch fixes a few things in BOLT: 1. For the end symbols, we're now using the section provided by ELF binary. Previously it would be tied with a wrong section found by symbol address. 2. The end symbols would have limited registration we would only add them in name->data GlobalSymbols map, since using address->data BinaryDataMap map would likely be impossible due to address duality of such symbols. 3. The outdated BD->getSection (currently returning refence, not pointer) check in postProcessSymbolTable is replaced by getSize check in order to allow zero-sized top-level symbols if they are located in zero-sized sections. For the most part, such things could only be found in tests, but I don't see a reason not to handle such cases. 4. Updated section-end-sym test and removed x86_64 requirement since there is no reason for this (tested on aarch64 linux) The test was provided by peterwaller-arm (thank you) in #100096 and slightly modified by me.	2024-08-07 16:26:12 +04:00
Vladislav Khmelevsky	097ddd3565	[BOLT] Fix relocations handling (#100890 ) After porting BOLT to RISCV some of the relocations were broken on both AArch64 and X86. On AArch64 the example of broken relocations would be GOT, during handling them, we should replace the symbol to __BOLT_got_zero in order to address GOT entry, not the symbol that addresses this entry. This is done further in code, so it is too early to add rel here. On X86 it is a mistake to add relocations without addend. This is the exact problem that is raised on #97937. Due to different code generation I had to use gcc-generated yaml test, since with clang I wasn't able to reproduce problem. Added tests for both architectures and made the problematic condition riscV-specific.	2024-08-07 16:25:46 +04:00

1 2 3 4 5 ...

2328 Commits