fix/server-shutdown-asan
7 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
75b9ea05b8 |
refactor(server): split into service layer, add agentic protocol, adopt task_group (#437)
## Summary - **Restructure `src/server/` into subdirectories** (`service/`, `compiler/`, `worker/`, `workspace/`, `protocol/`) to separate concerns: transport/session management, compilation, worker orchestration, and persistent workspace state. - **Decouple MasterServer from transport**: MasterServer no longer holds a `JsonPeer&` reference or registers handlers itself. New `LSPClient` and `AgentClient` classes own their peer references and register protocol handlers, accessing MasterServer internals via `friend class`. - **Add agentic protocol**: A TCP-based side channel (`agentic/compileCommand`) that lets external tools (AI agents, build systems) query compile commands from a running clice server. Includes a CLI client mode (`--mode agentic --port N --path FILE`), server-side listener when `--port` is specified in pipe mode, and integration tests for happy path, fallback, concurrency, and connection-refused. - **Replace fire-and-forget `loop.schedule()` with `kota::task_group`**: Compiler compile tasks, Indexer background indexing + resource monitor, WorkerPool worker monitors, and socket accept loops now use structured concurrency. This eliminates manual `alive_count_`/generation counters and ensures all spawned tasks are joined on shutdown. - **Fix flaky integration test**: `CliceClient.initialize()` now always sets `cache_dir` to a workspace-local `.clice/` directory, preventing stale PCH artifacts from the global `~/.cache/clice/` from polluting test runs. ## Details **Compiler peer lifetime**: `Compiler` and `Indexer` previously took `JsonPeer&` in their constructors, coupling them to a single connection. They now store a `JsonPeer*` set via `set_peer()`, with null checks before sending diagnostics/progress. This supports the multi-connection model where agentic clients don't need diagnostics. **Socket mode single-LSP enforcement**: `accept_connections()` takes a `register_lsp` flag; when true, only the first connection gets an `LSPClient`. All connections get an `AgentClient`. This prevents multiple LSP sessions from racing on shared server state. **Structured shutdown**: `Compiler::stop()` cancels in-flight compile tasks and joins them. `WorkerPool::stop()` signals workers and joins the monitor task group. `Indexer` uses a `cancellation_source` to stop its resource monitor when a background indexing run completes. **Pin kotatsu**: Changed from `GIT_TAG main` + `GIT_SHALLOW TRUE` to an exact commit hash for reproducible builds. --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> |
||
|
|
e1202d2fa5 |
fix: prevent worker crashes from null ASTConsumer, invalid FileID, and missing PCH cache dir (#435)
## Summary Three pre-existing bugs cause worker processes to crash with SEGV or SIGABRT. On the main branch these crashes are silent (workers die, requests fail fast with "transport closed", tests still pass because null responses are accepted). However when combined with #432's worker respawn mechanism, the crash-respawn-crash cycle on low-core CI machines causes request timeouts and smoke test hangs. ### Fixes - **compilation.cpp**: `ProxyAction::CreateASTConsumer` now checks for null before passing to `MultiplexConsumer`. When the wrapped action's `CreateASTConsumer` fails (e.g. missing system headers during PCH generation), this previously caused a null pointer dereference, SEGV, ASAN kills the stateless worker. - **compilation_unit.cpp**: `file_path()` returns empty `StringRef` on invalid `FileID` instead of asserting. The assert fired when `IncludeGraph::from()` called `file_path(interested_file())` on an AST compiled with synthesized default commands (no compile_commands.json, clang++ -std=c++20 fallback, no system headers, invalid main file ID), SIGABRT, stateful worker crash. - **compiler.cpp**: `ensure_pch` now creates the PCH cache directory before sending the build request. Previously, when `load_workspace()` exited early (no compile_commands.json), the cache subdirectories were never created, causing every PCH write to fail with "No such file or directory". - **master_server.cpp/h**: `load_workspace()` changed from `kota::task<>` to plain `void` -- it contains only synchronous filesystem operations and no co_await, so the coroutine wrapper was unnecessary. Called directly instead of via `loop.schedule()`. ## Test plan - [x] Verified zero SEGV/SIGABRT/assertion crashes in worker stderr after fix - [x] rapid_edit.jsonl smoke test passes 3/3 runs consistently (34s each) - [x] Behavior matches main branch (both return 134 responses, 0 pending) - [x] Debug build with ASAN (detect_leaks=0) -- clean run, no sanitizer reports <!-- codesmith:footer --> --- <a href="https://app.blacksmith.sh/clice-io/codesmith/clice/pr/435"><picture><source media="(prefers-color-scheme: dark)" srcset="https://pr-comments-assets.blacksmith.sh/codesmith/view-in-codesmith-dark.svg"><source media="(prefers-color-scheme: light)" srcset="https://pr-comments-assets.blacksmith.sh/codesmith/view-in-codesmith-light.svg"><img alt="View in Codesmith" src="https://pr-comments-assets.blacksmith.sh/codesmith/view-in-codesmith-dark.svg"></picture></a> <sup>Codesmith can help with this PR — just tag <code>@codesmith</code> or enable autofix.</sup> - [ ] Autofix CI and bot reviews <!-- /codesmith:footer --> <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **Bug Fixes** * Improved error handling for AST consumer creation with null checks and a clear failure path. * Safer file-path access that returns empty for invalid identifiers instead of asserting. * PCH cache handling now validates cache configuration, attempts directory creation, logs warnings, and aborts PCH builds on failure. * **Refactor** * Workspace loading changed from asynchronous to synchronous execution. <!-- end of auto-generated comment: release notes by coderabbit.ai --> |
||
|
|
418e190fa0 |
chore(deps): migrate from eventide to kotatsu (#428)
## Summary - The `eventide` dep was renamed to [kotatsu](https://github.com/clice-io/kotatsu) with a broad rename of CMake identifiers, namespaces, header paths, and a few module reorgs (`serde` → `codec`, `reflection` → `meta`, `common` → `support`). Align clice to the new names. - CMake: FetchContent target, option prefix (`ETD_*` → `KOTA_*`, `ETD_SERDE_*` → `KOTA_CODEC_*`), target names (`eventide::{ipc::lsp,serde::toml,deco,zest}` → `kota::{ipc::lsp,codec::toml,deco,zest}`). - Namespaces: `eventide::` → `kota::`, `eventide::serde::` → `kota::codec::`, `eventide::refl::` → `kota::meta::`. The short `et` alias is dropped — all usages now spell `kota::` directly. - Headers: `eventide/*` → `kota/*`, including special cases `serde/serde/raw_value.h` → `codec/raw_value.h`, `ipc/json_codec.h` → `ipc/codec/json.h`, `common/meta.h` → `support/type_traits.h`, `common/ranges.h` → `support/ranges.h`. - Kotatsu split `JsonPeer` / `BincodePeer` out of `ipc/peer.h` into the codec-specific headers; added `kota/ipc/codec/{json,bincode}.h` includes where those types are used. - Depends on clice-io/kotatsu#110 (already merged) to prevent `-Wall -Wextra -Werror` from transitively propagating out of `kota::project_options`. ## Test plan - [x] `pixi run unit-test RelWithDebInfo` — 518/518 pass (9 skipped, unchanged from main) - [x] `pixi run integration-test RelWithDebInfo` — 119/119 pass - [x] `pixi run smoke-test RelWithDebInfo` — 2/2 pass - [x] `pixi run format` clean ## Notes - `tests/smoke/rapid_edit.jsonl` was intentionally left untouched: the embedded `#include "eventide/..."` strings are frozen snapshots of file contents the client sent at record time, not clice source. 🤖 Generated with [Claude Code](https://claude.com/claude-code) <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **Chores** * Updated internal dependencies from `eventide` to `kota`, including async runtime, IPC transport, serialization codec, and metadata libraries. * Updated build configuration and CMake variables to align with the new dependency. * **Refactor** * Migrated internal implementation to use `kota` namespace and APIs throughout the codebase. <!-- end of auto-generated comment: release notes by coderabbit.ai --> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
bb0b160a28 |
refactor(server): extract Indexer and Compiler from MasterServer (#403)
## Summary - **Extract `Indexer` class** — owns all index state (ProjectIndex, MergedIndex shards, OpenFileIndex) and query methods (definition, references, call/type hierarchy, workspace symbol search) - **Extract `Compiler` class** — owns document state, PCH/PCM cache, compile argument resolution, header context, `ensure_compiled`, and worker forwarding - **MasterServer is now a pure LSP handler registration layer** (~700 lines, down from ~3200) - **`MergedIndexShard`** wraps `index::MergedIndex` with a lazily-cached PositionMapper; `OpenFileIndex` gains matching `find_occurrence()`/`find_relations()` APIs — callers get pre-converted LSP ranges directly - **Indexer returns typed values** (`vector<Location>`, `vector<CallHierarchyIncomingCall>`, etc.) instead of pre-serialized JSON, fixing the references handler from JSON string surgery to simple vector concatenation - **Fix**: duplicate `workspace/symbol` loop in the original code ## Test plan - [x] 465 unit tests pass - [x] 113 integration tests pass - [x] 2/2 smoke tests pass - [x] `clang-format` applied 🤖 Generated with [Claude Code](https://claude.com/claude-code) <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **New Features** * Server-side C++ compilation orchestration (module & precompiled header builds) with LSP-integrated document handling. * **Improvements** * Deterministic, persistent, dependency-aware caching to avoid redundant rebuilds and speed up incremental work. * Better cross-file indexing and navigation, improved diagnostics and more reliable include/import-aware completions. * **Tests** * Unit tests updated to the unified worker query/build request shapes. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> |
||
|
|
31d9c609b6 |
fix: data race in stateful worker between Compile and DocumentUpdate (#389)
## Summary Fix two data races in the stateful worker that caused spurious "redefinition" errors during rapid edits, and remove a didChange workaround that is no longer needed after clice-io/eventide#95. ### stateful_worker.cpp **Compile handler**: move `params` → `doc` field copy **after** `strand.lock()`. Previously the copy happened before the lock, so a concurrent Compile request waiting on the strand could overwrite `doc.text` while `et::queue` was reading it on the thread pool: ``` T1: Compile A → doc.text = text_A → lock → et::queue reads doc.text T2: Compile B → doc.text = text_B → waits for strand (overwrites!) T3: et::queue sees text_B instead of text_A → PCH/text mismatch ``` **DocumentUpdate handler**: only mark `dirty`, stop modifying `doc.text`/`doc.version`. The event loop notification can fire while `et::queue` work is running on the thread pool — writing `doc.text` from one thread while reading it from another is a data race. ### master_server.cpp Remove the `{0,0}-{0,0}` range workaround for whole-document `didChange`. eventide's variant deserialization now correctly rejects `TextDocumentContentChangePartial` when the `range` field is absent (clice-io/eventide#95), so `TextDocumentContentChangeWholeDocument` is matched as intended. ### protocol.h Remove `text` field from `DocumentUpdateParams` — the worker no longer needs it since DocumentUpdate only sets the dirty flag. ### Integration tests (+312 lines) Extend test_staleness.py from 5 to 14 tests covering document lifecycle: - `didChange` body edit → recompilation with updated diagnostics - `didChange` preamble edit → PCH rebuild + clean recompilation - `didClose` + reopen → compiles fresh from disk - `didClose` → hover returns None - `didSave` header → dependent file recompiles - `didSave` module → CompileGraph dependents invalidated ## Test plan - [x] 422 unit tests pass (426 on CI with extra test suites) - [x] 14 integration tests pass locally - [x] Depends on clice-io/eventide#95 (merged) 🤖 Generated with [Claude Code](https://claude.com/claude-code) <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **New Features** * Smaller document-update notifications sent to background workers (only path and version). * **Bug Fixes** * Reduced races and unnecessary work between update and compile flows. * Prevented notifications from overwriting in-memory document text, improving state consistency. * Safer concurrent handling to avoid mid-request eviction of active documents. * **Tests** * Added integration tests for staleness, dependency propagation, and LSP lifecycle. * Updated unit tests to match revised update behavior. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> |
||
|
|
bc04845293 |
refactor(tests): CMake-based CDB, workspace fixture, test cleanup (#378)
## Summary - **CMake-based CDB generation for module tests**: Replace hand-written compile_commands.json with CMakeLists.txt (CMake 3.28 `FILE_SET CXX_MODULES`) in all 26 `tests/data/modules/*/` directories. CDB is generated on-the-fly via `cmake -G Ninja` during test setup. - **`@pytest.mark.workspace()` decorator**: Introduce a marker + fixture pattern so tests declare their workspace via decorator and receive a resolved `workspace` path. The fixture auto-generates CDB when a CMakeLists.txt is present. - **`CliceClient` helper methods**: Add `initialize()`, `open()`, `wait_diagnostics()`, and `open_and_wait()` to reduce boilerplate across all test files. - **Use `asyncio_mode = "auto"`**: Switch from `@pytest_asyncio.fixture` + `@pytest.mark.asyncio` to `@pytest.fixture` + auto mode for proper Pylance type inference on fixtures. - **Test cleanup**: Remove redundant section separators and docstrings, delete `tests/pyproject.toml` (config moved to `pytest.ini`). - **Format task**: Add `.cppm` to `format-cpp` glob pattern. - **CI fix**: Disable `CMAKE_CXX_SCAN_FOR_MODULES` and prefer pixi clang++ to fix macOS CI where CMake rejects module scanning. ## Test plan - [x] All 26 module test directories have CMakeLists.txt with FILE_SET CXX_MODULES - [x] generate_cdb() produces valid compile_commands.json with module flags - [x] Integration tests pass locally - [ ] CI passes on all platforms (Linux, macOS, Windows) <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **Tests** * Unified fixtures and client workflow: new init/open/wait helpers, workspace marker support, bounded diagnostics waiting, CMake-based compilation-database generation, and directory-backed temp-file workflows; enabled asyncio test mode. * **Chores** * Added many C++20 module test projects and test data; removed prior test pyproject in favor of pytest config; updated formatter to include .cppm files. * **Style** * Reformatted many module/source implementations to consistent multi-line function bodies. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> |
||
|
|
020c2cb3cc |
feat: implement multi-process LSP server architecture (#364)
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> |