feat(worldline): filesystem-timeline tracer with web UI#403
Draft
wan9chi wants to merge 36 commits into
Draft
Conversation
Add as_os_str (Unix only) and to_cow_os_str (cross-platform) so callers can recover an OsStr from a deserialized NativePath without going through strip_path_prefix.
Adds the shared CallbackKind / CallbackRequest / CALLBACK_ACK wire types used by the synchronous round-trip between a traced process and the supervisor, plus the per-backend Payload endpoint fields (CallbackConf on Unix, callback_pipe_name + callback_mask on Windows). Both are absent / empty when no callback is registered.
Add FileEvent / FileEventKind / FileEventPath / BorrowedFile in a new
crate::callback module and Command::on_file_event(mask, callback) so a
consumer can register a callback that runs in the supervisor process.
Per-backend supervisor servers live in callback::{unix,windows}; the
Unix one wraps a Unix-domain socket and SCM_RIGHTS, the Windows one
wraps a named pipe with DuplicateHandle. SpawnError grows a
CallbackChannelCreation variant for binding failures.
No wiring into the OS-impl spawn() yet — that comes in follow-ups.
…tions The preload connects per-event to the supervisor's Unix-domain socket, passes the freshly opened (or about-to-be-closed) fd via SCM_RIGHTS, sends a length-prefixed CallbackRequest, and blocks reading a single ACK byte before letting the syscall return. The open hooks now call the real libc fn first, capture the result fd, keep the existing shared-memory event, and only then run the blocking post-open callback. New close/fclose hooks fire the pre-close callback while the fd is still valid; the fd's access mode is read with F_GETFL, and obviously-non-file paths (sockets, anon inodes) are filtered out before any round-trip. A thread-local reentrancy guard prevents the round-trip's own socket I/O from recursing. The callback channel is decoded from the EncodedPayload only when one is registered, so the no-callback path remains a single Option check.
Add NotifyResponse { Continue, ReturnFd { fd, cloexec } } and a small
HandlerResponse trait so a handler can ask the supervisor to install a
file descriptor into the target (via SECCOMP_IOCTL_NOTIF_ADDFD with the
SEND flag) and complete the syscall atomically with the new fd as the
result. Keeping HandlerResponse separate from SeccompNotifyHandler
means the impl_handler! macro stays untouched — handlers that always
continue rely on the default impl.
supervise() now delegates to a new supervise_with(init, syscalls)
which builds each per-connection handler via the init closure and
filters exactly the syscalls passed in, letting callers inject
per-spawn state and decide at runtime which syscalls to intercept.
Also exposes Caller::pid() and Fd::raw() — both needed by handlers
that want to look up /proc/<pid>/fdinfo/<fd> for the open mode of a
descriptor about to be closed.
The supervisor's SyscallHandler now optionally carries a FileCallback (injected via with_callback). When set, for each open*-syscall notification the supervisor opens the file itself with the target's flags, runs the callback on its own descriptor, and replies with a ReturnFd response so the kernel installs that descriptor into the target via ADDFD-SEND. For close notifications it reads the access mode out of /proc/<pid>/fdinfo/<fd>, opens a fresh read-only fd for the callback, then continues so the target performs the close. handle_open now takes an is_open_syscall flag so it only fires the post-open callback for real open* notifications — execve / stat / access route through handle_open purely for access recording and must not get an ADDFD response (execve does not return a descriptor). /dev/, /proc/ and /sys/ paths are skipped before any supervisor open to avoid spurious round-trips.
On non-musl hosts, when Command::on_file_event is set, spawn now binds a UnixCallbackServer and threads its socket path + access-mode mask into the Payload so the preload knows where to round-trip. On Linux, supervise_with is invoked with the full syscall list (including close) and a per-spawn closure that hands the FileCallback to each SyscallHandler instance. With no callback registered, close is filtered out of the syscall list so per-close overhead stays at zero. The wait task drains the callback server (so every in-flight callback finishes) before locking the IPC channel.
The preload connects per-event to the supervisor's named pipe and sends [u32 len][CallbackRequest][u64 raw HANDLE], then blocks reading a single ACK byte. The raw handle is duplicated out of this process by the supervisor (DuplicateHandle); we just send its numeric value. NtCreateFile / NtOpenFile are restructured to call the real fn first (so file_handle is populated), keep the existing shared-memory event, and only then run the post-open callback. A new NtClose detour fires the pre-close callback while the handle is still valid. NtClose runs for every handle type, so we keep a lock-free DashMap of file handles whose open mode matched the mask. Only handles in that map get a close callback, and the stored mode is reported back as the event's access mode. A thread-local IN_CALLBACK guard suppresses the recursion that would otherwise happen when the round-trip's own CreateFileW / NtClose go back through the detours.
When Command::on_file_event is set, spawn now binds a WindowsCallbackServer on a uniquely named pipe, threads its name plus the access-mode mask into the Payload, and (after the child has been spawned suspended but before ResumeThread) hands the child's process handle to the server so DuplicateHandle on incoming callbacks can pull the target's raw HANDLE into this process. The wait task drains the callback server before locking the IPC channel so every in-flight callback completes.
read_verify opens the file, reads it, and asserts the content is non-empty — used by the seccomp blocking-callback test to prove the ADDFD-installed descriptor in the target is usable. read_verify_threads runs the same from four concurrent threads to exercise the callback path under concurrency.
file_callback.rs covers the preload backend on Linux glibc / macOS / Windows: blocking proof (target cannot progress while the callback runs), the supervisor can read the passed descriptor, the close callback fires before the close with a still-valid descriptor, the mask filters events, and registering no callback leaves access tracking unchanged. static_executable.rs adds two seccomp-backend cases (ADDFD round-trip + multi-threaded concurrent opens).
e29e264 to
a81d817
Compare
|
Review the following changes in direct dependencies. Learn more about Socket for GitHub.
|
- Gate fspy::callback::unix to non-musl: it's only used by the preload backend, which itself is excluded on musl, so leaving it in caused dead_code errors under cargo's -D warnings on musl. - Drop a redundant [`FileEvent`](crate::FileEvent) explicit link target — FileEvent is re-exported at the crate root. - Disambiguate the [`impl_handler`] intra-doc link in HandlerResponse's doc to the macro at the crate root.
…ndle_close/open_callback Skipping the thread-local reentrancy guard until the ctor has set the global client keeps these paths infallible during very early (libdyld / pre-ctor) opens and closes, where the thread-local accessor is the only non-trivial operation either path would do anyway. Also surface the actual stdout content in the cancellation test's assertion message — without it, a child that crashes before writing 'ready' shows up as a bare assertion failure with no clue about why.
On macOS, tempfile::tempdir() returns paths under /var/folders/..., which is a symlink to /private/var/folders/.... The open syscall sees the literal /var/ path, but the close handler resolves the fd via F_GETPATH which returns the canonical /private/var/ form. The tests filter callback events by 'starts_with(dir_path)' — without canonical- ization the close event's canonical path fails to match the non- canonical prefix, so the Closing event is dropped and the close callback test fails on macOS only.
Add the worldline crate skeleton (bin + lib) and register the loro dependency in the workspace. Includes a temporary Loro API probe that exercises LoroDoc/LoroText/export to confirm the dependency and method names compile on the pinned nightly toolchain; replaced by the real capture store in the next commit. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Add the core capture layer: - A Loro-CRDT-backed Snapshotter storing each file version (text files as delta-stored LoroText, binary files as raw bytes), an ordered event log, and a raw terminal-output byte log. Each version commits the doc and snaps the state frontier for later time-travel reconstruction. - The fspy write-open/write-close callback wiring that reads file content via the supervisor descriptor without disturbing the traced process's offset. - IgnoreSet: default directory ignores (.git/node_modules/target/...) scoped to the working directory, plus user --ignore globs (wax). A unit test exercises record/dedup/export/import/checkout round-trip. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Add the run module that spawns the program via fspy and captures its output: - Transport is chosen automatically: an interactive Unix terminal gets a real PTY (openpty + raw mode + AsyncFd I/O pump + SIGWINCH), wiring three dups of the slave fd into fspy's stdio. Everything else (non-interactive stdio, and all of Windows) uses pipes. - The write callback reads file content by path, not via the passed descriptor: on the preload backends the supervisor receives the traced process's own fd, which is write-only for a write open and thus unreadable. Adds an off-musl capture integration test (run a child that writes files, assert the timeline + Loro snapshot round-trip). fspy can't run in the local sandbox, so this test is validated on CI. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
macOS reports a write-open's path as the open() argument (e.g. /tmp/x) but a write-close's path via F_GETPATH (e.g. /private/tmp/x). Canonicalizing the event path (and the ignore-set root) maps both to one Loro key, so a file's pre-write and post-write versions belong to the same timeline entry. Falls back to the raw path when canonicalization fails (deleted files). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… capture - serve: reconstruct a deduped timeline JSON from the Loro snapshot (checkout each event's frontier, content-address file blobs) and a minimal async HTTP/1.1 server (GET-only, localhost) for the embedded SPA, /api/data and /api/output. Browser is WASM-free; loro stays Rust-side. - build.rs embeds the committed ui/dist via include_bytes!. - capture: add a Final event that re-reads every seen file after the program exits, capturing writes whose close is never observable (shell redirections close the fd implicitly on child exit). - CLI (clap) + main: --port/--ignore/--no-default-ignores/--no-open/--no-serve /--dump, trailing program + args. - ui: Vue 3 + TypeScript + xterm.js app (timeline, per-version file viewer with line diff, terminal replay). dist committed and embedded; up-to-dateness checked via a toolchain-only source-hash test. Isolated from the root pnpm workspace and from oxfmt/typos. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- A local HTTP-server test (no fspy) drives /api/data, /api/output, the embedded index, and a 404 over a real socket. - An off-musl PTY test runs a child through the real PTY transport and asserts raw output and file writes are captured. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
cargo-shear flagged `worldline = { path = ... }` in [workspace.dependencies]:
nothing depends on the worldline binary crate, so the entry is unused. It
remains a workspace member via the crates/* glob.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Caught by the typos CI check in 'Format and Check Deps'.
…musl - The I/O pump parks a blocking `tokio::io::stdin` reader on the controlling terminal; dropping the runtime would wait on it forever, so the program hung after the child exited. Exit via `std::process::exit` (as `vt` does). - Fall back to pipes on musl, whose PTY internals are concurrency-unsafe (`pty_terminal`), keeping the real-PTY path to glibc/macOS Unix. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- capture.rs now also validates the reconstructed timeline the web UI receives (final file states, the preserved intermediate version, and blob dedup). - Drive the worldline binary through the e2e_snapshots PTY harness: a bin artifact dep exposes it as CARGO_BIN_FILE_WORLDLINE, the harness gains a 'worldline' program and a 'unix-non-musl' platform gate, and a new worldline_pty fixture asserts the child sees a real TTY (PTY redirection) and its output is forwarded. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Move the worldline bin artifact dev-dependency into a [target.*] section (covering all targets via cfg(any(unix, windows))). cargo-autoinherit otherwise rewrites it to a plain `workspace = true` dependency, dropping the `artifact = "bin"` attribute and the CARGO_BIN_FILE_WORLDLINE the e2e harness needs. Target sections are left untouched (as preload_test_lib already is). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…acks Add `FileEvent::raw_fd` (and a `fd` field on the wire `CallbackRequest`) carrying the traced process's own descriptor number — a fd on Unix, a HANDLE value on Windows. With the pid it lets a consumer pair an Opened event with the Closing of the same descriptor. Both preload backends report it; the seccomp backend reports -1 (the target's fd is assigned by the kernel via ADDFD only after the open callback runs). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Replace the open/close/final timeline with a flat list of writes. Each write is one open->close lifecycle of a descriptor opened for writing, pairing the open-callback snapshot (before) with the close-callback snapshot (after) of the same (pid, fd); the seccomp backend (raw_fd = -1) falls back to per-path pairing. Contents are stored per-path in Loro so repeated near-identical writes stay delta-compressed, and each write records before/after frontiers the server checks out to render a diff. The web UI now lists writes, each showing the file's before->after diff and the terminal output up to that write. dist + source hash rebuilt. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
On macOS, libuv (and thus Node) closes descriptors via `close$NOCANCEL`,
a distinct libc symbol from `close`. The preload only interposed `close`,
so a write-close from Node was never observed: its `Closing` callback never
fired. For worldline this meant `fs.writeFileSync` wrote the file but the
run captured zero writes ("No file writes were captured"); Rust's std uses
plain `close`, which is why it worked and hid the gap.
Interpose `close$NOCANCEL` onto the same hook (forwarding through the regular
`close`, which closes the descriptor identically). Add two `#[ignore]` Node
regression tests, run in CI's `--ignored` step on every platform: one at the
fspy callback layer asserting Opened+Closing both fire for `fs.writeFileSync`,
and one at the worldline layer mirroring the `worldline node` scenario and
asserting the captured write's before/after content.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
On glibc Linux, libuv (and therefore Node) closes every descriptor via `syscall(SYS_close, fd)` rather than the `close` libc symbol — confirmed by tracing: a `fs.writeFileSync` triggers zero `close()`/`__close_nocancel()` calls and routes every close through the `syscall` wrapper. The `close` symbol interposition therefore never observed Node's closes, so the `Closing` callback never fired and worldline captured no writes (the Linux analog of the macOS `close$NOCANCEL` gap; musl is unaffected because its seccomp backend catches the close syscall directly). The preload already interposes the libc `syscall` wrapper (for SYS_statx) and forwards Node's SYS_close through it — it just didn't act on it. Fire the close callback for SYS_close there, mirroring the existing handling. The `close_callback_fires_for_node_write` regression test now passes on Linux too. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
A rewrite via `writeFileSync` (or any `O_TRUNC` open) empties the file at open, before worldline's open-time snapshot runs in the supervisor — so the snapshot read back empty and overwrote the file's real prior content. A second write to the same file then showed `"" -> "new"` instead of `"old" -> "new"`. Only adopt the open-time read as the `before` when it is non-empty (a non-truncating open, or a pre-existing file). An empty read for a path already tracked is treated as a truncating open, and the last-recorded content (the previous write's `after`) is kept as the `before`. Adds a node-free regression test covering repeated truncating writes. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
0dc1576 to
097677a
Compare
…ace and fd reuse Hardening from an audit of the capture pipeline, fixing four ways a write's `before` content could be wrong: - Rename-over and delete-and-recreate (the write-tmp-then-rename pattern is ubiquitous in build tools) resurrected the replaced file's stale content. The file's on-disk identity (inode+device, or volume+index on Windows) is now tracked, so an empty truncating-open read keeps the prior content only when it's the *same* file; a replaced path is treated as fresh (empty `before`). - Under the seccomp backend (raw_fd == -1) every open of a path shared one slot, so a second overlapping write lost its `before`. Pending opens are now a LIFO stack per correlation key. - A descriptor reused for another file (e.g. via dup2, which closes the old target without a close callback) could mispair a close with a leaked open of a different file. The close now validates the open's path and falls back to the file's prior content on a mismatch. The in-place truncating-rewrite fix (keeping the prior content as `before`) is preserved. Adds store-level regression tests for each case. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
097677a to
9aedfb9
Compare
volume_serial_number / file_index on Windows (used in the capture callback to compute a file's on-disk identity) are still gated behind the unstable 'windows_by_handle' feature (issue #63010).
The feature is now stable in the pinned nightly-2026-05-24 toolchain (rustc cross-checks via cargo-xwin compile cleanly without the gate), and declaring it triggers '-D unused-features'. The earlier 'use of unstable library feature' CI Windows failure must have been on a different toolchain refresh — retry with the gate dropped and see if real-Windows agrees with the cross-check.
… code) - serve: `reconstruct` resolves each write's content via `LoroMap::get(path)` + `get_deep_value` instead of deep-cloning the entire map per lookup (was O(writes × all-file-content)). - capture: read the content and the file identity from one opened handle (a single path walk + an `fstat`) rather than a separate `metadata` stat, on the callback hot path that blocks the traced process. - capture: compute the fallback `before` frontier only when no matching open was observed, not on every close. - ui: drop a dead file-list/file-row stylesheet block and a no-op `changed()` class binding (no `.event.changed` rule ever existed); rebuild the bundle. - docs: correct the `.source-hash` path in the dist-up-to-date test header. No behavior change. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Adds
worldline, a CLI for seeing what a program does to the filesystem.worldline <program> [args...]runs the program, records how every file changes as it runs (before and after each write) alongside the program's terminal output, then opens a local web UI to scrub through the timeline — viewing the filesystem state, per-file diffs, and the terminal at each point..git,node_modules,target, …), customizable via--ignore.Built on this branch's blocking open/close supervisor callbacks. CI is green on Linux, macOS (x64/arm64), Windows, and musl.
🤖 Generated with Claude Code