Skip to content

feat(worldline): filesystem-timeline tracer with web UI#403

Draft
wan9chi wants to merge 36 commits into
mainfrom
claude/fspy-supervisor-callbacks-Uvauu
Draft

feat(worldline): filesystem-timeline tracer with web UI#403
wan9chi wants to merge 36 commits into
mainfrom
claude/fspy-supervisor-callbacks-Uvauu

Conversation

@wan9chi
Copy link
Copy Markdown
Member

@wan9chi wan9chi commented May 28, 2026

Adds worldline, a CLI for seeing what a program does to the filesystem.

worldline <program> [args...] runs the program, records how every file changes as it runs (before and after each write) alongside the program's terminal output, then opens a local web UI to scrub through the timeline — viewing the filesystem state, per-file diffs, and the terminal at each point.

  • Works under both piped and interactive (real-terminal) usage.
  • Sensible default path ignores (.git, node_modules, target, …), customizable via --ignore.
  • The web UI ships embedded in the binary — nothing extra to install to view a run.

Built on this branch's blocking open/close supervisor callbacks. CI is green on Linux, macOS (x64/arm64), Windows, and musl.

🤖 Generated with Claude Code

claude added 11 commits May 15, 2026 10:30
Add as_os_str (Unix only) and to_cow_os_str (cross-platform) so callers
can recover an OsStr from a deserialized NativePath without going
through strip_path_prefix.
Adds the shared CallbackKind / CallbackRequest / CALLBACK_ACK wire
types used by the synchronous round-trip between a traced process and
the supervisor, plus the per-backend Payload endpoint fields
(CallbackConf on Unix, callback_pipe_name + callback_mask on Windows).
Both are absent / empty when no callback is registered.
Add FileEvent / FileEventKind / FileEventPath / BorrowedFile in a new
crate::callback module and Command::on_file_event(mask, callback) so a
consumer can register a callback that runs in the supervisor process.
Per-backend supervisor servers live in callback::{unix,windows}; the
Unix one wraps a Unix-domain socket and SCM_RIGHTS, the Windows one
wraps a named pipe with DuplicateHandle. SpawnError grows a
CallbackChannelCreation variant for binding failures.

No wiring into the OS-impl spawn() yet — that comes in follow-ups.
…tions

The preload connects per-event to the supervisor's Unix-domain socket,
passes the freshly opened (or about-to-be-closed) fd via SCM_RIGHTS,
sends a length-prefixed CallbackRequest, and blocks reading a single
ACK byte before letting the syscall return.

The open hooks now call the real libc fn first, capture the result fd,
keep the existing shared-memory event, and only then run the blocking
post-open callback. New close/fclose hooks fire the pre-close callback
while the fd is still valid; the fd's access mode is read with
F_GETFL, and obviously-non-file paths (sockets, anon inodes) are
filtered out before any round-trip. A thread-local reentrancy guard
prevents the round-trip's own socket I/O from recursing.

The callback channel is decoded from the EncodedPayload only when one
is registered, so the no-callback path remains a single Option check.
Add NotifyResponse { Continue, ReturnFd { fd, cloexec } } and a small
HandlerResponse trait so a handler can ask the supervisor to install a
file descriptor into the target (via SECCOMP_IOCTL_NOTIF_ADDFD with the
SEND flag) and complete the syscall atomically with the new fd as the
result. Keeping HandlerResponse separate from SeccompNotifyHandler
means the impl_handler! macro stays untouched — handlers that always
continue rely on the default impl.

supervise() now delegates to a new supervise_with(init, syscalls)
which builds each per-connection handler via the init closure and
filters exactly the syscalls passed in, letting callers inject
per-spawn state and decide at runtime which syscalls to intercept.

Also exposes Caller::pid() and Fd::raw() — both needed by handlers
that want to look up /proc/<pid>/fdinfo/<fd> for the open mode of a
descriptor about to be closed.
The supervisor's SyscallHandler now optionally carries a FileCallback
(injected via with_callback). When set, for each open*-syscall
notification the supervisor opens the file itself with the target's
flags, runs the callback on its own descriptor, and replies with a
ReturnFd response so the kernel installs that descriptor into the
target via ADDFD-SEND. For close notifications it reads the access
mode out of /proc/<pid>/fdinfo/<fd>, opens a fresh read-only fd for
the callback, then continues so the target performs the close.

handle_open now takes an is_open_syscall flag so it only fires the
post-open callback for real open* notifications — execve / stat /
access route through handle_open purely for access recording and must
not get an ADDFD response (execve does not return a descriptor).

/dev/, /proc/ and /sys/ paths are skipped before any supervisor open
to avoid spurious round-trips.
On non-musl hosts, when Command::on_file_event is set, spawn now binds
a UnixCallbackServer and threads its socket path + access-mode mask
into the Payload so the preload knows where to round-trip.

On Linux, supervise_with is invoked with the full syscall list
(including close) and a per-spawn closure that hands the FileCallback
to each SyscallHandler instance. With no callback registered, close is
filtered out of the syscall list so per-close overhead stays at zero.

The wait task drains the callback server (so every in-flight callback
finishes) before locking the IPC channel.
The preload connects per-event to the supervisor's named pipe and
sends [u32 len][CallbackRequest][u64 raw HANDLE], then blocks reading
a single ACK byte. The raw handle is duplicated out of this process
by the supervisor (DuplicateHandle); we just send its numeric value.

NtCreateFile / NtOpenFile are restructured to call the real fn first
(so file_handle is populated), keep the existing shared-memory event,
and only then run the post-open callback. A new NtClose detour fires
the pre-close callback while the handle is still valid.

NtClose runs for every handle type, so we keep a lock-free DashMap of
file handles whose open mode matched the mask. Only handles in that
map get a close callback, and the stored mode is reported back as the
event's access mode. A thread-local IN_CALLBACK guard suppresses the
recursion that would otherwise happen when the round-trip's own
CreateFileW / NtClose go back through the detours.
When Command::on_file_event is set, spawn now binds a
WindowsCallbackServer on a uniquely named pipe, threads its name plus
the access-mode mask into the Payload, and (after the child has been
spawned suspended but before ResumeThread) hands the child's process
handle to the server so DuplicateHandle on incoming callbacks can pull
the target's raw HANDLE into this process.

The wait task drains the callback server before locking the IPC
channel so every in-flight callback completes.
read_verify opens the file, reads it, and asserts the content is
non-empty — used by the seccomp blocking-callback test to prove the
ADDFD-installed descriptor in the target is usable.
read_verify_threads runs the same from four concurrent threads to
exercise the callback path under concurrency.
file_callback.rs covers the preload backend on Linux glibc / macOS /
Windows: blocking proof (target cannot progress while the callback
runs), the supervisor can read the passed descriptor, the close
callback fires before the close with a still-valid descriptor, the
mask filters events, and registering no callback leaves access
tracking unchanged.

static_executable.rs adds two seccomp-backend cases (ADDFD round-trip
+ multi-threaded concurrent opens).
@wan9chi wan9chi changed the title chore: ongoing experiment (CI only) experiment May 28, 2026
@wan9chi wan9chi force-pushed the claude/fspy-supervisor-callbacks-Uvauu branch from e29e264 to a81d817 Compare May 28, 2026 03:27
@socket-security
Copy link
Copy Markdown

socket-security Bot commented May 28, 2026

Review the following changes in direct dependencies. Learn more about Socket for GitHub.

Diff Package Supply Chain
Security
Vulnerability Quality Maintenance License
Addedcargo/​dashmap@​6.2.110010093100100
Updatedcargo/​once_cell@​1.21.3 ⏵ 1.21.410010093100100

View full report

claude and others added 12 commits May 28, 2026 03:41
- Gate fspy::callback::unix to non-musl: it's only used by the preload
  backend, which itself is excluded on musl, so leaving it in caused
  dead_code errors under cargo's -D warnings on musl.
- Drop a redundant [`FileEvent`](crate::FileEvent) explicit link
  target — FileEvent is re-exported at the crate root.
- Disambiguate the [`impl_handler`] intra-doc link in
  HandlerResponse's doc to the macro at the crate root.
…ndle_close/open_callback

Skipping the thread-local reentrancy guard until the ctor has set the
global client keeps these paths infallible during very early (libdyld /
pre-ctor) opens and closes, where the thread-local accessor is the only
non-trivial operation either path would do anyway.

Also surface the actual stdout content in the cancellation test's
assertion message — without it, a child that crashes before writing
'ready' shows up as a bare assertion failure with no clue about why.
On macOS, tempfile::tempdir() returns paths under /var/folders/...,
which is a symlink to /private/var/folders/.... The open syscall sees
the literal /var/ path, but the close handler resolves the fd via
F_GETPATH which returns the canonical /private/var/ form. The tests
filter callback events by 'starts_with(dir_path)' — without canonical-
ization the close event's canonical path fails to match the non-
canonical prefix, so the Closing event is dropped and the close
callback test fails on macOS only.
Add the worldline crate skeleton (bin + lib) and register the loro
dependency in the workspace. Includes a temporary Loro API probe that
exercises LoroDoc/LoroText/export to confirm the dependency and method
names compile on the pinned nightly toolchain; replaced by the real
capture store in the next commit.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Add the core capture layer:
- A Loro-CRDT-backed Snapshotter storing each file version (text files as
  delta-stored LoroText, binary files as raw bytes), an ordered event log,
  and a raw terminal-output byte log. Each version commits the doc and snaps
  the state frontier for later time-travel reconstruction.
- The fspy write-open/write-close callback wiring that reads file content via
  the supervisor descriptor without disturbing the traced process's offset.
- IgnoreSet: default directory ignores (.git/node_modules/target/...) scoped
  to the working directory, plus user --ignore globs (wax).

A unit test exercises record/dedup/export/import/checkout round-trip.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Add the run module that spawns the program via fspy and captures its
output:
- Transport is chosen automatically: an interactive Unix terminal gets a
  real PTY (openpty + raw mode + AsyncFd I/O pump + SIGWINCH), wiring three
  dups of the slave fd into fspy's stdio. Everything else (non-interactive
  stdio, and all of Windows) uses pipes.
- The write callback reads file content by path, not via the passed
  descriptor: on the preload backends the supervisor receives the traced
  process's own fd, which is write-only for a write open and thus unreadable.

Adds an off-musl capture integration test (run a child that writes files,
assert the timeline + Loro snapshot round-trip). fspy can't run in the local
sandbox, so this test is validated on CI.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
macOS reports a write-open's path as the open() argument (e.g. /tmp/x) but
a write-close's path via F_GETPATH (e.g. /private/tmp/x). Canonicalizing the
event path (and the ignore-set root) maps both to one Loro key, so a file's
pre-write and post-write versions belong to the same timeline entry. Falls
back to the raw path when canonicalization fails (deleted files).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… capture

- serve: reconstruct a deduped timeline JSON from the Loro snapshot (checkout
  each event's frontier, content-address file blobs) and a minimal async
  HTTP/1.1 server (GET-only, localhost) for the embedded SPA, /api/data and
  /api/output. Browser is WASM-free; loro stays Rust-side.
- build.rs embeds the committed ui/dist via include_bytes!.
- capture: add a Final event that re-reads every seen file after the program
  exits, capturing writes whose close is never observable (shell redirections
  close the fd implicitly on child exit).
- CLI (clap) + main: --port/--ignore/--no-default-ignores/--no-open/--no-serve
  /--dump, trailing program + args.
- ui: Vue 3 + TypeScript + xterm.js app (timeline, per-version file viewer with
  line diff, terminal replay). dist committed and embedded; up-to-dateness
  checked via a toolchain-only source-hash test. Isolated from the root pnpm
  workspace and from oxfmt/typos.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- A local HTTP-server test (no fspy) drives /api/data, /api/output, the
  embedded index, and a 404 over a real socket.
- An off-musl PTY test runs a child through the real PTY transport and asserts
  raw output and file writes are captured.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
cargo-shear flagged `worldline = { path = ... }` in [workspace.dependencies]:
nothing depends on the worldline binary crate, so the entry is unused. It
remains a workspace member via the crates/* glob.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Caught by the typos CI check in 'Format and Check Deps'.
@wan9chi wan9chi changed the title experiment feat(worldline): filesystem-timeline tracer with web UI Jun 1, 2026
wan9chi and others added 3 commits June 1, 2026 22:38
…musl

- The I/O pump parks a blocking `tokio::io::stdin` reader on the controlling
  terminal; dropping the runtime would wait on it forever, so the program hung
  after the child exited. Exit via `std::process::exit` (as `vt` does).
- Fall back to pipes on musl, whose PTY internals are concurrency-unsafe
  (`pty_terminal`), keeping the real-PTY path to glibc/macOS Unix.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- capture.rs now also validates the reconstructed timeline the web UI receives
  (final file states, the preserved intermediate version, and blob dedup).
- Drive the worldline binary through the e2e_snapshots PTY harness: a bin
  artifact dep exposes it as CARGO_BIN_FILE_WORLDLINE, the harness gains a
  'worldline' program and a 'unix-non-musl' platform gate, and a new
  worldline_pty fixture asserts the child sees a real TTY (PTY redirection) and
  its output is forwarded.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Move the worldline bin artifact dev-dependency into a [target.*] section
(covering all targets via cfg(any(unix, windows))). cargo-autoinherit
otherwise rewrites it to a plain `workspace = true` dependency, dropping the
`artifact = "bin"` attribute and the CARGO_BIN_FILE_WORLDLINE the e2e harness
needs. Target sections are left untouched (as preload_test_lib already is).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
wan9chi and others added 6 commits June 1, 2026 23:26
…acks

Add `FileEvent::raw_fd` (and a `fd` field on the wire `CallbackRequest`)
carrying the traced process's own descriptor number — a fd on Unix, a HANDLE
value on Windows. With the pid it lets a consumer pair an Opened event with
the Closing of the same descriptor. Both preload backends report it; the
seccomp backend reports -1 (the target's fd is assigned by the kernel via
ADDFD only after the open callback runs).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Replace the open/close/final timeline with a flat list of writes. Each write
is one open->close lifecycle of a descriptor opened for writing, pairing the
open-callback snapshot (before) with the close-callback snapshot (after) of
the same (pid, fd); the seccomp backend (raw_fd = -1) falls back to per-path
pairing. Contents are stored per-path in Loro so repeated near-identical
writes stay delta-compressed, and each write records before/after frontiers
the server checks out to render a diff.

The web UI now lists writes, each showing the file's before->after diff and
the terminal output up to that write. dist + source hash rebuilt.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
On macOS, libuv (and thus Node) closes descriptors via `close$NOCANCEL`,
a distinct libc symbol from `close`. The preload only interposed `close`,
so a write-close from Node was never observed: its `Closing` callback never
fired. For worldline this meant `fs.writeFileSync` wrote the file but the
run captured zero writes ("No file writes were captured"); Rust's std uses
plain `close`, which is why it worked and hid the gap.

Interpose `close$NOCANCEL` onto the same hook (forwarding through the regular
`close`, which closes the descriptor identically). Add two `#[ignore]` Node
regression tests, run in CI's `--ignored` step on every platform: one at the
fspy callback layer asserting Opened+Closing both fire for `fs.writeFileSync`,
and one at the worldline layer mirroring the `worldline node` scenario and
asserting the captured write's before/after content.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
On glibc Linux, libuv (and therefore Node) closes every descriptor via
`syscall(SYS_close, fd)` rather than the `close` libc symbol — confirmed by
tracing: a `fs.writeFileSync` triggers zero `close()`/`__close_nocancel()`
calls and routes every close through the `syscall` wrapper. The `close`
symbol interposition therefore never observed Node's closes, so the `Closing`
callback never fired and worldline captured no writes (the Linux analog of
the macOS `close$NOCANCEL` gap; musl is unaffected because its seccomp
backend catches the close syscall directly).

The preload already interposes the libc `syscall` wrapper (for SYS_statx) and
forwards Node's SYS_close through it — it just didn't act on it. Fire the
close callback for SYS_close there, mirroring the existing handling. The
`close_callback_fires_for_node_write` regression test now passes on Linux too.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
A rewrite via `writeFileSync` (or any `O_TRUNC` open) empties the file at
open, before worldline's open-time snapshot runs in the supervisor — so the
snapshot read back empty and overwrote the file's real prior content. A second
write to the same file then showed `"" -> "new"` instead of `"old" -> "new"`.

Only adopt the open-time read as the `before` when it is non-empty (a
non-truncating open, or a pre-existing file). An empty read for a path already
tracked is treated as a truncating open, and the last-recorded content (the
previous write's `after`) is kept as the `before`. Adds a node-free regression
test covering repeated truncating writes.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@wan9chi wan9chi force-pushed the claude/fspy-supervisor-callbacks-Uvauu branch 2 times, most recently from 0dc1576 to 097677a Compare June 1, 2026 17:47
…ace and fd reuse

Hardening from an audit of the capture pipeline, fixing four ways a write's
`before` content could be wrong:

- Rename-over and delete-and-recreate (the write-tmp-then-rename pattern is
  ubiquitous in build tools) resurrected the replaced file's stale content. The
  file's on-disk identity (inode+device, or volume+index on Windows) is now
  tracked, so an empty truncating-open read keeps the prior content only when
  it's the *same* file; a replaced path is treated as fresh (empty `before`).
- Under the seccomp backend (raw_fd == -1) every open of a path shared one slot,
  so a second overlapping write lost its `before`. Pending opens are now a LIFO
  stack per correlation key.
- A descriptor reused for another file (e.g. via dup2, which closes the old
  target without a close callback) could mispair a close with a leaked open of a
  different file. The close now validates the open's path and falls back to the
  file's prior content on a mismatch.

The in-place truncating-rewrite fix (keeping the prior content as `before`) is
preserved. Adds store-level regression tests for each case.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@wan9chi wan9chi force-pushed the claude/fspy-supervisor-callbacks-Uvauu branch from 097677a to 9aedfb9 Compare June 1, 2026 17:50
claude and others added 3 commits June 1, 2026 17:51
volume_serial_number / file_index on Windows (used in the capture
callback to compute a file's on-disk identity) are still gated behind
the unstable 'windows_by_handle' feature (issue #63010).
The feature is now stable in the pinned nightly-2026-05-24 toolchain
(rustc cross-checks via cargo-xwin compile cleanly without the gate),
and declaring it triggers '-D unused-features'. The earlier
'use of unstable library feature' CI Windows failure must have
been on a different toolchain refresh — retry with the gate dropped
and see if real-Windows agrees with the cross-check.
… code)

- serve: `reconstruct` resolves each write's content via `LoroMap::get(path)` +
  `get_deep_value` instead of deep-cloning the entire map per lookup (was
  O(writes × all-file-content)).
- capture: read the content and the file identity from one opened handle (a
  single path walk + an `fstat`) rather than a separate `metadata` stat, on the
  callback hot path that blocks the traced process.
- capture: compute the fallback `before` frontier only when no matching open
  was observed, not on every close.
- ui: drop a dead file-list/file-row stylesheet block and a no-op `changed()`
  class binding (no `.event.changed` rule ever existed); rebuild the bundle.
- docs: correct the `.source-hash` path in the dist-up-to-date test header.

No behavior change.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants