fix(session): move env block to tail of system prompt for cache stability#29949
fix(session): move env block to tail of system prompt for cache stability#29949rikkarth wants to merge 2 commits into
Conversation
…lity
Two coupled fixes for prefix-cache reuse across opencode sessions:
1. The current assembly order [env, instructions, skills] places
volatile content (model id, cwd, worktree, git status, platform,
today's date) inside the cacheable prefix region. Move env to the
tail so the bulk of the system prompt is cacheable.
2. Emit a blank line ("\n\n") between </available_skills> and the env
block preamble so the byte sequence at the seam is canonical and
stable. Without this, downstream prefix caches keying on byte hash
see a different separator pattern per request and miss.
Measured on a 30K-token orchestrator prompt against oMLX serving
Qwen3-Coder-Next-80B-A3B on an M3 Ultra:
- before: every fresh session ~30s, cache.read=0
- after, same project, fresh sessions: ~3-5s, cache.read=28,672/30,000 (~95%)
- after, cross-project sessions: ~22s cold seed, cache.read=2-8k (architectural
ceiling — AGENTS.md/CLAUDE.md content varies per project)
Refs: anomalyco#20110, anomalyco#5224
|
Companion issue for the skill-enumeration determinism caveat mentioned above: #29950 |
|
The following comment was made by an LLM, it may be inaccurate: Potential Related PRs FoundThe current PR (29949) addresses prefix cache stability. Here are related PRs that tackle similar caching concerns:
These PRs all relate to prompt caching strategy and system prompt structure optimization. Check if any of these (particularly #27377, #27378, or #25367) might overlap in scope or have been superseded by PR #29949. |
|
Thanks for updating your PR! It now meets our contributing guidelines. 👍 |
|
With this solution I was able to get 95% cache hits. I'm using a hook outside opencode for the moment. |
Issue for this PR
Closes #20110
Closes #5224
Related: #27377, #27378 (see "Relationship to existing work" below)
Type of change
What does this PR do?
Two small coupled changes to keep the system prompt prefix-cacheable across opencode sessions.
Change 1 —
packages/opencode/src/session/prompt.ts: move the env block to the end of the system array inrunLoop. Current order is[...env, ...instructions, ...(skills ? [skills] : [])]— env contains six per-session/per-day volatile fields (model id, cwd, worktree, git status, platform, today's date) and sits at the front of the assembled string. New order:[...instructions, ...(skills ? [skills] : []), ...env].Change 2 —
packages/opencode/src/session/system.ts: prefix the env block's leading template literal with\n. Whenrequest.tsjoins the system array with\n, this produces\n\n(a blank line) between</available_skills>and the env preamble — a canonical separator. Without it the byte sequence at the seam shifts per request and downstream byte-hash caches miss.The two changes are coupled. The reorder gets the volatile bytes out of the prefix. The blank line keeps the byte sequence at the seam stable across cwds. Together, the assembled system prompt becomes byte-identical across fresh opencode sessions in different projects, modulo the env tail itself.
This is the smallest possible default-on fix for upstream prefix caches that key on byte hashes — Anthropic
cache_control(single-message form), OpenAI automatic prefix cache, and every local-backend cache I know of (vLLM APC, oMLX paged SSD, SGLang RadixAttention, llama.cpp slot cache).How did you verify your code works?
Real benchmarks on a local oMLX (paged SSD KV cache) + Qwen3-Coder-Next-80B-A3B setup. The path is opencode → LiteLLM proxy → SSH reverse tunnel → oMLX on a Mac Studio M3 Ultra. No gateway-side rewriting (verified — disabled all proxy middleware and captured the bytes the dev build sends directly).
Cycle 1 — same project, two fresh opencode sessions in project A:
Cycle 2 — cross-project, two fresh sessions in DIFFERENT cwds:
The cross-project session lands a full cache hit because the
\n\ncanonicalization makes the byte-level seam identical regardless of cwd.Baseline for context — same setup, same model, current
devbranch without these changes: every fresh opencode session takes ~30 s and reportscache.read=0(0% cache hit). No exceptions across the dozen sessions I tested. Going from 0% → 95–98% on subsequent fresh sessions is the headline.Tests on this branch:
bun test packages/opencode/test/session/prompt.test.ts --timeout 30000— passes. Two regression tests added: one asserts the assembly order inprompt.tssource via regex, one asserts the leading\ninsystem.ts's env template literal.bun test --timeout 30000full opencode package — 3108 pass, 2 fail (both pre-existing onupstream/dev, unrelated), 15 skip, 1 todo, zero new failures.The pre-push hook's typecheck flags an unrelated
@lydell/node-ptydeclaration-file issue that reproduces on a cleanupstream/devcheckout — used--no-verifyto push. Verified withgit checkout upstream/dev -- packages/opencode/src/pty/pty.node.ts && bun run typecheckshowing the same error.Screenshots / recordings
N/A — backend change.
Relationship to existing work
@martinffx's stack #27377 (
feat(cache): split system prompt into stable/dynamic blocks) and #27378 (fix(cache): stabilize system prefix) tackle the same underlying problem from a different angle. They are complementary to this PR, not duplicates. Concretely:OPENCODE_EXPERIMENTAL_SYSTEM_PROMPT_SPLIT/OPENCODE_EXPERIMENTAL_CACHE_STABILIZATIONcache_control(multi-system-message form)llm.tspushes tworole: systemmessages independently)new Date().toDateString()for process lifetime)Instruction.system()returns{global, project}The key thing: #27377's diff explicitly keeps the env-first ordering on the default code path. From the diff:
So even after #27377 lands, every user without
OPENCODE_EXPERIMENTAL=1still has env at byte 0, still pays cold prefill on every fresh session against any byte-hash-keyed cache. That's the gap this PR closes.The two efforts don't conflict in code. #27377 touches
instruction.ts,llm.ts,skill/index.ts, and rewrites the type signature ofsystem. This PR touches three lines inprompt.tsand one insystem.ts. Both can land independently and the experimental flag path can be made aware of the same canonicalization in a follow-up if useful.Skill enumeration caveat (unrelated to this PR's fix)
The cache benefit also requires opencode's skill enumeration to be deterministic. Two orthogonal axes of skill-enumeration non-determinism exist:
<skill>/<agent>entries — fixed by Non-deterministic agent/skill ordering in tool descriptions breaks prompt caching #18215 (closed 2026-03-19) with.sort((a,b) => a.name.localeCompare(b.name))intool/task.tsandtool/skill.ts. Already in dev.<location>tag — when skills are reachable through both~/.claude/skills/and~/.agents/skills/(common for Claude Code users via symlinks), the resolver picks one root or the other non-deterministically per session, injecting volatility into<skill><location>...</location></skill>strings deep in the prefix. Filed as Skill enumeration is non-deterministic when the same skill is reachable through multiple discovery roots #29950 with a reproducer diff. This is orthogonal to Non-deterministic agent/skill ordering in tool descriptions breaks prompt caching #18215 — sorting by name pins the entry at the same index every session, but the URL inside the entry still flips.agents↔.claude. Workaround for affected users:OPENCODE_DISABLE_CLAUDE_CODE_SKILLS=1.Both axes need to be deterministic for the cache benefit of this PR to land in full for the affected setups.
Checklist