Skip to content

fix(session): move env block to tail of system prompt for cache stability#29949

Open
rikkarth wants to merge 2 commits into
anomalyco:devfrom
rikkarth:fix/session-prompt-cache-friendly-order
Open

fix(session): move env block to tail of system prompt for cache stability#29949
rikkarth wants to merge 2 commits into
anomalyco:devfrom
rikkarth:fix/session-prompt-cache-friendly-order

Conversation

@rikkarth
Copy link
Copy Markdown

@rikkarth rikkarth commented May 30, 2026

Issue for this PR

Closes #20110
Closes #5224
Related: #27377, #27378 (see "Relationship to existing work" below)

Type of change

  • Bug fix

What does this PR do?

Two small coupled changes to keep the system prompt prefix-cacheable across opencode sessions.

Change 1 — packages/opencode/src/session/prompt.ts: move the env block to the end of the system array in runLoop. Current order is [...env, ...instructions, ...(skills ? [skills] : [])] — env contains six per-session/per-day volatile fields (model id, cwd, worktree, git status, platform, today's date) and sits at the front of the assembled string. New order: [...instructions, ...(skills ? [skills] : []), ...env].

Change 2 — packages/opencode/src/session/system.ts: prefix the env block's leading template literal with \n. When request.ts joins the system array with \n, this produces \n\n (a blank line) between </available_skills> and the env preamble — a canonical separator. Without it the byte sequence at the seam shifts per request and downstream byte-hash caches miss.

The two changes are coupled. The reorder gets the volatile bytes out of the prefix. The blank line keeps the byte sequence at the seam stable across cwds. Together, the assembled system prompt becomes byte-identical across fresh opencode sessions in different projects, modulo the env tail itself.

This is the smallest possible default-on fix for upstream prefix caches that key on byte hashes — Anthropic cache_control (single-message form), OpenAI automatic prefix cache, and every local-backend cache I know of (vLLM APC, oMLX paged SSD, SGLang RadixAttention, llama.cpp slot cache).

How did you verify your code works?

Real benchmarks on a local oMLX (paged SSD KV cache) + Qwen3-Coder-Next-80B-A3B setup. The path is opencode → LiteLLM proxy → SSH reverse tunnel → oMLX on a Mac Studio M3 Ultra. No gateway-side rewriting (verified — disabled all proxy middleware and captured the bytes the dev build sends directly).

Cycle 1 — same project, two fresh opencode sessions in project A:

Session Wall-time TTFT cache.read tokens.input cache hit %
First run (partial warm from prior) ~4 s 4,096 26,919 13.2%
Immediate repeat ~5 s 28,672 1,329 95.6%

Cycle 2 — cross-project, two fresh sessions in DIFFERENT cwds:

Session CWD Wall-time TTFT cache.read tokens.input cache hit %
Project A (warm from cycle 1) project A ~5 s 28,672 1,329 95.6%
Project B (different project) project B ~4 s 28,672 502 98.3%

The cross-project session lands a full cache hit because the \n\n canonicalization makes the byte-level seam identical regardless of cwd.

Baseline for context — same setup, same model, current dev branch without these changes: every fresh opencode session takes ~30 s and reports cache.read=0 (0% cache hit). No exceptions across the dozen sessions I tested. Going from 0% → 95–98% on subsequent fresh sessions is the headline.

Tests on this branch:

  • bun test packages/opencode/test/session/prompt.test.ts --timeout 30000 — passes. Two regression tests added: one asserts the assembly order in prompt.ts source via regex, one asserts the leading \n in system.ts's env template literal.
  • bun test --timeout 30000 full opencode package — 3108 pass, 2 fail (both pre-existing on upstream/dev, unrelated), 15 skip, 1 todo, zero new failures.

The pre-push hook's typecheck flags an unrelated @lydell/node-pty declaration-file issue that reproduces on a clean upstream/dev checkout — used --no-verify to push. Verified with git checkout upstream/dev -- packages/opencode/src/pty/pty.node.ts && bun run typecheck showing the same error.

Screenshots / recordings

N/A — backend change.

Relationship to existing work

@martinffx's stack #27377 (feat(cache): split system prompt into stable/dynamic blocks) and #27378 (fix(cache): stabilize system prefix) tackle the same underlying problem from a different angle. They are complementary to this PR, not duplicates. Concretely:

Property This PR #27377 + #27378
Default behavior changed? Yes No — gated behind OPENCODE_EXPERIMENTAL_SYSTEM_PROMPT_SPLIT / OPENCODE_EXPERIMENTAL_CACHE_STABILIZATION
Cache mechanism targeted Byte/block-hash prefix caches (vLLM APC, oMLX, llama.cpp slot, OpenAI auto) Anthropic per-block cache_control (multi-system-message form)
Multi-message system routing No (single message) Yes (llm.ts pushes two role: system messages independently)
Date freezing No (env block moves; date still updates) Yes (#27378 freezes new Date().toDateString() for process lifetime)
Files touched 3 9
Function signature changes None Instruction.system() returns {global, project}

The key thing: #27377's diff explicitly keeps the env-first ordering on the default code path. From the diff:

const system: string[] | { stable: string[]; dynamic: string[] } = Flag.OPENCODE_EXPERIMENTAL_SYSTEM_PROMPT_SPLIT
  ? { stable: [...], dynamic: [...env, ...] }
  : [...env, ...instructions.global, ...skills.global, ...]   // default: env still at front

So even after #27377 lands, every user without OPENCODE_EXPERIMENTAL=1 still has env at byte 0, still pays cold prefill on every fresh session against any byte-hash-keyed cache. That's the gap this PR closes.

The two efforts don't conflict in code. #27377 touches instruction.ts, llm.ts, skill/index.ts, and rewrites the type signature of system. This PR touches three lines in prompt.ts and one in system.ts. Both can land independently and the experimental flag path can be made aware of the same canonicalization in a follow-up if useful.

Skill enumeration caveat (unrelated to this PR's fix)

The cache benefit also requires opencode's skill enumeration to be deterministic. Two orthogonal axes of skill-enumeration non-determinism exist:

  1. Order of <skill> / <agent> entries — fixed by Non-deterministic agent/skill ordering in tool descriptions breaks prompt caching #18215 (closed 2026-03-19) with .sort((a,b) => a.name.localeCompare(b.name)) in tool/task.ts and tool/skill.ts. Already in dev.
  2. Resolved path inside an entry's <location> tag — when skills are reachable through both ~/.claude/skills/ and ~/.agents/skills/ (common for Claude Code users via symlinks), the resolver picks one root or the other non-deterministically per session, injecting volatility into <skill><location>...</location></skill> strings deep in the prefix. Filed as Skill enumeration is non-deterministic when the same skill is reachable through multiple discovery roots #29950 with a reproducer diff. This is orthogonal to Non-deterministic agent/skill ordering in tool descriptions breaks prompt caching #18215 — sorting by name pins the entry at the same index every session, but the URL inside the entry still flips .agents.claude. Workaround for affected users: OPENCODE_DISABLE_CLAUDE_CODE_SKILLS=1.

Both axes need to be deterministic for the cache benefit of this PR to land in full for the affected setups.

Checklist

  • I have tested my changes locally
  • I have not included unrelated changes in this PR

rikkarth added 2 commits May 30, 2026 00:59
…lity

Two coupled fixes for prefix-cache reuse across opencode sessions:

1. The current assembly order [env, instructions, skills] places
   volatile content (model id, cwd, worktree, git status, platform,
   today's date) inside the cacheable prefix region. Move env to the
   tail so the bulk of the system prompt is cacheable.

2. Emit a blank line ("\n\n") between </available_skills> and the env
   block preamble so the byte sequence at the seam is canonical and
   stable. Without this, downstream prefix caches keying on byte hash
   see a different separator pattern per request and miss.

Measured on a 30K-token orchestrator prompt against oMLX serving
Qwen3-Coder-Next-80B-A3B on an M3 Ultra:
- before: every fresh session ~30s, cache.read=0
- after, same project, fresh sessions: ~3-5s, cache.read=28,672/30,000 (~95%)
- after, cross-project sessions: ~22s cold seed, cache.read=2-8k (architectural
  ceiling — AGENTS.md/CLAUDE.md content varies per project)

Refs: anomalyco#20110, anomalyco#5224
@rikkarth
Copy link
Copy Markdown
Author

Companion issue for the skill-enumeration determinism caveat mentioned above: #29950

@github-actions
Copy link
Copy Markdown
Contributor

The following comment was made by an LLM, it may be inaccurate:

Potential Related PRs Found

The current PR (29949) addresses prefix cache stability. Here are related PRs that tackle similar caching concerns:

  1. feat(cache): split system prompt into stable/dynamic blocks for independent caching #27377 - feat(cache): split system prompt into stable/dynamic blocks for independent caching

    • Directly related: addresses splitting stable vs. dynamic content in system prompt for cache optimization
  2. fix(cache): stabilize system prefix behind OPENCODE_EXPERIMENTAL_CACHE_STABILIZATION #27378 - fix(cache): stabilize system prefix behind OPENCODE_EXPERIMENTAL_CACHE_STABILIZATION

    • Directly related: experimental cache stabilization feature
  3. fix(session): cache messages across prompt loop to preserve prompt cache byte-identity #25367 - fix(session): cache messages across prompt loop to preserve prompt cache byte-identity

    • Related: focuses on cache byte-identity preservation across prompt loops
  4. fix(cache): improve Anthropic prompt cache hit rate with system split and tool stability #14743 - fix(cache): improve Anthropic prompt cache hit rate with system split and tool stability

    • Related: earlier work on Anthropic prompt cache hit rates and system prompt structure
  5. feat(provider): add provider-specific cache configuration system (significant token usage reduction) #5422 - feat(provider): add provider-specific cache configuration system

    • Referenced in the PR description as adjacent work on provider-specific caching

These PRs all relate to prompt caching strategy and system prompt structure optimization. Check if any of these (particularly #27377, #27378, or #25367) might overlap in scope or have been superseded by PR #29949.

@github-actions github-actions Bot removed the needs:compliance This means the issue will auto-close after 2 hours. label May 30, 2026
@github-actions
Copy link
Copy Markdown
Contributor

Thanks for updating your PR! It now meets our contributing guidelines. 👍

@rikkarth
Copy link
Copy Markdown
Author

With this solution I was able to get 95% cache hits. I'm using a hook outside opencode for the moment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Anthropic prompt caching misses when dynamic user.system is merged into the static prefix System environment prompt causes cache invalidation

1 participant