Skip to content

docs(rfc): propose SDK consumption entrypoints and file transfer#1590

Open
zanetworker wants to merge 2 commits into
NVIDIA:mainfrom
zanetworker:docs/sdk-file-transfer-rfc
Open

docs(rfc): propose SDK consumption entrypoints and file transfer#1590
zanetworker wants to merge 2 commits into
NVIDIA:mainfrom
zanetworker:docs/sdk-file-transfer-rfc

Conversation

@zanetworker
Copy link
Copy Markdown
Contributor

@zanetworker zanetworker commented May 27, 2026

Related: #1044

Summary

RFC 0006 proposing official Python and TypeScript SDKs that make OpenShell consumable as programmable infrastructure for agent platforms and frameworks, plus streaming file transfer RPCs.

Why: Agent platforms (Anthropic Managed Agents, OpenAI Agents SDK, OpenClaw, Cloudflare) all need a secure execution layer. OpenShell has the enforcement (Landlock, seccomp, L4/L7 policy, credential injection, OCSF audit) and the API (54 gRPC RPCs). But only 8 RPCs are wrapped in the Python SDK, there's no TypeScript SDK, no file transfer RPC, and no OIDC auth in the SDK. Every integration must shell out to the CLI binary or build a custom gRPC client.

What this RFC proposes:

  • Extend the Python SDK with OIDC auth, provider attach/detach, streaming watch, and file transfer
  • Add streaming UploadFile/DownloadFile gRPC RPCs to the gateway (routed via existing RelayStream infrastructure)
  • Ship an official TypeScript SDK (for OpenClaw and Node.js frameworks)
  • OIDC authentication in both SDKs for cross-namespace K8s deployments

Three sandbox modes covered:

  • Mode 1 (entire agent sandboxed): CLI-driven, no SDK needed
  • Mode 2 (platform-managed): SDK in your worker, brain on the platform (Anthropic, OpenAI Responses API)
  • Mode 3 (framework extension): SDK embedded in the developer's process (OpenAI Agents SDK, OpenClaw)

Five implementation phases with a dependency analysis showing Phase 1 (OIDC + providers) and Phase 2 (file transfer) can run in parallel.

Related Issues

Changes

  • rfc/0006-sdk-and-file-transfer/README.md - Full RFC document
  • rfc/0006-sdk-and-file-transfer/sdk-modes.png - Three sandbox modes architecture diagram
  • rfc/0006-sdk-and-file-transfer/sdk-phase-deps.{mmd,png} - Phase dependency diagram
  • rfc/0006-sdk-and-file-transfer/sdk-file-transfer.png - UploadFile/DownloadFile sequence diagram
  • rfc/0006-sdk-and-file-transfer/sdk-anthropic-worker.png - Anthropic Mode 2 end-to-end sequence

Testing

  • mise run pre-commit passes
  • Markdown lint clean
  • All diagrams render correctly

Checklist

  • Follows Conventional Commits
  • Commits are signed off (DCO)
  • RFC follows the rfc/0000-template structure
  • RFC number (0006) does not conflict with existing or in-flight RFCs

Add RFC 0006 proposing official Python and TypeScript SDKs for
programmatic sandbox consumption by agent platforms and frameworks.

The RFC covers:
- Three sandbox modes and which the SDK serves (Mode 2 and 3)
- Extending the Python SDK with OIDC auth, provider management,
  streaming watch, and file transfer
- Streaming UploadFile/DownloadFile gRPC RPCs for the gateway
- A new TypeScript SDK for OpenClaw and Node.js frameworks
- Five implementation phases with dependency analysis
- Integration examples for Anthropic Managed Agents and OpenAI
  Agents SDK

Includes architecture diagrams for the three modes, file transfer
sequence flow, phase dependencies, and Anthropic worker end-to-end.

Signed-off-by: Adel Zaalouk <azaalouk@redhat.com>
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented May 27, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

frameworks. Add streaming UploadFile/DownloadFile gRPC RPCs to the
gateway so SDK consumers can move files in and out of sandboxes
without shelling out to the CLI. Support OIDC authentication in both
SDKs so cross-namespace K8s deployments work without copying mTLS
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you clarify what is meant by cross namespace deployments?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call. "Cross-namespace" is K8s jargon that does not belong here. What I meant: the current Python SDK only supports mTLS, which requires distributing TLS client certificates to every consumer. In a K8s deployment that means copying Secrets across namespaces, but the same friction applies outside K8s — any consumer on a different machine needs those certs.

OIDC removes that distribution problem regardless of deployment model. I will rewrite this section to frame it as "SDK consumers need bearer-token auth so they can connect to any OIDC-enabled gateway without distributing client certificates."


| Method | RPC | Why |
|--------|-----|-----|
| OIDC auth | gRPC metadata interceptor | mTLS-only locks SDK to single namespace. Every K8s production deployment needs cross-namespace auth. |
Copy link
Copy Markdown
Collaborator

@derekwaynecarr derekwaynecarr May 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don’t understand what oidc has to do with k8s, agree the sdk needs to work with a server that is oidc auth

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed — OIDC has nothing to do with K8s specifically. The SDK needs to work with any gateway that has OIDC auth enabled, whether that is on K8s, bare metal, or a managed service. I conflated "the most common deployment where this matters" with "the reason it matters." Will decouple the two in the next revision.

@zanetworker
Copy link
Copy Markdown
Contributor Author

Positioning relative to RFC 0005 (#1617) and the Python SDK OIDC PR (#1621)

Three SDK efforts are in flight under #1044. Here is how they relate:

PR Scope State
#1617 (RFC 0005, @maxdubrinsky) Extract openshell-sdk Rust crate, ship @openshell/sdk TypeScript binding via napi-rs, refactor CLI onto SDK RFC + working prototype
#1621 (@mrunalp) Add OIDC bearer auth to the existing pure-Python SDK Code, ready for review
#1590 (this RFC 0006) Broader SDK roadmap: consumption patterns (Mode 1/2/3), file transfer RPCs, Python SDK surface expansion, platform integration examples (Anthropic, OpenAI, OpenClaw) RFC, no code

RFC 0005 and RFC 0006 are complementary, not competing. RFC 0005 delivers the shared Rust core and TypeScript binding — the "how" for the TS half. This RFC frames the "why" and "what" across both languages: which consumption patterns platforms need, what RPC gaps block adoption (file transfer), and how the SDK surface maps to real platform integrations.

Concretely, this RFC covers areas that RFC 0005 explicitly defers or does not address:

  • File transfer RPCs (UploadFile/DownloadFile): gateway proto changes, streaming design, routing via ConnectSupervisor/RelayStream
  • Python SDK surface expansion: provider attach/detach, watch, policy, services (wrapping existing RPCs, not rebuilding transport)
  • Platform consumption patterns: Anthropic Managed Agents (Mode 2), OpenAI Agents SDK (Mode 3), OpenClaw, CI/CD
  • Python-on-shared-core migration path: RFC 0005 defers this as a non-goal; this RFC scopes it as a future phase

The intended reading order: this RFC for the overall SDK strategy and surface area, RFC 0005 for the shared core architecture and TS binding implementation.

Will update the RFC text to reference RFC 0005 directly and to decouple OIDC from K8s framing per @derekwaynecarr feedback.

- Decouple OIDC motivation from K8s framing per @derekwaynecarr feedback
- Reframe as certificate distribution problem, not namespace problem
- Add relationship section positioning RFC 0006 relative to RFC 0005
- Add links to RFC 0005 (NVIDIA#1617), Python OIDC PR (NVIDIA#1621), roadmap (NVIDIA#1044)
Comment on lines +96 to +102
**Mode 1: Sandbox the entire agent.** The agent process runs inside
the sandbox. Interface: CLI. No SDK needed.

**Mode 2: Platform-managed sandbox.** The platform (Anthropic, OpenAI)
owns the agent loop. A separate worker on your infrastructure embeds
the SDK and creates sandboxes. Brain and worker are physically
separate systems. Mode 2 is a spectrum:
Copy link
Copy Markdown
Collaborator

@drew drew Jun 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've heard this commonly described as: Agent in a Sandbox (mode 1) and Sandbox as a Tool (mode 2).

Reference: https://www.langchain.com/blog/the-two-patterns-by-which-agents-connect-sandboxes.

OpenShell has always intended to support both. If that analogy is similar to mode 1 and 2, I would suggest framing this RFC around those two terms. This is how I think we should document usage as well.


### Three sandbox modes

The SDK serves Mode 2 and Mode 3. Mode 1 stays CLI-driven.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I think we still want to programmatically launch agents in a sandbox from the SDK. CLI and SDK should generally have API parity outside of things like interactive shell access. For example, I think we also want the policy prover in our SDK.

Comment on lines +120 to +122
- **Provider CRUD in the SDK.** Providers are created by the platform
engineer via CLI. SDK consumers attach existing providers, not
create new ones.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason not to include this? I would expect to have api parity between the CLI and SDK.

Comment on lines +110 to +112
**Mode 3: Framework sandbox extension.** The developer's process owns
the agent loop. Harness + SDK live in one process. Logical separation.
Examples: OpenAI Agents SDK, OpenClaw, LangChain.
Copy link
Copy Markdown
Collaborator

@drew drew Jun 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not understanding how this mode is different. How does this relate to OpenShell? It seems most similar to Mode 1, Agent in a Sandbox, but perhaps I'm missing some nuance.

Can you elaborate with more concrete examples?

Comment on lines +37 to +42
Areas this RFC covers that RFC 0005 explicitly defers:

- File transfer RPCs (UploadFile/DownloadFile)
- Python SDK surface expansion (provider, watch, policy, services)
- Platform consumption patterns (Anthropic, OpenAI, OpenClaw, CI/CD)
- Python-on-shared-core migration path
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This RFC seems to also detail some of the "how" (see implementation plan). I think we should try and close both RFCs at the same time since they inform each other.

My recommendation is we build a rust core (#1617) so we're not re-implementing common workflows across all languages for features such as OIDC, file transfer, image building, etc. SDKs are thin wrappers over the rust core.

This RFC can serve as

  • guidance for how the SDK should be used, eg Agents in a Sandbox and Sandbox as a Tool.
  • guidance on what goes into the SDK vs CLI vs gRPC service routes.

Comment on lines +440 to +443
4. **Provider CRUD in SDK.** This RFC scopes the SDK to attach/detach
(bind existing providers). Should full provider CRUD
(create/update/delete) be in scope for platforms that want fully
programmatic provider lifecycle?
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, i think we want to strive for api parity as a principle.

Comment on lines +357 to +361
**TypeScript SDK maintenance.** A second SDK doubles the maintenance
surface. Mitigation: both SDKs are thin wrappers around generated
proto stubs. The Python SDK is ~600 lines. The TypeScript SDK would
be similar. The proto files are the source of truth; SDK updates are
mechanical when protos change.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#1617 addresses this risk, right?

| Anthropic worker | Create sandboxes, download skills, run tool calls, retrieve artifacts | No OIDC auth, no file transfer RPC |
| OpenAI Agents SDK adapter | Implement SandboxClient: materialize Manifest, exec, snapshot | No file transfer RPC (session.write() for LocalDir has no clean implementation) |
| OpenClaw plugin | Create sandboxes, sync workspace, exec commands | No TypeScript SDK (plugins are TS-only), currently shells out to CLI 5+ times per command |
| Multi-tenant platform | Per-tenant sandboxes with policies and credentials | No OIDC auth, no provider attach/detach in SDK |
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OIDC auth in OpenShell's current form != multi-tenancy right? For example, if you and I both authenticate against a gateway and create sandboxes, we can see each other's work.

- "Issue #1044 (SDK roadmap)"
---

# RFC 0006 - SDK Consumption Entrypoints and File Transfer
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: sdk design principles and streaming file transfers seem like two distinct things. is there a reason these are linked together? can we decouple the efforts?

| `expose_service()` / service CRUD | ExposeService, GetService, ListServices, DeleteService | Sandbox-hosted HTTP services |
| `get_logs()` | GetSandboxLogs | One-shot log retrieval for debugging |

### 2. Add streaming file transfer RPCs to the gateway
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm supportive of this feature, but I think we'll want to do some more design work around the feature. Streaming file uploads can be hard!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants