Skip to content

RFC: AGT Studio, a single unified UI for governance #2638

@Ricky-G

Description

@Ricky-G

Summary

Consolidate the seven scattered AGT UI surfaces into a single first-class, policy-first UI called AGT Studio, shipped both as a standalone web app (agt ui) and as a VS Code / Cursor webview from one SPA codebase. Studio replaces six Streamlit dashboards and the UI panels in the IDE extensions with one canonical experience for browsing, authoring, testing, simulating, versioning, and observing AGT policy. Runtime enforcement hooks in the IDE and Chrome extensions stay where they are; only the UI surfaces collapse.

Full design is captured in ADR 0028 (PR to follow this issue).

Motivation

AGT today ships:

  • Six Streamlit dashboards (governance-dashboard, 06-trust-score-dashboard, agent-sre/examples/dashboard, agent-hypervisor/examples/dashboard, scak/dashboard.py, observability/dashboards.py) that mostly use simulated data and are rarely consumed in production.
  • IDE extensions (Cursor, Copilot, JetBrains) with overlapping tree views for policies, audit, and stats.
  • A Chrome extension targeted at browsing safety on GitHub, Jira, and AWS.
  • A static console.html pitch asset.

This creates three concrete problems:

  1. No front door for evaluators or customers. Sales and demo conversations consistently end with diagrams on a whiteboard because there is no single place to see what AGT does. The CLI plus YAML workflow does not land. Multiple maintainers have hit this in customer calls.
  2. The policy lifecycle has no UI. AGT ships 70+ example policies and 14 templates and has a full replay/test engine via agt test, but provides no visual way to author, lint, simulate, diff, or version policies. Policy is the project's core IP. The closest thing today is "open YAML in VS Code, run the CLI, eyeball the output."
  3. Surfaces drift. Each extension and dashboard exposes overlapping concepts (active policies, audit, decisions) with inconsistent shapes, no shared contract, and no canonical experience. New contributors do not know where to land UI work.

A separate review considered a full operator/SOC console with approvals, quarantine, and runtime control. That was rejected: the write-path requires SSO, RBAC, multi-tenancy, audit non-repudiation, and 24/7 support out of scale for AGT, and Azure Sentinel, Defender, and Foundry already own that space. Studio is deliberately not that product.

Detailed Design

Naming and entry point. AGT Studio, launched with agt ui. A local sidecar agt serve exposes the engine over HTTP/WebSocket. The browser opens to the SPA.

Packaging. One SPA codebase, two shells:

  • Standalone: agt ui boots the sidecar plus a local web server, opens the browser tab.
  • VS Code / Cursor webview: same SPA inside a webview panel, talks to the sidecar via postMessage bridged by the extension host.
  • Shells inject the transport so the SPA is host-agnostic. No backend-specific UI code.

Modelled on Azure Policy in the Azure Portal. Browse definitions, see active assignments, author, test, assign. Same mental model. This is a UX analogy, not a commitment to a specific framework or visual style.

In-scope capabilities:

Capability Phase
Browse all policies (70+ examples, 14 templates, plus Copilot, Claude, and Antigravity CLI policy configs) MVP
Author policies with schema-aware editing and inline lint MVP
Test and validate policies (wraps agent_compliance.policy_test.replay) MVP
What-if simulator (action + identity + context to decision and rule) MVP
Regression view (rule change to fixture pass/fail diff) MVP
Version both engine and policies; test against either MVP
Live decisions feed (from agentmesh.dashboard.api.DashboardAPI) V1
Shadow-agent and credential lifecycle signals V1
Trust network, scores, and decay views V1
Audit log viewer with chain-integrity badge and evidence export V1

Explicitly out of scope:

  • Operator/SOC console capabilities (approvals queue, quarantine, hot-reload to production, write-path runtime control).
  • SSO/SAML/OIDC/RBAC/multi-tenancy. Studio is an authoring and visibility tool. Production deployments wrap the engine API behind their own auth.
  • Replacement for runtime enforcement hooks in the IDE extensions or the Chrome extension. Only the UI surfaces collapse into Studio.
  • SRE, FinOps, incident management, deployment tooling (belongs in Grafana, Sentinel, Datadog, PagerDuty, ServiceNow, Argo).

Triage of the six existing dashboards:

# Dashboard Disposition
1 examples/demos/governance-dashboard Partial port to Studio: policy feed, shadow agents. Drop fleet table and lifecycle funnel.
2 agent-mesh/examples/06-trust-score-dashboard Mostly port: trust graph, credentials, compliance. Protocol traffic moves to a Grafana template.
3 agent-sre/examples/dashboard Replace with Grafana templates.
4 agent-hypervisor/examples/dashboard Archive.
5 agent-os/modules/scak/dashboard.py Archive with module.
6 agent-os/modules/observability/dashboards.py Keep and expand. This is the embed-first integration story.

Net effect: six dashboards collapse to one UI plus one Grafana template pack. console.html is deleted.

Sequencing:

  1. Engine API contract (agt serve) defined and stabilised (separate ADR, blocks Studio MVP).
  2. Studio MVP standalone (agt ui): browse + author + test + simulate + regression + version.
  3. VS Code / Cursor webview shell.
  4. V1 visibility port from existing dashboards.
  5. Deprecation cycle for the six existing dashboards (one release of overlap, clear migration banners).

Alternatives Considered

  1. Full operator/SOC console. Rejected. Write-path risk, auth surface, and ongoing support cost are out of scale for AGT; Sentinel, Defender, and Foundry are better homes for that telemetry.
  2. Embed-only via Grafana, Sentinel, and Foundry plugins. Useful and continues alongside Studio (the Grafana template pack is part of this proposal), but does not solve the policy authoring or demo-credibility gaps that motivated the decision.
  3. Status quo. Six surfaces stay scattered, the policy lifecycle stays CLI-only, demos keep failing to land. Already the de facto fallback; the problems above continue to compound.
  4. Multiple specialised UIs done well. Effectively where we are now. Each surface stays at "demo-grade" because no single surface gets enough investment to graduate.

Security Implications

Studio is intentionally an authoring and visibility tool. The proposal explicitly excludes write-path runtime control, approvals, quarantine, RBAC, and multi-tenancy from scope. That keeps the security surface roughly equivalent to a local developer tool:

  • New surface introduced: the agt serve sidecar HTTP/WebSocket API on localhost. Default bind is 127.0.0.1; never 0.0.0.0. No remote-access mode in MVP.
  • No new credentials or secrets handled by Studio itself. It reads policies from disk and queries an engine the user already has access to.
  • Audit-chain integrity must be verified, not asserted. The audit viewer must compute and display chain integrity using existing ADR 0017 (Merkle chain) mechanisms; it must not just render whatever the API returns.
  • Evidence export must preserve signatures. Aligns with ADR 0021 (CloudEvents envelope for audit).
  • XSS / CSP discipline. Policy YAML and decision payloads are user-supplied content. Renderers must escape; webview CSP must be tight; no innerHTML without sanitisation.
  • Dependency hygiene. Pinned dependencies, SHA-pinned actions, supply-chain rules per the project's standards apply from day one.
  • VS Code webview review. Standard Microsoft web extension security review applies for the webview shell.

Production deployments that want a centrally-hosted Studio behind SSO/RBAC are out of scope for this RFC; that would be a separate proposal with its own threat model.

Migration / Backward Compatibility

No breaking API changes. Migration is deprecation-only:

  • The six Streamlit dashboards keep working for one release after Studio MVP ships. Each README gains a banner pointing at agt ui.
  • After one release of overlap, the four dashboards in the "port/replace/archive" rows of the triage table move under examples/legacy/, then are removed in the following release.
  • agent-os/modules/observability/dashboards.py (Grafana templates) is retained and expanded.
  • IDE extensions keep their runtime enforcement hooks unchanged. The tree-view UI panels gain an "Open AGT Studio" command; tree views are deprecated in a follow-up cycle.
  • console.html is deleted.

Scope

  • Cross-package (2-3 packages)

(Touches agent-compliance for the agt ui/agt serve CLI, agent-mesh for the dashboard API backend, agent-os for the IDE extension integration. Per the project's escalation rules, this counts as a cross-cutting change requiring maintainer approval, which is what the ADR PR is asking for.)

Target Placement

  • New package

(Studio itself will live in a new package, name TBD during implementation. The CLI hooks land in agent-compliance.)

Prior Art

  • Azure Policy + Azure Portal. Browse definitions, see assignments, author, test, assign. Direct UX analogy.
  • OPA Playground (play.openpolicyagent.org). Closest analogue for the "author + test + simulate policy" core. Studio adds version management, fixture replay, and integration with a running engine.
  • Grafana for Prometheus, Kibana for Elastic, Jaeger UI for OpenTelemetry. Canonical UI patterns for observability OSS, referenced for the "one front door" framing.
  • GitLens, Jupyter, the GitHub PR extension. Established pattern of one SPA shipped as both a standalone web app and a VS Code webview.
  • AGT ADR 0015 (pluggable external policy backends). Studio must work with YAML rules and external backends (OPA/Rego, Cedar) via the same evaluator contract.
  • AGT ADR 0017 (Merkle chain for audit tamper evidence). Studio's audit-chain integrity badge depends on this.
  • AGT ADR 0021 (CloudEvents envelope for mesh audit). Studio's evidence export uses this envelope.
  • AGT ADR 0022 (compliance framework auto-mapping). Studio's compliance view surfaces these mappings.

Checklist

  • I have searched existing issues and RFCs for duplicates.
  • I have read the ADR index (docs/adr/index.md) for related decisions.
  • I am willing to implement this RFC or help review an implementation.

Metadata

Metadata

Assignees

No one assigned

    Labels

    architectureArchitecture and package boundary changesdocumentationImprovements or additions to documentationenhancementNew feature or requestneeds-review:MEDIUMContributor check flagged MEDIUM risk

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions