# Skate Implementation Plan This plan closes the gaps between the current codebase and the goals stated in DESIGN.md. The phases are ordered by dependency -- each phase builds on the previous. ## Current State Summary Phase 0 (core loop) is functionally complete: the TUI renders conversations, the orchestrator drives the Claude API, tools execute inside a Landlock sandbox, and the channel boundary between TUI and core is properly maintained. The major gaps are: 1. Tool executor tarpc interface -- the orchestrator calls tools directly rather than via a tarpc client/server split as DESIGN.md specifies. This is the biggest structural gap and a prerequisite for sub-agents (each agent gets its own client). 2. Session logging (JSONL, tree-addressable) -- no `session/` module exists yet. 3. Token tracking -- counts are debug-logged but not surfaced to the user. 4. TUI introspection -- tool blocks and thinking traces cannot be expanded/collapsed. 5. Status bar is sparse -- no token totals, no activity mode, no network state badge. 6. Planning Mode -- no dedicated harness instantiation with restricted sandbox. 7. Sub-agents -- no spawning mechanism, no independent context windows. 8. Space-bar leader key and which-key help overlay are absent. --- ## Phase 1 -- Tool Executor tarpc Interface **Goal:** Introduce the harness/executor split described in DESIGN.md. The executor owns the `ToolRegistry` and `Sandbox`; the orchestrator (harness) communicates with it exclusively through a tarpc client. In this phase the transport is in-process (tarpc's unbounded channel pair), laying the groundwork for out-of-process execution in a later phase. This is the largest structural change in the plan. Every subsequent phase benefits from the cleaner boundary: sub-agents each get their own executor client (Phase 7), and the sandbox policy becomes a constructor argument to the executor rather than something threaded through the orchestrator. ### 1.1 Define the tarpc service Create `src/executor/mod.rs`: ```rust #[tarpc::service] pub trait Executor { /// Return the full list of tools this executor exposes, including their /// JSON Schema input descriptors. The harness calls this once at startup /// and caches the result for the lifetime of the conversation. async fn list_tools() -> Vec; /// Invoke a single tool by name with a JSON-encoded argument object. /// Returns the text content to feed back to the model, or an error string /// that is also fed back (so the model can self-correct). async fn call_tool(name: String, input: serde_json::Value) -> Result; } ``` `ToolDefinition` is already defined in `core/types.rs` and is provider-agnostic -- no new types are needed on the wire. ### 1.2 Implement `ExecutorServer` Still in `src/executor/mod.rs`, add: ```rust pub struct ExecutorServer { registry: ToolRegistry, sandbox: Arc, } impl ExecutorServer { pub fn new(registry: ToolRegistry, sandbox: Sandbox) -> Self { ... } } impl Executor for ExecutorServer { async fn list_tools(self, _: Context) -> Vec { self.registry.definitions() } async fn call_tool(self, _: Context, name: String, input: Value) -> Result { match self.registry.get(&name) { None => Err(format!("unknown tool: {name}")), Some(tool) => tool .execute(input, &self.sandbox) .await .map_err(|e| e.to_string()), } } } ``` The `Arc` is required because tarpc clones the server struct per request. ### 1.3 In-process transport helper Add a function to `src/executor/mod.rs` (and re-export from `src/app/mod.rs`) that wires an `ExecutorServer` to a client over tarpc's in-memory channel: ```rust /// Spawn an ExecutorServer on the current tokio runtime and return a client /// connected to it via an in-process channel. The server task runs until /// the client is dropped. pub fn spawn_local(server: ExecutorServer) -> ExecutorClient { let (client_transport, server_transport) = tarpc::transport::channel::unbounded(); let server = tarpc::server::BaseChannel::with_defaults(server_transport); tokio::spawn(server.execute(ExecutorServer::serve(/* ... */))); ExecutorClient::new(tarpc::client::Config::default(), client_transport).spawn() } ``` ### 1.4 Refactor `Orchestrator` to use the client Currently `Orchestrator

` holds `ToolRegistry` and `Sandbox` directly and calls `tool.execute(input, &sandbox)` in `run_turn`. Replace these fields with: ```rust executor: ExecutorClient, tool_definitions: Vec, // fetched once at construction ``` `run_turn` changes from direct tool dispatch to: ```rust let result = self.executor .call_tool(context::current(), name, input) .await; ``` The `tool_definitions` vec is passed to `provider.stream()` instead of being built from the registry on each call. ### 1.5 Update `app/mod.rs` Replace the inline construction of `ToolRegistry + Sandbox` in `app::run` with: ```rust let registry = build_tool_registry(); let sandbox = Sandbox::new(policy, project_dir, enforcement)?; let executor = executor::spawn_local(ExecutorServer::new(registry, sandbox)); let orchestrator = Orchestrator::new(provider, executor, system_prompt); ``` ### 1.6 Tests - Unit: `ExecutorServer::call_tool` with a mock `ToolRegistry` returns correct output and maps errors to `Err(String)`. - Integration: `spawn_local` -> `client.call_tool` round-trip through the in-process channel executes a real `read_file` against a temp dir. - Integration: existing orchestrator integration tests continue to pass after the refactor (the mock provider path is unchanged; only tool dispatch changes). ### 1.7 Files touched | Action | File | |--------|------| | New | `src/executor/mod.rs` | | Modified | `src/core/orchestrator.rs` -- remove registry/sandbox, add executor client | | Modified | `src/app/mod.rs` -- construct executor, pass client to orchestrator | | Modified | `Cargo.toml` -- add `tarpc` with `tokio1` feature | New dependency: `tarpc` (with `tokio1` and `serde-transport` features). --- ## Phase 2 -- Session Logging **Goal:** Persist every event to a JSONL file. This is the foundation for token accounting, session resume, and future conversation branching. ### 1.1 Add `src/session/` module Create `src/session/mod.rs` with the following public surface: ```rust pub struct SessionWriter { ... } impl SessionWriter { /// Open (or create) a JSONL log at the given path in append mode. pub async fn open(path: &Path) -> Result; /// Append one event. Never rewrites history. pub async fn append(&self, event: &LogEvent) -> Result<(), SessionError>; } pub struct SessionReader { ... } impl SessionReader { pub async fn load(path: &Path) -> Result, SessionError>; } ``` ### 1.2 Define `LogEvent` ```rust pub struct LogEvent { pub id: Uuid, pub parent_id: Option, pub timestamp: DateTime, pub payload: LogPayload, pub token_usage: Option, } pub enum LogPayload { UserMessage { content: String }, AssistantMessage { content: Vec }, ToolCall { tool_name: String, input: serde_json::Value }, ToolResult { tool_use_id: String, content: String, is_error: bool }, } pub struct TokenUsage { pub input: u32, pub output: u32, pub cache_read: Option, pub cache_write: Option, } ``` `id` and `parent_id` form a tree that enables future branching. For now the conversation is linear so `parent_id` is always the id of the previous event. ### 1.3 Wire into Orchestrator - `Orchestrator` holds an `Option`. - Every time the orchestrator pushes to `ConversationHistory` it also appends a `LogEvent`. Token counts from `StreamEvent::InputTokens` / `OutputTokens` are stored on the final assistant event of each turn. - Session file lives at `.skate/sessions/.jsonl`. ### 1.4 Tests - Unit: `SessionWriter::append` then `SessionReader::load` round-trips all payload variants. - Unit: parent_id chain is correct across a simulated multi-turn exchange. - Integration: run the orchestrator with a mock provider against a temp dir; assert the JSONL file is written. --- ## Phase 3 -- Token Tracking & Status Bar **Goal:** Surface token usage in the TUI per-turn and cumulatively. ### 3.1 Per-turn token counts in UIEvent Add a variant to `UIEvent`: ```rust UIEvent::TurnComplete { input_tokens: u32, output_tokens: u32 } ``` The orchestrator already receives `StreamEvent::InputTokens` and `OutputTokens`; it should accumulate them during a turn and emit them in `TurnComplete`. ### 3.2 AppState token counters Add to `AppState`: ```rust pub turn_input_tokens: u32, pub turn_output_tokens: u32, pub total_input_tokens: u64, pub total_output_tokens: u64, ``` `events.rs` updates these on `TurnComplete`. ### 3.3 Status bar redesign The status bar currently shows only the mode indicator. Expand it to four sections: ``` [ MODE ] [ ACTIVITY ] [ i:1234 o:567 | total i:9999 o:2345 ] [ NET: off ] ``` - **MODE** -- Normal / Insert / Command - **ACTIVITY** -- Plan / Execute (Phase 4 adds Plan; for now always "Execute") - **Tokens** -- per-turn input/output, then session cumulative - **NET** -- `on` (green) or `off` (red) reflecting `network_allowed` Update `render.rs` to implement this layout using Ratatui `Layout::horizontal`. ### 3.4 Tests - Unit: `AppState` accumulates totals correctly across multiple `TurnComplete` events. - TUI snapshot test (TestBackend): status bar renders all four sections with correct content after a synthetic `TurnComplete`. --- ## Phase 4 -- TUI Introspection (Expand/Collapse) **Goal:** Support progressive disclosure -- tool calls and thinking traces start collapsed; the user can expand them. ### 4.1 Block model Replace the flat `Vec` in `AppState` with a `Vec`: ```rust pub enum DisplayBlock { UserMessage { content: String }, AssistantText { content: String }, ToolCall { display: ToolDisplay, result: Option, expanded: bool, }, Error { message: String }, } ``` ### 4.2 Navigation in Normal mode Add block-level cursor to `AppState`: ```rust pub focused_block: Option, ``` Keybindings (Normal mode): | Key | Action | |-----|--------| | `[` | Move focus to previous block | | `]` | Move focus to next block | | `Enter` or `Space` | Toggle `expanded` on focused ToolCall block | | `j` / `k` | Line scroll (unchanged) | The focused block is highlighted with a distinct border color. ### 4.3 Render changes `render.rs` must calculate the height of each `DisplayBlock` depending on whether it is collapsed (1-2 summary lines) or expanded (full content). The scroll offset operates on pixel-rows, not message indices. Collapsed tool call shows: `> tool_name(arg_summary) -- result_summary` Expanded tool call shows: full input and output as formatted by `tool_display.rs`. ### 4.4 Tests - Unit: toggling `expanded` on a `ToolCall` block changes height calculation. - TUI snapshot: collapsed vs expanded render output for `WriteFile` and `ShellExec`. --- ## Phase 5 -- Space-bar Leader Key & Which-Key Overlay **Goal:** Support vim-style `` leader chords for configuration actions. This replaces the `:net on` / `:net off` text commands with discoverable hotkeys. ### 5.1 Leader key state machine Extend `AppState` with: ```rust pub leader_active: bool, pub leader_timeout: Option, ``` In Normal mode, pressing `Space` sets `leader_active = true` and starts a 1-second timeout. The next key is dispatched through the chord table. If the timeout fires or an unbound key is pressed, leader mode is cancelled with a brief status message. ### 5.2 Initial chord table | Chord | Action | |-------|--------| | ` n` | Toggle network policy | | ` c` | Clear history (`:clear`) | | ` p` | Switch to Plan mode (Phase 5) | | ` ?` | Toggle which-key overlay | ### 5.3 Which-key overlay A centered popup rendered over the output pane that lists all available chords and their descriptions. Rendered only when `leader_active = true` (after a short delay, ~200 ms, to avoid flicker on fast typists). ### 5.4 Remove `:net on/off` from command parser Once leader-key network toggle is in place, remove the text-command duplicates to keep the command palette small and focused. ### 5.5 Tests - Unit: leader key state machine transitions (activate, timeout, chord match, cancel). - TUI snapshot: which-key overlay renders with correct chord list. --- ## Phase 6 -- Planning Mode **Goal:** A dedicated planning harness with restricted sandbox that writes a single plan file, plus a mechanism to pipe the plan into an execute harness. ### 6.1 Plan harness sandbox policy In planning mode the orchestrator is instantiated with a `SandboxPolicy` that grants: - `/` -- ReadOnly (same as execute) - `/.skate/plan.md` -- ReadWrite (only this file) - Network -- off All other write attempts fail with a sandbox permission error returned to the model. ### 6.2 Survey tool Add a new tool `ask_user` that allows the model to present structured questions to the user during planning: ```rust // Input schema { "question": "string", "options": ["string"] | null // null means free-text answer } ``` The orchestrator sends a new `UIEvent::SurveyRequest { question, options }`. The TUI renders an inline prompt. The user's answer is sent back as a `UserAction::SurveyResponse`. ### 6.3 TUI activity mode `AppState` gets: ```rust pub activity: Activity, pub enum Activity { Plan, Execute } ``` Switching activity (via ` p`) instantiates a new orchestrator on a fresh channel pair. The old orchestrator is shut down cleanly. The status bar ACTIVITY section updates. ### 6.4 Plan -> Execute handoff When the user is satisfied with the plan (` x` or `:exec`): 1. TUI reads `.skate/plan.md`. 2. Constructs a new system prompt: `\n\n## Plan\n`. 3. Instantiates an Execute orchestrator with the full sandbox policy and the augmented system prompt. 4. Transitions `activity` to `Execute`. The old Plan orchestrator is dropped. ### 6.5 Edit plan in $EDITOR Hotkey ` e` (or `:edit-plan`) suspends the TUI (restores terminal), opens `$EDITOR` on `.skate/plan.md`, then resumes the TUI after the editor exits. ### 6.6 Tests - Integration: plan harness rejects write to a file other than plan.md. - Integration: survey tool round-trip through channel boundary. - Unit: plan -> execute handoff produces correct augmented system prompt. --- ## Phase 7 -- Sub-Agents **Goal:** The model can spawn independent sub-agents with their own context windows. Results are summarised and returned to the parent. ### 7.1 `spawn_agent` tool Add a new tool with input schema: ```rust { "task": "string", // instruction for the sub-agent "sandbox": { // optional policy overrides "network": bool, "extra_write_paths": ["string"] } } ``` ### 7.2 Sub-agent lifecycle When `spawn_agent` executes: 1. Create a new `Orchestrator` with an independent conversation history. 2. The sub-agent's system prompt is the parent's system prompt plus the task description. 3. The sub-agent runs autonomously (no user interaction) until it emits a `UserAction::Quit` equivalent or hits `MAX_TOOL_ITERATIONS`. 4. The final assistant message is returned as the tool result (the "summary"). 5. The sub-agent's session is logged to a child JSONL file linked to the parent session by a `parent_session_id` field. ### 7.3 TUI sub-agent view The agent tree is accessible via ` a`. A side panel shows: ``` Parent +-- sub-agent 1 [running] +-- sub-agent 2 [done] ``` Pressing Enter on a sub-agent opens a read-only replay of its conversation (scroll only, no input). This is a stretch goal within this phase -- the core spawning mechanism is the priority. ### 7.4 Tests - Integration: spawn_agent with a mock provider runs to completion and returns a summary string. - Unit: sub-agent session file has correct parent_session_id link. - Unit: MAX_TOOL_ITERATIONS limit is respected within sub-agents. In this phase `spawn_agent` gains a natural implementation: it calls `executor::spawn_local` with a new `ExecutorServer` configured for the child policy, constructs a new `Orchestrator` with that client, and runs it to completion. The tarpc boundary from Phase 1 makes this straightforward. --- ## Phase 8 -- Prompt Caching **Goal:** Use Anthropic's prompt caching to reduce cost and latency on long conversations. DESIGN.md notes this as a desired property of message construction. ### 8.1 Cache breakpoints The Anthropic API supports `"cache_control": {"type": "ephemeral"}` on message content blocks. The optimal strategy is to mark the last user message of the longest stable prefix as a cache write point. In `provider/claude.rs`, when serializing the messages array: - Mark the system prompt content block with `cache_control` (it never changes). - Mark the penultimate user message with `cache_control` (the conversation history that is stable for the current turn). ### 8.2 Cache token tracking The `TokenUsage` struct in `session/` already reserves `cache_read` and `cache_write` fields. `StreamEvent` must be extended: ```rust StreamEvent::CacheReadTokens(u32), StreamEvent::CacheWriteTokens(u32), ``` The Anthropic `message_start` event contains `usage.cache_read_input_tokens` and `usage.cache_creation_input_tokens`. Parse these and emit the new variants. ### 8.3 Status bar update Add cache tokens to the status bar display: `i:1234(c:800) o:567`. ### 8.4 Tests - Provider unit test: replay a fixture that contains cache token fields; assert the new StreamEvent variants are emitted. - Snapshot test: status bar renders cache token counts correctly. --- ## Dependency Graph ``` Phase 1 (tarpc executor) | +-- Phase 2 (session logging) -- orchestrator refactor is complete | | | +-- Phase 3 (token tracking) -- requires session TokenUsage struct | | | +-- Phase 7 (sub-agents) -- requires session parent_session_id | +-- Phase 7 (sub-agents) -- spawn_local reuse is natural after Phase 1 Phase 4 (expand/collapse) -- independent, can be done alongside Phase 3 Phase 5 (leader key) -- independent, prerequisite for Phase 6 Phase 6 (planning mode) -- requires Phase 5 (leader key chord p) -- benefits from Phase 1 (separate executor per activity) Phase 8 (prompt caching) -- requires Phase 3 (cache token display) ``` Recommended order: 1 -> 2 -> 3 -> 4 -> 5 -> 6 -> 8, with 7 after 2 and 6. --- ## Files Touched Per Phase | Phase | New Files | Modified Files | |-------|-----------|----------------| | 1 | `src/executor/mod.rs` | `src/core/orchestrator.rs`, `src/core/types.rs`, `src/app/mod.rs`, `Cargo.toml` | | 2 | `src/session/mod.rs` | `src/core/orchestrator.rs`, `src/app/mod.rs` | | 3 | -- | `src/core/types.rs`, `src/core/orchestrator.rs`, `src/tui/events.rs`, `src/tui/render.rs` | | 4 | -- | `src/tui/mod.rs`, `src/tui/render.rs`, `src/tui/events.rs`, `src/tui/input.rs` | | 5 | -- | `src/tui/input.rs`, `src/tui/render.rs`, `src/tui/mod.rs` | | 6 | `src/tools/ask_user.rs` | `src/core/types.rs`, `src/core/orchestrator.rs`, `src/tui/mod.rs`, `src/tui/input.rs`, `src/tui/render.rs`, `src/app/mod.rs` | | 7 | -- | `src/executor/mod.rs`, `src/core/orchestrator.rs`, `src/tui/render.rs`, `src/tui/input.rs` | | 8 | -- | `src/provider/claude.rs`, `src/core/types.rs`, `src/session/mod.rs`, `src/tui/render.rs` | --- ## New Dependencies | Crate | Phase | Reason | |-------|-------|--------| | `tarpc` | 1 | RPC service trait + in-process transport | | `uuid` | 2 | LogEvent ids | | `chrono` | 2 | Event timestamps (check if already transitive) | No other new dependencies are needed. All other required functionality (`serde_json`, `tokio`, `ratatui`, `tracing`) is already present.