skate/PLAN.md
Drew Galbraith 7420755800 Update the design and PLAN.md (#11)
Reviewed-on: #11
Co-authored-by: Drew Galbraith <drew@tiramisu.one>
Co-committed-by: Drew Galbraith <drew@tiramisu.one>
2026-03-14 21:52:38 +00:00

20 KiB

Skate Implementation Plan

This plan closes the gaps between the current codebase and the goals stated in DESIGN.md. The phases are ordered by dependency -- each phase builds on the previous.

Current State Summary

Phase 0 (core loop) is functionally complete: the TUI renders conversations, the orchestrator drives the Claude API, tools execute inside a Landlock sandbox, and the channel boundary between TUI and core is properly maintained.

The major gaps are:

  1. Tool executor tarpc interface -- the orchestrator calls tools directly rather than via a tarpc client/server split as DESIGN.md specifies. This is the biggest structural gap and a prerequisite for sub-agents (each agent gets its own client).
  2. Session logging (JSONL, tree-addressable) -- no session/ module exists yet.
  3. Token tracking -- counts are debug-logged but not surfaced to the user.
  4. TUI introspection -- tool blocks and thinking traces cannot be expanded/collapsed.
  5. Status bar is sparse -- no token totals, no activity mode, no network state badge.
  6. Planning Mode -- no dedicated harness instantiation with restricted sandbox.
  7. Sub-agents -- no spawning mechanism, no independent context windows.
  8. Space-bar leader key and which-key help overlay are absent.

Phase 1 -- Tool Executor tarpc Interface

Goal: Introduce the harness/executor split described in DESIGN.md. The executor owns the ToolRegistry and Sandbox; the orchestrator (harness) communicates with it exclusively through a tarpc client. In this phase the transport is in-process (tarpc's unbounded channel pair), laying the groundwork for out-of-process execution in a later phase.

This is the largest structural change in the plan. Every subsequent phase benefits from the cleaner boundary: sub-agents each get their own executor client (Phase 7), and the sandbox policy becomes a constructor argument to the executor rather than something threaded through the orchestrator.

1.1 Define the tarpc service

Create src/executor/mod.rs:

#[tarpc::service]
pub trait Executor {
    /// Return the full list of tools this executor exposes, including their
    /// JSON Schema input descriptors.  The harness calls this once at startup
    /// and caches the result for the lifetime of the conversation.
    async fn list_tools() -> Vec<ToolDefinition>;

    /// Invoke a single tool by name with a JSON-encoded argument object.
    /// Returns the text content to feed back to the model, or an error string
    /// that is also fed back (so the model can self-correct).
    async fn call_tool(name: String, input: serde_json::Value) -> Result<String, String>;
}

ToolDefinition is already defined in core/types.rs and is provider-agnostic -- no new types are needed on the wire.

1.2 Implement ExecutorServer

Still in src/executor/mod.rs, add:

pub struct ExecutorServer {
    registry: ToolRegistry,
    sandbox: Arc<Sandbox>,
}

impl ExecutorServer {
    pub fn new(registry: ToolRegistry, sandbox: Sandbox) -> Self { ... }
}

impl Executor for ExecutorServer {
    async fn list_tools(self, _: Context) -> Vec<ToolDefinition> {
        self.registry.definitions()
    }

    async fn call_tool(self, _: Context, name: String, input: Value) -> Result<String, String> {
        match self.registry.get(&name) {
            None => Err(format!("unknown tool: {name}")),
            Some(tool) => tool
                .execute(input, &self.sandbox)
                .await
                .map_err(|e| e.to_string()),
        }
    }
}

The Arc<Sandbox> is required because tarpc clones the server struct per request.

1.3 In-process transport helper

Add a function to src/executor/mod.rs (and re-export from src/app/mod.rs) that wires an ExecutorServer to a client over tarpc's in-memory channel:

/// Spawn an ExecutorServer on the current tokio runtime and return a client
/// connected to it via an in-process channel.  The server task runs until
/// the client is dropped.
pub fn spawn_local(server: ExecutorServer) -> ExecutorClient {
    let (client_transport, server_transport) = tarpc::transport::channel::unbounded();
    let server = tarpc::server::BaseChannel::with_defaults(server_transport);
    tokio::spawn(server.execute(ExecutorServer::serve(/* ... */)));
    ExecutorClient::new(tarpc::client::Config::default(), client_transport).spawn()
}

1.4 Refactor Orchestrator to use the client

Currently Orchestrator<P> holds ToolRegistry and Sandbox directly and calls tool.execute(input, &sandbox) in run_turn. Replace these fields with:

executor: ExecutorClient,
tool_definitions: Vec<ToolDefinition>,   // fetched once at construction

run_turn changes from direct tool dispatch to:

let result = self.executor
    .call_tool(context::current(), name, input)
    .await;

The tool_definitions vec is passed to provider.stream() instead of being built from the registry on each call.

1.5 Update app/mod.rs

Replace the inline construction of ToolRegistry + Sandbox in app::run with:

let registry = build_tool_registry();
let sandbox  = Sandbox::new(policy, project_dir, enforcement)?;
let executor = executor::spawn_local(ExecutorServer::new(registry, sandbox));
let orchestrator = Orchestrator::new(provider, executor, system_prompt);

1.6 Tests

  • Unit: ExecutorServer::call_tool with a mock ToolRegistry returns correct output and maps errors to Err(String).
  • Integration: spawn_local -> client.call_tool round-trip through the in-process channel executes a real read_file against a temp dir.
  • Integration: existing orchestrator integration tests continue to pass after the refactor (the mock provider path is unchanged; only tool dispatch changes).

1.7 Files touched

Action File
New src/executor/mod.rs
Modified src/core/orchestrator.rs -- remove registry/sandbox, add executor client
Modified src/app/mod.rs -- construct executor, pass client to orchestrator
Modified Cargo.toml -- add tarpc with tokio1 feature

New dependency: tarpc (with tokio1 and serde-transport features).


Phase 2 -- Session Logging

Goal: Persist every event to a JSONL file. This is the foundation for token accounting, session resume, and future conversation branching.

1.1 Add src/session/ module

Create src/session/mod.rs with the following public surface:

pub struct SessionWriter { ... }

impl SessionWriter {
    /// Open (or create) a JSONL log at the given path in append mode.
    pub async fn open(path: &Path) -> Result<Self, SessionError>;

    /// Append one event.  Never rewrites history.
    pub async fn append(&self, event: &LogEvent) -> Result<(), SessionError>;
}

pub struct SessionReader { ... }

impl SessionReader {
    pub async fn load(path: &Path) -> Result<Vec<LogEvent>, SessionError>;
}

1.2 Define LogEvent

pub struct LogEvent {
    pub id: Uuid,
    pub parent_id: Option<Uuid>,
    pub timestamp: DateTime<Utc>,
    pub payload: LogPayload,
    pub token_usage: Option<TokenUsage>,
}

pub enum LogPayload {
    UserMessage { content: String },
    AssistantMessage { content: Vec<ContentBlock> },
    ToolCall { tool_name: String, input: serde_json::Value },
    ToolResult { tool_use_id: String, content: String, is_error: bool },
}

pub struct TokenUsage {
    pub input: u32,
    pub output: u32,
    pub cache_read: Option<u32>,
    pub cache_write: Option<u32>,
}

id and parent_id form a tree that enables future branching. For now the conversation is linear so parent_id is always the id of the previous event.

1.3 Wire into Orchestrator

  • Orchestrator holds an Option<SessionWriter>.
  • Every time the orchestrator pushes to ConversationHistory it also appends a LogEvent. Token counts from StreamEvent::InputTokens / OutputTokens are stored on the final assistant event of each turn.
  • Session file lives at .skate/sessions/<timestamp>.jsonl.

1.4 Tests

  • Unit: SessionWriter::append then SessionReader::load round-trips all payload variants.
  • Unit: parent_id chain is correct across a simulated multi-turn exchange.
  • Integration: run the orchestrator with a mock provider against a temp dir; assert the JSONL file is written.

Phase 3 -- Token Tracking & Status Bar

Goal: Surface token usage in the TUI per-turn and cumulatively.

3.1 Per-turn token counts in UIEvent

Add a variant to UIEvent:

UIEvent::TurnComplete { input_tokens: u32, output_tokens: u32 }

The orchestrator already receives StreamEvent::InputTokens and OutputTokens; it should accumulate them during a turn and emit them in TurnComplete.

3.2 AppState token counters

Add to AppState:

pub turn_input_tokens: u32,
pub turn_output_tokens: u32,
pub total_input_tokens: u64,
pub total_output_tokens: u64,

events.rs updates these on TurnComplete.

3.3 Status bar redesign

The status bar currently shows only the mode indicator. Expand it to four sections:

[ MODE ] [ ACTIVITY ]          [ i:1234 o:567 | total i:9999 o:2345 ] [ NET: off ]
  • MODE -- Normal / Insert / Command
  • ACTIVITY -- Plan / Execute (Phase 4 adds Plan; for now always "Execute")
  • Tokens -- per-turn input/output, then session cumulative
  • NET -- on (green) or off (red) reflecting network_allowed

Update render.rs to implement this layout using Ratatui Layout::horizontal.

3.4 Tests

  • Unit: AppState accumulates totals correctly across multiple TurnComplete events.
  • TUI snapshot test (TestBackend): status bar renders all four sections with correct content after a synthetic TurnComplete.

Phase 4 -- TUI Introspection (Expand/Collapse)

Goal: Support progressive disclosure -- tool calls and thinking traces start collapsed; the user can expand them.

4.1 Block model

Replace the flat Vec<DisplayMessage> in AppState with a Vec<DisplayBlock>:

pub enum DisplayBlock {
    UserMessage { content: String },
    AssistantText { content: String },
    ToolCall {
        display: ToolDisplay,
        result: Option<String>,
        expanded: bool,
    },
    Error { message: String },
}

4.2 Navigation in Normal mode

Add block-level cursor to AppState:

pub focused_block: Option<usize>,

Keybindings (Normal mode):

Key Action
[ Move focus to previous block
] Move focus to next block
Enter or Space Toggle expanded on focused ToolCall block
j / k Line scroll (unchanged)

The focused block is highlighted with a distinct border color.

4.3 Render changes

render.rs must calculate the height of each DisplayBlock depending on whether it is collapsed (1-2 summary lines) or expanded (full content). The scroll offset operates on pixel-rows, not message indices.

Collapsed tool call shows: > tool_name(arg_summary) -- result_summary Expanded tool call shows: full input and output as formatted by tool_display.rs.

4.4 Tests

  • Unit: toggling expanded on a ToolCall block changes height calculation.
  • TUI snapshot: collapsed vs expanded render output for WriteFile and ShellExec.

Phase 5 -- Space-bar Leader Key & Which-Key Overlay

Goal: Support vim-style <Space> leader chords for configuration actions. This replaces the :net on / :net off text commands with discoverable hotkeys.

5.1 Leader key state machine

Extend AppState with:

pub leader_active: bool,
pub leader_timeout: Option<Instant>,

In Normal mode, pressing Space sets leader_active = true and starts a 1-second timeout. The next key is dispatched through the chord table. If the timeout fires or an unbound key is pressed, leader mode is cancelled with a brief status message.

5.2 Initial chord table

Chord Action
<Space> n Toggle network policy
<Space> c Clear history (:clear)
<Space> p Switch to Plan mode (Phase 5)
<Space> ? Toggle which-key overlay

5.3 Which-key overlay

A centered popup rendered over the output pane that lists all available chords and their descriptions. Rendered only when leader_active = true (after a short delay, ~200 ms, to avoid flicker on fast typists).

5.4 Remove :net on/off from command parser

Once leader-key network toggle is in place, remove the text-command duplicates to keep the command palette small and focused.

5.5 Tests

  • Unit: leader key state machine transitions (activate, timeout, chord match, cancel).
  • TUI snapshot: which-key overlay renders with correct chord list.

Phase 6 -- Planning Mode

Goal: A dedicated planning harness with restricted sandbox that writes a single plan file, plus a mechanism to pipe the plan into an execute harness.

6.1 Plan harness sandbox policy

In planning mode the orchestrator is instantiated with a SandboxPolicy that grants:

  • / -- ReadOnly (same as execute)
  • <project_dir>/.skate/plan.md -- ReadWrite (only this file)
  • Network -- off

All other write attempts fail with a sandbox permission error returned to the model.

6.2 Survey tool

Add a new tool ask_user that allows the model to present structured questions to the user during planning:

// Input schema
{
  "question": "string",
  "options": ["string"] | null   // null means free-text answer
}

The orchestrator sends a new UIEvent::SurveyRequest { question, options }. The TUI renders an inline prompt. The user's answer is sent back as a UserAction::SurveyResponse.

6.3 TUI activity mode

AppState gets:

pub activity: Activity,

pub enum Activity { Plan, Execute }

Switching activity (via <Space> p) instantiates a new orchestrator on a fresh channel pair. The old orchestrator is shut down cleanly. The status bar ACTIVITY section updates.

6.4 Plan -> Execute handoff

When the user is satisfied with the plan (<Space> x or :exec):

  1. TUI reads .skate/plan.md.
  2. Constructs a new system prompt: <original system prompt>\n\n## Plan\n<plan content>.
  3. Instantiates an Execute orchestrator with the full sandbox policy and the augmented system prompt.
  4. Transitions activity to Execute.

The old Plan orchestrator is dropped.

6.5 Edit plan in $EDITOR

Hotkey <Space> e (or :edit-plan) suspends the TUI (restores terminal), opens $EDITOR on .skate/plan.md, then resumes the TUI after the editor exits.

6.6 Tests

  • Integration: plan harness rejects write to a file other than plan.md.
  • Integration: survey tool round-trip through channel boundary.
  • Unit: plan -> execute handoff produces correct augmented system prompt.

Phase 7 -- Sub-Agents

Goal: The model can spawn independent sub-agents with their own context windows. Results are summarised and returned to the parent.

7.1 spawn_agent tool

Add a new tool with input schema:

{
  "task": "string",           // instruction for the sub-agent
  "sandbox": {                // optional policy overrides
    "network": bool,
    "extra_write_paths": ["string"]
  }
}

7.2 Sub-agent lifecycle

When spawn_agent executes:

  1. Create a new Orchestrator with an independent conversation history.
  2. The sub-agent's system prompt is the parent's system prompt plus the task description.
  3. The sub-agent runs autonomously (no user interaction) until it emits a UserAction::Quit equivalent or hits MAX_TOOL_ITERATIONS.
  4. The final assistant message is returned as the tool result (the "summary").
  5. The sub-agent's session is logged to a child JSONL file linked to the parent session by a parent_session_id field.

7.3 TUI sub-agent view

The agent tree is accessible via <Space> a. A side panel shows:

Parent
 +-- sub-agent 1  [running]
 +-- sub-agent 2  [done]

Pressing Enter on a sub-agent opens a read-only replay of its conversation (scroll only, no input). This is a stretch goal within this phase -- the core spawning mechanism is the priority.

7.4 Tests

  • Integration: spawn_agent with a mock provider runs to completion and returns a summary string.
  • Unit: sub-agent session file has correct parent_session_id link.
  • Unit: MAX_TOOL_ITERATIONS limit is respected within sub-agents.

In this phase spawn_agent gains a natural implementation: it calls executor::spawn_local with a new ExecutorServer configured for the child policy, constructs a new Orchestrator with that client, and runs it to completion. The tarpc boundary from Phase 1 makes this straightforward.


Phase 8 -- Prompt Caching

Goal: Use Anthropic's prompt caching to reduce cost and latency on long conversations. DESIGN.md notes this as a desired property of message construction.

8.1 Cache breakpoints

The Anthropic API supports "cache_control": {"type": "ephemeral"} on message content blocks. The optimal strategy is to mark the last user message of the longest stable prefix as a cache write point.

In provider/claude.rs, when serializing the messages array:

  • Mark the system prompt content block with cache_control (it never changes).
  • Mark the penultimate user message with cache_control (the conversation history that is stable for the current turn).

8.2 Cache token tracking

The TokenUsage struct in session/ already reserves cache_read and cache_write fields. StreamEvent must be extended:

StreamEvent::CacheReadTokens(u32),
StreamEvent::CacheWriteTokens(u32),

The Anthropic message_start event contains usage.cache_read_input_tokens and usage.cache_creation_input_tokens. Parse these and emit the new variants.

8.3 Status bar update

Add cache tokens to the status bar display: i:1234(c:800) o:567.

8.4 Tests

  • Provider unit test: replay a fixture that contains cache token fields; assert the new StreamEvent variants are emitted.
  • Snapshot test: status bar renders cache token counts correctly.

Dependency Graph

Phase 1 (tarpc executor)
    |
    +-- Phase 2 (session logging) -- orchestrator refactor is complete
    |       |
    |       +-- Phase 3 (token tracking) -- requires session TokenUsage struct
    |       |
    |       +-- Phase 7 (sub-agents) -- requires session parent_session_id
    |
    +-- Phase 7 (sub-agents) -- spawn_local reuse is natural after Phase 1

Phase 4 (expand/collapse) -- independent, can be done alongside Phase 3

Phase 5 (leader key) -- independent, prerequisite for Phase 6

Phase 6 (planning mode) -- requires Phase 5 (leader key chord <Space> p)
                        -- benefits from Phase 1 (separate executor per activity)

Phase 8 (prompt caching) -- requires Phase 3 (cache token display)

Recommended order: 1 -> 2 -> 3 -> 4 -> 5 -> 6 -> 8, with 7 after 2 and 6.


Files Touched Per Phase

Phase New Files Modified Files
1 src/executor/mod.rs src/core/orchestrator.rs, src/core/types.rs, src/app/mod.rs, Cargo.toml
2 src/session/mod.rs src/core/orchestrator.rs, src/app/mod.rs
3 -- src/core/types.rs, src/core/orchestrator.rs, src/tui/events.rs, src/tui/render.rs
4 -- src/tui/mod.rs, src/tui/render.rs, src/tui/events.rs, src/tui/input.rs
5 -- src/tui/input.rs, src/tui/render.rs, src/tui/mod.rs
6 src/tools/ask_user.rs src/core/types.rs, src/core/orchestrator.rs, src/tui/mod.rs, src/tui/input.rs, src/tui/render.rs, src/app/mod.rs
7 -- src/executor/mod.rs, src/core/orchestrator.rs, src/tui/render.rs, src/tui/input.rs
8 -- src/provider/claude.rs, src/core/types.rs, src/session/mod.rs, src/tui/render.rs

New Dependencies

Crate Phase Reason
tarpc 1 RPC service trait + in-process transport
uuid 2 LogEvent ids
chrono 2 Event timestamps (check if already transitive)

No other new dependencies are needed. All other required functionality (serde_json, tokio, ratatui, tracing) is already present.