# Skate Implementation Plan

This plan closes the gaps between the current codebase and the goals stated in DESIGN.md.
The phases are ordered by dependency -- each phase builds on the previous.

## Current State Summary

Phase 0 (core loop) is functionally complete: the TUI renders conversations, the
orchestrator drives the Claude API, tools execute inside a Landlock sandbox, and the
channel boundary between TUI and core is properly maintained.

The major gaps are:

1. Tool executor tarpc interface -- the orchestrator calls tools directly rather than
   via a tarpc client/server split as DESIGN.md specifies. This is the biggest
   structural gap and a prerequisite for sub-agents (each agent gets its own client).
2. Session logging (JSONL, tree-addressable) -- no `session/` module exists yet.
3. Token tracking -- counts are debug-logged but not surfaced to the user.
4. TUI introspection -- tool blocks and thinking traces cannot be expanded/collapsed.
5. Status bar is sparse -- no token totals, no activity mode, no network state badge.
6. Planning Mode -- no dedicated harness instantiation with restricted sandbox.
7. Sub-agents -- no spawning mechanism, no independent context windows.
8. Space-bar leader key and which-key help overlay are absent.

---

## Phase 1 -- Tool Executor tarpc Interface

**Goal:** Introduce the harness/executor split described in DESIGN.md. The executor
owns the `ToolRegistry` and `Sandbox`; the orchestrator (harness) communicates with
it exclusively through a tarpc client. In this phase the transport is in-process
(tarpc's unbounded channel pair), laying the groundwork for out-of-process execution
in a later phase.

This is the largest structural change in the plan. Every subsequent phase benefits
from the cleaner boundary: sub-agents each get their own executor client (Phase 7),
and the sandbox policy becomes a constructor argument to the executor rather than
something threaded through the orchestrator.

### 1.1  Define the tarpc service

Create `src/executor/mod.rs`:

```rust
#[tarpc::service]
pub trait Executor {
    /// Return the full list of tools this executor exposes, including their
    /// JSON Schema input descriptors.  The harness calls this once at startup
    /// and caches the result for the lifetime of the conversation.
    async fn list_tools() -> Vec<ToolDefinition>;

    /// Invoke a single tool by name with a JSON-encoded argument object.
    /// Returns the text content to feed back to the model, or an error string
    /// that is also fed back (so the model can self-correct).
    async fn call_tool(name: String, input: serde_json::Value) -> Result<String, String>;
}
```

`ToolDefinition` is already defined in `core/types.rs` and is provider-agnostic --
no new types are needed on the wire.

### 1.2  Implement `ExecutorServer`

Still in `src/executor/mod.rs`, add:

```rust
pub struct ExecutorServer {
    registry: ToolRegistry,
    sandbox: Arc<Sandbox>,
}

impl ExecutorServer {
    pub fn new(registry: ToolRegistry, sandbox: Sandbox) -> Self { ... }
}

impl Executor for ExecutorServer {
    async fn list_tools(self, _: Context) -> Vec<ToolDefinition> {
        self.registry.definitions()
    }

    async fn call_tool(self, _: Context, name: String, input: Value) -> Result<String, String> {
        match self.registry.get(&name) {
            None => Err(format!("unknown tool: {name}")),
            Some(tool) => tool
                .execute(input, &self.sandbox)
                .await
                .map_err(|e| e.to_string()),
        }
    }
}
```

The `Arc<Sandbox>` is required because tarpc clones the server struct per request.

### 1.3  In-process transport helper

Add a function to `src/executor/mod.rs` (and re-export from `src/app/mod.rs`) that
wires an `ExecutorServer` to a client over tarpc's in-memory channel:

```rust
/// Spawn an ExecutorServer on the current tokio runtime and return a client
/// connected to it via an in-process channel.  The server task runs until
/// the client is dropped.
pub fn spawn_local(server: ExecutorServer) -> ExecutorClient {
    let (client_transport, server_transport) = tarpc::transport::channel::unbounded();
    let server = tarpc::server::BaseChannel::with_defaults(server_transport);
    tokio::spawn(server.execute(ExecutorServer::serve(/* ... */)));
    ExecutorClient::new(tarpc::client::Config::default(), client_transport).spawn()
}
```

### 1.4  Refactor `Orchestrator` to use the client

Currently `Orchestrator<P>` holds `ToolRegistry` and `Sandbox` directly and calls
`tool.execute(input, &sandbox)` in `run_turn`. Replace these fields with:

```rust
executor: ExecutorClient,
tool_definitions: Vec<ToolDefinition>,   // fetched once at construction
```

`run_turn` changes from direct tool dispatch to:

```rust
let result = self.executor
    .call_tool(context::current(), name, input)
    .await;
```

The `tool_definitions` vec is passed to `provider.stream()` instead of being built
from the registry on each call.

### 1.5  Update `app/mod.rs`

Replace the inline construction of `ToolRegistry + Sandbox` in `app::run` with:

```rust
let registry = build_tool_registry();
let sandbox  = Sandbox::new(policy, project_dir, enforcement)?;
let executor = executor::spawn_local(ExecutorServer::new(registry, sandbox));
let orchestrator = Orchestrator::new(provider, executor, system_prompt);
```

### 1.6  Tests

- Unit: `ExecutorServer::call_tool` with a mock `ToolRegistry` returns correct
  output and maps errors to `Err(String)`.
- Integration: `spawn_local` -> `client.call_tool` round-trip through the in-process
  channel executes a real `read_file` against a temp dir.
- Integration: existing orchestrator integration tests continue to pass after the
  refactor (the mock provider path is unchanged; only tool dispatch changes).

### 1.7  Files touched

| Action | File |
|--------|------|
| New | `src/executor/mod.rs` |
| Modified | `src/core/orchestrator.rs` -- remove registry/sandbox, add executor client |
| Modified | `src/app/mod.rs` -- construct executor, pass client to orchestrator |
| Modified | `Cargo.toml` -- add `tarpc` with `tokio1` feature |

New dependency: `tarpc` (with `tokio1` and `serde-transport` features).

---

## Phase 2 -- Session Logging

**Goal:** Persist every event to a JSONL file. This is the foundation for token
accounting, session resume, and future conversation branching.

### 1.1  Add `src/session/` module

Create `src/session/mod.rs` with the following public surface:

```rust
pub struct SessionWriter { ... }

impl SessionWriter {
    /// Open (or create) a JSONL log at the given path in append mode.
    pub async fn open(path: &Path) -> Result<Self, SessionError>;

    /// Append one event.  Never rewrites history.
    pub async fn append(&self, event: &LogEvent) -> Result<(), SessionError>;
}

pub struct SessionReader { ... }

impl SessionReader {
    pub async fn load(path: &Path) -> Result<Vec<LogEvent>, SessionError>;
}
```

### 1.2  Define `LogEvent`

```rust
pub struct LogEvent {
    pub id: Uuid,
    pub parent_id: Option<Uuid>,
    pub timestamp: DateTime<Utc>,
    pub payload: LogPayload,
    pub token_usage: Option<TokenUsage>,
}

pub enum LogPayload {
    UserMessage { content: String },
    AssistantMessage { content: Vec<ContentBlock> },
    ToolCall { tool_name: String, input: serde_json::Value },
    ToolResult { tool_use_id: String, content: String, is_error: bool },
}

pub struct TokenUsage {
    pub input: u32,
    pub output: u32,
    pub cache_read: Option<u32>,
    pub cache_write: Option<u32>,
}
```

`id` and `parent_id` form a tree that enables future branching. For now the
conversation is linear so `parent_id` is always the id of the previous event.

### 1.3  Wire into Orchestrator

- `Orchestrator` holds an `Option<SessionWriter>`.
- Every time the orchestrator pushes to `ConversationHistory` it also appends a
  `LogEvent`. Token counts from `StreamEvent::InputTokens` / `OutputTokens` are
  stored on the final assistant event of each turn.
- Session file lives at `.skate/sessions/<timestamp>.jsonl`.

### 1.4  Tests

- Unit: `SessionWriter::append` then `SessionReader::load` round-trips all payload
  variants.
- Unit: parent_id chain is correct across a simulated multi-turn exchange.
- Integration: run the orchestrator with a mock provider against a temp dir; assert
  the JSONL file is written.

---

## Phase 3 -- Token Tracking & Status Bar

**Goal:** Surface token usage in the TUI per-turn and cumulatively.

### 3.1  Per-turn token counts in UIEvent

Add a variant to `UIEvent`:

```rust
UIEvent::TurnComplete { input_tokens: u32, output_tokens: u32 }
```

The orchestrator already receives `StreamEvent::InputTokens` and `OutputTokens`;
it should accumulate them during a turn and emit them in `TurnComplete`.

### 3.2  AppState token counters

Add to `AppState`:

```rust
pub turn_input_tokens: u32,
pub turn_output_tokens: u32,
pub total_input_tokens: u64,
pub total_output_tokens: u64,
```

`events.rs` updates these on `TurnComplete`.

### 3.3  Status bar redesign

The status bar currently shows only the mode indicator. Expand it to four sections:

```
[ MODE ] [ ACTIVITY ]          [ i:1234 o:567 | total i:9999 o:2345 ] [ NET: off ]
```

- **MODE** -- Normal / Insert / Command
- **ACTIVITY** -- Plan / Execute (Phase 4 adds Plan; for now always "Execute")
- **Tokens** -- per-turn input/output, then session cumulative
- **NET** -- `on` (green) or `off` (red) reflecting `network_allowed`

Update `render.rs` to implement this layout using Ratatui `Layout::horizontal`.

### 3.4  Tests

- Unit: `AppState` accumulates totals correctly across multiple `TurnComplete` events.
- TUI snapshot test (TestBackend): status bar renders all four sections with correct
  content after a synthetic `TurnComplete`.

---

## Phase 4 -- TUI Introspection (Expand/Collapse)

**Goal:** Support progressive disclosure -- tool calls and thinking traces start
collapsed; the user can expand them.

### 4.1  Block model

Replace the flat `Vec<DisplayMessage>` in `AppState` with a `Vec<DisplayBlock>`:

```rust
pub enum DisplayBlock {
    UserMessage { content: String },
    AssistantText { content: String },
    ToolCall {
        display: ToolDisplay,
        result: Option<String>,
        expanded: bool,
    },
    Error { message: String },
}
```

### 4.2  Navigation in Normal mode

Add block-level cursor to `AppState`:

```rust
pub focused_block: Option<usize>,
```

Keybindings (Normal mode):

| Key | Action |
|-----|--------|
| `[` | Move focus to previous block |
| `]` | Move focus to next block |
| `Enter` or `Space` | Toggle `expanded` on focused ToolCall block |
| `j` / `k` | Line scroll (unchanged) |

The focused block is highlighted with a distinct border color.

### 4.3  Render changes

`render.rs` must calculate the height of each `DisplayBlock` depending on whether
it is collapsed (1-2 summary lines) or expanded (full content). The scroll offset
operates on pixel-rows, not message indices.

Collapsed tool call shows: `> tool_name(arg_summary) -- result_summary`
Expanded tool call shows: full input and output as formatted by `tool_display.rs`.

### 4.4  Tests

- Unit: toggling `expanded` on a `ToolCall` block changes height calculation.
- TUI snapshot: collapsed vs expanded render output for `WriteFile` and `ShellExec`.

---

## Phase 5 -- Space-bar Leader Key & Which-Key Overlay

**Goal:** Support vim-style `<Space>` leader chords for configuration actions. This
replaces the `:net on` / `:net off` text commands with discoverable hotkeys.

### 5.1  Leader key state machine

Extend `AppState` with:

```rust
pub leader_active: bool,
pub leader_timeout: Option<Instant>,
```

In Normal mode, pressing `Space` sets `leader_active = true` and starts a 1-second
timeout. The next key is dispatched through the chord table. If the timeout fires
or an unbound key is pressed, leader mode is cancelled with a brief status message.

### 5.2  Initial chord table

| Chord | Action |
|-------|--------|
| `<Space> n` | Toggle network policy |
| `<Space> c` | Clear history (`:clear`) |
| `<Space> p` | Switch to Plan mode (Phase 5) |
| `<Space> ?` | Toggle which-key overlay |

### 5.3  Which-key overlay

A centered popup rendered over the output pane that lists all available chords and
their descriptions. Rendered only when `leader_active = true` (after a short delay,
~200 ms, to avoid flicker on fast typists).

### 5.4  Remove `:net on/off` from command parser

Once leader-key network toggle is in place, remove the text-command duplicates to
keep the command palette small and focused.

### 5.5  Tests

- Unit: leader key state machine transitions (activate, timeout, chord match, cancel).
- TUI snapshot: which-key overlay renders with correct chord list.

---

## Phase 6 -- Planning Mode

**Goal:** A dedicated planning harness with restricted sandbox that writes a single
plan file, plus a mechanism to pipe the plan into an execute harness.

### 6.1  Plan harness sandbox policy

In planning mode the orchestrator is instantiated with a `SandboxPolicy` that grants:

- `/` -- ReadOnly (same as execute)
- `<project_dir>/.skate/plan.md` -- ReadWrite (only this file)
- Network -- off

All other write attempts fail with a sandbox permission error returned to the model.

### 6.2  Survey tool

Add a new tool `ask_user` that allows the model to present structured questions to
the user during planning:

```rust
// Input schema
{
  "question": "string",
  "options": ["string"] | null   // null means free-text answer
}
```

The orchestrator sends a new `UIEvent::SurveyRequest { question, options }`. The TUI
renders an inline prompt. The user's answer is sent back as a `UserAction::SurveyResponse`.

### 6.3  TUI activity mode

`AppState` gets:

```rust
pub activity: Activity,

pub enum Activity { Plan, Execute }
```

Switching activity (via `<Space> p`) instantiates a new orchestrator on a fresh
channel pair. The old orchestrator is shut down cleanly. The status bar ACTIVITY
section updates.

### 6.4  Plan -> Execute handoff

When the user is satisfied with the plan (`<Space> x` or `:exec`):

1. TUI reads `.skate/plan.md`.
2. Constructs a new system prompt: `<original system prompt>\n\n## Plan\n<plan content>`.
3. Instantiates an Execute orchestrator with the full sandbox policy and the
   augmented system prompt.
4. Transitions `activity` to `Execute`.

The old Plan orchestrator is dropped.

### 6.5  Edit plan in $EDITOR

Hotkey `<Space> e` (or `:edit-plan`) suspends the TUI (restores terminal), opens
`$EDITOR` on `.skate/plan.md`, then resumes the TUI after the editor exits.

### 6.6  Tests

- Integration: plan harness rejects write to a file other than plan.md.
- Integration: survey tool round-trip through channel boundary.
- Unit: plan -> execute handoff produces correct augmented system prompt.

---

## Phase 7 -- Sub-Agents

**Goal:** The model can spawn independent sub-agents with their own context windows.
Results are summarised and returned to the parent.

### 7.1  `spawn_agent` tool

Add a new tool with input schema:

```rust
{
  "task": "string",           // instruction for the sub-agent
  "sandbox": {                // optional policy overrides
    "network": bool,
    "extra_write_paths": ["string"]
  }
}
```

### 7.2  Sub-agent lifecycle

When `spawn_agent` executes:

1. Create a new `Orchestrator` with an independent conversation history.
2. The sub-agent's system prompt is the parent's system prompt plus the task
   description.
3. The sub-agent runs autonomously (no user interaction) until it emits a
   `UserAction::Quit` equivalent or hits `MAX_TOOL_ITERATIONS`.
4. The final assistant message is returned as the tool result (the "summary").
5. The sub-agent's session is logged to a child JSONL file linked to the parent
   session by a `parent_session_id` field.

### 7.3  TUI sub-agent view

The agent tree is accessible via `<Space> a`. A side panel shows:

```
Parent
 +-- sub-agent 1  [running]
 +-- sub-agent 2  [done]
```

Pressing Enter on a sub-agent opens a read-only replay of its conversation (scroll
only, no input). This is a stretch goal within this phase -- the core spawning
mechanism is the priority.

### 7.4  Tests

- Integration: spawn_agent with a mock provider runs to completion and returns a
  summary string.
- Unit: sub-agent session file has correct parent_session_id link.
- Unit: MAX_TOOL_ITERATIONS limit is respected within sub-agents.

In this phase `spawn_agent` gains a natural implementation: it calls
`executor::spawn_local` with a new `ExecutorServer` configured for the child policy,
constructs a new `Orchestrator` with that client, and runs it to completion. The
tarpc boundary from Phase 1 makes this straightforward.

---

## Phase 8 -- Prompt Caching

**Goal:** Use Anthropic's prompt caching to reduce cost and latency on long
conversations. DESIGN.md notes this as a desired property of message construction.

### 8.1  Cache breakpoints

The Anthropic API supports `"cache_control": {"type": "ephemeral"}` on message
content blocks. The optimal strategy is to mark the last user message of the longest
stable prefix as a cache write point.

In `provider/claude.rs`, when serializing the messages array:

- Mark the system prompt content block with `cache_control` (it never changes).
- Mark the penultimate user message with `cache_control` (the conversation history
  that is stable for the current turn).

### 8.2  Cache token tracking

The `TokenUsage` struct in `session/` already reserves `cache_read` and
`cache_write` fields. `StreamEvent` must be extended:

```rust
StreamEvent::CacheReadTokens(u32),
StreamEvent::CacheWriteTokens(u32),
```

The Anthropic `message_start` event contains `usage.cache_read_input_tokens` and
`usage.cache_creation_input_tokens`. Parse these and emit the new variants.

### 8.3  Status bar update

Add cache tokens to the status bar display: `i:1234(c:800) o:567`.

### 8.4  Tests

- Provider unit test: replay a fixture that contains cache token fields; assert the
  new StreamEvent variants are emitted.
- Snapshot test: status bar renders cache token counts correctly.

---

## Dependency Graph

```
Phase 1 (tarpc executor)
    |
    +-- Phase 2 (session logging) -- orchestrator refactor is complete
    |       |
    |       +-- Phase 3 (token tracking) -- requires session TokenUsage struct
    |       |
    |       +-- Phase 7 (sub-agents) -- requires session parent_session_id
    |
    +-- Phase 7 (sub-agents) -- spawn_local reuse is natural after Phase 1

Phase 4 (expand/collapse) -- independent, can be done alongside Phase 3

Phase 5 (leader key) -- independent, prerequisite for Phase 6

Phase 6 (planning mode) -- requires Phase 5 (leader key chord <Space> p)
                        -- benefits from Phase 1 (separate executor per activity)

Phase 8 (prompt caching) -- requires Phase 3 (cache token display)
```

Recommended order: 1 -> 2 -> 3 -> 4 -> 5 -> 6 -> 8, with 7 after 2 and 6.

---

## Files Touched Per Phase

| Phase | New Files | Modified Files |
|-------|-----------|----------------|
| 1 | `src/executor/mod.rs` | `src/core/orchestrator.rs`, `src/core/types.rs`, `src/app/mod.rs`, `Cargo.toml` |
| 2 | `src/session/mod.rs` | `src/core/orchestrator.rs`, `src/app/mod.rs` |
| 3 | -- | `src/core/types.rs`, `src/core/orchestrator.rs`, `src/tui/events.rs`, `src/tui/render.rs` |
| 4 | -- | `src/tui/mod.rs`, `src/tui/render.rs`, `src/tui/events.rs`, `src/tui/input.rs` |
| 5 | -- | `src/tui/input.rs`, `src/tui/render.rs`, `src/tui/mod.rs` |
| 6 | `src/tools/ask_user.rs` | `src/core/types.rs`, `src/core/orchestrator.rs`, `src/tui/mod.rs`, `src/tui/input.rs`, `src/tui/render.rs`, `src/app/mod.rs` |
| 7 | -- | `src/executor/mod.rs`, `src/core/orchestrator.rs`, `src/tui/render.rs`, `src/tui/input.rs` |
| 8 | -- | `src/provider/claude.rs`, `src/core/types.rs`, `src/session/mod.rs`, `src/tui/render.rs` |

---

## New Dependencies

| Crate | Phase | Reason |
|-------|-------|--------|
| `tarpc` | 1 | RPC service trait + in-process transport |
| `uuid` | 2 | LogEvent ids |
| `chrono` | 2 | Event timestamps (check if already transitive) |

No other new dependencies are needed. All other required functionality
(`serde_json`, `tokio`, `ratatui`, `tracing`) is already present.