Reviewed-on: #11 Co-authored-by: Drew Galbraith <drew@tiramisu.one> Co-committed-by: Drew Galbraith <drew@tiramisu.one>
20 KiB
Skate Implementation Plan
This plan closes the gaps between the current codebase and the goals stated in DESIGN.md. The phases are ordered by dependency -- each phase builds on the previous.
Current State Summary
Phase 0 (core loop) is functionally complete: the TUI renders conversations, the orchestrator drives the Claude API, tools execute inside a Landlock sandbox, and the channel boundary between TUI and core is properly maintained.
The major gaps are:
- Tool executor tarpc interface -- the orchestrator calls tools directly rather than via a tarpc client/server split as DESIGN.md specifies. This is the biggest structural gap and a prerequisite for sub-agents (each agent gets its own client).
- Session logging (JSONL, tree-addressable) -- no
session/module exists yet. - Token tracking -- counts are debug-logged but not surfaced to the user.
- TUI introspection -- tool blocks and thinking traces cannot be expanded/collapsed.
- Status bar is sparse -- no token totals, no activity mode, no network state badge.
- Planning Mode -- no dedicated harness instantiation with restricted sandbox.
- Sub-agents -- no spawning mechanism, no independent context windows.
- Space-bar leader key and which-key help overlay are absent.
Phase 1 -- Tool Executor tarpc Interface
Goal: Introduce the harness/executor split described in DESIGN.md. The executor
owns the ToolRegistry and Sandbox; the orchestrator (harness) communicates with
it exclusively through a tarpc client. In this phase the transport is in-process
(tarpc's unbounded channel pair), laying the groundwork for out-of-process execution
in a later phase.
This is the largest structural change in the plan. Every subsequent phase benefits from the cleaner boundary: sub-agents each get their own executor client (Phase 7), and the sandbox policy becomes a constructor argument to the executor rather than something threaded through the orchestrator.
1.1 Define the tarpc service
Create src/executor/mod.rs:
#[tarpc::service]
pub trait Executor {
/// Return the full list of tools this executor exposes, including their
/// JSON Schema input descriptors. The harness calls this once at startup
/// and caches the result for the lifetime of the conversation.
async fn list_tools() -> Vec<ToolDefinition>;
/// Invoke a single tool by name with a JSON-encoded argument object.
/// Returns the text content to feed back to the model, or an error string
/// that is also fed back (so the model can self-correct).
async fn call_tool(name: String, input: serde_json::Value) -> Result<String, String>;
}
ToolDefinition is already defined in core/types.rs and is provider-agnostic --
no new types are needed on the wire.
1.2 Implement ExecutorServer
Still in src/executor/mod.rs, add:
pub struct ExecutorServer {
registry: ToolRegistry,
sandbox: Arc<Sandbox>,
}
impl ExecutorServer {
pub fn new(registry: ToolRegistry, sandbox: Sandbox) -> Self { ... }
}
impl Executor for ExecutorServer {
async fn list_tools(self, _: Context) -> Vec<ToolDefinition> {
self.registry.definitions()
}
async fn call_tool(self, _: Context, name: String, input: Value) -> Result<String, String> {
match self.registry.get(&name) {
None => Err(format!("unknown tool: {name}")),
Some(tool) => tool
.execute(input, &self.sandbox)
.await
.map_err(|e| e.to_string()),
}
}
}
The Arc<Sandbox> is required because tarpc clones the server struct per request.
1.3 In-process transport helper
Add a function to src/executor/mod.rs (and re-export from src/app/mod.rs) that
wires an ExecutorServer to a client over tarpc's in-memory channel:
/// Spawn an ExecutorServer on the current tokio runtime and return a client
/// connected to it via an in-process channel. The server task runs until
/// the client is dropped.
pub fn spawn_local(server: ExecutorServer) -> ExecutorClient {
let (client_transport, server_transport) = tarpc::transport::channel::unbounded();
let server = tarpc::server::BaseChannel::with_defaults(server_transport);
tokio::spawn(server.execute(ExecutorServer::serve(/* ... */)));
ExecutorClient::new(tarpc::client::Config::default(), client_transport).spawn()
}
1.4 Refactor Orchestrator to use the client
Currently Orchestrator<P> holds ToolRegistry and Sandbox directly and calls
tool.execute(input, &sandbox) in run_turn. Replace these fields with:
executor: ExecutorClient,
tool_definitions: Vec<ToolDefinition>, // fetched once at construction
run_turn changes from direct tool dispatch to:
let result = self.executor
.call_tool(context::current(), name, input)
.await;
The tool_definitions vec is passed to provider.stream() instead of being built
from the registry on each call.
1.5 Update app/mod.rs
Replace the inline construction of ToolRegistry + Sandbox in app::run with:
let registry = build_tool_registry();
let sandbox = Sandbox::new(policy, project_dir, enforcement)?;
let executor = executor::spawn_local(ExecutorServer::new(registry, sandbox));
let orchestrator = Orchestrator::new(provider, executor, system_prompt);
1.6 Tests
- Unit:
ExecutorServer::call_toolwith a mockToolRegistryreturns correct output and maps errors toErr(String). - Integration:
spawn_local->client.call_toolround-trip through the in-process channel executes a realread_fileagainst a temp dir. - Integration: existing orchestrator integration tests continue to pass after the refactor (the mock provider path is unchanged; only tool dispatch changes).
1.7 Files touched
| Action | File |
|---|---|
| New | src/executor/mod.rs |
| Modified | src/core/orchestrator.rs -- remove registry/sandbox, add executor client |
| Modified | src/app/mod.rs -- construct executor, pass client to orchestrator |
| Modified | Cargo.toml -- add tarpc with tokio1 feature |
New dependency: tarpc (with tokio1 and serde-transport features).
Phase 2 -- Session Logging
Goal: Persist every event to a JSONL file. This is the foundation for token accounting, session resume, and future conversation branching.
1.1 Add src/session/ module
Create src/session/mod.rs with the following public surface:
pub struct SessionWriter { ... }
impl SessionWriter {
/// Open (or create) a JSONL log at the given path in append mode.
pub async fn open(path: &Path) -> Result<Self, SessionError>;
/// Append one event. Never rewrites history.
pub async fn append(&self, event: &LogEvent) -> Result<(), SessionError>;
}
pub struct SessionReader { ... }
impl SessionReader {
pub async fn load(path: &Path) -> Result<Vec<LogEvent>, SessionError>;
}
1.2 Define LogEvent
pub struct LogEvent {
pub id: Uuid,
pub parent_id: Option<Uuid>,
pub timestamp: DateTime<Utc>,
pub payload: LogPayload,
pub token_usage: Option<TokenUsage>,
}
pub enum LogPayload {
UserMessage { content: String },
AssistantMessage { content: Vec<ContentBlock> },
ToolCall { tool_name: String, input: serde_json::Value },
ToolResult { tool_use_id: String, content: String, is_error: bool },
}
pub struct TokenUsage {
pub input: u32,
pub output: u32,
pub cache_read: Option<u32>,
pub cache_write: Option<u32>,
}
id and parent_id form a tree that enables future branching. For now the
conversation is linear so parent_id is always the id of the previous event.
1.3 Wire into Orchestrator
Orchestratorholds anOption<SessionWriter>.- Every time the orchestrator pushes to
ConversationHistoryit also appends aLogEvent. Token counts fromStreamEvent::InputTokens/OutputTokensare stored on the final assistant event of each turn. - Session file lives at
.skate/sessions/<timestamp>.jsonl.
1.4 Tests
- Unit:
SessionWriter::appendthenSessionReader::loadround-trips all payload variants. - Unit: parent_id chain is correct across a simulated multi-turn exchange.
- Integration: run the orchestrator with a mock provider against a temp dir; assert the JSONL file is written.
Phase 3 -- Token Tracking & Status Bar
Goal: Surface token usage in the TUI per-turn and cumulatively.
3.1 Per-turn token counts in UIEvent
Add a variant to UIEvent:
UIEvent::TurnComplete { input_tokens: u32, output_tokens: u32 }
The orchestrator already receives StreamEvent::InputTokens and OutputTokens;
it should accumulate them during a turn and emit them in TurnComplete.
3.2 AppState token counters
Add to AppState:
pub turn_input_tokens: u32,
pub turn_output_tokens: u32,
pub total_input_tokens: u64,
pub total_output_tokens: u64,
events.rs updates these on TurnComplete.
3.3 Status bar redesign
The status bar currently shows only the mode indicator. Expand it to four sections:
[ MODE ] [ ACTIVITY ] [ i:1234 o:567 | total i:9999 o:2345 ] [ NET: off ]
- MODE -- Normal / Insert / Command
- ACTIVITY -- Plan / Execute (Phase 4 adds Plan; for now always "Execute")
- Tokens -- per-turn input/output, then session cumulative
- NET --
on(green) oroff(red) reflectingnetwork_allowed
Update render.rs to implement this layout using Ratatui Layout::horizontal.
3.4 Tests
- Unit:
AppStateaccumulates totals correctly across multipleTurnCompleteevents. - TUI snapshot test (TestBackend): status bar renders all four sections with correct
content after a synthetic
TurnComplete.
Phase 4 -- TUI Introspection (Expand/Collapse)
Goal: Support progressive disclosure -- tool calls and thinking traces start collapsed; the user can expand them.
4.1 Block model
Replace the flat Vec<DisplayMessage> in AppState with a Vec<DisplayBlock>:
pub enum DisplayBlock {
UserMessage { content: String },
AssistantText { content: String },
ToolCall {
display: ToolDisplay,
result: Option<String>,
expanded: bool,
},
Error { message: String },
}
4.2 Navigation in Normal mode
Add block-level cursor to AppState:
pub focused_block: Option<usize>,
Keybindings (Normal mode):
| Key | Action |
|---|---|
[ |
Move focus to previous block |
] |
Move focus to next block |
Enter or Space |
Toggle expanded on focused ToolCall block |
j / k |
Line scroll (unchanged) |
The focused block is highlighted with a distinct border color.
4.3 Render changes
render.rs must calculate the height of each DisplayBlock depending on whether
it is collapsed (1-2 summary lines) or expanded (full content). The scroll offset
operates on pixel-rows, not message indices.
Collapsed tool call shows: > tool_name(arg_summary) -- result_summary
Expanded tool call shows: full input and output as formatted by tool_display.rs.
4.4 Tests
- Unit: toggling
expandedon aToolCallblock changes height calculation. - TUI snapshot: collapsed vs expanded render output for
WriteFileandShellExec.
Phase 5 -- Space-bar Leader Key & Which-Key Overlay
Goal: Support vim-style <Space> leader chords for configuration actions. This
replaces the :net on / :net off text commands with discoverable hotkeys.
5.1 Leader key state machine
Extend AppState with:
pub leader_active: bool,
pub leader_timeout: Option<Instant>,
In Normal mode, pressing Space sets leader_active = true and starts a 1-second
timeout. The next key is dispatched through the chord table. If the timeout fires
or an unbound key is pressed, leader mode is cancelled with a brief status message.
5.2 Initial chord table
| Chord | Action |
|---|---|
<Space> n |
Toggle network policy |
<Space> c |
Clear history (:clear) |
<Space> p |
Switch to Plan mode (Phase 5) |
<Space> ? |
Toggle which-key overlay |
5.3 Which-key overlay
A centered popup rendered over the output pane that lists all available chords and
their descriptions. Rendered only when leader_active = true (after a short delay,
~200 ms, to avoid flicker on fast typists).
5.4 Remove :net on/off from command parser
Once leader-key network toggle is in place, remove the text-command duplicates to keep the command palette small and focused.
5.5 Tests
- Unit: leader key state machine transitions (activate, timeout, chord match, cancel).
- TUI snapshot: which-key overlay renders with correct chord list.
Phase 6 -- Planning Mode
Goal: A dedicated planning harness with restricted sandbox that writes a single plan file, plus a mechanism to pipe the plan into an execute harness.
6.1 Plan harness sandbox policy
In planning mode the orchestrator is instantiated with a SandboxPolicy that grants:
/-- ReadOnly (same as execute)<project_dir>/.skate/plan.md-- ReadWrite (only this file)- Network -- off
All other write attempts fail with a sandbox permission error returned to the model.
6.2 Survey tool
Add a new tool ask_user that allows the model to present structured questions to
the user during planning:
// Input schema
{
"question": "string",
"options": ["string"] | null // null means free-text answer
}
The orchestrator sends a new UIEvent::SurveyRequest { question, options }. The TUI
renders an inline prompt. The user's answer is sent back as a UserAction::SurveyResponse.
6.3 TUI activity mode
AppState gets:
pub activity: Activity,
pub enum Activity { Plan, Execute }
Switching activity (via <Space> p) instantiates a new orchestrator on a fresh
channel pair. The old orchestrator is shut down cleanly. The status bar ACTIVITY
section updates.
6.4 Plan -> Execute handoff
When the user is satisfied with the plan (<Space> x or :exec):
- TUI reads
.skate/plan.md. - Constructs a new system prompt:
<original system prompt>\n\n## Plan\n<plan content>. - Instantiates an Execute orchestrator with the full sandbox policy and the augmented system prompt.
- Transitions
activitytoExecute.
The old Plan orchestrator is dropped.
6.5 Edit plan in $EDITOR
Hotkey <Space> e (or :edit-plan) suspends the TUI (restores terminal), opens
$EDITOR on .skate/plan.md, then resumes the TUI after the editor exits.
6.6 Tests
- Integration: plan harness rejects write to a file other than plan.md.
- Integration: survey tool round-trip through channel boundary.
- Unit: plan -> execute handoff produces correct augmented system prompt.
Phase 7 -- Sub-Agents
Goal: The model can spawn independent sub-agents with their own context windows. Results are summarised and returned to the parent.
7.1 spawn_agent tool
Add a new tool with input schema:
{
"task": "string", // instruction for the sub-agent
"sandbox": { // optional policy overrides
"network": bool,
"extra_write_paths": ["string"]
}
}
7.2 Sub-agent lifecycle
When spawn_agent executes:
- Create a new
Orchestratorwith an independent conversation history. - The sub-agent's system prompt is the parent's system prompt plus the task description.
- The sub-agent runs autonomously (no user interaction) until it emits a
UserAction::Quitequivalent or hitsMAX_TOOL_ITERATIONS. - The final assistant message is returned as the tool result (the "summary").
- The sub-agent's session is logged to a child JSONL file linked to the parent
session by a
parent_session_idfield.
7.3 TUI sub-agent view
The agent tree is accessible via <Space> a. A side panel shows:
Parent
+-- sub-agent 1 [running]
+-- sub-agent 2 [done]
Pressing Enter on a sub-agent opens a read-only replay of its conversation (scroll only, no input). This is a stretch goal within this phase -- the core spawning mechanism is the priority.
7.4 Tests
- Integration: spawn_agent with a mock provider runs to completion and returns a summary string.
- Unit: sub-agent session file has correct parent_session_id link.
- Unit: MAX_TOOL_ITERATIONS limit is respected within sub-agents.
In this phase spawn_agent gains a natural implementation: it calls
executor::spawn_local with a new ExecutorServer configured for the child policy,
constructs a new Orchestrator with that client, and runs it to completion. The
tarpc boundary from Phase 1 makes this straightforward.
Phase 8 -- Prompt Caching
Goal: Use Anthropic's prompt caching to reduce cost and latency on long conversations. DESIGN.md notes this as a desired property of message construction.
8.1 Cache breakpoints
The Anthropic API supports "cache_control": {"type": "ephemeral"} on message
content blocks. The optimal strategy is to mark the last user message of the longest
stable prefix as a cache write point.
In provider/claude.rs, when serializing the messages array:
- Mark the system prompt content block with
cache_control(it never changes). - Mark the penultimate user message with
cache_control(the conversation history that is stable for the current turn).
8.2 Cache token tracking
The TokenUsage struct in session/ already reserves cache_read and
cache_write fields. StreamEvent must be extended:
StreamEvent::CacheReadTokens(u32),
StreamEvent::CacheWriteTokens(u32),
The Anthropic message_start event contains usage.cache_read_input_tokens and
usage.cache_creation_input_tokens. Parse these and emit the new variants.
8.3 Status bar update
Add cache tokens to the status bar display: i:1234(c:800) o:567.
8.4 Tests
- Provider unit test: replay a fixture that contains cache token fields; assert the new StreamEvent variants are emitted.
- Snapshot test: status bar renders cache token counts correctly.
Dependency Graph
Phase 1 (tarpc executor)
|
+-- Phase 2 (session logging) -- orchestrator refactor is complete
| |
| +-- Phase 3 (token tracking) -- requires session TokenUsage struct
| |
| +-- Phase 7 (sub-agents) -- requires session parent_session_id
|
+-- Phase 7 (sub-agents) -- spawn_local reuse is natural after Phase 1
Phase 4 (expand/collapse) -- independent, can be done alongside Phase 3
Phase 5 (leader key) -- independent, prerequisite for Phase 6
Phase 6 (planning mode) -- requires Phase 5 (leader key chord <Space> p)
-- benefits from Phase 1 (separate executor per activity)
Phase 8 (prompt caching) -- requires Phase 3 (cache token display)
Recommended order: 1 -> 2 -> 3 -> 4 -> 5 -> 6 -> 8, with 7 after 2 and 6.
Files Touched Per Phase
| Phase | New Files | Modified Files |
|---|---|---|
| 1 | src/executor/mod.rs |
src/core/orchestrator.rs, src/core/types.rs, src/app/mod.rs, Cargo.toml |
| 2 | src/session/mod.rs |
src/core/orchestrator.rs, src/app/mod.rs |
| 3 | -- | src/core/types.rs, src/core/orchestrator.rs, src/tui/events.rs, src/tui/render.rs |
| 4 | -- | src/tui/mod.rs, src/tui/render.rs, src/tui/events.rs, src/tui/input.rs |
| 5 | -- | src/tui/input.rs, src/tui/render.rs, src/tui/mod.rs |
| 6 | src/tools/ask_user.rs |
src/core/types.rs, src/core/orchestrator.rs, src/tui/mod.rs, src/tui/input.rs, src/tui/render.rs, src/app/mod.rs |
| 7 | -- | src/executor/mod.rs, src/core/orchestrator.rs, src/tui/render.rs, src/tui/input.rs |
| 8 | -- | src/provider/claude.rs, src/core/types.rs, src/session/mod.rs, src/tui/render.rs |
New Dependencies
| Crate | Phase | Reason |
|---|---|---|
tarpc |
1 | RPC service trait + in-process transport |
uuid |
2 | LogEvent ids |
chrono |
2 | Event timestamps (check if already transitive) |
No other new dependencies are needed. All other required functionality
(serde_json, tokio, ratatui, tracing) is already present.