skate/PLAN.md
2026-03-01 18:53:02 -08:00

4.9 KiB

Implementation Plan

Phase 3: Tool Execution

Step 3.1: Enrich the content model

  • Replace ConversationMessage { role, content: String } with content-block model
  • Define ContentBlock enum: Text(String), ToolUse { id, name, input: Value }, ToolResult { tool_use_id, content: String, is_error: bool }
  • Change ConversationMessage.content from String to Vec<ContentBlock>
  • Add ConversationMessage::text(role, s) helper to keep existing call sites clean
  • Update serialization, orchestrator, tests, TUI display
  • Files: src/core/types.rs, src/core/history.rs
  • Done when: cargo test passes with new model; all existing tests updated

Step 3.2: Send tool definitions in API requests

  • Add ToolDefinition { name, description, input_schema: Value } (provider-agnostic)
  • Extend ModelProvider::stream to accept &[ToolDefinition]
  • Include "tools" array in Claude provider request body
  • Files: src/provider/mod.rs, src/provider/claude.rs
  • Done when: API responses contain tool_use content blocks in raw SSE stream

Step 3.3: Parse tool-use blocks from SSE stream

  • Add StreamEvent::ToolUseStart { id, name }, ToolUseInputDelta(String), ToolUseDone
  • Handle content_block_start (type "tool_use"), content_block_delta (type "input_json_delta"), content_block_stop for tool blocks
  • Track current block type state in SSE parser
  • Files: src/provider/claude.rs, src/core/types.rs
  • Done when: Unit test with recorded tool-use SSE fixture asserts correct StreamEvent sequence

Step 3.4: Orchestrator accumulates tool-use blocks

  • Accumulate ToolUseInputDelta fragments into JSON buffer per tool-use id
  • On ToolUseDone, parse JSON into ContentBlock::ToolUse
  • After StreamEvent::Done, if assistant message contains ToolUse blocks, enter tool-execution phase
  • Files: src/core/orchestrator.rs
  • Done when: Unit test with mock provider emitting tool-use events produces correct ContentBlocks

Step 3.5: Tool trait, registry, and core tools

  • Tool trait: name(), description(), input_schema() -> Value, execute(input: Value, working_dir: &Path) -> Result<ToolOutput>
  • ToolOutput { content: String, is_error: bool }
  • ToolRegistry: stores tools, provides get(name) and definitions() -> Vec<ToolDefinition>
  • Risk level: AutoApprove (reads), RequiresApproval (writes/shell)
  • Implement: read_file (auto), list_directory (auto), write_file (approval), shell_exec (approval)
  • Path validation: canonicalize + starts_with check, reject paths outside working dir (no Landlock yet)
  • Files: New src/tools/ module: mod.rs, read_file.rs, write_file.rs, list_directory.rs, shell_exec.rs
  • Done when: Unit tests pass for each tool in temp dirs; path traversal rejected

Step 3.6: Approval gate (TUI <-> core)

  • New UIEvent::ToolApprovalRequest { tool_use_id, tool_name, input_summary }
  • New UserAction::ToolApprovalResponse { tool_use_id, approved: bool }
  • Orchestrator: check risk level -> auto-approve or send approval request and await response
  • Denied tools return ToolResult { is_error: true } with denial message
  • TUI: render approval prompt overlay with y/n keybindings
  • Files: src/core/types.rs, src/core/orchestrator.rs, src/tui/events.rs, src/tui/input.rs, src/tui/render.rs
  • Done when: Integration test: mock provider + mock TUI channel verifies approval flow

Step 3.7: Tool results fed back to the model

  • After executing tool calls: append assistant message (with ToolUse blocks) to history, append user message with ToolResult blocks, re-call provider
  • Loop: model may respond with more tool calls or text
  • Cap at max iterations (25) to prevent runaway
  • Files: src/core/orchestrator.rs
  • Done when: Integration test: mock provider returns tool-use then text; orchestrator makes two calls. Max-iteration cap tested.

Step 3.8: TUI display for tool activity

  • New UIEvent::ToolExecuting { tool_name, input_summary }, UIEvent::ToolResult { tool_name, output_summary, is_error }
  • Render tool calls as distinct visual blocks in conversation view
  • Render tool results inline (truncated if long)
  • Files: src/tui/render.rs, src/tui/events.rs
  • Done when: Visual check with cargo run; TestBackend test for tool block rendering

Phase 3 verification (end-to-end)

  1. cargo test -- all tests pass
  2. cargo clippy -- -D warnings -- zero warnings
  3. cargo run -- --project-dir . -- ask Claude to read a file, approve, see contents
  4. Ask Claude to write a file -- approve, verify written
  5. Ask Claude to run a shell command -- approve, verify output
  6. Deny an approval -- Claude gets denial and responds gracefully

Phase 4: Sandboxing

  • Landlock: read-only system, read-write project dir, network blocked
  • Tools execute through Sandbox, never directly
  • :net on/off toggle, state in status bar
  • Graceful degradation on older kernels
  • Done when: Writes outside project dir fail; network toggle works