skate/PLAN.md
Drew Galbraith 797d7564b7 Add tool use to the orchestrator (#4)
Add tool use without sandboxing.

Currently available tools are list dir, read file, write file and exec bash.

Reviewed-on: #4
Co-authored-by: Drew Galbraith <drew@tiramisu.one>
Co-committed-by: Drew Galbraith <drew@tiramisu.one>
2026-03-02 03:00:13 +00:00

81 lines
4.9 KiB
Markdown

# Implementation Plan
## Phase 3: Tool Execution
### Step 3.1: Enrich the content model
- Replace `ConversationMessage { role, content: String }` with content-block model
- Define `ContentBlock` enum: `Text(String)`, `ToolUse { id, name, input: Value }`, `ToolResult { tool_use_id, content: String, is_error: bool }`
- Change `ConversationMessage.content` from `String` to `Vec<ContentBlock>`
- Add `ConversationMessage::text(role, s)` helper to keep existing call sites clean
- Update serialization, orchestrator, tests, TUI display
- **Files:** `src/core/types.rs`, `src/core/history.rs`
- **Done when:** `cargo test` passes with new model; all existing tests updated
### Step 3.2: Send tool definitions in API requests
- Add `ToolDefinition { name, description, input_schema: Value }` (provider-agnostic)
- Extend `ModelProvider::stream` to accept `&[ToolDefinition]`
- Include `"tools"` array in Claude provider request body
- **Files:** `src/provider/mod.rs`, `src/provider/claude.rs`
- **Done when:** API responses contain `tool_use` content blocks in raw SSE stream
### Step 3.3: Parse tool-use blocks from SSE stream
- Add `StreamEvent::ToolUseStart { id, name }`, `ToolUseInputDelta(String)`, `ToolUseDone`
- Handle `content_block_start` (type "tool_use"), `content_block_delta` (type "input_json_delta"), `content_block_stop` for tool blocks
- Track current block type state in SSE parser
- **Files:** `src/provider/claude.rs`, `src/core/types.rs`
- **Done when:** Unit test with recorded tool-use SSE fixture asserts correct StreamEvent sequence
### Step 3.4: Orchestrator accumulates tool-use blocks
- Accumulate `ToolUseInputDelta` fragments into JSON buffer per tool-use id
- On `ToolUseDone`, parse JSON into `ContentBlock::ToolUse`
- After `StreamEvent::Done`, if assistant message contains ToolUse blocks, enter tool-execution phase
- **Files:** `src/core/orchestrator.rs`
- **Done when:** Unit test with mock provider emitting tool-use events produces correct ContentBlocks
### Step 3.5: Tool trait, registry, and core tools
- `Tool` trait: `name()`, `description()`, `input_schema() -> Value`, `execute(input: Value, working_dir: &Path) -> Result<ToolOutput>`
- `ToolOutput { content: String, is_error: bool }`
- `ToolRegistry`: stores tools, provides `get(name)` and `definitions() -> Vec<ToolDefinition>`
- Risk level: `AutoApprove` (reads), `RequiresApproval` (writes/shell)
- Implement: `read_file` (auto), `list_directory` (auto), `write_file` (approval), `shell_exec` (approval)
- Path validation: `canonicalize` + `starts_with` check, reject paths outside working dir (no Landlock yet)
- **Files:** New `src/tools/` module: `mod.rs`, `read_file.rs`, `write_file.rs`, `list_directory.rs`, `shell_exec.rs`
- **Done when:** Unit tests pass for each tool in temp dirs; path traversal rejected
### Step 3.6: Approval gate (TUI <-> core)
- New `UIEvent::ToolApprovalRequest { tool_use_id, tool_name, input_summary }`
- New `UserAction::ToolApprovalResponse { tool_use_id, approved: bool }`
- Orchestrator: check risk level -> auto-approve or send approval request and await response
- Denied tools return `ToolResult { is_error: true }` with denial message
- TUI: render approval prompt overlay with y/n keybindings
- **Files:** `src/core/types.rs`, `src/core/orchestrator.rs`, `src/tui/events.rs`, `src/tui/input.rs`, `src/tui/render.rs`
- **Done when:** Integration test: mock provider + mock TUI channel verifies approval flow
### Step 3.7: Tool results fed back to the model
- After executing tool calls: append assistant message (with ToolUse blocks) to history, append user message with ToolResult blocks, re-call provider
- Loop: model may respond with more tool calls or text
- Cap at max iterations (25) to prevent runaway
- **Files:** `src/core/orchestrator.rs`
- **Done when:** Integration test: mock provider returns tool-use then text; orchestrator makes two calls. Max-iteration cap tested.
### Step 3.8: TUI display for tool activity
- New `UIEvent::ToolExecuting { tool_name, input_summary }`, `UIEvent::ToolResult { tool_name, output_summary, is_error }`
- Render tool calls as distinct visual blocks in conversation view
- Render tool results inline (truncated if long)
- **Files:** `src/tui/render.rs`, `src/tui/events.rs`
- **Done when:** Visual check with `cargo run`; TestBackend test for tool block rendering
### Phase 3 verification (end-to-end)
1. `cargo test` -- all tests pass
2. `cargo clippy -- -D warnings` -- zero warnings
3. `cargo run -- --project-dir .` -- ask Claude to read a file, approve, see contents
4. Ask Claude to write a file -- approve, verify written
5. Ask Claude to run a shell command -- approve, verify output
6. Deny an approval -- Claude gets denial and responds gracefully
## Phase 4: Sandboxing
- Landlock: read-only system, read-write project dir, network blocked
- Tools execute through `Sandbox`, never directly
- `:net on/off` toggle, state in status bar
- Graceful degradation on older kernels
- **Done when:** Writes outside project dir fail; network toggle works