Add tool use without sandboxing. Currently available tools are list dir, read file, write file and exec bash. Reviewed-on: #4 Co-authored-by: Drew Galbraith <drew@tiramisu.one> Co-committed-by: Drew Galbraith <drew@tiramisu.one>
81 lines
4.9 KiB
Markdown
81 lines
4.9 KiB
Markdown
# Implementation Plan
|
|
|
|
## Phase 3: Tool Execution
|
|
|
|
### Step 3.1: Enrich the content model
|
|
- Replace `ConversationMessage { role, content: String }` with content-block model
|
|
- Define `ContentBlock` enum: `Text(String)`, `ToolUse { id, name, input: Value }`, `ToolResult { tool_use_id, content: String, is_error: bool }`
|
|
- Change `ConversationMessage.content` from `String` to `Vec<ContentBlock>`
|
|
- Add `ConversationMessage::text(role, s)` helper to keep existing call sites clean
|
|
- Update serialization, orchestrator, tests, TUI display
|
|
- **Files:** `src/core/types.rs`, `src/core/history.rs`
|
|
- **Done when:** `cargo test` passes with new model; all existing tests updated
|
|
|
|
### Step 3.2: Send tool definitions in API requests
|
|
- Add `ToolDefinition { name, description, input_schema: Value }` (provider-agnostic)
|
|
- Extend `ModelProvider::stream` to accept `&[ToolDefinition]`
|
|
- Include `"tools"` array in Claude provider request body
|
|
- **Files:** `src/provider/mod.rs`, `src/provider/claude.rs`
|
|
- **Done when:** API responses contain `tool_use` content blocks in raw SSE stream
|
|
|
|
### Step 3.3: Parse tool-use blocks from SSE stream
|
|
- Add `StreamEvent::ToolUseStart { id, name }`, `ToolUseInputDelta(String)`, `ToolUseDone`
|
|
- Handle `content_block_start` (type "tool_use"), `content_block_delta` (type "input_json_delta"), `content_block_stop` for tool blocks
|
|
- Track current block type state in SSE parser
|
|
- **Files:** `src/provider/claude.rs`, `src/core/types.rs`
|
|
- **Done when:** Unit test with recorded tool-use SSE fixture asserts correct StreamEvent sequence
|
|
|
|
### Step 3.4: Orchestrator accumulates tool-use blocks
|
|
- Accumulate `ToolUseInputDelta` fragments into JSON buffer per tool-use id
|
|
- On `ToolUseDone`, parse JSON into `ContentBlock::ToolUse`
|
|
- After `StreamEvent::Done`, if assistant message contains ToolUse blocks, enter tool-execution phase
|
|
- **Files:** `src/core/orchestrator.rs`
|
|
- **Done when:** Unit test with mock provider emitting tool-use events produces correct ContentBlocks
|
|
|
|
### Step 3.5: Tool trait, registry, and core tools
|
|
- `Tool` trait: `name()`, `description()`, `input_schema() -> Value`, `execute(input: Value, working_dir: &Path) -> Result<ToolOutput>`
|
|
- `ToolOutput { content: String, is_error: bool }`
|
|
- `ToolRegistry`: stores tools, provides `get(name)` and `definitions() -> Vec<ToolDefinition>`
|
|
- Risk level: `AutoApprove` (reads), `RequiresApproval` (writes/shell)
|
|
- Implement: `read_file` (auto), `list_directory` (auto), `write_file` (approval), `shell_exec` (approval)
|
|
- Path validation: `canonicalize` + `starts_with` check, reject paths outside working dir (no Landlock yet)
|
|
- **Files:** New `src/tools/` module: `mod.rs`, `read_file.rs`, `write_file.rs`, `list_directory.rs`, `shell_exec.rs`
|
|
- **Done when:** Unit tests pass for each tool in temp dirs; path traversal rejected
|
|
|
|
### Step 3.6: Approval gate (TUI <-> core)
|
|
- New `UIEvent::ToolApprovalRequest { tool_use_id, tool_name, input_summary }`
|
|
- New `UserAction::ToolApprovalResponse { tool_use_id, approved: bool }`
|
|
- Orchestrator: check risk level -> auto-approve or send approval request and await response
|
|
- Denied tools return `ToolResult { is_error: true }` with denial message
|
|
- TUI: render approval prompt overlay with y/n keybindings
|
|
- **Files:** `src/core/types.rs`, `src/core/orchestrator.rs`, `src/tui/events.rs`, `src/tui/input.rs`, `src/tui/render.rs`
|
|
- **Done when:** Integration test: mock provider + mock TUI channel verifies approval flow
|
|
|
|
### Step 3.7: Tool results fed back to the model
|
|
- After executing tool calls: append assistant message (with ToolUse blocks) to history, append user message with ToolResult blocks, re-call provider
|
|
- Loop: model may respond with more tool calls or text
|
|
- Cap at max iterations (25) to prevent runaway
|
|
- **Files:** `src/core/orchestrator.rs`
|
|
- **Done when:** Integration test: mock provider returns tool-use then text; orchestrator makes two calls. Max-iteration cap tested.
|
|
|
|
### Step 3.8: TUI display for tool activity
|
|
- New `UIEvent::ToolExecuting { tool_name, input_summary }`, `UIEvent::ToolResult { tool_name, output_summary, is_error }`
|
|
- Render tool calls as distinct visual blocks in conversation view
|
|
- Render tool results inline (truncated if long)
|
|
- **Files:** `src/tui/render.rs`, `src/tui/events.rs`
|
|
- **Done when:** Visual check with `cargo run`; TestBackend test for tool block rendering
|
|
|
|
### Phase 3 verification (end-to-end)
|
|
1. `cargo test` -- all tests pass
|
|
2. `cargo clippy -- -D warnings` -- zero warnings
|
|
3. `cargo run -- --project-dir .` -- ask Claude to read a file, approve, see contents
|
|
4. Ask Claude to write a file -- approve, verify written
|
|
5. Ask Claude to run a shell command -- approve, verify output
|
|
6. Deny an approval -- Claude gets denial and responds gracefully
|
|
|
|
## Phase 4: Sandboxing
|
|
- Landlock: read-only system, read-write project dir, network blocked
|
|
- Tools execute through `Sandbox`, never directly
|
|
- `:net on/off` toggle, state in status bar
|
|
- Graceful degradation on older kernels
|
|
- **Done when:** Writes outside project dir fail; network toggle works
|