Add tool use without sandboxing. Currently available tools are list dir, read file, write file and exec bash. Reviewed-on: #4 Co-authored-by: Drew Galbraith <drew@tiramisu.one> Co-committed-by: Drew Galbraith <drew@tiramisu.one>
4.9 KiB
4.9 KiB
Implementation Plan
Phase 3: Tool Execution
Step 3.1: Enrich the content model
- Replace
ConversationMessage { role, content: String }with content-block model - Define
ContentBlockenum:Text(String),ToolUse { id, name, input: Value },ToolResult { tool_use_id, content: String, is_error: bool } - Change
ConversationMessage.contentfromStringtoVec<ContentBlock> - Add
ConversationMessage::text(role, s)helper to keep existing call sites clean - Update serialization, orchestrator, tests, TUI display
- Files:
src/core/types.rs,src/core/history.rs - Done when:
cargo testpasses with new model; all existing tests updated
Step 3.2: Send tool definitions in API requests
- Add
ToolDefinition { name, description, input_schema: Value }(provider-agnostic) - Extend
ModelProvider::streamto accept&[ToolDefinition] - Include
"tools"array in Claude provider request body - Files:
src/provider/mod.rs,src/provider/claude.rs - Done when: API responses contain
tool_usecontent blocks in raw SSE stream
Step 3.3: Parse tool-use blocks from SSE stream
- Add
StreamEvent::ToolUseStart { id, name },ToolUseInputDelta(String),ToolUseDone - Handle
content_block_start(type "tool_use"),content_block_delta(type "input_json_delta"),content_block_stopfor tool blocks - Track current block type state in SSE parser
- Files:
src/provider/claude.rs,src/core/types.rs - Done when: Unit test with recorded tool-use SSE fixture asserts correct StreamEvent sequence
Step 3.4: Orchestrator accumulates tool-use blocks
- Accumulate
ToolUseInputDeltafragments into JSON buffer per tool-use id - On
ToolUseDone, parse JSON intoContentBlock::ToolUse - After
StreamEvent::Done, if assistant message contains ToolUse blocks, enter tool-execution phase - Files:
src/core/orchestrator.rs - Done when: Unit test with mock provider emitting tool-use events produces correct ContentBlocks
Step 3.5: Tool trait, registry, and core tools
Tooltrait:name(),description(),input_schema() -> Value,execute(input: Value, working_dir: &Path) -> Result<ToolOutput>ToolOutput { content: String, is_error: bool }ToolRegistry: stores tools, providesget(name)anddefinitions() -> Vec<ToolDefinition>- Risk level:
AutoApprove(reads),RequiresApproval(writes/shell) - Implement:
read_file(auto),list_directory(auto),write_file(approval),shell_exec(approval) - Path validation:
canonicalize+starts_withcheck, reject paths outside working dir (no Landlock yet) - Files: New
src/tools/module:mod.rs,read_file.rs,write_file.rs,list_directory.rs,shell_exec.rs - Done when: Unit tests pass for each tool in temp dirs; path traversal rejected
Step 3.6: Approval gate (TUI <-> core)
- New
UIEvent::ToolApprovalRequest { tool_use_id, tool_name, input_summary } - New
UserAction::ToolApprovalResponse { tool_use_id, approved: bool } - Orchestrator: check risk level -> auto-approve or send approval request and await response
- Denied tools return
ToolResult { is_error: true }with denial message - TUI: render approval prompt overlay with y/n keybindings
- Files:
src/core/types.rs,src/core/orchestrator.rs,src/tui/events.rs,src/tui/input.rs,src/tui/render.rs - Done when: Integration test: mock provider + mock TUI channel verifies approval flow
Step 3.7: Tool results fed back to the model
- After executing tool calls: append assistant message (with ToolUse blocks) to history, append user message with ToolResult blocks, re-call provider
- Loop: model may respond with more tool calls or text
- Cap at max iterations (25) to prevent runaway
- Files:
src/core/orchestrator.rs - Done when: Integration test: mock provider returns tool-use then text; orchestrator makes two calls. Max-iteration cap tested.
Step 3.8: TUI display for tool activity
- New
UIEvent::ToolExecuting { tool_name, input_summary },UIEvent::ToolResult { tool_name, output_summary, is_error } - Render tool calls as distinct visual blocks in conversation view
- Render tool results inline (truncated if long)
- Files:
src/tui/render.rs,src/tui/events.rs - Done when: Visual check with
cargo run; TestBackend test for tool block rendering
Phase 3 verification (end-to-end)
cargo test-- all tests passcargo clippy -- -D warnings-- zero warningscargo run -- --project-dir .-- ask Claude to read a file, approve, see contents- Ask Claude to write a file -- approve, verify written
- Ask Claude to run a shell command -- approve, verify output
- Deny an approval -- Claude gets denial and responds gracefully
Phase 4: Sandboxing
- Landlock: read-only system, read-write project dir, network blocked
- Tools execute through
Sandbox, never directly :net on/offtoggle, state in status bar- Graceful degradation on older kernels
- Done when: Writes outside project dir fail; network toggle works