Design/Plan/Claude.md from Claude.
This commit is contained in:
commit
42e3ddacc2
5 changed files with 310 additions and 0 deletions
1
.gitignore
vendored
Normal file
1
.gitignore
vendored
Normal file
|
|
@ -0,0 +1 @@
|
|||
target/
|
||||
63
CLAUDE.md
Normal file
63
CLAUDE.md
Normal file
|
|
@ -0,0 +1,63 @@
|
|||
# CLAUDE.md
|
||||
|
||||
Rust TUI coding agent. Ratatui + Crossterm + Tokio. See DESIGN.md for architecture decisions and PLAN.md for implementation phases.
|
||||
|
||||
## Commands
|
||||
|
||||
- `cargo build`: Build the project
|
||||
- `cargo test`: Run all unit and integration tests
|
||||
- `cargo test --lib`: Unit tests only
|
||||
- `cargo test --test '*'`: Integration tests only
|
||||
- `cargo clippy -- -D warnings`: Lint (must pass with zero warnings)
|
||||
- `cargo fmt --check`: Format check
|
||||
- `cargo run -- --project-dir <path>`: Run against a project directory
|
||||
|
||||
## Architecture
|
||||
|
||||
Six modules with strict boundaries:
|
||||
|
||||
- `src/app/` — Wiring, lifecycle, tokio runtime setup
|
||||
- `src/tui/` — Ratatui rendering, input handling, vim modes. Communicates with core ONLY via channels (`UserAction` → core, `UIEvent` ← core). Never touches conversation state directly.
|
||||
- `src/core/` — Conversation tree, orchestrator loop, sub-agent lifecycle
|
||||
- `src/provider/` — `ModelProvider` trait + Claude implementation. Leaf module, no internal dependencies.
|
||||
- `src/tools/` — `Tool` trait, registry, built-in tools. Depends only on `sandbox`.
|
||||
- `src/sandbox/` — Landlock policy, path validation, command execution. Leaf module.
|
||||
- `src/session/` — JSONL logging, session read/write. Leaf module.
|
||||
|
||||
The channel boundary between `tui` and `core` is critical — never bypass it. The TUI is a frontend; core is the engine. This separation enables headless mode for benchmarking.
|
||||
|
||||
## Code Style
|
||||
|
||||
- Use `thiserror` for error types, not `anyhow` in library code (`anyhow` only in `main.rs`/`app`)
|
||||
- Prefer `impl Trait` return types over boxing when possible
|
||||
- All public types need doc comments
|
||||
- No `unwrap()` in non-test code — use `?` or explicit error handling
|
||||
- Async functions should be cancel-safe where possible
|
||||
- Use `tracing` for structured logging, not `println!` or `log`
|
||||
|
||||
## Conversation Data Model
|
||||
|
||||
Events use parent IDs forming a tree (not a flat list). This enables future branching. Every event has: id, parent_id, timestamp, event_type, token_usage. A "turn" is all events between two user messages — this is the unit for token tracking.
|
||||
|
||||
## Testing
|
||||
|
||||
- Unit tests go in the same file as the code (`#[cfg(test)] mod tests`)
|
||||
- Integration tests go in `tests/`
|
||||
- TUI widget tests use `ratatui::backend::TestBackend` + `insta` snapshots
|
||||
- Provider tests replay recorded SSE fixtures from `tests/fixtures/`
|
||||
- Sandbox tests use `tempdir` and skip Landlock-specific assertions if kernel < 5.13
|
||||
- Run `cargo test` before every commit
|
||||
|
||||
## Key Constraints
|
||||
|
||||
- All file I/O and process spawning in tools MUST go through `Sandbox` — never use `std::fs` or `std::process::Command` directly in tool implementations
|
||||
- The `ModelProvider` trait must remain provider-agnostic — no Claude-specific types in the trait interface
|
||||
- Session JSONL is append-only. Never rewrite history. Branching works by writing new events with different parent IDs.
|
||||
- Token usage must be tracked per-event and aggregatable per-turn
|
||||
|
||||
## Do Not
|
||||
|
||||
- Add MCP support (deferred, but keep tool trait compatible)
|
||||
- Use `unsafe` without discussion
|
||||
- Add dependencies without checking if an existing dep covers the use case
|
||||
- Modify test fixtures without re-recording from a real API session
|
||||
111
DESIGN.md
Normal file
111
DESIGN.md
Normal file
|
|
@ -0,0 +1,111 @@
|
|||
# Design Decisions
|
||||
|
||||
## Stack
|
||||
- **Language:** Rust
|
||||
- **TUI Framework:** Ratatui + Crossterm
|
||||
- **Async Runtime:** Tokio
|
||||
|
||||
## Architecture
|
||||
- Channel boundary between TUI and core (fully decoupled)
|
||||
- Module decomposition: `app`, `tui`, `core`, `provider`, `tools`, `sandbox`, `session`
|
||||
- Headless mode: core without TUI, driven by script (enables benchmarking and CI)
|
||||
|
||||
## Model Integration
|
||||
- Claude-first, multi-model via `ModelProvider` trait
|
||||
- Common `StreamEvent` internal representation across providers
|
||||
- Prompt caching-aware message construction
|
||||
|
||||
## UI
|
||||
- **Agent view:** Tree-based hierarchy (not flat tabs) for sub-agent inspection
|
||||
- **Modes:** Normal, Insert, Command (`:` prefix from Normal mode)
|
||||
- **Activity modes:** Plan and Execute are visually distinct activities in the TUI
|
||||
- **Streaming:** Barebones styled text initially, full markdown rendering deferred
|
||||
- **Token usage:** Per-turn display (between user inputs), cumulative in status bar
|
||||
- **Status bar:** Mode indicator, current activity (Plan/Execute), token totals, network policy state
|
||||
|
||||
## Planning Mode
|
||||
- Distinct activity from execution — planner agent produces a plan file, does not execute
|
||||
- Plan file is structured markdown: steps with descriptions, files involved, acceptance criteria
|
||||
- Plan is reviewable and editable before execution (`:edit-plan` opens `$EDITOR`)
|
||||
- User explicitly approves plan before execution begins
|
||||
- Executor agent receives the plan file + project context, not the planning conversation
|
||||
- Plan-step progress tracked during execution (complete/in-progress/failed)
|
||||
|
||||
## Sub-Agents
|
||||
- Independent context windows with summary passed back to parent
|
||||
- Fully autonomous once spawned
|
||||
- Hard deny on unpermitted actions
|
||||
- Plan executor is a specialized sub-agent where the plan replaces the summary
|
||||
- Direct user interaction with sub-agents deferred
|
||||
|
||||
## Tool System
|
||||
- Built-in tool system with `Tool` trait
|
||||
- Core tools: `read_file`, `write_file`, `edit_file`, `shell_exec`, `list_directory`, `search_files`
|
||||
- Approval gates by risk level: auto-approve (reads), confirm (writes/shell), deny
|
||||
- MCP not implemented but interface designed to allow future adapter
|
||||
|
||||
## Sandboxing
|
||||
- **Landlock** (Linux kernel-level):
|
||||
- Read-only: system-wide (`/`)
|
||||
- Read-write: project directory, temp directory
|
||||
- Network: blocked by default, toggleable via `:net on/off`
|
||||
- Graceful degradation on older kernels
|
||||
- All tool execution goes through `Sandbox` — tools never touch filesystem directly
|
||||
|
||||
## Session Logging
|
||||
- JSONL format, one event per line
|
||||
- Events: user message, assistant message, tool call, tool result, sub-agent spawn/result, plan created, plan step status
|
||||
- Tree-addressable via parent IDs (enables conversation branching later)
|
||||
- Token usage stored per event
|
||||
- Linear UX for now, branching deferred
|
||||
|
||||
## Testing Strategy
|
||||
|
||||
### Unit Tests
|
||||
- **`provider`:** SSE stream parsing from byte fixtures, message/tool serialization, `StreamEvent` variant correctness
|
||||
- **`tools`:** Path canonicalization, traversal prevention, risk level classification, registry dispatch
|
||||
- **`sandbox`:** Landlock policy construction, path validation logic (without applying kernel rules)
|
||||
- **`core`:** Conversation tree operations (insert, query by parent, turn computation, token totals), orchestrator state machine transitions against mock `StreamEvent` sequences
|
||||
- **`session`:** JSONL serialization roundtrips, parent ID chain reconstruction
|
||||
- **`tui`:** Widget rendering via Ratatui `TestBackend`, snapshot tests with `insta` crate for layout/mode indicator/token display
|
||||
|
||||
### Integration Tests — Component Boundaries
|
||||
- **Core ↔ Provider:** Mock `ModelProvider` replaying recorded API sessions (full SSE streams with tool use). Tests the complete orchestration loop deterministically without network.
|
||||
- **Core ↔ TUI (channel boundary):** Orchestrator with mock provider connected to channels. Assert correct `UIEvent` sequence, inject `UserAction` messages, verify approval/denial flow.
|
||||
- **Tools ↔ Sandbox:** Real file operations and shell commands in temp directories. Verify write confinement, path traversal rejection, network blocking. Skip Landlock-specific tests on older kernels in CI.
|
||||
|
||||
### Integration Tests — End to End
|
||||
- **Recorded session replay:** Capture real Claude API HTTP request/response pairs, replay deterministically. Exercises full stack (core + channel + mock TUI) without cost or network dependency. Primary E2E test strategy.
|
||||
- **Live API tests:** Small suite behind feature flag / env var. Verifies real API integration. Run manually before releases, not in CI.
|
||||
|
||||
### Snapshot Testing
|
||||
- `insta` crate for TUI visual regression testing from Phase 2 onward
|
||||
- Capture rendered `TestBackend` buffers as string snapshots
|
||||
- Catches layout, mode indicator, and token display regressions
|
||||
|
||||
### Benchmarking — SWE-bench
|
||||
- **Target:** SWE-bench Verified (500 curated problems) as primary benchmark
|
||||
- **Secondary:** SWE-bench Pro for testing planning mode on longer-horizon tasks
|
||||
- **Approach:** Headless mode (core without TUI) produces unified diff patches, evaluated via SWE-bench Docker harness
|
||||
- **Baseline:** mini-swe-agent (~100 lines Python, >74% on Verified) as calibration — if we score significantly below with same model, the issue is scaffolding
|
||||
- **Cadence:** Milestone checks, not continuous CI (too expensive/slow)
|
||||
- **Requirements:** x86_64, 120GB+ storage, 16GB RAM, 8 CPU cores
|
||||
|
||||
### Test Sequencing
|
||||
- Phase 1: Unit tests for SSE parser, event types, message serialization
|
||||
- Phase 2: Snapshot tests for TUI with `insta`
|
||||
- Phase 4: Recorded session replay infrastructure (core loop complex enough to warrant it)
|
||||
- Phase 6-7: Headless mode + first SWE-bench Verified run
|
||||
|
||||
## Configuration (Deferred)
|
||||
- Single-user, hardcoded defaults for now
|
||||
- Designed for later: global config, per-project `.agent.toml`, configurable keybindings
|
||||
|
||||
## Deferred Features
|
||||
- Conversation branching (tree structure in log, linear UX for now)
|
||||
- Direct sub-agent interaction
|
||||
- MCP adapter
|
||||
- Full markdown/syntax-highlighted rendering
|
||||
- Session log viewer
|
||||
- Per-project configuration
|
||||
- Structured plan editor in TUI (use `$EDITOR` for now)
|
||||
47
IDEAS.md
Normal file
47
IDEAS.md
Normal file
|
|
@ -0,0 +1,47 @@
|
|||
# IDEAS
|
||||
|
||||
Notes based on ideas I've had.
|
||||
|
||||
## Token Usage Visualization
|
||||
- Per-turn token breakdown (input/output/cache) inline in conversation
|
||||
- Cumulative session totals in status bar
|
||||
- Estimated Cost of Usage
|
||||
|
||||
## Planning Mode
|
||||
- Activity mode distinction in TUI (Plan vs Execute), visible in status bar
|
||||
- Planner agent: has tool access (reads, search) but no write/exec permissions
|
||||
- Plan output as structured markdown (steps, files, acceptance criteria)
|
||||
- `:edit-plan` command to open plan in `$EDITOR` before execution
|
||||
- Explicit plan approval gate before transitioning to execution
|
||||
- Executor agent spawned with plan file + project context (not planning conversation)
|
||||
- Plan-step progress tracking (complete/in-progress/failed) visible in TUI
|
||||
- **Done when:** Can plan a task, review/edit the plan, then execute it as a separate activity
|
||||
|
||||
## Sub-Agents
|
||||
- `spawn_agent` tool, independent `ConversationTree` per sub-agent
|
||||
- Agent tree sidebar in TUI, navigable in Normal mode
|
||||
- Sub-agents follow same approval policy with hard deny on unpermitted actions
|
||||
- Plan executor refactored as a sub-agent specialization
|
||||
- **Done when:** Agent delegates to sub-agent, user can inspect it, result flows back
|
||||
|
||||
## Context Window Management
|
||||
- Token counting for outgoing payloads
|
||||
- Compaction strategy: summarize older turns, preserve full history in session log
|
||||
- Stable message prefix for prompt caching
|
||||
- **Done when:** Conversations run indefinitely without hitting context limits
|
||||
|
||||
## Automated Anomaly Notation
|
||||
- Similar to Jon's SESSION.md: https://github.com/jonhoo/configs/blob/master/agentic/AGENTS.md
|
||||
- Allows the agents to note an anomaly or bad design decision.
|
||||
|
||||
## Defered TODO list
|
||||
- Allow the user to notate things that should be fixed after the agent has iterated on its full loop.
|
||||
- Potentially add a way to iterate through the todo list at the end.
|
||||
|
||||
## Session Logging
|
||||
- JSONL `SessionWriter` with `Event` structure
|
||||
- Parent IDs, timestamps, token usage per event
|
||||
- Predictable file location with session IDs
|
||||
- **Done when:** Session files are coherent, parseable, with token counts per turn
|
||||
|
||||
|
||||
88
PLAN.md
Normal file
88
PLAN.md
Normal file
|
|
@ -0,0 +1,88 @@
|
|||
# Implementation Plan
|
||||
|
||||
## Phase 1: Minimal Conversation Loop
|
||||
|
||||
**Done when:** Multi-turn streaming conversation with Claude works in terminal
|
||||
|
||||
### 1.1 Project Scaffolding
|
||||
- `Cargo.toml` with initial dependencies:
|
||||
- `ratatui`, `crossterm` — TUI
|
||||
- `tokio` (full features) — async runtime
|
||||
- `serde`, `serde_json` — serialization
|
||||
- `thiserror` — error types
|
||||
- `tracing`, `tracing-subscriber` — structured logging
|
||||
- `reqwest` (with `stream` feature) — HTTP client for SSE
|
||||
- `futures` — stream combinators
|
||||
- Establish `src/{app,tui,core,provider}/mod.rs` stubs
|
||||
- `cargo build` passes; `cargo clippy -- -D warnings` passes on empty stubs
|
||||
|
||||
### 1.2 Shared Types (`src/core/types.rs`)
|
||||
- `StreamEvent` enum: `TextDelta(String)`, `InputTokens(u32)`, `OutputTokens(u32)`, `Done`, `Error(String)`
|
||||
- `UserAction` enum (TUI → core channel): `SendMessage(String)`, `Quit`
|
||||
- `UIEvent` enum (core → TUI channel): `StreamDelta(String)`, `TurnComplete`, `Error(String)`
|
||||
- `ConversationMessage` struct: `role: Role`, `content: String`
|
||||
- All types derive `Debug`; all public types have doc comments
|
||||
|
||||
### 1.3 Provider: `ModelProvider` Trait + Claude SSE (`src/provider/`)
|
||||
- `ModelProvider` trait: `async fn stream(&self, messages: &[ConversationMessage]) -> impl Stream<Item = StreamEvent>`
|
||||
- `ClaudeProvider` struct: API key from env, `reqwest` HTTP client
|
||||
- Serialize messages to Anthropic Messages API JSON format
|
||||
- Parse SSE byte stream → `StreamEvent` (handle `content_block_delta`, `message_delta` for tokens, `message_stop`)
|
||||
- Unit tests: SSE parsing from hardcoded byte fixtures in `#[cfg(test)]`
|
||||
|
||||
### 1.4 Core: Conversation State + Orchestrator Loop (`src/core/`)
|
||||
- `ConversationHistory`: `Vec<ConversationMessage>` with `push` and `messages()` (flat list, no tree yet)
|
||||
- `Orchestrator` struct holding history, provider, channel senders/receivers
|
||||
- Orchestrator loop:
|
||||
1. Await `UserAction` from TUI channel
|
||||
2. On `SendMessage`: append user message, call `provider.stream()`
|
||||
3. Forward each `StreamEvent` as `UIEvent` to TUI
|
||||
4. Accumulate deltas into assistant message; append to history on `Done`
|
||||
5. On `Quit`: break loop
|
||||
|
||||
### 1.5 TUI: Layout + Input + Streaming Display (`src/tui/`)
|
||||
- `AppState` struct: `messages: Vec<(Role, String)>`, `input: String`, `scroll: u16`
|
||||
- Ratatui layout: full-height `Paragraph` output area (scrollable) + single-line `Paragraph` input
|
||||
- Insert mode only — printable chars append to `input`, Enter sends `UserAction::SendMessage`, Backspace deletes
|
||||
- On `UIEvent::StreamDelta`: append to last assistant message in `messages`, re-render
|
||||
- On `UIEvent::TurnComplete`: finalize assistant message
|
||||
- Crossterm raw mode enter/exit; restore terminal on panic or clean exit
|
||||
|
||||
### 1.6 App Wiring + Entry Point (`src/app/`, `src/main.rs`)
|
||||
- `main.rs`: parse `--project-dir <path>` CLI arg
|
||||
- Initialize `tracing_subscriber` (log to file, not stdout — avoids TUI interference)
|
||||
- Create `tokio::sync::mpsc` channel pair for `UserAction` and `UIEvent`
|
||||
- Spawn `Orchestrator::run()` as a tokio task
|
||||
- Run TUI event loop on main thread (Ratatui requires main thread for crossterm)
|
||||
- On `UserAction::Quit` or Ctrl-C: signal orchestrator shutdown, restore terminal, exit cleanly
|
||||
|
||||
### 1.7 Phase 1 Unit Tests
|
||||
- Provider: SSE byte fixture → correct `StreamEvent` sequence
|
||||
- Provider: `ConversationMessage` vec → correct Anthropic API JSON shape
|
||||
- Core: `ConversationHistory` push/read roundtrip
|
||||
- Core: Orchestrator state transitions against mock `StreamEvent` sequence (no real API)
|
||||
|
||||
## Phase 2: Vim Modes and Navigation
|
||||
- Normal, Insert, Command modes with visual indicator
|
||||
- `j`/`k` scroll in Normal mode
|
||||
- `:quit`, `:clear` commands
|
||||
- **Done when:** Fluid mode switching and scrolling feels vim-native
|
||||
|
||||
## Phase 3: Tool Execution
|
||||
- `Tool` trait, `ToolRegistry`, core tools (`read_file`, `write_file`, `shell_exec`)
|
||||
- Tool definitions in API requests, parse tool-use responses
|
||||
- Approval gate: core → TUI pending event → user approve/deny → result back
|
||||
- Working directory confinement + path validation (no Landlock yet)
|
||||
- **Done when:** Claude can read, modify files, and run commands with user approval
|
||||
|
||||
## Phase 4: Sandboxing
|
||||
- Landlock: read-only system, read-write project dir, network blocked
|
||||
- Tools execute through `Sandbox`, never directly
|
||||
- `:net on/off` toggle, state in status bar
|
||||
- Graceful degradation on older kernels
|
||||
- **Done when:** Writes outside project dir fail; network toggle works
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
Loading…
Add table
Add a link
Reference in a new issue