From 42e3ddacc2f4e908deeefb0fe859d0ae944a4779 Mon Sep 17 00:00:00 2001 From: Drew Galbraith Date: Mon, 23 Feb 2026 21:39:31 -0800 Subject: [PATCH] Design/Plan/Claude.md from Claude. --- .gitignore | 1 + CLAUDE.md | 63 ++++++++++++++++++++++++++++++ DESIGN.md | 111 +++++++++++++++++++++++++++++++++++++++++++++++++++++ IDEAS.md | 47 +++++++++++++++++++++++ PLAN.md | 88 ++++++++++++++++++++++++++++++++++++++++++ 5 files changed, 310 insertions(+) create mode 100644 .gitignore create mode 100644 CLAUDE.md create mode 100644 DESIGN.md create mode 100644 IDEAS.md create mode 100644 PLAN.md diff --git a/.gitignore b/.gitignore new file mode 100644 index 0000000..2f7896d --- /dev/null +++ b/.gitignore @@ -0,0 +1 @@ +target/ diff --git a/CLAUDE.md b/CLAUDE.md new file mode 100644 index 0000000..adb63e4 --- /dev/null +++ b/CLAUDE.md @@ -0,0 +1,63 @@ +# CLAUDE.md + +Rust TUI coding agent. Ratatui + Crossterm + Tokio. See DESIGN.md for architecture decisions and PLAN.md for implementation phases. + +## Commands + +- `cargo build`: Build the project +- `cargo test`: Run all unit and integration tests +- `cargo test --lib`: Unit tests only +- `cargo test --test '*'`: Integration tests only +- `cargo clippy -- -D warnings`: Lint (must pass with zero warnings) +- `cargo fmt --check`: Format check +- `cargo run -- --project-dir `: Run against a project directory + +## Architecture + +Six modules with strict boundaries: + +- `src/app/` — Wiring, lifecycle, tokio runtime setup +- `src/tui/` — Ratatui rendering, input handling, vim modes. Communicates with core ONLY via channels (`UserAction` → core, `UIEvent` ← core). Never touches conversation state directly. +- `src/core/` — Conversation tree, orchestrator loop, sub-agent lifecycle +- `src/provider/` — `ModelProvider` trait + Claude implementation. Leaf module, no internal dependencies. +- `src/tools/` — `Tool` trait, registry, built-in tools. Depends only on `sandbox`. +- `src/sandbox/` — Landlock policy, path validation, command execution. Leaf module. +- `src/session/` — JSONL logging, session read/write. Leaf module. + +The channel boundary between `tui` and `core` is critical — never bypass it. The TUI is a frontend; core is the engine. This separation enables headless mode for benchmarking. + +## Code Style + +- Use `thiserror` for error types, not `anyhow` in library code (`anyhow` only in `main.rs`/`app`) +- Prefer `impl Trait` return types over boxing when possible +- All public types need doc comments +- No `unwrap()` in non-test code — use `?` or explicit error handling +- Async functions should be cancel-safe where possible +- Use `tracing` for structured logging, not `println!` or `log` + +## Conversation Data Model + +Events use parent IDs forming a tree (not a flat list). This enables future branching. Every event has: id, parent_id, timestamp, event_type, token_usage. A "turn" is all events between two user messages — this is the unit for token tracking. + +## Testing + +- Unit tests go in the same file as the code (`#[cfg(test)] mod tests`) +- Integration tests go in `tests/` +- TUI widget tests use `ratatui::backend::TestBackend` + `insta` snapshots +- Provider tests replay recorded SSE fixtures from `tests/fixtures/` +- Sandbox tests use `tempdir` and skip Landlock-specific assertions if kernel < 5.13 +- Run `cargo test` before every commit + +## Key Constraints + +- All file I/O and process spawning in tools MUST go through `Sandbox` — never use `std::fs` or `std::process::Command` directly in tool implementations +- The `ModelProvider` trait must remain provider-agnostic — no Claude-specific types in the trait interface +- Session JSONL is append-only. Never rewrite history. Branching works by writing new events with different parent IDs. +- Token usage must be tracked per-event and aggregatable per-turn + +## Do Not + +- Add MCP support (deferred, but keep tool trait compatible) +- Use `unsafe` without discussion +- Add dependencies without checking if an existing dep covers the use case +- Modify test fixtures without re-recording from a real API session diff --git a/DESIGN.md b/DESIGN.md new file mode 100644 index 0000000..3733849 --- /dev/null +++ b/DESIGN.md @@ -0,0 +1,111 @@ +# Design Decisions + +## Stack +- **Language:** Rust +- **TUI Framework:** Ratatui + Crossterm +- **Async Runtime:** Tokio + +## Architecture +- Channel boundary between TUI and core (fully decoupled) +- Module decomposition: `app`, `tui`, `core`, `provider`, `tools`, `sandbox`, `session` +- Headless mode: core without TUI, driven by script (enables benchmarking and CI) + +## Model Integration +- Claude-first, multi-model via `ModelProvider` trait +- Common `StreamEvent` internal representation across providers +- Prompt caching-aware message construction + +## UI +- **Agent view:** Tree-based hierarchy (not flat tabs) for sub-agent inspection +- **Modes:** Normal, Insert, Command (`:` prefix from Normal mode) +- **Activity modes:** Plan and Execute are visually distinct activities in the TUI +- **Streaming:** Barebones styled text initially, full markdown rendering deferred +- **Token usage:** Per-turn display (between user inputs), cumulative in status bar +- **Status bar:** Mode indicator, current activity (Plan/Execute), token totals, network policy state + +## Planning Mode +- Distinct activity from execution — planner agent produces a plan file, does not execute +- Plan file is structured markdown: steps with descriptions, files involved, acceptance criteria +- Plan is reviewable and editable before execution (`:edit-plan` opens `$EDITOR`) +- User explicitly approves plan before execution begins +- Executor agent receives the plan file + project context, not the planning conversation +- Plan-step progress tracked during execution (complete/in-progress/failed) + +## Sub-Agents +- Independent context windows with summary passed back to parent +- Fully autonomous once spawned +- Hard deny on unpermitted actions +- Plan executor is a specialized sub-agent where the plan replaces the summary +- Direct user interaction with sub-agents deferred + +## Tool System +- Built-in tool system with `Tool` trait +- Core tools: `read_file`, `write_file`, `edit_file`, `shell_exec`, `list_directory`, `search_files` +- Approval gates by risk level: auto-approve (reads), confirm (writes/shell), deny +- MCP not implemented but interface designed to allow future adapter + +## Sandboxing +- **Landlock** (Linux kernel-level): + - Read-only: system-wide (`/`) + - Read-write: project directory, temp directory + - Network: blocked by default, toggleable via `:net on/off` +- Graceful degradation on older kernels +- All tool execution goes through `Sandbox` — tools never touch filesystem directly + +## Session Logging +- JSONL format, one event per line +- Events: user message, assistant message, tool call, tool result, sub-agent spawn/result, plan created, plan step status +- Tree-addressable via parent IDs (enables conversation branching later) +- Token usage stored per event +- Linear UX for now, branching deferred + +## Testing Strategy + +### Unit Tests +- **`provider`:** SSE stream parsing from byte fixtures, message/tool serialization, `StreamEvent` variant correctness +- **`tools`:** Path canonicalization, traversal prevention, risk level classification, registry dispatch +- **`sandbox`:** Landlock policy construction, path validation logic (without applying kernel rules) +- **`core`:** Conversation tree operations (insert, query by parent, turn computation, token totals), orchestrator state machine transitions against mock `StreamEvent` sequences +- **`session`:** JSONL serialization roundtrips, parent ID chain reconstruction +- **`tui`:** Widget rendering via Ratatui `TestBackend`, snapshot tests with `insta` crate for layout/mode indicator/token display + +### Integration Tests — Component Boundaries +- **Core ↔ Provider:** Mock `ModelProvider` replaying recorded API sessions (full SSE streams with tool use). Tests the complete orchestration loop deterministically without network. +- **Core ↔ TUI (channel boundary):** Orchestrator with mock provider connected to channels. Assert correct `UIEvent` sequence, inject `UserAction` messages, verify approval/denial flow. +- **Tools ↔ Sandbox:** Real file operations and shell commands in temp directories. Verify write confinement, path traversal rejection, network blocking. Skip Landlock-specific tests on older kernels in CI. + +### Integration Tests — End to End +- **Recorded session replay:** Capture real Claude API HTTP request/response pairs, replay deterministically. Exercises full stack (core + channel + mock TUI) without cost or network dependency. Primary E2E test strategy. +- **Live API tests:** Small suite behind feature flag / env var. Verifies real API integration. Run manually before releases, not in CI. + +### Snapshot Testing +- `insta` crate for TUI visual regression testing from Phase 2 onward +- Capture rendered `TestBackend` buffers as string snapshots +- Catches layout, mode indicator, and token display regressions + +### Benchmarking — SWE-bench +- **Target:** SWE-bench Verified (500 curated problems) as primary benchmark +- **Secondary:** SWE-bench Pro for testing planning mode on longer-horizon tasks +- **Approach:** Headless mode (core without TUI) produces unified diff patches, evaluated via SWE-bench Docker harness +- **Baseline:** mini-swe-agent (~100 lines Python, >74% on Verified) as calibration — if we score significantly below with same model, the issue is scaffolding +- **Cadence:** Milestone checks, not continuous CI (too expensive/slow) +- **Requirements:** x86_64, 120GB+ storage, 16GB RAM, 8 CPU cores + +### Test Sequencing +- Phase 1: Unit tests for SSE parser, event types, message serialization +- Phase 2: Snapshot tests for TUI with `insta` +- Phase 4: Recorded session replay infrastructure (core loop complex enough to warrant it) +- Phase 6-7: Headless mode + first SWE-bench Verified run + +## Configuration (Deferred) +- Single-user, hardcoded defaults for now +- Designed for later: global config, per-project `.agent.toml`, configurable keybindings + +## Deferred Features +- Conversation branching (tree structure in log, linear UX for now) +- Direct sub-agent interaction +- MCP adapter +- Full markdown/syntax-highlighted rendering +- Session log viewer +- Per-project configuration +- Structured plan editor in TUI (use `$EDITOR` for now) diff --git a/IDEAS.md b/IDEAS.md new file mode 100644 index 0000000..0c3c97f --- /dev/null +++ b/IDEAS.md @@ -0,0 +1,47 @@ +# IDEAS + +Notes based on ideas I've had. + +## Token Usage Visualization +- Per-turn token breakdown (input/output/cache) inline in conversation +- Cumulative session totals in status bar +- Estimated Cost of Usage + +## Planning Mode +- Activity mode distinction in TUI (Plan vs Execute), visible in status bar +- Planner agent: has tool access (reads, search) but no write/exec permissions +- Plan output as structured markdown (steps, files, acceptance criteria) +- `:edit-plan` command to open plan in `$EDITOR` before execution +- Explicit plan approval gate before transitioning to execution +- Executor agent spawned with plan file + project context (not planning conversation) +- Plan-step progress tracking (complete/in-progress/failed) visible in TUI +- **Done when:** Can plan a task, review/edit the plan, then execute it as a separate activity + +## Sub-Agents +- `spawn_agent` tool, independent `ConversationTree` per sub-agent +- Agent tree sidebar in TUI, navigable in Normal mode +- Sub-agents follow same approval policy with hard deny on unpermitted actions +- Plan executor refactored as a sub-agent specialization +- **Done when:** Agent delegates to sub-agent, user can inspect it, result flows back + +## Context Window Management +- Token counting for outgoing payloads +- Compaction strategy: summarize older turns, preserve full history in session log +- Stable message prefix for prompt caching +- **Done when:** Conversations run indefinitely without hitting context limits + +## Automated Anomaly Notation +- Similar to Jon's SESSION.md: https://github.com/jonhoo/configs/blob/master/agentic/AGENTS.md +- Allows the agents to note an anomaly or bad design decision. + +## Defered TODO list +- Allow the user to notate things that should be fixed after the agent has iterated on its full loop. +- Potentially add a way to iterate through the todo list at the end. + +## Session Logging +- JSONL `SessionWriter` with `Event` structure +- Parent IDs, timestamps, token usage per event +- Predictable file location with session IDs +- **Done when:** Session files are coherent, parseable, with token counts per turn + + diff --git a/PLAN.md b/PLAN.md new file mode 100644 index 0000000..0e3a1f2 --- /dev/null +++ b/PLAN.md @@ -0,0 +1,88 @@ +# Implementation Plan + +## Phase 1: Minimal Conversation Loop + +**Done when:** Multi-turn streaming conversation with Claude works in terminal + +### 1.1 Project Scaffolding +- `Cargo.toml` with initial dependencies: + - `ratatui`, `crossterm` — TUI + - `tokio` (full features) — async runtime + - `serde`, `serde_json` — serialization + - `thiserror` — error types + - `tracing`, `tracing-subscriber` — structured logging + - `reqwest` (with `stream` feature) — HTTP client for SSE + - `futures` — stream combinators +- Establish `src/{app,tui,core,provider}/mod.rs` stubs +- `cargo build` passes; `cargo clippy -- -D warnings` passes on empty stubs + +### 1.2 Shared Types (`src/core/types.rs`) +- `StreamEvent` enum: `TextDelta(String)`, `InputTokens(u32)`, `OutputTokens(u32)`, `Done`, `Error(String)` +- `UserAction` enum (TUI → core channel): `SendMessage(String)`, `Quit` +- `UIEvent` enum (core → TUI channel): `StreamDelta(String)`, `TurnComplete`, `Error(String)` +- `ConversationMessage` struct: `role: Role`, `content: String` +- All types derive `Debug`; all public types have doc comments + +### 1.3 Provider: `ModelProvider` Trait + Claude SSE (`src/provider/`) +- `ModelProvider` trait: `async fn stream(&self, messages: &[ConversationMessage]) -> impl Stream` +- `ClaudeProvider` struct: API key from env, `reqwest` HTTP client +- Serialize messages to Anthropic Messages API JSON format +- Parse SSE byte stream → `StreamEvent` (handle `content_block_delta`, `message_delta` for tokens, `message_stop`) +- Unit tests: SSE parsing from hardcoded byte fixtures in `#[cfg(test)]` + +### 1.4 Core: Conversation State + Orchestrator Loop (`src/core/`) +- `ConversationHistory`: `Vec` with `push` and `messages()` (flat list, no tree yet) +- `Orchestrator` struct holding history, provider, channel senders/receivers +- Orchestrator loop: + 1. Await `UserAction` from TUI channel + 2. On `SendMessage`: append user message, call `provider.stream()` + 3. Forward each `StreamEvent` as `UIEvent` to TUI + 4. Accumulate deltas into assistant message; append to history on `Done` + 5. On `Quit`: break loop + +### 1.5 TUI: Layout + Input + Streaming Display (`src/tui/`) +- `AppState` struct: `messages: Vec<(Role, String)>`, `input: String`, `scroll: u16` +- Ratatui layout: full-height `Paragraph` output area (scrollable) + single-line `Paragraph` input +- Insert mode only — printable chars append to `input`, Enter sends `UserAction::SendMessage`, Backspace deletes +- On `UIEvent::StreamDelta`: append to last assistant message in `messages`, re-render +- On `UIEvent::TurnComplete`: finalize assistant message +- Crossterm raw mode enter/exit; restore terminal on panic or clean exit + +### 1.6 App Wiring + Entry Point (`src/app/`, `src/main.rs`) +- `main.rs`: parse `--project-dir ` CLI arg +- Initialize `tracing_subscriber` (log to file, not stdout — avoids TUI interference) +- Create `tokio::sync::mpsc` channel pair for `UserAction` and `UIEvent` +- Spawn `Orchestrator::run()` as a tokio task +- Run TUI event loop on main thread (Ratatui requires main thread for crossterm) +- On `UserAction::Quit` or Ctrl-C: signal orchestrator shutdown, restore terminal, exit cleanly + +### 1.7 Phase 1 Unit Tests +- Provider: SSE byte fixture → correct `StreamEvent` sequence +- Provider: `ConversationMessage` vec → correct Anthropic API JSON shape +- Core: `ConversationHistory` push/read roundtrip +- Core: Orchestrator state transitions against mock `StreamEvent` sequence (no real API) + +## Phase 2: Vim Modes and Navigation +- Normal, Insert, Command modes with visual indicator +- `j`/`k` scroll in Normal mode +- `:quit`, `:clear` commands +- **Done when:** Fluid mode switching and scrolling feels vim-native + +## Phase 3: Tool Execution +- `Tool` trait, `ToolRegistry`, core tools (`read_file`, `write_file`, `shell_exec`) +- Tool definitions in API requests, parse tool-use responses +- Approval gate: core → TUI pending event → user approve/deny → result back +- Working directory confinement + path validation (no Landlock yet) +- **Done when:** Claude can read, modify files, and run commands with user approval + +## Phase 4: Sandboxing +- Landlock: read-only system, read-write project dir, network blocked +- Tools execute through `Sandbox`, never directly +- `:net on/off` toggle, state in status bar +- Graceful degradation on older kernels +- **Done when:** Writes outside project dir fail; network toggle works + + + + +