Add modal editing to the agent TUI. (#2)

Adds a status line indicating which mode the user is in. Adds a "normal" mode with keyboard shortcuts (including a chorded shortcut 'gg'). Adds a command mode with several basic commands that can be entered into an overlay. Chores: - Cleans up design/claude/plan.md to avoid confusing claude. - Adds some TODOs based on claude feedback.` Reviewed-on: #2 Co-authored-by: Drew Galbraith <drew@tiramisu.one> Co-committed-by: Drew Galbraith <drew@tiramisu.one>
2026-02-25 01:16:16 +00:00 · 2026-02-25 01:16:16 +00:00 · 3fd448d431
commit 3fd448d431
parent 5d213b43d3
9 changed files with 725 additions and 101 deletions
--- a/DESIGN.md
+++ b/DESIGN.md
@ -67,7 +67,7 @@
 - **`sandbox`:** Landlock policy construction, path validation logic (without applying kernel rules)
 - **`core`:** Conversation tree operations (insert, query by parent, turn computation, token totals), orchestrator state machine transitions against mock `StreamEvent` sequences
 - **`session`:** JSONL serialization roundtrips, parent ID chain reconstruction
- **`tui`:** Widget rendering via Ratatui `TestBackend`, snapshot tests with `insta` crate for layout/mode indicator/token display
+- **`tui`:** Widget rendering via Ratatui `TestBackend`

 ### Integration Tests — Component Boundaries
 - **Core ↔ Provider:** Mock `ModelProvider` replaying recorded API sessions (full SSE streams with tool use). Tests the complete orchestration loop deterministically without network.
@ -78,11 +78,6 @@
 - **Recorded session replay:** Capture real Claude API HTTP request/response pairs, replay deterministically. Exercises full stack (core + channel + mock TUI) without cost or network dependency. Primary E2E test strategy.
 - **Live API tests:** Small suite behind feature flag / env var. Verifies real API integration. Run manually before releases, not in CI.

-### Snapshot Testing
- `insta` crate for TUI visual regression testing from Phase 2 onward
- Capture rendered `TestBackend` buffers as string snapshots
- Catches layout, mode indicator, and token display regressions
-
 ### Benchmarking — SWE-bench
 - **Target:** SWE-bench Verified (500 curated problems) as primary benchmark
 - **Secondary:** SWE-bench Pro for testing planning mode on longer-horizon tasks
@ -93,7 +88,6 @@

 ### Test Sequencing
 - Phase 1: Unit tests for SSE parser, event types, message serialization
- Phase 2: Snapshot tests for TUI with `insta`
 - Phase 4: Recorded session replay infrastructure (core loop complex enough to warrant it)
 - Phase 6-7: Headless mode + first SWE-bench Verified run