122 lines
4.9 KiB
Markdown
122 lines
4.9 KiB
Markdown
# Skate Design
|
|
|
|
This is a TUI coding agent harness built for one user. The unique design goals compared
|
|
to other coding agents are:
|
|
|
|
1) Allow autonomous execution without permission prompts without fully sacrificing security.
|
|
The user can configure what permissions the coding agent has before execution and these
|
|
are enforced using kernel-level sandboxing.
|
|
|
|
2) The UI supports introspection to better understand how the harness is performing.
|
|
Information may start collapsed, but it is possible to introspect things like tool uses
|
|
and thinking chains. Additionally token usage is elevated to understand where the harness
|
|
is performing inefficiently.
|
|
|
|
3) The UI is modal and supports neovim like hotkeys for navigation and configuratiorn
|
|
(i.e. using the space bar as a leader key). We prefer having hotkeys over adding custom
|
|
slash commands (/model) to the text chat interface. The text chat should be reserved for
|
|
things that go straight to the underlying model.
|
|
|
|
|
|
## Architecture
|
|
|
|
The coding agent is broken into three main components, the TUI, the harness, and the tool executor.
|
|
|
|
The harness communicates with the tool executor via a tarpc interface.
|
|
|
|
The TUI and harness communicate over a Channel boundary and are fully decoupled
|
|
in a way that supports running the harness without the TUI (i.e. in scripting mode).
|
|
|
|
## Harness Design
|
|
|
|
The harness follows a fairly straightforward design loop.
|
|
|
|
1. Send message to underlying model.
|
|
2. If model requests a tool use, execute it (via a call to the executor) and return to 1.
|
|
3. Else, wait for further user input.
|
|
|
|
### Harness Instantiation
|
|
|
|
The harness is instantiated with a system prompt and a tarpc client to the tool executor.
|
|
(In the first iteration we use an in process channel for the tarpc client).
|
|
|
|
### Model Integration
|
|
|
|
The harness uses a trait system to make it agnostic to the underlying coding agent used.
|
|
|
|
This trait unifies a variety of APIs using a `StreamEvent` interface for streaming responses
|
|
from the API.
|
|
|
|
Currently, only Anthropic's Claude API is supported.
|
|
|
|
Messages are constructed in such a way to support prompt caching when available.
|
|
|
|
### Session Logging
|
|
- JSONL format, one event per line
|
|
- Events: user message, assistant message, tool call, tool result.
|
|
- Tree-addressable via parent IDs (enables conversation branching later)
|
|
- Token usage stored per event
|
|
- Linear UX for now, branching deferred
|
|
|
|
## Executor Design
|
|
|
|
The key aspect of the executor design is that is configured with sandbox permissions
|
|
that allow tool use without any user prompting. Either the tool use succeeds within the
|
|
sandbox and is returned to the model or it fails with a permission error to the model.
|
|
|
|
The sandboxing allows running arbitrary shell commands without prompting.
|
|
|
|
### Executor Interface
|
|
|
|
The executor interface exposed to the harness has the following methods.
|
|
|
|
- list_available_tools: takes no arguments and returns tool names, descriptions, and argument schema.
|
|
- call_tool: takes a tool name and its arguments and returns either a result or an error.
|
|
|
|
### Sandboxing
|
|
|
|
Sandboxing is done using the linux kernel feature "Landlock".
|
|
|
|
This allows restricting file system access (either read only, read/write, or no access)
|
|
as well as network access (either on/off).
|
|
|
|
## TUI Design
|
|
|
|
The bulk of the complexity of this coding agent is pushed to TUI in this design.
|
|
|
|
The driving goals of the TUI are:
|
|
|
|
- Support (neo)vim style keyboard navigation and modal editing.
|
|
- Full progressive discloure of information, high level information is grokable at a glance
|
|
but full tool use and thinking traces can be expanded.
|
|
- Support for instantiating multiple different instances of the core harness (i.e. different
|
|
instantiations for code review vs planning vs implementation).
|
|
|
|
## UI
|
|
- **Agent view:** Tree-based hierarchy (not flat tabs) for sub-agent inspection
|
|
- **Modes:** Normal, Insert, Command (`:` prefix from Normal mode)
|
|
- **Activity modes:** Plan and Execute are visually distinct activities in the TUI
|
|
- **Streaming:** Barebones styled text initially, full markdown rendering deferred
|
|
- **Token usage:** Per-turn display (between user inputs), cumulative in status bar
|
|
- **Status bar:** Mode indicator, current activity (Plan/Execute), token totals, network policy state
|
|
|
|
## Planning Mode
|
|
|
|
In planning mode the TUI instantiates a harness with read access to the project directory
|
|
and write access to a single plan markdown file.
|
|
|
|
The TUI then provides a glue mechanism that can then pipe that plan into a new instantiation of the
|
|
harness in execute mode.
|
|
|
|
Additionally we specify a schema for "surveys" that allow the model to ask the user questions about
|
|
the plan.
|
|
|
|
We also provide a hotkey (Ctrl+G or :edit-plan) that allows opening the plan in the users `$EDITOR`.
|
|
|
|
## Sub-Agents
|
|
- Independent context windows with summary passed back to parent
|
|
- Fully autonomous once spawned
|
|
- Hard deny on unpermitted actions
|
|
- Plan executor is a specialized sub-agent where the plan replaces the summary
|
|
- Direct user interaction with sub-agents deferred
|
|
|