Real Time Design

I built Real Time Design because I wanted coding to feel closer to talking across the desk.

A realtime voice layer for local coding agents: open a repo, run rtd, speak, and let Codex or Claude work against the files already on disk.

Terminal - rtd

Realtime voice

listening

Listening for a coding task...

Tasks

No tasks dispatched yet.

● listening turns 0 queued 0 0.0KB · $0.00 · Codex gpt-5.3-codex-spark · agents 0 · q/m/u/a

The terminal panel mirrors the actual Ink app structure: realtime voice state, waveform, active runs, model picker, queue count, agent count, and the q/m/u/a controls.

I built Real Time Design because I wanted a coding setup that felt closer to talking to a teammate across the desk than managing a chat window.

The idea is simple: open a terminal in a repo, run rtd, and just speak. The app listens continuously, turns finished voice instructions into small coding tasks, and sends those tasks to a local coding agent. Codex is the default worker, Claude is supported too, and the terminal stays focused on the live state of the work. No giant transcript feed taking over the screen.

The interaction was inspired by the work Jaytel has been sharing around voice-driven software creation: speak naturally, keep the tool close to the work, and let the interface show just enough state to stay trustworthy.

Jaytel's voice-driven software creation demo.
Watch on X

What it does

Real Time Design is a realtime voice layer for local coding agents. You can say something like Make the header smaller and run the tests. The voice layer hears that as two separate jobs: edit the header, then run tests.

Maketheheadersmallerandrunthetests.

1. edit · header

Make the header smaller

2. run · tests

Run the tests

Then the scheduler starts separate agent runs when those jobs can happen independently. If you correct yourself while something is running, it treats that as a barge-in. Say Make the header red. and then Actually make it blue instead. and the first run gets stopped and restarted with the correction as the new source of truth.

Maketheheaderred. Actuallymakeitblueinstead.

edit · header

Make the header red

interrupted

edit · header

Actually make it blue instead

new source of truth

That was the main interaction I cared about. Voice coding only feels good if interruptions work. Real speech is messy. People revise themselves mid-sentence, add one more thing, cancel a request, or change their mind while the machine is already doing something. The app is built around that mess instead of pretending every prompt arrives fully polished.

The shape of the system

The app has four main pieces:

01Microphone

24 kHz PCM from the local mic.

02OpenAI Realtime

Semantic VAD waits for complete thoughts.

03Code intents

Speech becomes edit, run, create, delete, explain, or undo.

04Local agents

Codex or Claude receives repo-scoped tasks.

Realtime handles voice and intent. The scheduler owns queueing, cancellation, and parallel dispatch.

The Realtime model has one job in this project: voice and intent. It listens to audio, waits for a complete thought with semantic VAD, and calls a small tool called code_intents.

{
  "intents": [
    {
      "action": "edit",
      "target": "navigation",
      "description": "Make the nav red"
    },
    {
      "action": "run",
      "target": "typecheck",
      "description": "Run the typecheck"
    }
  ]
}

After that, Real Time Design owns the workflow. It dedupes repeated events, decides whether the new instruction is a fresh task, a cancellation, or a correction, and dispatches work to the selected coding CLI.

I like this split because each layer stays narrow. The voice model decides whether I said something actionable. The coding agent has the repo, the tools, and the responsibility to make the change.

Terminal UI decisions

I used Ink, so the UI is React rendered into the terminal. That sounds like a weird choice until you build something state-heavy in a terminal. This app has connection state, mic state, transcript state, queue state, active agent runs, finished runs, current model selection, errors, and keyboard shortcuts. I wanted normal component boundaries instead of manually repainting strings.

The UI is intentionally compact: the top panel shows listening state, waveform, and latest heard text; the middle panel shows active tasks and recent finished tasks; the bottom bar shows status, queue count, active model, and shortcuts.

The terminal should answer three questions fast: is it listening, what did it think I said, and what agents are working right now?

I also put the app in the terminal alternate screen when possible. That makes it feel like a real tool instead of a command that sprays logs into your shell. When you quit, your previous terminal buffer comes back.

Why the UI does not show everything

The coding agents can produce a lot of output. Full logs are useful for debugging, but they are terrible as the main interface. Watching every token scroll by turns the app into a slot machine.

So the task list keeps a short inferred step: step: editing src/components/Nav.tsx, step: running npm run typecheck, or step: finished. That step is guessed from agent output. It is imperfect, and still useful enough to show whether the agent is reading, editing, testing, or done.

The design goal was calm status, not a live dump.

Parallel work

One of the more fun parts is the scheduler. By default, Real Time Design can run up to four local agents at once. If the Realtime layer splits one spoken request into three independent intents, the scheduler can start all three.

This is risky if two agents touch the same file, so every generated prompt reminds the worker that other agents may be running in the same workspace and that it must inspect the current tree before editing. The scheduler also keeps barge-in decisions target-aware, so actually make the nav blue tries to restart the nav task instead of killing some unrelated run.

It is still early. Conflict resolution is mostly inspect first, keep scope tight, and do the easy parallel cases well. The point is to make independent work feel instant while keeping the model aware that the workspace is shared.

Agent and model picker

The app detects installed CLIs and prefers Codex when both Codex and Claude are available. You can also pick explicitly with rtd --agent codex or rtd --agent claude.

The default model is gpt-5.3-codex-spark because it makes the loop feel almost realtime. It is magical to talk, finish a sentence, and see the app start changing merely seconds later.

Inside the TUI, pressing a opens a small picker for future tasks. Active jobs keep the model they started with. The picker only changes what happens next. That was a small but important product decision. Switching the model globally while work is already running would make the UI harder to reason about.

Audio and Realtime plumbing

The important part is not the audio plumbing. From the user's perspective, the voice layer should feel uninterrupted. You start rtd, talk naturally, pause, correct yourself, and keep going without pressing a button or managing a transcript.

Realtime handles the listening loop and waits until an instruction sounds complete. When there is a coding task, it emits structured intents for the scheduler. When I am thinking out loud, it should stay quiet.

That is the product feel I care about: no push-to-talk, no separate send step, no ceremony. Just speak, revise, and let the terminal decide when something is actionable.

The rough edges

This is a prototype. Barge-in matching is heuristic, reconnect behavior is basic, progress summaries are inferred from CLI output, parallel agents can still collide if I ask for overlapping edits, and the UI is built for a focused terminal size rather than every tiny pane.

I am fine with those rough edges right now. The important loop works: talk, get structured tasks, dispatch local agents, interrupt them when needed, and keep the terminal readable while it happens.

Why I like this project

The most interesting design work was deciding where each responsibility should live. Realtime handles listening and intent extraction. The scheduler handles task shape, dedupe, queueing, cancellation, and parallel dispatch. Codex or Claude handles actual repo work. Ink handles the terminal as a proper stateful interface.

That separation keeps the prototype understandable. Each piece can be improved without turning the whole thing into one giant prompt.

It also makes the tool feel different from normal chat-based coding. I can stay in the repo, talk through small changes, correct myself naturally, and watch a few local agents move at once. When it works, the terminal starts to feel like a little workshop.

P.S. This is also really good for using LLMs as a writing companion. I mostly spoke this entire post, corrected myself, asked for visualizations, and kept shaping the page out loud. Give it a try: the code is on GitHub, and you can install it with npm install -g real-time-design.

Pedro Marques