The speed at which the AI can work means that human typing speeds might be the slow down for conveying prompts. People already use 3rd party transcription apps like SuperWhisper to convey instructions quickly (see the original vibe coding tweet).
But I think this would be a great feature for codex, especially since there is 1st party support for realtime transcription now via the OpenAI Realtime API.
https://platform.openai.com/docs/guides/realtime-transcription#handling-transcriptions
It would work something like:
codex --voice-mode
(colloquially --vibe-mode)
It would start an interactive session same as it does now but with a persistent connection to the realtime API and you can use a push to talk key (Space by default?, but configurable) to provide input / interrupt the AI
Willing to work on a PR for this
The speed at which the AI can work means that human typing speeds might be the slow down for conveying prompts. People already use 3rd party transcription apps like SuperWhisper to convey instructions quickly (see the original vibe coding tweet).
But I think this would be a great feature for codex, especially since there is 1st party support for realtime transcription now via the OpenAI Realtime API.
https://platform.openai.com/docs/guides/realtime-transcription#handling-transcriptions
It would work something like:
codex --voice-mode
(colloquially --vibe-mode)
It would start an interactive session same as it does now but with a persistent connection to the realtime API and you can use a push to talk key (Space by default?, but configurable) to provide input / interrupt the AI
Willing to work on a PR for this