Applied AI Engineer focused on multimodal AI, human-robot interaction, voice AI, and empathetic human-centered systems.
I build AI applications that combine vision, speech, reasoning, and agentic interaction to solve real-world problems.
- Multimodal AI
- Human-Robot Interaction
- Empathetic AI
- LLM-powered applications
- Empathetic AI systems
- JournY β Empathetic AI journaling companion (in progress)
- Wren Meditation β AI voice-guided meditation app
Collaborative project for real-time conversation support using Whisper, GPT-4, and ElevenLabs TTS.
My Contributions:
- Backend pipeline (Node.js + TypeScript + Express) β Wired the API surface (/api/transcribe, /api/analyze, /api/tts, /api/omi): Whisper transcription, a second-stage GPT βUnawkwardβ awkwardness/social analysis using pause/filler scores and speaker turns, ElevenLabs TTS, optional OMI auto-processing via omiWatcher, and static serving of the web app.
- Product behavior β Implemented the two-stage flow described in the README: audio in β transcribe β multi-signal awkwardness reasoning β short rescue suggestions, with hooks for speaker labeling (including which side is the user/OMI wearer).
- Web client β Shipped the vanilla JS/HTML/CSS frontend under public/ (recording UX, status/visual feedback, and integration with the backend APIs).