[Enhancement] Adding support for gpt-4o-audio by amitsnow · Pull Request #61 · ServiceNow/SyGra

amitsnow · 2025-11-11T05:23:20Z

Summary

This PR implements support for OpenAI's gpt-4o-audio multimodal model in SyGra. Unlike traditional TTS models (tts-1) that use the audio.speech.create API, gpt-4o-audio uses the chat.completions.create API and supports bidirectional audio/text transformations with context-aware conversational responses.

Features implemented:

Audio Format Conversion: Automatically converts SyGra's audio_url format to OpenAI's input_audio format with proper data and format fields
Dynamic Modalities Management: Intelligently sets modalities (text, audio, or both) based on input_type, output_type, and actual audio content in messages
Consistency: Maintains consistency with other multimodal features (TTS, image generation, etc.)

Performance impact (if any):

N/A - No significant performance impact. The implementation adds a lightweight routing check and uses existing client infrastructure instead of manual processing.

How to Test

Prerequisites

Set up OpenAI API key: export OPENAI_API_KEY=your_key_here
Ensure SyGra is installed with all dependencies

Test Case 1: Text-to-Audio (TTS via Chat Completions)

model:
  name: gpt4o_audio
  model: gpt-4o-audio-preview
  output_type: audio
  parameters:
    voice: alloy
    response_format: wav

prompts:
  - user: "Please read this: Hello, this is a test of the GPT-4o audio model."

Steps:

Run pipeline
Check that audio file is created in multimodal_output/audio/
Verify the audio file plays back the text content

Test Case 2: Audio-to-Text (Transcription)

model:
  name: gpt4o_audio
  model: gpt-4o-audio-preview
  
prompts:
  - user:
      - type: audio_url
        audio_url: "{<audio_field>}"
      - type: text
        text: "Please transcribe this audio."

Steps:

Run pipeline
Verify output contains transcribed text
Check logs for proper audio input detection

Test Case 3: Audio-to-Audio (Translation/Transformation)

model:
  name: gpt4o_audio
  model: gpt-4o-audio-preview
  output_type: audio
  parameters:
    voice: nova
    response_format: mp3

Steps:

Provide audio input with transformation instruction
Run the pipeline
Verify audio output is generated with specified voice and format

Test Case 4: Run Unit Tests

Expected Result: All tests should pass

Screenshots (if applicable)

N/A

Example Configuration:

model:
  name: gpt4o_audio
  model: gpt-4o-audio-preview
  output_type: audio
  parameters:
    voice: alloy  # Options: alloy, echo, fable, onyx, nova, shimmer
    response_format: wav  # Options: wav, mp3, opus, aac, flac, pcm

Example Output:

{
  "id": "record_0",
  "response": "file:multimodal_output/audio/record_0_response_0.wav"
}

Checklist

Lint fixes and unit testing done
End to end task testing
Documentation updated

psriramsnc

LGTM 🚀

amitsnow and others added 10 commits November 11, 2025 10:50

Adding support for gpt-4o-audio

d00433e

test cases

9dac4bd

Merge branch 'main' into scratch/gpt-4o-audio

f775d57

Model Response changes

8a2a8f4

fixes on dynamic modality control and dummy model config in models.yaml

bdee60b

test fixes

24a94f8

gpt_4o_audio documentation

f845384

Refactoring code to remove redundancy

a715929

formating changes

a199d95

lint fixes

fc0826d

amitsnow marked this pull request as ready for review November 12, 2025 19:11

amitsnow requested a review from a team as a code owner November 12, 2025 19:11

amitsnow self-assigned this Nov 12, 2025

amitsnow added the enhancement New feature or request label Nov 12, 2025

amitsnow changed the title ~~Adding support for gpt-4o-audio~~ [Enhancement] Adding support for gpt-4o-audio Nov 12, 2025

Adding example reference for gpt-4o-audio

b6bff28

psriramsnc approved these changes Nov 13, 2025

View reviewed changes

This comment was marked as duplicate.

Sign in to view

Merge branch 'main' into scratch/gpt-4o-audio

5b66bf9

psriramsnc requested review from vipul-mittal and zephyrzilla November 13, 2025 10:55

vipul-mittal approved these changes Nov 13, 2025

View reviewed changes

amitsnow requested a review from a team November 14, 2025 04:24

zephyrzilla approved these changes Nov 14, 2025

View reviewed changes

amitsnow merged commit 6f8937f into main Nov 14, 2025
6 checks passed

amitsnow deleted the scratch/gpt-4o-audio branch November 14, 2025 05:33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Enhancement] Adding support for gpt-4o-audio#61

[Enhancement] Adding support for gpt-4o-audio#61
amitsnow merged 12 commits intomainfrom
scratch/gpt-4o-audio

amitsnow commented Nov 11, 2025 •

edited

Loading

Uh oh!

psriramsnc left a comment

Uh oh!

This comment was marked as duplicate.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

amitsnow commented Nov 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Features implemented:

Performance impact (if any):

How to Test

Prerequisites

Test Case 1: Text-to-Audio (TTS via Chat Completions)

Test Case 2: Audio-to-Text (Transcription)

Test Case 3: Audio-to-Audio (Translation/Transformation)

Test Case 4: Run Unit Tests

Screenshots (if applicable)

Example Configuration:

Example Output:

Checklist

Uh oh!

psriramsnc left a comment

Choose a reason for hiding this comment

Uh oh!

This comment was marked as duplicate.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

amitsnow commented Nov 11, 2025 •

edited

Loading