Skip to content

Add live audio transcription streaming support to Foundry Local C# SDK#485

Open
rui-ren wants to merge 11 commits intomainfrom
ruiren/audio-streaming-support-sdk
Open

Add live audio transcription streaming support to Foundry Local C# SDK#485
rui-ren wants to merge 11 commits intomainfrom
ruiren/audio-streaming-support-sdk

Conversation

@rui-ren
Copy link

@rui-ren rui-ren commented Mar 5, 2026

Here's the updated PR description based on the latest changes (renamed types, CoreInterop routing fix, mermaid updates):


Title: Add live audio transcription streaming support to Foundry Local C# SDK

Description:

Adds real-time audio streaming support to the Foundry Local C# SDK, enabling live microphone-to-text transcription via ONNX Runtime GenAI's StreamingProcessor API (Nemotron ASR).

The existing OpenAIAudioClient only supports file-based transcription. This PR introduces LiveAudioTranscriptionSession that accepts continuous PCM audio chunks (e.g., from a microphone) and returns partial/final transcription results as an async stream.

What's included

New files

  • src/OpenAI/LiveAudioTranscriptionClient.cs — Streaming session with StartAsync(), AppendAsync(), GetTranscriptionStream(), StopAsync()
  • src/OpenAI/LiveAudioTranscriptionTypes.csLiveAudioTranscriptionResult and CoreErrorResponse types

Modified files

  • src/OpenAI/AudioClient.cs — Added CreateLiveTranscriptionSession() factory method
  • src/Detail/ICoreInterop.cs — Added StreamingRequestBuffer struct, StartAudioStream, PushAudioData, StopAudioStream interface methods
  • src/Detail/CoreInterop.cs — Routes audio commands through existing execute_command / execute_command_with_binary native entry points (no separate audio exports needed)
  • src/Detail/JsonSerializationContext.cs — Registered LiveAudioTranscriptionResult for AOT compatibility
  • test/FoundryLocal.Tests/Utils.cs — Updated to use CreateLiveTranscriptionSession()

Documentation

image

API surface

var audioClient = await model.GetAudioClientAsync();
var session = audioClient.CreateLiveTranscriptionSession();

session.Settings.SampleRate = 16000;
session.Settings.Channels = 1;
session.Settings.Language = "en";

await session.StartAsync();

// Push audio from microphone callback
await session.AppendAsync(pcmBytes);

// Read results as async stream
await foreach (var result in session.GetTranscriptionStream())
{
    Console.Write(result.Text);
}

await session.StopAsync();

Design highlights

  • Internal push queue — Bounded Channel<T> serializes audio pushes from any thread (safe for mic callbacks) with backpressure
  • Retry policy — Transient native errors retried with exponential backoff (3 attempts); permanent errors terminate the session
  • Settings freeze — Audio format settings are snapshot-copied at StartAsync() and immutable during the session
  • Cancellation-safe stopStopAsync always calls native stop even if cancelled, preventing native session leaks
  • Dedicated session CTS — Push loop uses its own CancellationTokenSource, decoupled from the caller's token
  • Routes through existing exportsStartAudioStream and StopAudioStream route through execute_command; PushAudioData routes through execute_command_with_binary — no new native entry points required

Core integration (neutron-server)

The Core side (AudioStreamingSession.cs) uses StreamingProcessor + Generator + Tokenizer + TokenizerStream from onnxruntime-genai to perform real-time RNNT decoding. The native commands (audio_stream_start/push/stop) are handled as cases in NativeInterop.ExecuteCommandManaged / ExecuteCommandWithBinaryManaged.

Verified working

  • ✅ SDK build succeeds (0 errors)
  • ✅ GenAI StreamingProcessor pipeline verified with WAV file (correct transcript)
  • ✅ Core TranscribeChunk byte[] PCM path matches reference float[] path exactly
  • ✅ Full E2E simulation: SDK Channel + JSON serialization + session management (32 partial + 1 final result)
  • ✅ Live microphone test: 67s real-time transcription through SDK → Core → GenAI
  • ✅ Full SDK → Core → GenAI E2E with locally built Core DLL and GenAI NuGet 0.13.0-dev

@vercel
Copy link

vercel bot commented Mar 5, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
foundry-local Ready Ready Preview, Comment Mar 18, 2026 3:50am

Request Review

ruiren_microsoft added 2 commits March 10, 2026 18:09
@rui-ren rui-ren changed the title Add real-time audio streaming support (Microphone ASR) - c# Add live audio transcription streaming support to Foundry Local C# SDK Mar 13, 2026
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new C# SDK API for live/streaming audio transcription sessions (push PCM chunks, receive incremental/final text results) and includes a Windows microphone demo sample.

Changes:

  • Introduces LiveAudioTranscriptionSession + result/error types for streaming ASR over Core interop.
  • Extends Core interop to support audio stream start/push/stop (including binary payload routing).
  • Adds a samples/cs/LiveAudioTranscription demo project and updates the audio client factory API.

Reviewed changes

Copilot reviewed 12 out of 12 changed files in this pull request and generated 9 comments.

Show a summary per file
File Description
sdk_v2/cs/test/FoundryLocal.Tests/Utils.cs Replaced prior test utilities with ad-hoc top-level streaming harness code (currently breaks test build).
sdk_v2/cs/test/FoundryLocal.Tests/ModelTests.cs Adds trailing blank lines (formatting noise).
sdk_v2/cs/src/OpenAI/LiveAudioTranscriptionTypes.cs Adds LiveAudioTranscriptionResult and a structured Core error type.
sdk_v2/cs/src/OpenAI/LiveAudioTranscriptionClient.cs Adds LiveAudioTranscriptionSession implementation (channels, retry, stop semantics).
sdk_v2/cs/src/OpenAI/AudioClient.cs Adds CreateLiveTranscriptionSession() and removes the public file streaming transcription API.
sdk_v2/cs/src/Detail/JsonSerializationContext.cs Registers new audio streaming types for source-gen JSON.
sdk_v2/cs/src/Detail/ICoreInterop.cs Adds interop structs + methods for audio stream start/push/stop.
sdk_v2/cs/src/Detail/CoreInterop.cs Implements binary command routing via execute_command_with_binary and start/stop routing via execute_command.
sdk_v2/cs/src/AssemblyInfo.cs Adds InternalsVisibleTo("AudioStreamTest").
samples/cs/LiveAudioTranscription/README.md Documentation for the live transcription demo sample.
samples/cs/LiveAudioTranscription/Program.cs Windows microphone demo using NAudio + new session API.
samples/cs/LiveAudioTranscription/LiveAudioTranscription.csproj Adds sample project dependencies and references the SDK project (path currently incorrect).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

Comment on lines +36 to +38
// --- NEW: Audio streaming types ---
[JsonSerializable(typeof(LiveAudioTranscriptionResult))]
[JsonSerializable(typeof(CoreErrorResponse))]
<Project Sdk="Microsoft.NET.Sdk">

<ItemGroup>
<ProjectReference Include="..\..\sdk_v2\cs\src\Microsoft.AI.Foundry.Local.csproj" />
Comment on lines +55 to +56


var streamingClient = audioClient.CreateLiveTranscriptionSession();
streamingClient.Settings.SampleRate = 16000;
streamingClient.Settings.Channels = 1;
streamingClient.Settings.BitsPerSample = 16;
Comment on lines +98 to 101

private async IAsyncEnumerable<AudioCreateTranscriptionResponse> TranscribeAudioStreamingImplAsync(
string audioFilePath, [EnumeratorCancellation] CancellationToken ct)
{
Comment on lines +9 to +14
using System.Runtime.CompilerServices;
using System.Runtime.InteropServices;
using System.Globalization;
using System.Threading.Channels;
using Microsoft.AI.Foundry.Local.Detail;
using Microsoft.Extensions.Logging;
///
/// Created via <see cref="OpenAIAudioClient.CreateLiveTranscriptionSession"/>.
///
/// Thread safety: PushAudioAsync can be called from any thread (including high-frequency
Comment on lines +10 to 11
[assembly: InternalsVisibleTo("AudioStreamTest")]
[assembly: InternalsVisibleTo("DynamicProxyGenAssembly2")] // for Mock of ICoreInterop
Comment on lines +1 to +18
using Microsoft.AI.Foundry.Local;
using Microsoft.Extensions.Logging;

using Microsoft.VisualStudio.TestPlatform.TestHost;
var loggerFactory = LoggerFactory.Create(b => b.AddConsole().SetMinimumLevel(LogLevel.Debug));
var logger = loggerFactory.CreateLogger("AudioStreamTest");

using Moq;
// Point to the directory containing Core + ORT DLLs
var corePath = @"C:\Users\ruiren\Desktop\audio-stream-test\Microsoft.AI.Foundry.Local.Core.dll";

internal static class Utils
var config = new Configuration
{
internal struct TestCatalogInfo
{
internal readonly List<ModelInfo> TestCatalog { get; }
internal readonly string ModelListJson { get; }

internal TestCatalogInfo(bool includeCuda)
{

TestCatalog = Utils.BuildTestCatalog(includeCuda);
ModelListJson = JsonSerializer.Serialize(TestCatalog, JsonSerializationContext.Default.ListModelInfo);
}
}

internal static readonly TestCatalogInfo TestCatalog = new(true);

[Before(Assembly)]
public static void AssemblyInit(AssemblyHookContext _)
{
using var loggerFactory = LoggerFactory.Create(builder =>
{
builder
.AddConsole()
.SetMinimumLevel(LogLevel.Debug);
});

ILogger logger = loggerFactory.CreateLogger<Program>();

// Read configuration from appsettings.Test.json
logger.LogDebug("Reading configuration from appsettings.Test.json");
var configuration = new ConfigurationBuilder()
.SetBasePath(Directory.GetCurrentDirectory())
.AddJsonFile("appsettings.Test.json", optional: true, reloadOnChange: false)
.Build();

var testModelCacheDirName = "test-data-shared";
string testDataSharedPath;
if (Path.IsPathRooted(testModelCacheDirName) ||
testModelCacheDirName.Contains(Path.DirectorySeparatorChar) ||
testModelCacheDirName.Contains(Path.AltDirectorySeparatorChar))
{
// It's a relative or complete filepath, resolve from current directory
testDataSharedPath = Path.GetFullPath(testModelCacheDirName);
}
else
{
// It's just a directory name, combine with repo root parent
testDataSharedPath = Path.GetFullPath(Path.Combine(GetRepoRoot(), "..", testModelCacheDirName));
}

logger.LogInformation("Using test model cache directory: {testDataSharedPath}", testDataSharedPath);

if (!Directory.Exists(testDataSharedPath))
{
throw new DirectoryNotFoundException($"Test model cache directory does not exist: {testDataSharedPath}");

}

var config = new Configuration
{
AppName = "FoundryLocalSdkTest",
LogLevel = Local.LogLevel.Debug,
Web = new Configuration.WebService
{
Urls = "http://127.0.0.1:0"
},
ModelCacheDir = testDataSharedPath,
LogsDir = Path.Combine(GetRepoRoot(), "sdk_v2", "cs", "logs")
};

// Initialize the singleton instance.
FoundryLocalManager.CreateAsync(config, logger).GetAwaiter().GetResult();

// standalone instance for testing individual components that skips the 'initialize' command
CoreInterop = new CoreInterop(logger);
}

internal static ICoreInterop CoreInterop { get; private set; } = default!;

internal static Mock<ILogger> CreateCapturingLoggerMock(List<string> sink)
{
var mock = new Mock<ILogger>();
mock.Setup(x => x.Log(
It.IsAny<LogLevel>(),
It.IsAny<EventId>(),
It.IsAny<It.IsAnyType>(),
It.IsAny<Exception?>(),
(Func<It.IsAnyType, Exception?, string>)It.IsAny<object>()))
.Callback((LogLevel level, EventId id, object state, Exception? ex, Delegate formatter) =>
{
var message = formatter.DynamicInvoke(state, ex) as string;
sink.Add($"{level}: {message}");
});

return mock;
}

internal sealed record InteropCommandInterceptInfo
{
public string CommandName { get; init; } = default!;
public string? CommandInput { get; init; }
public string ResponseData { get; init; } = default!;
public string? ResponseError { get; init; }
}

internal static Mock<ICoreInterop> CreateCoreInteropWithIntercept(ICoreInterop coreInterop,
List<InteropCommandInterceptInfo> intercepts)
{
var mock = new Mock<ICoreInterop>();
var interceptNames = new HashSet<string>(StringComparer.InvariantCulture);

foreach (var intercept in intercepts)
{
if (!interceptNames.Add(intercept.CommandName))
{
throw new ArgumentException($"Duplicate intercept for command {intercept.CommandName}");
}

mock.Setup(x => x.ExecuteCommand(It.Is<string>(s => s == intercept.CommandName), It.IsAny<CoreInteropRequest?>()))
.Returns(new ICoreInterop.Response
{
Data = intercept.ResponseData,
Error = intercept.ResponseError
});

mock.Setup(x => x.ExecuteCommandAsync(It.Is<string>(s => s == intercept.CommandName),
It.IsAny<CoreInteropRequest?>(),
It.IsAny<CancellationToken?>()))
.ReturnsAsync(new ICoreInterop.Response
{
Data = intercept.ResponseData,
Error = intercept.ResponseError
});
}

mock.Setup(x => x.ExecuteCommand(It.Is<string>(s => !interceptNames.Contains(s)),
It.IsAny<CoreInteropRequest?>()))
.Returns((string commandName, CoreInteropRequest? commandInput) =>
coreInterop.ExecuteCommand(commandName, commandInput));

mock.Setup(x => x.ExecuteCommandAsync(It.Is<string>(s => !interceptNames.Contains(s)),
It.IsAny<CoreInteropRequest?>(),
It.IsAny<CancellationToken?>()))
.Returns((string commandName, CoreInteropRequest? commandInput, CancellationToken? ct) =>
coreInterop.ExecuteCommandAsync(commandName, commandInput, ct));

return mock;
}

internal static bool IsRunningInCI()
AppName = "AudioStreamTest",
LogLevel = Microsoft.AI.Foundry.Local.LogLevel.Debug,
AdditionalSettings = new Dictionary<string, string>
{
var azureDevOps = Environment.GetEnvironmentVariable("TF_BUILD");
var githubActions = Environment.GetEnvironmentVariable("GITHUB_ACTIONS");
var isCI = string.Equals(azureDevOps, "True", StringComparison.OrdinalIgnoreCase) ||
string.Equals(githubActions, "true", StringComparison.OrdinalIgnoreCase);

return isCI;
{ "FoundryLocalCorePath", corePath }
}
};
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants