Merge pull request #55 from fishaudio/s2-pro

kcui5 · web-flow · commit 2a87d7a12d14 · 2026-03-10T00:18:53.000-07:00
s2 pro everywhere
diff --git a/api-reference/emotion-reference.mdx b/api-reference/emotion-reference.mdx
@@ -12,7 +12,7 @@ import SpecialEffects from '/snippets/emotion-list-special.mdx';
 
 ## Complete Emotion List
 
-This reference guide provides a comprehensive list of all 64+ supported emotional expressions and voice styles available in Fish Audio's TTS models.
+This reference guide provides a comprehensive list of all 64+ supported emotional expressions and voice styles available in Fish Audio's S1 TTS model. The latest S2-Pro model supports free-form natural language emotion tags.
 
 ## Basic Emotions (24)
 
@@ -103,6 +103,7 @@ That's hilarious! Ha ha ha!  // Natural laughter
 |-------|-------|----------|-------|---------|----------|
 | Fish Speech 1.5 | ✓ | Limited | ✓ | 6/10 | No |
 | Fish Audio S1 | ✓ | ✓ | ✓ | ✓ | ✓ |
+| Fish Audio S2-Pro | ✓ | ✓ | ✓ | ✓ | ✓ |
 
 ## Tips for Natural Speech
 
diff --git a/api-reference/openapi.json b/api-reference/openapi.json
@@ -23,13 +23,14 @@
           {
             "in": "header",
             "name": "model",
-            "description": "Specify which TTS model to use. We recommend `s1`",
+            "description": "Specify which TTS model to use. We recommend `s2`",
             "required": true,
             "schema": {
               "type": "string",
-              "default": "s1",
+              "default": "s2-pro",
               "enum": [
-                "s1"
+                "s1",
+                "s2-pro"
               ]
             }
           }
diff --git a/api-reference/sdk/javascript/api-reference.mdx b/api-reference/sdk/javascript/api-reference.mdx
@@ -169,7 +169,7 @@ Fields: `speed` (0.5–2.0), `volume` (-20 to 20)
 The backend model to use.
 
 ```typescript
-Backends = 's1;
+Backends = 's1' | 's2-pro';
 ```
 
 ## Response Classes
diff --git a/developer-guide/core-features/text-to-speech.mdx b/developer-guide/core-features/text-to-speech.mdx
@@ -231,7 +231,8 @@ Choose the right model for your needs:
 
 | Model | Best For | Quality | Speed |
 |---|---|---|---|
-| **s1** | Latest features | Excellent | Fast |
+| **s1** | Prototyping | Excellent | Fast |
+| **s2-pro** | Latest features | Excellent | Fastest |
 
 Specify a model in your request:
 
@@ -244,10 +245,10 @@ Specify a model in your request:
   </Tab>
   <Tab title="JavaScript">
     ```javascript
-    // Using the latest S1 model
+    // Using the latest S2-Pro model
     const audio = await fishAudio.textToSpeech.convert(
         { text: "Hello world" },
-        "s1"
+        "s2-pro"
     );
     ```
   </Tab>
@@ -355,7 +356,7 @@ For direct API calls without the SDK:
             headers={
                 "authorization": "Bearer YOUR_API_KEY",
                 "content-type": "application/msgpack",
-                "model": "s1"
+                "model": "s2-pro"
             }
         )
         
@@ -380,7 +381,7 @@ For direct API calls without the SDK:
         headers: {
             Authorization: "Bearer <YOUR_API_KEY>",
             "Content-Type": "application/msgpack",
-            model: "s1",
+            model: "s2-pro",
         },
         body,
     });
diff --git a/developer-guide/getting-started/introduction.mdx b/developer-guide/getting-started/introduction.mdx
@@ -47,7 +47,7 @@ Our technology brings dynamic, natural-sounding voices to your applications, ena
 <Tip>
     Introducing our latest generation voice models:
 
-    **Fish Audio S1:** Our latest model delivers unparalleled naturalness and emotion, setting a new standard for AI-generated speech. [Learn more about our models →](/developer-guide/models-pricing/models-overview)
+    **Fish Audio S2-Pro:** Our latest model delivers unparalleled naturalness and emotion, setting a new standard for AI-generated speech. [Learn more about our models →](/developer-guide/models-pricing/models-overview)
 </Tip>
 
 ## Core Capabilities
diff --git a/developer-guide/getting-started/quickstart.mdx b/developer-guide/getting-started/quickstart.mdx
@@ -72,7 +72,7 @@ Choose your preferred method to generate speech:
         curl -X POST https://api.fish.audio/v1/tts \
           -H "Authorization: Bearer $FISH_API_KEY" \
           -H "Content-Type: application/json" \
-          -H "model: s1" \
+          -H "model: s2-pro" \
           -d '{
             "text": "Hello! Welcome to Fish Audio. This is my first AI-generated voice.",
             "format": "mp3"
@@ -242,7 +242,7 @@ Then generate speech with your chosen voice:
     curl -X POST https://api.fish.audio/v1/tts \
       -H "Authorization: Bearer $FISH_API_KEY" \
       -H "Content-Type: application/json" \
-      -H "model: s1" \
+      -H "model: s2" \
       -d '{
         "text": "This is a custom voice from Fish Audio! You can explore hundreds of different voices on the platform, or even create your own.",
         "reference_id": "'"$REFERENCE_ID"'",
diff --git a/developer-guide/models-pricing/choosing-a-model.mdx b/developer-guide/models-pricing/choosing-a-model.mdx
@@ -41,6 +41,6 @@ import { AudioTranscript } from '/snippets/audio-transcript.jsx';
     }
   ]} />
 
-We recommend using **Fish Audio S1** for all projects - our flagship model with industry-leading quality and performance.
+We recommend using **Fish Audio S2-Pro** for all projects - our flagship model with industry-leading quality and performance.
 
 <Support />
diff --git a/developer-guide/models-pricing/deprecations.mdx b/developer-guide/models-pricing/deprecations.mdx
@@ -45,7 +45,8 @@ import { AudioTranscript } from '/snippets/audio-transcript.jsx';
 ## Available Models
 
 Currently available models:
-- **Fish Audio S1** (Recommended) - Latest generation with best performance
+- **Fish Audio S2** (Recommended) - Latest generation with best performance
+- **Fish Audio S1** - Highly expressive and natural sounding
 
 ## Deprecated Models
 - **speech-1.6** - Fish Speech v1.6 has been deprecated on February, 28th, 2026
diff --git a/developer-guide/models-pricing/models-overview.mdx b/developer-guide/models-pricing/models-overview.mdx
@@ -48,18 +48,28 @@ Fish Audio offers state-of-the-art text-to-speech models optimized for different
 
 ### Current Model
 
-<Card title="s1" icon="star">
-  **Fish Audio S1** - Our flagship model with industry-leading quality
-  - 4 billion parameters
-  - 0.008 WER (0.8% word error rate)
-  - Best performance and naturalness
-  - Full emotional control capabilities
+<Card title="s2-pro" icon="star">
+  **Fish Audio S2-Pro** - Our latest flagship model
+  - Natural language free-form emotion tags like `[whispers sweetly]` or `[laughing nervously]`
+  - Multi-speaker dialogue support
+  - 80+ languages
+  - 100ms time-to-first-audio
+  - Full SGLang-based serving stack
 </Card>
 
 <Note>
-We recommend using `s1` for all new projects to access the latest capabilities and performance improvements. Legacy models remain available for existing integrations.
+We recommend using `s2-pro` for all new projects to access the latest capabilities and performance improvements. S1 remains available for existing integrations.
 </Note>
 
+### Previous Model
+
+<Card title="s1" icon="microchip">
+  **Fish Audio S1** - High-quality voice generation
+  - 4 billion parameters
+  - 0.008 WER (0.8% word error rate)
+  - Full emotional control capabilities
+</Card>
+
 ## Model Specifications
 
 ### Fish Audio S1 Performance Metrics
@@ -71,21 +81,33 @@ We recommend using `s1` for all new projects to access the latest capabilities a
 
 ## Supported Languages
 
-Fish Audio models support text-to-speech generation in 13 languages with full emotional expression capabilities and more to come soon.
+### S2-Pro
+
+S2-Pro supports 80+ languages with automatic language detection.
+
+<Info>
+Language detection is automatic - simply provide text in your target language.
+</Info>
+
+### S1
+
+S1 supports text-to-speech generation in 13 languages with full emotional expression capabilities.
 
 ```
 English, Chinese, Japanese, German,
 French, Spanish, Korean, Arabic,
 Russian, Dutch, Italian, Polish, Portuguese
 ```
 
-<Info>
-Language detection is automatic - simply provide text in your target language.
-</Info>
-
 ## Voice Styles and Emotions
 
-Fish Audio models support 64+ emotional expressions and voice styles that can be controlled through text markers in your input.
+### S2-Pro
+
+S2-Pro supports natural language free-form emotion tags. You can use any descriptive expression in brackets to control the voice style, such as `[whispers sweetly]` or `[laughing nervously]`.
+
+### S1
+
+S1 supports 64+ emotional expressions and voice styles that can be controlled through text markers in your input.
 
 ### Basic Emotions (24 expressions)
 ```
diff --git a/developer-guide/models-pricing/pricing-and-rate-limits.mdx b/developer-guide/models-pricing/pricing-and-rate-limits.mdx
@@ -52,6 +52,7 @@ TTS pricing is based on the size of input text, measured in millions of UTF-8 by
 | Model Name   | Price (USD)            |
 |--------------|------------------------|
 | `s1`         | $15.00 / M UTF-8 bytes |
+| `s2-pro`         | $15.00 / M UTF-8 bytes |
 
 <Info>
 1M UTF-8 bytes is approximately 180,000 English words, or about 12 hours of speech
diff --git a/developer-guide/sdk-guide/javascript/text-to-speech.mdx b/developer-guide/sdk-guide/javascript/text-to-speech.mdx
@@ -168,13 +168,13 @@ const audio = await client.textToSpeech.convert({
 
 ## Choosing Backend
 
-Our state-of-the-art [S1 model](/developer-guide/models-pricing/models-overview)
+Our state-of-the-art [S2-Pro model](/developer-guide/models-pricing/models-overview)
 is the default backend model for TTS. Optionally specify the model via the second argument (`backend: Backends`).
 
 ```typescript
 const audio = await fishAudio.textToSpeech.convert({
   text: "Hello, world!",
-}, "s1");
+}, "s2-pro");
 ```
 
 ## Streaming
diff --git a/developer-guide/sdk-guide/javascript/websocket.mdx b/developer-guide/sdk-guide/javascript/websocket.mdx
@@ -167,7 +167,7 @@ connection.on(RealtimeEvents.ERROR, (err) => {
 
 Customize WebSocket behavior by configuring the client.<br />
 Optionally specify the backend model to use.
-Our state-of-the-art [S1 model](/developer-guide/models-pricing/models-overview) is the default:
+Our state-of-the-art [S2-Pro model](/developer-guide/models-pricing/models-overview) is the default:
 
 ```typescript
 // Custom endpoint
@@ -180,7 +180,7 @@ const fishAudio = new FishAudioClient({
 const conn = await fishAudio.textToSpeech.convertRealtime(
   request,
   makeTextStream(),
-  backend: "s1"
+  backend: "s2-pro"
 );
 ```
 

Original file line number	Diff line number	Diff line change
`@@ -23,13 +23,14 @@`
`23`	`23`	`{`
`24`	`24`	`"in": "header",`
`25`	`25`	`"name": "model",`
`26`		- "description": "Specify which TTS model to use. We recommend `s1`",
	`26`	+ "description": "Specify which TTS model to use. We recommend `s2`",
`27`	`27`	`"required": true,`
`28`	`28`	`"schema": {`
`29`	`29`	`"type": "string",`
`30`		`- "default": "s1",`
	`30`	`+ "default": "s2-pro",`
`31`	`31`	`"enum": [`
`32`		`- "s1"`
	`32`	`+ "s1",`
	`33`	`+ "s2-pro"`
`33`	`34`	`]`
`34`	`35`	`}`
`35`	`36`	`}`
Original file line number	Diff line number	Diff line change
`@@ -41,6 +41,6 @@ import { AudioTranscript } from '/snippets/audio-transcript.jsx';`
`41`	`41`	`}`
`42`	`42`	`]} />`
`43`	`43`
`44`		`-We recommend using Fish Audio S1 for all projects - our flagship model with industry-leading quality and performance.`
	`44`	`+We recommend using Fish Audio S2-Pro for all projects - our flagship model with industry-leading quality and performance.`
`45`	`45`
`46`	`46`	`<Support />`