Skip to content

Commit 2a87d7a

Browse files
authored
Merge pull request #55 from fishaudio/s2-pro
s2 pro everywhere
2 parents 1131aef + a1e7861 commit 2a87d7a

12 files changed

Lines changed: 59 additions & 32 deletions

File tree

api-reference/emotion-reference.mdx

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@ import SpecialEffects from '/snippets/emotion-list-special.mdx';
1212

1313
## Complete Emotion List
1414

15-
This reference guide provides a comprehensive list of all 64+ supported emotional expressions and voice styles available in Fish Audio's TTS models.
15+
This reference guide provides a comprehensive list of all 64+ supported emotional expressions and voice styles available in Fish Audio's S1 TTS model. The latest S2-Pro model supports free-form natural language emotion tags.
1616

1717
## Basic Emotions (24)
1818

@@ -103,6 +103,7 @@ That's hilarious! Ha ha ha! // Natural laughter
103103
|-------|-------|----------|-------|---------|----------|
104104
| Fish Speech 1.5 || Limited || 6/10 | No |
105105
| Fish Audio S1 ||||||
106+
| Fish Audio S2-Pro ||||||
106107

107108
## Tips for Natural Speech
108109

api-reference/openapi.json

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -23,13 +23,14 @@
2323
{
2424
"in": "header",
2525
"name": "model",
26-
"description": "Specify which TTS model to use. We recommend `s1`",
26+
"description": "Specify which TTS model to use. We recommend `s2`",
2727
"required": true,
2828
"schema": {
2929
"type": "string",
30-
"default": "s1",
30+
"default": "s2-pro",
3131
"enum": [
32-
"s1"
32+
"s1",
33+
"s2-pro"
3334
]
3435
}
3536
}

api-reference/sdk/javascript/api-reference.mdx

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -169,7 +169,7 @@ Fields: `speed` (0.5–2.0), `volume` (-20 to 20)
169169
The backend model to use.
170170

171171
```typescript
172-
Backends = 's1;
172+
Backends = 's1' | 's2-pro';
173173
```
174174

175175
## Response Classes

developer-guide/core-features/text-to-speech.mdx

Lines changed: 6 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -231,7 +231,8 @@ Choose the right model for your needs:
231231

232232
| Model | Best For | Quality | Speed |
233233
|---|---|---|---|
234-
| **s1** | Latest features | Excellent | Fast |
234+
| **s1** | Prototyping | Excellent | Fast |
235+
| **s2-pro** | Latest features | Excellent | Fastest |
235236

236237
Specify a model in your request:
237238

@@ -244,10 +245,10 @@ Specify a model in your request:
244245
</Tab>
245246
<Tab title="JavaScript">
246247
```javascript
247-
// Using the latest S1 model
248+
// Using the latest S2-Pro model
248249
const audio = await fishAudio.textToSpeech.convert(
249250
{ text: "Hello world" },
250-
"s1"
251+
"s2-pro"
251252
);
252253
```
253254
</Tab>
@@ -355,7 +356,7 @@ For direct API calls without the SDK:
355356
headers={
356357
"authorization": "Bearer YOUR_API_KEY",
357358
"content-type": "application/msgpack",
358-
"model": "s1"
359+
"model": "s2-pro"
359360
}
360361
)
361362

@@ -380,7 +381,7 @@ For direct API calls without the SDK:
380381
headers: {
381382
Authorization: "Bearer <YOUR_API_KEY>",
382383
"Content-Type": "application/msgpack",
383-
model: "s1",
384+
model: "s2-pro",
384385
},
385386
body,
386387
});

developer-guide/getting-started/introduction.mdx

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -47,7 +47,7 @@ Our technology brings dynamic, natural-sounding voices to your applications, ena
4747
<Tip>
4848
Introducing our latest generation voice models:
4949

50-
**Fish Audio S1:** Our latest model delivers unparalleled naturalness and emotion, setting a new standard for AI-generated speech. [Learn more about our models →](/developer-guide/models-pricing/models-overview)
50+
**Fish Audio S2-Pro:** Our latest model delivers unparalleled naturalness and emotion, setting a new standard for AI-generated speech. [Learn more about our models →](/developer-guide/models-pricing/models-overview)
5151
</Tip>
5252

5353
## Core Capabilities

developer-guide/getting-started/quickstart.mdx

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -72,7 +72,7 @@ Choose your preferred method to generate speech:
7272
curl -X POST https://api.fish.audio/v1/tts \
7373
-H "Authorization: Bearer $FISH_API_KEY" \
7474
-H "Content-Type: application/json" \
75-
-H "model: s1" \
75+
-H "model: s2-pro" \
7676
-d '{
7777
"text": "Hello! Welcome to Fish Audio. This is my first AI-generated voice.",
7878
"format": "mp3"
@@ -242,7 +242,7 @@ Then generate speech with your chosen voice:
242242
curl -X POST https://api.fish.audio/v1/tts \
243243
-H "Authorization: Bearer $FISH_API_KEY" \
244244
-H "Content-Type: application/json" \
245-
-H "model: s1" \
245+
-H "model: s2" \
246246
-d '{
247247
"text": "This is a custom voice from Fish Audio! You can explore hundreds of different voices on the platform, or even create your own.",
248248
"reference_id": "'"$REFERENCE_ID"'",

developer-guide/models-pricing/choosing-a-model.mdx

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -41,6 +41,6 @@ import { AudioTranscript } from '/snippets/audio-transcript.jsx';
4141
}
4242
]} />
4343

44-
We recommend using **Fish Audio S1** for all projects - our flagship model with industry-leading quality and performance.
44+
We recommend using **Fish Audio S2-Pro** for all projects - our flagship model with industry-leading quality and performance.
4545

4646
<Support />

developer-guide/models-pricing/deprecations.mdx

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -45,7 +45,8 @@ import { AudioTranscript } from '/snippets/audio-transcript.jsx';
4545
## Available Models
4646

4747
Currently available models:
48-
- **Fish Audio S1** (Recommended) - Latest generation with best performance
48+
- **Fish Audio S2** (Recommended) - Latest generation with best performance
49+
- **Fish Audio S1** - Highly expressive and natural sounding
4950

5051
## Deprecated Models
5152
- **speech-1.6** - Fish Speech v1.6 has been deprecated on February, 28th, 2026

developer-guide/models-pricing/models-overview.mdx

Lines changed: 35 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -48,18 +48,28 @@ Fish Audio offers state-of-the-art text-to-speech models optimized for different
4848

4949
### Current Model
5050

51-
<Card title="s1" icon="star">
52-
**Fish Audio S1** - Our flagship model with industry-leading quality
53-
- 4 billion parameters
54-
- 0.008 WER (0.8% word error rate)
55-
- Best performance and naturalness
56-
- Full emotional control capabilities
51+
<Card title="s2-pro" icon="star">
52+
**Fish Audio S2-Pro** - Our latest flagship model
53+
- Natural language free-form emotion tags like `[whispers sweetly]` or `[laughing nervously]`
54+
- Multi-speaker dialogue support
55+
- 80+ languages
56+
- 100ms time-to-first-audio
57+
- Full SGLang-based serving stack
5758
</Card>
5859

5960
<Note>
60-
We recommend using `s1` for all new projects to access the latest capabilities and performance improvements. Legacy models remain available for existing integrations.
61+
We recommend using `s2-pro` for all new projects to access the latest capabilities and performance improvements. S1 remains available for existing integrations.
6162
</Note>
6263

64+
### Previous Model
65+
66+
<Card title="s1" icon="microchip">
67+
**Fish Audio S1** - High-quality voice generation
68+
- 4 billion parameters
69+
- 0.008 WER (0.8% word error rate)
70+
- Full emotional control capabilities
71+
</Card>
72+
6373
## Model Specifications
6474

6575
### Fish Audio S1 Performance Metrics
@@ -71,21 +81,33 @@ We recommend using `s1` for all new projects to access the latest capabilities a
7181

7282
## Supported Languages
7383

74-
Fish Audio models support text-to-speech generation in 13 languages with full emotional expression capabilities and more to come soon.
84+
### S2-Pro
85+
86+
S2-Pro supports 80+ languages with automatic language detection.
87+
88+
<Info>
89+
Language detection is automatic - simply provide text in your target language.
90+
</Info>
91+
92+
### S1
93+
94+
S1 supports text-to-speech generation in 13 languages with full emotional expression capabilities.
7595

7696
```
7797
English, Chinese, Japanese, German,
7898
French, Spanish, Korean, Arabic,
7999
Russian, Dutch, Italian, Polish, Portuguese
80100
```
81101

82-
<Info>
83-
Language detection is automatic - simply provide text in your target language.
84-
</Info>
85-
86102
## Voice Styles and Emotions
87103

88-
Fish Audio models support 64+ emotional expressions and voice styles that can be controlled through text markers in your input.
104+
### S2-Pro
105+
106+
S2-Pro supports natural language free-form emotion tags. You can use any descriptive expression in brackets to control the voice style, such as `[whispers sweetly]` or `[laughing nervously]`.
107+
108+
### S1
109+
110+
S1 supports 64+ emotional expressions and voice styles that can be controlled through text markers in your input.
89111

90112
### Basic Emotions (24 expressions)
91113
```

developer-guide/models-pricing/pricing-and-rate-limits.mdx

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -52,6 +52,7 @@ TTS pricing is based on the size of input text, measured in millions of UTF-8 by
5252
| Model Name | Price (USD) |
5353
|--------------|------------------------|
5454
| `s1` | $15.00 / M UTF-8 bytes |
55+
| `s2-pro` | $15.00 / M UTF-8 bytes |
5556

5657
<Info>
5758
1M UTF-8 bytes is approximately 180,000 English words, or about 12 hours of speech

0 commit comments

Comments
 (0)