You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: api-reference/emotion-reference.mdx
+2-1Lines changed: 2 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -12,7 +12,7 @@ import SpecialEffects from '/snippets/emotion-list-special.mdx';
12
12
13
13
## Complete Emotion List
14
14
15
-
This reference guide provides a comprehensive list of all 64+ supported emotional expressions and voice styles available in Fish Audio's TTS models.
15
+
This reference guide provides a comprehensive list of all 64+ supported emotional expressions and voice styles available in Fish Audio's S1 TTS model. The latest S2-Pro model supports free-form natural language emotion tags.
16
16
17
17
## Basic Emotions (24)
18
18
@@ -103,6 +103,7 @@ That's hilarious! Ha ha ha! // Natural laughter
Copy file name to clipboardExpand all lines: developer-guide/getting-started/introduction.mdx
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -47,7 +47,7 @@ Our technology brings dynamic, natural-sounding voices to your applications, ena
47
47
<Tip>
48
48
Introducing our latest generation voice models:
49
49
50
-
**Fish Audio S1:** Our latest model delivers unparalleled naturalness and emotion, setting a new standard for AI-generated speech. [Learn more about our models →](/developer-guide/models-pricing/models-overview)
50
+
**Fish Audio S2-Pro:** Our latest model delivers unparalleled naturalness and emotion, setting a new standard for AI-generated speech. [Learn more about our models →](/developer-guide/models-pricing/models-overview)
Copy file name to clipboardExpand all lines: developer-guide/models-pricing/models-overview.mdx
+35-13Lines changed: 35 additions & 13 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -48,18 +48,28 @@ Fish Audio offers state-of-the-art text-to-speech models optimized for different
48
48
49
49
### Current Model
50
50
51
-
<Cardtitle="s1"icon="star">
52
-
**Fish Audio S1** - Our flagship model with industry-leading quality
53
-
- 4 billion parameters
54
-
- 0.008 WER (0.8% word error rate)
55
-
- Best performance and naturalness
56
-
- Full emotional control capabilities
51
+
<Cardtitle="s2-pro"icon="star">
52
+
**Fish Audio S2-Pro** - Our latest flagship model
53
+
- Natural language free-form emotion tags like `[whispers sweetly]` or `[laughing nervously]`
54
+
- Multi-speaker dialogue support
55
+
- 80+ languages
56
+
- 100ms time-to-first-audio
57
+
- Full SGLang-based serving stack
57
58
</Card>
58
59
59
60
<Note>
60
-
We recommend using `s1` for all new projects to access the latest capabilities and performance improvements. Legacy models remain available for existing integrations.
61
+
We recommend using `s2-pro` for all new projects to access the latest capabilities and performance improvements. S1 remains available for existing integrations.
61
62
</Note>
62
63
64
+
### Previous Model
65
+
66
+
<Cardtitle="s1"icon="microchip">
67
+
**Fish Audio S1** - High-quality voice generation
68
+
- 4 billion parameters
69
+
- 0.008 WER (0.8% word error rate)
70
+
- Full emotional control capabilities
71
+
</Card>
72
+
63
73
## Model Specifications
64
74
65
75
### Fish Audio S1 Performance Metrics
@@ -71,21 +81,33 @@ We recommend using `s1` for all new projects to access the latest capabilities a
71
81
72
82
## Supported Languages
73
83
74
-
Fish Audio models support text-to-speech generation in 13 languages with full emotional expression capabilities and more to come soon.
84
+
### S2-Pro
85
+
86
+
S2-Pro supports 80+ languages with automatic language detection.
87
+
88
+
<Info>
89
+
Language detection is automatic - simply provide text in your target language.
90
+
</Info>
91
+
92
+
### S1
93
+
94
+
S1 supports text-to-speech generation in 13 languages with full emotional expression capabilities.
75
95
76
96
```
77
97
English, Chinese, Japanese, German,
78
98
French, Spanish, Korean, Arabic,
79
99
Russian, Dutch, Italian, Polish, Portuguese
80
100
```
81
101
82
-
<Info>
83
-
Language detection is automatic - simply provide text in your target language.
84
-
</Info>
85
-
86
102
## Voice Styles and Emotions
87
103
88
-
Fish Audio models support 64+ emotional expressions and voice styles that can be controlled through text markers in your input.
104
+
### S2-Pro
105
+
106
+
S2-Pro supports natural language free-form emotion tags. You can use any descriptive expression in brackets to control the voice style, such as `[whispers sweetly]` or `[laughing nervously]`.
107
+
108
+
### S1
109
+
110
+
S1 supports 64+ emotional expressions and voice styles that can be controlled through text markers in your input.
0 commit comments