Fish Agent V0.1 3B
High-precision speech-to-speech model for capturing and generating environmental audio information.
CommonProductProductivitySpeech-to-SpeechText-to-Speech
Fish Agent V0.1 3B is a groundbreaking speech-to-speech model capable of capturing and generating environmental audio information with unprecedented accuracy. The model utilizes a non-semantic tagging architecture, eliminating the need for traditional semantic encoders/decoders. Additionally, it is a cutting-edge text-to-speech (TTS) model trained on 700,000 hours of multilingual audio content. As a continuation of the Qwen-2.5-3B-Instruct pre-trained version, it has been trained on 200 billion speech and text tags. The model supports eight languages, including English and Chinese, with approximately 300,000 hours of training data for each of these languages and around 20,000 hours for others.
Fish Agent V0.1 3B Visit Over Time
Monthly Visits
20899836
Bounce Rate
46.04%
Page per Visit
5.2
Visit Duration
00:04:57