Text-to-speach engine to synthesize speech in Tohoku dialects of Japanese language, using VoiceVox Core.
? Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
1 min voice data can also be used to train a good TTS model! (few shot voice cloning)
Port of OpenAI's Whisper model in C/C++
?? - a deep learning toolkit for Text-to-Speech, battle-tested in research and production
A generative speech model for daily dialogue.
?AI拟声: 5秒内克隆您的声音并生成任意语音内容 Clone a voice in 5 seconds to generate arbitrary speech in real-time
Instant voice cloning by MIT and MyShell. Audio foundation model.
SoftVC VITS Singing Voice Conversion
DeepSpeech is an open source embedded (offline, on-device) speech-to-text engine which can run in real time on devices ranging from a Raspberry Pi 4 to high power GPU servers.
PDF scientific paper translation with preserved formats - 基于 AI 完整保留排版的 PDF 文档全文双语翻译,支持 Google/DeepL/Ollama/OpenAI 等服务,提供 CLI/GUI/Docker/Zotero