Subtitle generation w/ Speaker Diarization using Whisper and pyannote.audio
?? - a deep learning toolkit for Text-to-Speech, battle-tested in research and production
A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)
kaldi-asr/kaldi is the official location of the Kaldi project.
Offline speech recognition API for Android, iOS, Raspberry Pi and servers with Python, Java, C# and Node
A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc.
A PyTorch-based Speech Toolkit
End-to-End Speech Processing Toolkit
EmotiVoice ?: a Multi-Voice and Prompt-Controlled TTS Engine
Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding
视频硬字幕提取,生成srt文件。无需申请第三方API,本地实现文本识别。基于深度学习的视频字幕提取框架,包含字幕区域检测、字幕内容提取。A GUI tool for extracting hard-coded subtitle (hardsub) from videos and generating srt files.