SynthAVSR

Public

This repository contains the development of SynthAVSR, the first Audiovisual Speech Recognition (AVSR) system tailored for the Spanish and Catalan languages. Based on the AV-HuBERT (Audio-Visual Hidden Unit BERT) model, SynthAVSR leverages synthetic audiovisual data to bridge the gap in speech recognition technology for these languages.

asr avsr catalan multimodal multimodal-deep-learning multimodality spanish synth synthetic-data vsr

Creat：2024-10-28T01:14:53

Update：2025-01-20T18:13:43

Stars

Stars Increase

Related projects

WhisperX

asr

WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)

19085

1年前

+35today

NeMo

asr

A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)

16245

3年前

+18today

Vosk Api

android

Offline speech recognition API for Android, iOS, Raspberry Pi and servers with Python, Java, C# and Node

13768

1年前

+27today

PaddleSpeech

asr

Easy-to-use Speech Toolkit including Self-Supervised Learning model, SOTA/Streaming ASR with punctuation, Streaming TTS with text frontend, Speaker Verification System, End-to-End Speech Translation and Keyword Spotting. Won NAACL2022 Best Demo Award.

12414

1年前

+6today

Speechbrain

asr

A PyTorch-based Speech Toolkit

10895

1年前

+14today

Sherpa Onnx

aarch64

Speech-to-text, text-to-speech, speaker diarization, speech enhancement, and VAD using next-gen Kaldi with onnxruntime without Internet connection. Support embedded systems, Android, iOS, HarmonyOS, Raspberry Pi, RISC-V, x86_64 servers, websocket server/client, support 11 programming languages

9179

1年前

+45today

SenseVoice

Multilingual Voice Understanding Model

7112

1年前

+17today

Wukong Robot

? wukong-robot 是一个简单、灵活、优雅的中文语音对话机器人/智能音箱项目，支持ChatGPT多轮对话能力，还可能是首个支持脑机交互的开源智能音箱项目。

7048

1年前

+1today

Nexa Sdk

asr

Nexa SDK is a comprehensive toolkit for supporting GGML and ONNX models. It supports text generation, image generation, vision-language models (VLM), Audio Language Model, auto-speech-recognition (ASR), and text-to-speech (TTS) capabilities.

6145

1年前

+23today

Silero Models

asr

Silero Models: pre-trained speech-to-text, text-to-speech and text-enhancement models made embarrassingly simple

5651

1年前

+3today

Latest AI News

AI Daily Brief

AI Product Finder

AI Product Rankings

AI Product Submit

AI Tools Directory

GEO Brand Visibility

AI Visibility Audit

AI Search Visibility Checker

GEO Ranking Monitor

AI Conversation Insight

GEO Promotion Link Detection

Website AI Friendliness Detection

GEO Ranking Optimization System

GEO Ranking Optimization

MCP Servers

MCP Client

MCP Case Tutorials

MCP Ranking

MCP Service Submission

MCP Playground

MCP Inspector

LLM API Hub

AI Models Finder

Model Providers

LLM Leaderboard

LLM API Proxy Checker

Compare LLMs

LLM Cost Calculator

LLM Arena

AI Model Compatibility Checker

AI Deployment Calculator