Multimodal-voice-assistant

Public

This project is a multi-modal AI voice assistant that uses OpenAI's GPT-4, audio processing with WhisperModel, speech recognition, clipboard extraction, and image processing to respond to user prompts.

ai assistant image-processing llm multimodal openai search-engine text-to-speech transcription tts

Creat：2024-06-22T10:02:42

Update：2025-03-21T00:19:57

Stars

Stars Increase

Related projects

AutoGPT

AutoGPT is the vision of accessible AI for everyone, to use and to build on. Our mission is to provide the tools, so that you can focus on what matters.

176953

5个月前

+33today

Stable Diffusion Webui

Stable Diffusion web UI

154533

1年前

+34today

Transformers

bert

? Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.

146991

1年前

+46today

N8n

Hot

Fair-code workflow automation platform with native AI capabilities. Combine visual building with custom code, self-host or cloud, 400+ integrations.

118501

4年前

+295today

Dify

Hot

agent

Dify is an open-source LLM app development platform. Dify's intuitive interface combines AI workflow, RAG pipeline, agent capabilities, model management, observability features and more, letting you quickly go from prototype to production.

107097

3个月前

+127today