vision-voice-multimodal-app
PublicAn AI-powered chatbot that combines image understanding, voice input, and multilingual speech output. Users can upload images, ask questions by voice, and receive intelligent spoken answers. Showcases state-of-the-art models (Gemini, Whisper, Kokoro TTS) and robust Python engineering.