MAVNet-Multimodal-Audio-Visual-Network-for-Cross-Modal-Understanding
PublicMAVNet is a deep learning framework that integrates audio and visual modalities for intelligent perception — enabling tasks like event recognition, autonomous surveillance, and wildlife detection through synchronized sound and vision analysis.