HomeAI Tutorial

MAVNet-Multimodal-Audio-Visual-Network-for-Cross-Modal-Understanding

Public

MAVNet is a deep learning framework that integrates audio and visual modalities for intelligent perception — enabling tasks like event recognition, autonomous surveillance, and wildlife detection through synchronized sound and vision analysis.

Creat2025-10-21T00:03:52
Update2025-10-21T04:09:08
0
Stars
0
Stars Increase