Multi-modal-Sandtable
PublicMulti-Sensors funsion traffic Sandtable. Micropy with ESP32 connect env sensor and publish to MQTT. Microphone get sounds translate to text, RTSP Cam with YOLO identify The Car, Fingers positions. Using LLM intent recognition and slot filling to concat text question and semantic vision data, could answering mqtt, visual and execute operations.