This is a quantized version model based on Google Gemma-3n-E2B-it. It uses the FP8 data type for weight and activation quantization, supports multimodal input of audio, vision, and text, and the output is text. The model achieves efficient deployment through vLLM, significantly improving inference efficiency while maintaining high accuracy.
Multimodal
Transformers