LFM2-VL-3B is a multimodal vision-language model developed by Liquid AI, built on the LFM2 backbone architecture. It has powerful visual understanding and reasoning capabilities, especially excelling in fine-grained perception tasks. This model can efficiently process text and image inputs and supports native image processing with a resolution of up to 512×512.
Multimodal
TransformersMultiple Languages