Hugging Face has introduced an open-source multimodal AI model named IDEFICS, which can accept both images and text as inputs and generate coherent text outputs. Developed based on DeepMind's Flamingo visual-language model, IDEFICS boasts 80 billion parameters. The model is available in versions with 9 billion and 80 billion parameters, both supporting the generation of coherent text. This introduction provides researchers and developers with a powerful open-source visual-language model, demonstrating the potential of generative models in handling multimodal inputs. It is expected to propel the development of multimodal AI applications.