A vision-language model specifically designed to extract structured financial information from cheque images and generate JSON-formatted output containing key information such as cheque number, payee, amount, and issue date.
Multimodal
TransformersEnglish