NVIDIA Nemotron Parse v1.1 TC is an advanced document semantic understanding model that can extract text and table elements with spatial positioning from images and generate structured annotations, including formatted text, bounding boxes, and semantic categories. Compared with the previous version, the speed is increased by 20%, and the page order of unordered elements is retained.
Multimodal
Transformers