CogAgent is a GUI agent based on the GLM-4V-9B optimized vision-language model, with significant improvements in GUI perception, reasoning accuracy, action space completeness, and task generalization. It supports bilingual interaction in both Chinese and English and has been applied to the GLM-PC product.
Multimodal
TransformersMultiple Languages