Stepfun
$1
Input tokens/M
$2
Output tokens/M
32
Context Length
OFA-Sys
InsTagger is a tool for automatically providing instruction tags. It achieves its function by extracting tag results from InsTag and is mainly used to analyze large language model supervised fine-tuning data consistent with human preferences.
A lightweight text-to-image generation model, nearly half the size of the original Stable Diffusion model while maintaining similar generation quality.
Chinese CLIP is a multimodal model based on the Vision Transformer architecture, supporting Chinese vision-language tasks.
Chinese CLIP is a simplified implementation of CLIP based on approximately 200 million Chinese image-text pairs, using ViT-L/14@336px as the image encoder and RoBERTa-wwm-base as the text encoder.
Chinese CLIP model, based on VIT architecture, supports Chinese vision-language tasks
The base version of Chinese CLIP, using ViT-B/16 as the image encoder and RoBERTa-wwm-base as the text encoder, trained on a large-scale dataset of approximately 200 million Chinese image-text pairs.
Intel
This model is obtained by fine-tuning a pre-trained 80% 1x4 block sparse Prune OFA BERT-Large model through knowledge distillation, demonstrating excellent performance on the SQuADv1.1 Q&A task.