ezelikman
Based on the Mistral-7b model, it employs the Quiet-STaR method for continuous pretraining, generating 8 reasoning tokens before each output token to enhance reasoning capabilities.
vasugoel
K-12BERT is a BERT model obtained through continuous pretraining on K-12 basic education data, optimized specifically for educational scenarios