GLUE-X

Public

We leverage 14 datasets as OOD test data and conduct evaluations on 8 NLU tasks over 21 popularly used models. Our findings confirm that the OOD accuracy in NLP tasks needs to be paid more attention to since the significant performance decay compared to ID accuracy has been found in all settings.

benchmark bert natural-language-processing nlp ood

Creat：2022-11-19T21:55:14

Update：2025-03-12T10:47:15

http://gluexbenchmark.com/

Stars

Stars Increase

Related projects

Transformers

Hot

bert

? Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.

153615

3年前

+136today

Rust

Hot

compiler

Empowering everyone to build reliable and efficient software.

108357

1年前

+91today

TypeScript

Hot

javascript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

107029

1年前

+53today

Generative Ai For Beginners

Hot

21 Lessons, Get Started Building with Generative AI ? https://microsoft.github.io/generative-ai-for-beginners/

102828

1年前

+172today

Comfyui

Hot

ComfyUI docker images for use in GPU cloud and local environments. Includes AI-Dock base for authentication and improved user experience.

96213

1年前

+526today

Opencv

c-plus-plus

Open Source Computer Vision Library

85202

8年前

+42today

LLMs From Scratch

Hot

Implement a ChatGPT-like LLM in PyTorch from scratch, step by step

80697

1年前

+221today

D2l Zh

Hot

book

《动手学深度学习》：面向中文读者、能运行、可讨论。中英文版被70多个国家的500多所大学用于教学。

74277

1年前

+60today

Gpt_academic

academic

为GPT/GLM等LLM大语言模型提供实用化交互接口，特别优化论文阅读/润色/写作体验，模块化设计，支持自定义快捷按钮&函数插件，支持Python和C++等项目剖析&自译解功能，PDF/LaTex论文翻译&总结功能，支持并行问询多种LLM模型，支持chatglm3等本地模型。接入通义千问, deepseekcoder, 讯飞星火, 文心一言, llama2, rwkv, claude2, moss等。

69771

1年前

+15today

LLaMA Factory

Hot

agent

Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)

63700

1年前

+147today

Latest AI News

AI Daily Brief

AI Product Finder

AI Product Rankings

AI Product Submit

AI Tools Directory

GEO Brand Visibility

AI Visibility Audit

AI Search Visibility Checker

GEO Ranking Monitor

AI Conversation Insight

GEO Promotion Link Detection

Website AI Friendliness Detection

GEO Ranking Optimization System

GEO Ranking Optimization

MCP Servers

MCP Client

MCP Case Tutorials

MCP Ranking

MCP Service Submission

MCP Playground

MCP Inspector

LLM API Hub

AI Models Finder

Model Providers

LLM Leaderboard

LLM API Proxy Checker

Compare LLMs

LLM Cost Calculator

LLM Arena

AI Model Compatibility Checker

AI Deployment Calculator