HallusionBench

Public

[CVPR'24] HallusionBench: You See What You Think? Or You Think What You See? An Image-Context Reasoning Benchmark Challenging for GPT-4V(ision), LLaVA-1.5, and Other Multi-modality Models

benchmark benchmarks gpt-4 gpt-4v hallucination large-language-models large-vision-language-models llava llm lmm

Creat：2023-10-23T04:17:32

Update：2025-03-21T05:26:22

316

Stars

Stars Increase

Related projects

AutoGPT

Hot

AutoGPT is the vision of accessible AI for everyone, to use and to build on. Our mission is to provide the tools, so that you can focus on what matters.

180179

1年前

+59today

Dify

Hot

agent

Dify is an open-source LLM app development platform. Dify's intuitive interface combines AI workflow, RAG pipeline, agent capabilities, model management, observability features and more, letting you quickly go from prototype to production.

120901

1年前

+280today

Generative Ai For Beginners

Hot

21 Lessons, Get Started Building with Generative AI ? https://microsoft.github.io/generative-ai-for-beginners/

102828

1年前

+172today

NextChat

LLMs From Scratch

Hot

Implement a ChatGPT-like LLM in PyTorch from scratch, step by step

80697

1年前

+221today

Gpt_academic

academic

为GPT/GLM等LLM大语言模型提供实用化交互接口，特别优化论文阅读/润色/写作体验，模块化设计，支持自定义快捷按钮&函数插件，支持Python和C++等项目剖析&自译解功能，PDF/LaTex论文翻译&总结功能，支持并行问询多种LLM模型，支持chatglm3等本地模型。接入通义千问, deepseekcoder, 讯飞星火, 文心一言, llama2, rwkv, claude2, moss等。

69771

1年前

+15today

Openai Cookbook

chatgpt

Examples and guides for using the OpenAI API

69550

1年前

+43today

Lobe Chat

Hot

? Lobe Chat - an open-source, modern-design AI chat framework. Supports Multi AI Providers( OpenAI / Claude 3 / Gemini / Ollama / DeepSeek / Qwen), Knowledge Base (file upload / knowledge management / RAG ), Multi-Modals (Plugins/Artifacts) and Thinking. One-click FREE deployment of your private ChatGPT/ Claude / DeepSeek application.

68780

1年前

+205today

PaddleOCR

Hot

ai4science

Turn any PDF or image document into structured data for your AI. A powerful, lightweight OCR toolkit that bridges the gap between images/PDFs and LLMs. Supports 80+ languages.

65993

9个月前

+167today

Vllm

Hot

amd

A high-throughput and memory-efficient inference and serving engine for LLMs

64909

2年前

+236today

Latest AI News

AI Daily Brief

AI Product Finder

AI Product Rankings

AI Product Submit

AI Tools Directory

GEO Brand Visibility

AI Visibility Audit

AI Search Visibility Checker

GEO Ranking Monitor

AI Conversation Insight

GEO Promotion Link Detection

Website AI Friendliness Detection

GEO Ranking Optimization System

GEO Ranking Optimization

MCP Servers

MCP Client

MCP Case Tutorials

MCP Ranking

MCP Service Submission

MCP Playground

MCP Inspector

LLM API Hub

AI Models Finder

Model Providers

LLM Leaderboard

LLM API Proxy Checker

Compare LLMs

LLM Cost Calculator

LLM Arena

AI Model Compatibility Checker

AI Deployment Calculator