Picture for Zehan Qi

Zehan Qi

Tsinghua University

VisualAgentBench: Towards Large Multimodal Models as Visual Foundation Agents

Add code
Aug 12, 2024
Figure 1 for VisualAgentBench: Towards Large Multimodal Models as Visual Foundation Agents
Figure 2 for VisualAgentBench: Towards Large Multimodal Models as Visual Foundation Agents
Figure 3 for VisualAgentBench: Towards Large Multimodal Models as Visual Foundation Agents
Figure 4 for VisualAgentBench: Towards Large Multimodal Models as Visual Foundation Agents
Viaarxiv icon

DebateQA: Evaluating Question Answering on Debatable Knowledge

Add code
Aug 02, 2024
Viaarxiv icon

Walking in Others' Shoes: How Perspective-Taking Guides Large Language Models in Reducing Toxicity and Bias

Add code
Jul 22, 2024
Viaarxiv icon

MR-BEN: A Comprehensive Meta-Reasoning Benchmark for Large Language Models

Add code
Jun 20, 2024
Figure 1 for MR-BEN: A Comprehensive Meta-Reasoning Benchmark for Large Language Models
Figure 2 for MR-BEN: A Comprehensive Meta-Reasoning Benchmark for Large Language Models
Figure 3 for MR-BEN: A Comprehensive Meta-Reasoning Benchmark for Large Language Models
Figure 4 for MR-BEN: A Comprehensive Meta-Reasoning Benchmark for Large Language Models
Viaarxiv icon

ChatGLM: A Family of Large Language Models from GLM-130B to GLM-4 All Tools

Add code
Jun 18, 2024
Figure 1 for ChatGLM: A Family of Large Language Models from GLM-130B to GLM-4 All Tools
Figure 2 for ChatGLM: A Family of Large Language Models from GLM-130B to GLM-4 All Tools
Figure 3 for ChatGLM: A Family of Large Language Models from GLM-130B to GLM-4 All Tools
Figure 4 for ChatGLM: A Family of Large Language Models from GLM-130B to GLM-4 All Tools
Viaarxiv icon

Preemptive Answer "Attacks" on Chain-of-Thought Reasoning

Add code
May 31, 2024
Viaarxiv icon

NaturalCodeBench: Examining Coding Performance Mismatch on HumanEval and Natural User Prompts

Add code
May 07, 2024
Figure 1 for NaturalCodeBench: Examining Coding Performance Mismatch on HumanEval and Natural User Prompts
Figure 2 for NaturalCodeBench: Examining Coding Performance Mismatch on HumanEval and Natural User Prompts
Figure 3 for NaturalCodeBench: Examining Coding Performance Mismatch on HumanEval and Natural User Prompts
Figure 4 for NaturalCodeBench: Examining Coding Performance Mismatch on HumanEval and Natural User Prompts
Viaarxiv icon

Knowledge Conflicts for LLMs: A Survey

Add code
Mar 13, 2024
Figure 1 for Knowledge Conflicts for LLMs: A Survey
Figure 2 for Knowledge Conflicts for LLMs: A Survey
Figure 3 for Knowledge Conflicts for LLMs: A Survey
Figure 4 for Knowledge Conflicts for LLMs: A Survey
Viaarxiv icon

Prejudice and Caprice: A Statistical Framework for Measuring Social Discrimination in Large Language Models

Add code
Feb 29, 2024
Figure 1 for Prejudice and Caprice: A Statistical Framework for Measuring Social Discrimination in Large Language Models
Figure 2 for Prejudice and Caprice: A Statistical Framework for Measuring Social Discrimination in Large Language Models
Figure 3 for Prejudice and Caprice: A Statistical Framework for Measuring Social Discrimination in Large Language Models
Figure 4 for Prejudice and Caprice: A Statistical Framework for Measuring Social Discrimination in Large Language Models
Viaarxiv icon

Survey on Factuality in Large Language Models: Knowledge, Retrieval and Domain-Specificity

Add code
Oct 18, 2023
Figure 1 for Survey on Factuality in Large Language Models: Knowledge, Retrieval and Domain-Specificity
Figure 2 for Survey on Factuality in Large Language Models: Knowledge, Retrieval and Domain-Specificity
Figure 3 for Survey on Factuality in Large Language Models: Knowledge, Retrieval and Domain-Specificity
Figure 4 for Survey on Factuality in Large Language Models: Knowledge, Retrieval and Domain-Specificity
Viaarxiv icon