Picture for Hangyu Guo

Hangyu Guo

DREAM: Disentangling Risks to Enhance Safety Alignment in Multimodal Large Language Models

Add code
Apr 25, 2025
Viaarxiv icon

WiS Platform: Enhancing Evaluation of LLM-Based Multi-Agent Systems Through Game-Based Analysis

Add code
Dec 04, 2024
Figure 1 for WiS Platform: Enhancing Evaluation of LLM-Based Multi-Agent Systems Through Game-Based Analysis
Figure 2 for WiS Platform: Enhancing Evaluation of LLM-Based Multi-Agent Systems Through Game-Based Analysis
Figure 3 for WiS Platform: Enhancing Evaluation of LLM-Based Multi-Agent Systems Through Game-Based Analysis
Figure 4 for WiS Platform: Enhancing Evaluation of LLM-Based Multi-Agent Systems Through Game-Based Analysis
Viaarxiv icon

PhysGame: Uncovering Physical Commonsense Violations in Gameplay Videos

Add code
Dec 02, 2024
Figure 1 for PhysGame: Uncovering Physical Commonsense Violations in Gameplay Videos
Figure 2 for PhysGame: Uncovering Physical Commonsense Violations in Gameplay Videos
Figure 3 for PhysGame: Uncovering Physical Commonsense Violations in Gameplay Videos
Figure 4 for PhysGame: Uncovering Physical Commonsense Violations in Gameplay Videos
Viaarxiv icon

Chinese SimpleQA: A Chinese Factuality Evaluation for Large Language Models

Add code
Nov 13, 2024
Figure 1 for Chinese SimpleQA: A Chinese Factuality Evaluation for Large Language Models
Figure 2 for Chinese SimpleQA: A Chinese Factuality Evaluation for Large Language Models
Figure 3 for Chinese SimpleQA: A Chinese Factuality Evaluation for Large Language Models
Figure 4 for Chinese SimpleQA: A Chinese Factuality Evaluation for Large Language Models
Viaarxiv icon

2D-DPO: Scaling Direct Preference Optimization with 2-Dimensional Supervision

Add code
Oct 25, 2024
Figure 1 for 2D-DPO: Scaling Direct Preference Optimization with 2-Dimensional Supervision
Figure 2 for 2D-DPO: Scaling Direct Preference Optimization with 2-Dimensional Supervision
Figure 3 for 2D-DPO: Scaling Direct Preference Optimization with 2-Dimensional Supervision
Figure 4 for 2D-DPO: Scaling Direct Preference Optimization with 2-Dimensional Supervision
Viaarxiv icon

MTU-Bench: A Multi-granularity Tool-Use Benchmark for Large Language Models

Add code
Oct 15, 2024
Figure 1 for MTU-Bench: A Multi-granularity Tool-Use Benchmark for Large Language Models
Figure 2 for MTU-Bench: A Multi-granularity Tool-Use Benchmark for Large Language Models
Figure 3 for MTU-Bench: A Multi-granularity Tool-Use Benchmark for Large Language Models
Figure 4 for MTU-Bench: A Multi-granularity Tool-Use Benchmark for Large Language Models
Viaarxiv icon

ING-VP: MLLMs cannot Play Easy Vision-based Games Yet

Add code
Oct 09, 2024
Figure 1 for ING-VP: MLLMs cannot Play Easy Vision-based Games Yet
Figure 2 for ING-VP: MLLMs cannot Play Easy Vision-based Games Yet
Figure 3 for ING-VP: MLLMs cannot Play Easy Vision-based Games Yet
Figure 4 for ING-VP: MLLMs cannot Play Easy Vision-based Games Yet
Viaarxiv icon

GraphReader: Building Graph-based Agent to Enhance Long-Context Abilities of Large Language Models

Add code
Jun 20, 2024
Viaarxiv icon

GeoGPT4V: Towards Geometric Multi-modal Large Language Models with Geometric Image Generation

Add code
Jun 17, 2024
Figure 1 for GeoGPT4V: Towards Geometric Multi-modal Large Language Models with Geometric Image Generation
Figure 2 for GeoGPT4V: Towards Geometric Multi-modal Large Language Models with Geometric Image Generation
Figure 3 for GeoGPT4V: Towards Geometric Multi-modal Large Language Models with Geometric Image Generation
Figure 4 for GeoGPT4V: Towards Geometric Multi-modal Large Language Models with Geometric Image Generation
Viaarxiv icon

Images are Achilles' Heel of Alignment: Exploiting Visual Vulnerabilities for Jailbreaking Multimodal Large Language Models

Add code
Mar 14, 2024
Viaarxiv icon