Picture for Gang Zhang

Gang Zhang

Michael Pokorny

MMAT-1M: A Large Reasoning Dataset for Multimodal Agent Tuning

Add code
Jul 29, 2025
Viaarxiv icon

On Data Synthesis and Post-training for Visual Abstract Reasoning

Add code
Apr 02, 2025
Figure 1 for On Data Synthesis and Post-training for Visual Abstract Reasoning
Figure 2 for On Data Synthesis and Post-training for Visual Abstract Reasoning
Figure 3 for On Data Synthesis and Post-training for Visual Abstract Reasoning
Figure 4 for On Data Synthesis and Post-training for Visual Abstract Reasoning
Viaarxiv icon

Humanity's Last Exam

Add code
Jan 24, 2025
Viaarxiv icon

Interpretable Face Anti-Spoofing: Enhancing Generalization with Multimodal Large Language Models

Add code
Jan 03, 2025
Figure 1 for Interpretable Face Anti-Spoofing: Enhancing Generalization with Multimodal Large Language Models
Figure 2 for Interpretable Face Anti-Spoofing: Enhancing Generalization with Multimodal Large Language Models
Figure 3 for Interpretable Face Anti-Spoofing: Enhancing Generalization with Multimodal Large Language Models
Figure 4 for Interpretable Face Anti-Spoofing: Enhancing Generalization with Multimodal Large Language Models
Viaarxiv icon

Descriptive Caption Enhancement with Visual Specialists for Multimodal Perception

Add code
Dec 18, 2024
Figure 1 for Descriptive Caption Enhancement with Visual Specialists for Multimodal Perception
Figure 2 for Descriptive Caption Enhancement with Visual Specialists for Multimodal Perception
Figure 3 for Descriptive Caption Enhancement with Visual Specialists for Multimodal Perception
Figure 4 for Descriptive Caption Enhancement with Visual Specialists for Multimodal Perception
Viaarxiv icon

ALoRE: Efficient Visual Adaptation via Aggregating Low Rank Experts

Add code
Dec 11, 2024
Figure 1 for ALoRE: Efficient Visual Adaptation via Aggregating Low Rank Experts
Figure 2 for ALoRE: Efficient Visual Adaptation via Aggregating Low Rank Experts
Figure 3 for ALoRE: Efficient Visual Adaptation via Aggregating Low Rank Experts
Figure 4 for ALoRE: Efficient Visual Adaptation via Aggregating Low Rank Experts
Viaarxiv icon

Continual SFT Matches Multimodal RLHF with Negative Supervision

Add code
Nov 22, 2024
Figure 1 for Continual SFT Matches Multimodal RLHF with Negative Supervision
Figure 2 for Continual SFT Matches Multimodal RLHF with Negative Supervision
Figure 3 for Continual SFT Matches Multimodal RLHF with Negative Supervision
Figure 4 for Continual SFT Matches Multimodal RLHF with Negative Supervision
Viaarxiv icon

R-CoT: Reverse Chain-of-Thought Problem Generation for Geometric Reasoning in Large Multimodal Models

Add code
Oct 23, 2024
Figure 1 for R-CoT: Reverse Chain-of-Thought Problem Generation for Geometric Reasoning in Large Multimodal Models
Figure 2 for R-CoT: Reverse Chain-of-Thought Problem Generation for Geometric Reasoning in Large Multimodal Models
Figure 3 for R-CoT: Reverse Chain-of-Thought Problem Generation for Geometric Reasoning in Large Multimodal Models
Figure 4 for R-CoT: Reverse Chain-of-Thought Problem Generation for Geometric Reasoning in Large Multimodal Models
Viaarxiv icon

Improving Multi-modal Large Language Model through Boosting Vision Capabilities

Add code
Oct 17, 2024
Figure 1 for Improving Multi-modal Large Language Model through Boosting Vision Capabilities
Figure 2 for Improving Multi-modal Large Language Model through Boosting Vision Capabilities
Figure 3 for Improving Multi-modal Large Language Model through Boosting Vision Capabilities
Figure 4 for Improving Multi-modal Large Language Model through Boosting Vision Capabilities
Viaarxiv icon

Add-SD: Rational Generation without Manual Reference

Add code
Jul 30, 2024
Figure 1 for Add-SD: Rational Generation without Manual Reference
Figure 2 for Add-SD: Rational Generation without Manual Reference
Figure 3 for Add-SD: Rational Generation without Manual Reference
Figure 4 for Add-SD: Rational Generation without Manual Reference
Viaarxiv icon