Picture for Chi Chen

Chi Chen

Think in Safety: Unveiling and Mitigating Safety Alignment Collapse in Multimodal Large Reasoning Model

Add code
May 10, 2025
Viaarxiv icon

AdaMMS: Model Merging for Heterogeneous Multimodal Large Language Models with Unsupervised Coefficient Optimization

Add code
Mar 31, 2025
Viaarxiv icon

DeepPerception: Advancing R1-like Cognitive Visual Perception in MLLMs for Knowledge-Intensive Visual Grounding

Add code
Mar 17, 2025
Viaarxiv icon

Towards Self-Improving Systematic Cognition for Next-Generation Foundation MLLMs

Add code
Mar 16, 2025
Viaarxiv icon

How Do Multimodal Large Language Models Handle Complex Multimodal Reasoning? Placing Them in An Extensible Escape Game

Add code
Mar 13, 2025
Viaarxiv icon

DNA Origami Nanostructures Observed in Transmission Electron Microscopy Images can be Characterized through Convolutional Neural Networks

Add code
Mar 13, 2025
Viaarxiv icon

Migician: Revealing the Magic of Free-Form Multi-Image Grounding in Multimodal Large Language Models

Add code
Jan 13, 2025
Figure 1 for Migician: Revealing the Magic of Free-Form Multi-Image Grounding in Multimodal Large Language Models
Figure 2 for Migician: Revealing the Magic of Free-Form Multi-Image Grounding in Multimodal Large Language Models
Figure 3 for Migician: Revealing the Magic of Free-Form Multi-Image Grounding in Multimodal Large Language Models
Figure 4 for Migician: Revealing the Magic of Free-Form Multi-Image Grounding in Multimodal Large Language Models
Viaarxiv icon

ChartCoder: Advancing Multimodal Large Language Model for Chart-to-Code Generation

Add code
Jan 11, 2025
Figure 1 for ChartCoder: Advancing Multimodal Large Language Model for Chart-to-Code Generation
Figure 2 for ChartCoder: Advancing Multimodal Large Language Model for Chart-to-Code Generation
Figure 3 for ChartCoder: Advancing Multimodal Large Language Model for Chart-to-Code Generation
Figure 4 for ChartCoder: Advancing Multimodal Large Language Model for Chart-to-Code Generation
Viaarxiv icon

Jailbreaking Multimodal Large Language Models via Shuffle Inconsistency

Add code
Jan 09, 2025
Figure 1 for Jailbreaking Multimodal Large Language Models via Shuffle Inconsistency
Figure 2 for Jailbreaking Multimodal Large Language Models via Shuffle Inconsistency
Figure 3 for Jailbreaking Multimodal Large Language Models via Shuffle Inconsistency
Figure 4 for Jailbreaking Multimodal Large Language Models via Shuffle Inconsistency
Viaarxiv icon

LLaVA-UHD v2: an MLLM Integrating High-Resolution Feature Pyramid via Hierarchical Window Transformer

Add code
Dec 18, 2024
Viaarxiv icon