Picture for Zhao Xu

Zhao Xu

Text is All You Need for Vision-Language Model Jailbreaking

Add code
Jan 31, 2026
Viaarxiv icon

Triplets Better Than Pairs: Towards Stable and Effective Self-Play Fine-Tuning for LLMs

Add code
Jan 13, 2026
Viaarxiv icon

Deep But Reliable: Advancing Multi-turn Reasoning for Thinking with Images

Add code
Dec 19, 2025
Figure 1 for Deep But Reliable: Advancing Multi-turn Reasoning for Thinking with Images
Figure 2 for Deep But Reliable: Advancing Multi-turn Reasoning for Thinking with Images
Figure 3 for Deep But Reliable: Advancing Multi-turn Reasoning for Thinking with Images
Figure 4 for Deep But Reliable: Advancing Multi-turn Reasoning for Thinking with Images
Viaarxiv icon

Omni-View: Unlocking How Generation Facilitates Understanding in Unified 3D Model based on Multiview images

Add code
Nov 10, 2025
Viaarxiv icon

TransBench: Benchmarking Machine Translation for Industrial-Scale Applications

Add code
May 20, 2025
Viaarxiv icon

Unified Multimodal Understanding and Generation Models: Advances, Challenges, and Opportunities

Add code
May 05, 2025
Viaarxiv icon

CHATS: Combining Human-Aligned Optimization and Test-Time Sampling for Text-to-Image Generation

Add code
Feb 18, 2025
Figure 1 for CHATS: Combining Human-Aligned Optimization and Test-Time Sampling for Text-to-Image Generation
Figure 2 for CHATS: Combining Human-Aligned Optimization and Test-Time Sampling for Text-to-Image Generation
Figure 3 for CHATS: Combining Human-Aligned Optimization and Test-Time Sampling for Text-to-Image Generation
Figure 4 for CHATS: Combining Human-Aligned Optimization and Test-Time Sampling for Text-to-Image Generation
Viaarxiv icon

Evaluating Image Caption via Cycle-consistent Text-to-Image Generation

Add code
Jan 08, 2025
Figure 1 for Evaluating Image Caption via Cycle-consistent Text-to-Image Generation
Figure 2 for Evaluating Image Caption via Cycle-consistent Text-to-Image Generation
Figure 3 for Evaluating Image Caption via Cycle-consistent Text-to-Image Generation
Figure 4 for Evaluating Image Caption via Cycle-consistent Text-to-Image Generation
Viaarxiv icon

MDP3: A Training-free Approach for List-wise Frame Selection in Video-LLMs

Add code
Jan 06, 2025
Viaarxiv icon

UNIC-Adapter: Unified Image-instruction Adapter with Multi-modal Transformer for Image Generation

Add code
Dec 25, 2024
Figure 1 for UNIC-Adapter: Unified Image-instruction Adapter with Multi-modal Transformer for Image Generation
Figure 2 for UNIC-Adapter: Unified Image-instruction Adapter with Multi-modal Transformer for Image Generation
Figure 3 for UNIC-Adapter: Unified Image-instruction Adapter with Multi-modal Transformer for Image Generation
Figure 4 for UNIC-Adapter: Unified Image-instruction Adapter with Multi-modal Transformer for Image Generation
Viaarxiv icon