Picture for Xiaojun Chang

Xiaojun Chang

Measuring Social Bias in Vision-Language Models with Face-Only Counterfactuals from Real Photos

Add code
Jan 11, 2026
Viaarxiv icon

Parallel Diffusion Solver via Residual Dirichlet Policy Optimization

Add code
Dec 28, 2025
Viaarxiv icon

CARE What Fails: Contrastive Anchored-REflection for Verifiable Multimodal

Add code
Dec 22, 2025
Viaarxiv icon

User-Feedback-Driven Continual Adaptation for Vision-and-Language Navigation

Add code
Dec 11, 2025
Viaarxiv icon

GLaD: Geometric Latent Distillation for Vision-Language-Action Models

Add code
Dec 10, 2025
Figure 1 for GLaD: Geometric Latent Distillation for Vision-Language-Action Models
Figure 2 for GLaD: Geometric Latent Distillation for Vision-Language-Action Models
Figure 3 for GLaD: Geometric Latent Distillation for Vision-Language-Action Models
Figure 4 for GLaD: Geometric Latent Distillation for Vision-Language-Action Models
Viaarxiv icon

FusionFM: All-in-One Multi-Modal Image Fusion with Flow Matching

Add code
Nov 17, 2025
Viaarxiv icon

EmoVerse: A MLLMs-Driven Emotion Representation Dataset for Interpretable Visual Emotion Analysis

Add code
Nov 16, 2025
Figure 1 for EmoVerse: A MLLMs-Driven Emotion Representation Dataset for Interpretable Visual Emotion Analysis
Figure 2 for EmoVerse: A MLLMs-Driven Emotion Representation Dataset for Interpretable Visual Emotion Analysis
Figure 3 for EmoVerse: A MLLMs-Driven Emotion Representation Dataset for Interpretable Visual Emotion Analysis
Figure 4 for EmoVerse: A MLLMs-Driven Emotion Representation Dataset for Interpretable Visual Emotion Analysis
Viaarxiv icon

Towards Efficient General Feature Prediction in Masked Skeleton Modeling

Add code
Sep 03, 2025
Figure 1 for Towards Efficient General Feature Prediction in Masked Skeleton Modeling
Figure 2 for Towards Efficient General Feature Prediction in Masked Skeleton Modeling
Figure 3 for Towards Efficient General Feature Prediction in Masked Skeleton Modeling
Figure 4 for Towards Efficient General Feature Prediction in Masked Skeleton Modeling
Viaarxiv icon

MDK12-Bench: A Comprehensive Evaluation of Multimodal Large Language Models on Multidisciplinary Exams

Add code
Aug 09, 2025
Viaarxiv icon

Ground-R1: Incentivizing Grounded Visual Reasoning via Reinforcement Learning

Add code
May 26, 2025
Viaarxiv icon