Picture for Linli Xu

Linli Xu

MVP: Enhancing Video Large Language Models via Self-supervised Masked Video Prediction

Add code
Jan 07, 2026
Viaarxiv icon

DiG: Differential Grounding for Enhancing Fine-Grained Perception in Multimodal Large Language Model

Add code
Dec 14, 2025
Viaarxiv icon

Large Reasoning Embedding Models: Towards Next-Generation Dense Retrieval Paradigm

Add code
Oct 16, 2025
Viaarxiv icon

CROP: Integrating Topological and Spatial Structures via Cross-View Prefixes for Molecular LLMs

Add code
Aug 09, 2025
Viaarxiv icon

BASIC: Boosting Visual Alignment with Intrinsic Refined Embeddings in Multimodal Large Language Models

Add code
Aug 09, 2025
Viaarxiv icon

Unifying Continuous and Discrete Text Diffusion with Non-simultaneous Diffusion Processes

Add code
May 28, 2025
Viaarxiv icon

AdPO: Enhancing the Adversarial Robustness of Large Vision-Language Models with Preference Optimization

Add code
Apr 02, 2025
Figure 1 for AdPO: Enhancing the Adversarial Robustness of Large Vision-Language Models with Preference Optimization
Figure 2 for AdPO: Enhancing the Adversarial Robustness of Large Vision-Language Models with Preference Optimization
Figure 3 for AdPO: Enhancing the Adversarial Robustness of Large Vision-Language Models with Preference Optimization
Figure 4 for AdPO: Enhancing the Adversarial Robustness of Large Vision-Language Models with Preference Optimization
Viaarxiv icon

ImageScope: Unifying Language-Guided Image Retrieval via Large Multimodal Model Collective Reasoning

Add code
Mar 13, 2025
Figure 1 for ImageScope: Unifying Language-Guided Image Retrieval via Large Multimodal Model Collective Reasoning
Figure 2 for ImageScope: Unifying Language-Guided Image Retrieval via Large Multimodal Model Collective Reasoning
Figure 3 for ImageScope: Unifying Language-Guided Image Retrieval via Large Multimodal Model Collective Reasoning
Figure 4 for ImageScope: Unifying Language-Guided Image Retrieval via Large Multimodal Model Collective Reasoning
Viaarxiv icon

Tracking the Copyright of Large Vision-Language Models through Parameter Learning Adversarial Images

Add code
Feb 23, 2025
Figure 1 for Tracking the Copyright of Large Vision-Language Models through Parameter Learning Adversarial Images
Figure 2 for Tracking the Copyright of Large Vision-Language Models through Parameter Learning Adversarial Images
Figure 3 for Tracking the Copyright of Large Vision-Language Models through Parameter Learning Adversarial Images
Figure 4 for Tracking the Copyright of Large Vision-Language Models through Parameter Learning Adversarial Images
Viaarxiv icon

Addressing Representation Collapse in Vector Quantized Models with One Linear Layer

Add code
Nov 04, 2024
Figure 1 for Addressing Representation Collapse in Vector Quantized Models with One Linear Layer
Figure 2 for Addressing Representation Collapse in Vector Quantized Models with One Linear Layer
Figure 3 for Addressing Representation Collapse in Vector Quantized Models with One Linear Layer
Figure 4 for Addressing Representation Collapse in Vector Quantized Models with One Linear Layer
Viaarxiv icon