Picture for Xinlong Wang

Xinlong Wang

MorphSAM: Learning the Morphological Prompts from Atlases for Spine Image Segmentation

Add code
Jun 16, 2025
Viaarxiv icon

Audio-Sync Video Generation with Multi-Stream Temporal Control

Add code
Jun 09, 2025
Viaarxiv icon

End-to-End Vision Tokenizer Tuning

Add code
May 15, 2025
Viaarxiv icon

Image Difference Grounding with Natural Language

Add code
Apr 02, 2025
Viaarxiv icon

Towards Unified Referring Expression Segmentation Across Omni-Level Visual Target Granularities

Add code
Apr 02, 2025
Viaarxiv icon

EVEv2: Improved Baselines for Encoder-Free Vision-Language Models

Add code
Feb 10, 2025
Figure 1 for EVEv2: Improved Baselines for Encoder-Free Vision-Language Models
Figure 2 for EVEv2: Improved Baselines for Encoder-Free Vision-Language Models
Figure 3 for EVEv2: Improved Baselines for Encoder-Free Vision-Language Models
Figure 4 for EVEv2: Improved Baselines for Encoder-Free Vision-Language Models
Viaarxiv icon

Autoregressive Video Generation without Vector Quantization

Add code
Dec 18, 2024
Viaarxiv icon

Falcon-UI: Understanding GUI Before Following User Instructions

Add code
Dec 12, 2024
Figure 1 for Falcon-UI: Understanding GUI Before Following User Instructions
Figure 2 for Falcon-UI: Understanding GUI Before Following User Instructions
Figure 3 for Falcon-UI: Understanding GUI Before Following User Instructions
Figure 4 for Falcon-UI: Understanding GUI Before Following User Instructions
Viaarxiv icon

You See it, You Got it: Learning 3D Creation on Pose-Free Videos at Scale

Add code
Dec 09, 2024
Viaarxiv icon

A Simple Image Segmentation Framework via In-Context Examples

Add code
Oct 07, 2024
Viaarxiv icon