Picture for Long Lian

Long Lian

VisGym: Diverse, Customizable, Scalable Environments for Multimodal Agents

Add code
Jan 23, 2026
Viaarxiv icon

Visually Prompted Benchmarks Are Surprisingly Fragile

Add code
Dec 19, 2025
Figure 1 for Visually Prompted Benchmarks Are Surprisingly Fragile
Figure 2 for Visually Prompted Benchmarks Are Surprisingly Fragile
Figure 3 for Visually Prompted Benchmarks Are Surprisingly Fragile
Figure 4 for Visually Prompted Benchmarks Are Surprisingly Fragile
Viaarxiv icon

Describe Anything: Detailed Localized Image and Video Captioning

Add code
Apr 22, 2025
Viaarxiv icon

Learning Adaptive Parallel Reasoning with Language Models

Add code
Apr 21, 2025
Viaarxiv icon

TULIP: Towards Unified Language-Image Pretraining

Add code
Mar 19, 2025
Figure 1 for TULIP: Towards Unified Language-Image Pretraining
Figure 2 for TULIP: Towards Unified Language-Image Pretraining
Figure 3 for TULIP: Towards Unified Language-Image Pretraining
Figure 4 for TULIP: Towards Unified Language-Image Pretraining
Viaarxiv icon

Atlas: Multi-Scale Attention Improves Long Context Image Modeling

Add code
Mar 16, 2025
Viaarxiv icon

Rethinking Patch Dependence for Masked Autoencoders

Add code
Jan 25, 2024
Figure 1 for Rethinking Patch Dependence for Masked Autoencoders
Figure 2 for Rethinking Patch Dependence for Masked Autoencoders
Figure 3 for Rethinking Patch Dependence for Masked Autoencoders
Figure 4 for Rethinking Patch Dependence for Masked Autoencoders
Viaarxiv icon

Unsupervised Universal Image Segmentation

Add code
Dec 28, 2023
Figure 1 for Unsupervised Universal Image Segmentation
Figure 2 for Unsupervised Universal Image Segmentation
Figure 3 for Unsupervised Universal Image Segmentation
Figure 4 for Unsupervised Universal Image Segmentation
Viaarxiv icon

Self-correcting LLM-controlled Diffusion Models

Add code
Nov 27, 2023
Viaarxiv icon

LLM-grounded Video Diffusion Models

Add code
Oct 02, 2023
Viaarxiv icon