Picture for David M. Chan

David M. Chan

Stateful Visual Encoders for Vision-Language Models

Add code
Jun 03, 2026
Viaarxiv icon

PitchBench: Measuring Pitch Hearing in Audio-Language Models

Add code
May 25, 2026
Viaarxiv icon

Attend Before Attention: Efficient and Scalable Video Understanding via Autoregressive Gazing

Add code
Mar 12, 2026
Viaarxiv icon

VisGym: Diverse, Customizable, Scalable Environments for Multimodal Agents

Add code
Jan 23, 2026
Viaarxiv icon

Puzzled by Puzzles: When Vision-Language Models Can't Take a Hint

Add code
May 29, 2025
Viaarxiv icon

REOrdering Patches Improves Vision Models

Add code
May 29, 2025
Viaarxiv icon

LISAT: Language-Instructed Segmentation Assistant for Satellite Imagery

Add code
May 05, 2025
Viaarxiv icon

Generate, but Verify: Reducing Hallucination in Vision-Language Models with Retrospective Resampling

Add code
Apr 17, 2025
Viaarxiv icon

Higher-Order Binding of Language Model Virtual Personas: a Study on Approximating Political Partisan Misperceptions

Add code
Apr 16, 2025
Figure 1 for Higher-Order Binding of Language Model Virtual Personas: a Study on Approximating Political Partisan Misperceptions
Figure 2 for Higher-Order Binding of Language Model Virtual Personas: a Study on Approximating Political Partisan Misperceptions
Figure 3 for Higher-Order Binding of Language Model Virtual Personas: a Study on Approximating Political Partisan Misperceptions
Figure 4 for Higher-Order Binding of Language Model Virtual Personas: a Study on Approximating Political Partisan Misperceptions
Viaarxiv icon

TULIP: Towards Unified Language-Image Pretraining

Add code
Mar 19, 2025
Figure 1 for TULIP: Towards Unified Language-Image Pretraining
Figure 2 for TULIP: Towards Unified Language-Image Pretraining
Figure 3 for TULIP: Towards Unified Language-Image Pretraining
Figure 4 for TULIP: Towards Unified Language-Image Pretraining
Viaarxiv icon