Picture for Jason Kuen

Jason Kuen

ViT-AdaLA: Adapting Vision Transformers with Linear Attention

Add code
Mar 17, 2026
Viaarxiv icon

SNCE: Geometry-Aware Supervision for Scalable Discrete Image Generation

Add code
Mar 16, 2026
Viaarxiv icon

LaViDa-R1: Advancing Reasoning for Unified Multimodal Diffusion Language Models

Add code
Feb 15, 2026
Viaarxiv icon

Sparse-LaViDa: Sparse Multimodal Discrete Diffusion Language Models

Add code
Dec 16, 2025
Viaarxiv icon

VGent: Visual Grounding via Modular Design for Disentangling Reasoning and Prediction

Add code
Dec 11, 2025
Figure 1 for VGent: Visual Grounding via Modular Design for Disentangling Reasoning and Prediction
Figure 2 for VGent: Visual Grounding via Modular Design for Disentangling Reasoning and Prediction
Figure 3 for VGent: Visual Grounding via Modular Design for Disentangling Reasoning and Prediction
Figure 4 for VGent: Visual Grounding via Modular Design for Disentangling Reasoning and Prediction
Viaarxiv icon

OIDA-QA: A Multimodal Benchmark for Analyzing the Opioid Industry Documents Archive

Add code
Nov 14, 2025
Viaarxiv icon

Image Tokenizer Needs Post-Training

Add code
Sep 15, 2025
Viaarxiv icon

Refer to Anything with Vision-Language Prompts

Add code
Jun 05, 2025
Viaarxiv icon

LaViDa: A Large Diffusion Language Model for Multimodal Understanding

Add code
May 22, 2025
Viaarxiv icon

Robust Latent Matters: Boosting Image Generation with Sampling Error

Add code
Mar 11, 2025
Figure 1 for Robust Latent Matters: Boosting Image Generation with Sampling Error
Figure 2 for Robust Latent Matters: Boosting Image Generation with Sampling Error
Figure 3 for Robust Latent Matters: Boosting Image Generation with Sampling Error
Figure 4 for Robust Latent Matters: Boosting Image Generation with Sampling Error
Viaarxiv icon