Picture for Zuyao You

Zuyao You

Learning Accurate Segmentation Purely from Self-Supervision

Add code
Feb 27, 2026
Viaarxiv icon

VideoLoom: A Video Large Language Model for Joint Spatial-Temporal Understanding

Add code
Jan 12, 2026
Viaarxiv icon

Pix2Cap-COCO: Advancing Visual Comprehension via Pixel-Level Captioning

Add code
Jan 23, 2025
Figure 1 for Pix2Cap-COCO: Advancing Visual Comprehension via Pixel-Level Captioning
Figure 2 for Pix2Cap-COCO: Advancing Visual Comprehension via Pixel-Level Captioning
Figure 3 for Pix2Cap-COCO: Advancing Visual Comprehension via Pixel-Level Captioning
Figure 4 for Pix2Cap-COCO: Advancing Visual Comprehension via Pixel-Level Captioning
Viaarxiv icon

FOCUS: Towards Universal Foreground Segmentation

Add code
Jan 09, 2025
Figure 1 for FOCUS: Towards Universal Foreground Segmentation
Figure 2 for FOCUS: Towards Universal Foreground Segmentation
Figure 3 for FOCUS: Towards Universal Foreground Segmentation
Figure 4 for FOCUS: Towards Universal Foreground Segmentation
Viaarxiv icon

Synthesize, Diagnose, and Optimize: Towards Fine-Grained Vision-Language Understanding

Add code
Nov 30, 2023
Figure 1 for Synthesize, Diagnose, and Optimize: Towards Fine-Grained Vision-Language Understanding
Figure 2 for Synthesize, Diagnose, and Optimize: Towards Fine-Grained Vision-Language Understanding
Figure 3 for Synthesize, Diagnose, and Optimize: Towards Fine-Grained Vision-Language Understanding
Figure 4 for Synthesize, Diagnose, and Optimize: Towards Fine-Grained Vision-Language Understanding
Viaarxiv icon