Picture for Kaiyu Yue

Kaiyu Yue

Zebra-CoT: A Dataset for Interleaved Vision Language Reasoning

Add code
Jul 22, 2025
Figure 1 for Zebra-CoT: A Dataset for Interleaved Vision Language Reasoning
Figure 2 for Zebra-CoT: A Dataset for Interleaved Vision Language Reasoning
Figure 3 for Zebra-CoT: A Dataset for Interleaved Vision Language Reasoning
Figure 4 for Zebra-CoT: A Dataset for Interleaved Vision Language Reasoning
Viaarxiv icon

Zero-Shot Vision Encoder Grafting via LLM Surrogates

Add code
May 28, 2025
Viaarxiv icon

From Pixels to Prose: A Large Dataset of Dense Image Captions

Add code
Jun 14, 2024
Figure 1 for From Pixels to Prose: A Large Dataset of Dense Image Captions
Figure 2 for From Pixels to Prose: A Large Dataset of Dense Image Captions
Figure 3 for From Pixels to Prose: A Large Dataset of Dense Image Captions
Figure 4 for From Pixels to Prose: A Large Dataset of Dense Image Captions
Viaarxiv icon

Object Recognition as Next Token Prediction

Add code
Dec 04, 2023
Figure 1 for Object Recognition as Next Token Prediction
Figure 2 for Object Recognition as Next Token Prediction
Figure 3 for Object Recognition as Next Token Prediction
Figure 4 for Object Recognition as Next Token Prediction
Viaarxiv icon

Visible Feature Guidance for Crowd Pedestrian Detection

Add code
Sep 16, 2020
Figure 1 for Visible Feature Guidance for Crowd Pedestrian Detection
Figure 2 for Visible Feature Guidance for Crowd Pedestrian Detection
Figure 3 for Visible Feature Guidance for Crowd Pedestrian Detection
Figure 4 for Visible Feature Guidance for Crowd Pedestrian Detection
Viaarxiv icon

Matching Guided Distillation

Add code
Aug 23, 2020
Figure 1 for Matching Guided Distillation
Figure 2 for Matching Guided Distillation
Figure 3 for Matching Guided Distillation
Figure 4 for Matching Guided Distillation
Viaarxiv icon

Compact Generalized Non-local Network

Add code
Nov 01, 2018
Figure 1 for Compact Generalized Non-local Network
Figure 2 for Compact Generalized Non-local Network
Figure 3 for Compact Generalized Non-local Network
Figure 4 for Compact Generalized Non-local Network
Viaarxiv icon

Fine-grained Video Categorization with Redundancy Reduction Attention

Add code
Oct 26, 2018
Figure 1 for Fine-grained Video Categorization with Redundancy Reduction Attention
Figure 2 for Fine-grained Video Categorization with Redundancy Reduction Attention
Figure 3 for Fine-grained Video Categorization with Redundancy Reduction Attention
Figure 4 for Fine-grained Video Categorization with Redundancy Reduction Attention
Viaarxiv icon