Picture for Kaiyang Zhou

Kaiyang Zhou

Streaming Video Instruction Tuning

Add code
Dec 24, 2025
Viaarxiv icon

Measuring Epistemic Humility in Multimodal Large Language Models

Add code
Sep 11, 2025
Viaarxiv icon

Bootstrapping Grounded Chain-of-Thought in Multimodal LLMs for Data-Efficient Model Adaptation

Add code
Jul 03, 2025
Viaarxiv icon

Visionary-R1: Mitigating Shortcuts in Visual Reasoning with Reinforcement Learning

Add code
May 20, 2025
Viaarxiv icon

Training-Free Watermarking for Autoregressive Image Generation

Add code
May 20, 2025
Figure 1 for Training-Free Watermarking for Autoregressive Image Generation
Figure 2 for Training-Free Watermarking for Autoregressive Image Generation
Figure 3 for Training-Free Watermarking for Autoregressive Image Generation
Figure 4 for Training-Free Watermarking for Autoregressive Image Generation
Viaarxiv icon

Fine-tuning Quantized Neural Networks with Zeroth-order Optimization

Add code
May 19, 2025
Viaarxiv icon

Advances in Multimodal Adaptation and Generalization: From Traditional Approaches to Foundation Models

Add code
Jan 30, 2025
Figure 1 for Advances in Multimodal Adaptation and Generalization: From Traditional Approaches to Foundation Models
Figure 2 for Advances in Multimodal Adaptation and Generalization: From Traditional Approaches to Foundation Models
Figure 3 for Advances in Multimodal Adaptation and Generalization: From Traditional Approaches to Foundation Models
Figure 4 for Advances in Multimodal Adaptation and Generalization: From Traditional Approaches to Foundation Models
Viaarxiv icon

4D Panoptic Scene Graph Generation

Add code
May 16, 2024
Figure 1 for 4D Panoptic Scene Graph Generation
Figure 2 for 4D Panoptic Scene Graph Generation
Figure 3 for 4D Panoptic Scene Graph Generation
Figure 4 for 4D Panoptic Scene Graph Generation
Viaarxiv icon

Dual Memory Networks: A Versatile Adaptation Approach for Vision-Language Models

Add code
Mar 26, 2024
Viaarxiv icon

Open-Vocabulary Calibration for Vision-Language Models

Add code
Feb 15, 2024
Figure 1 for Open-Vocabulary Calibration for Vision-Language Models
Figure 2 for Open-Vocabulary Calibration for Vision-Language Models
Figure 3 for Open-Vocabulary Calibration for Vision-Language Models
Figure 4 for Open-Vocabulary Calibration for Vision-Language Models
Viaarxiv icon