Picture for Mike Zheng Shou

Mike Zheng Shou

DD-Ranking: Rethinking the Evaluation of Dataset Distillation

Add code
May 19, 2025
Figure 1 for DD-Ranking: Rethinking the Evaluation of Dataset Distillation
Figure 2 for DD-Ranking: Rethinking the Evaluation of Dataset Distillation
Figure 3 for DD-Ranking: Rethinking the Evaluation of Dataset Distillation
Figure 4 for DD-Ranking: Rethinking the Evaluation of Dataset Distillation
Viaarxiv icon

LiveCC: Learning Video LLM with Streaming Speech Transcription at Scale

Add code
Apr 22, 2025
Figure 1 for LiveCC: Learning Video LLM with Streaming Speech Transcription at Scale
Figure 2 for LiveCC: Learning Video LLM with Streaming Speech Transcription at Scale
Figure 3 for LiveCC: Learning Video LLM with Streaming Speech Transcription at Scale
Figure 4 for LiveCC: Learning Video LLM with Streaming Speech Transcription at Scale
Viaarxiv icon

MP-Mat: A 3D-and-Instance-Aware Human Matting and Editing Framework with Multiplane Representation

Add code
Apr 20, 2025
Figure 1 for MP-Mat: A 3D-and-Instance-Aware Human Matting and Editing Framework with Multiplane Representation
Figure 2 for MP-Mat: A 3D-and-Instance-Aware Human Matting and Editing Framework with Multiplane Representation
Figure 3 for MP-Mat: A 3D-and-Instance-Aware Human Matting and Editing Framework with Multiplane Representation
Figure 4 for MP-Mat: A 3D-and-Instance-Aware Human Matting and Editing Framework with Multiplane Representation
Viaarxiv icon

Tuning-Free Image Editing with Fidelity and Editability via Unified Latent Diffusion Model

Add code
Apr 08, 2025
Viaarxiv icon

AssistPDA: An Online Video Surveillance Assistant for Video Anomaly Prediction, Detection, and Analysis

Add code
Mar 27, 2025
Viaarxiv icon

Long-Context Autoregressive Video Modeling with Next-Frame Prediction

Add code
Mar 25, 2025
Figure 1 for Long-Context Autoregressive Video Modeling with Next-Frame Prediction
Figure 2 for Long-Context Autoregressive Video Modeling with Next-Frame Prediction
Figure 3 for Long-Context Autoregressive Video Modeling with Next-Frame Prediction
Figure 4 for Long-Context Autoregressive Video Modeling with Next-Frame Prediction
Viaarxiv icon

Impossible Videos

Add code
Mar 18, 2025
Figure 1 for Impossible Videos
Figure 2 for Impossible Videos
Figure 3 for Impossible Videos
Figure 4 for Impossible Videos
Viaarxiv icon

VideoMind: A Chain-of-LoRA Agent for Long Video Reasoning

Add code
Mar 17, 2025
Viaarxiv icon

Edit Transfer: Learning Image Editing via Vision In-Context Relations

Add code
Mar 17, 2025
Viaarxiv icon

VLog: Video-Language Models by Generative Retrieval of Narration Vocabulary

Add code
Mar 12, 2025
Figure 1 for VLog: Video-Language Models by Generative Retrieval of Narration Vocabulary
Figure 2 for VLog: Video-Language Models by Generative Retrieval of Narration Vocabulary
Figure 3 for VLog: Video-Language Models by Generative Retrieval of Narration Vocabulary
Figure 4 for VLog: Video-Language Models by Generative Retrieval of Narration Vocabulary
Viaarxiv icon