Picture for Yuanxing Zhang

Yuanxing Zhang

TimeChat-Online: 80% Visual Tokens are Naturally Redundant in Streaming Videos

Add code
Apr 24, 2025
Viaarxiv icon

IV-Bench: A Benchmark for Image-Grounded Video Perception and Reasoning in Multimodal LLMs

Add code
Apr 21, 2025
Viaarxiv icon

Mavors: Multi-granularity Video Representation for Multimodal Large Language Model

Add code
Apr 14, 2025
Viaarxiv icon

Generative Frame Sampler for Long Video Understanding

Add code
Mar 12, 2025
Viaarxiv icon

HAIC: Improving Human Action Understanding and Generation with Better Captions for Multi-modal Large Language Models

Add code
Feb 28, 2025
Viaarxiv icon

CodeCriticBench: A Holistic Code Critique Benchmark for Large Language Models

Add code
Feb 23, 2025
Viaarxiv icon

VidCapBench: A Comprehensive Benchmark of Video Captioning for Controllable Text-to-Video Generation

Add code
Feb 18, 2025
Viaarxiv icon

DMQR-RAG: Diverse Multi-Query Rewriting for RAG

Add code
Nov 20, 2024
Viaarxiv icon

MIO: A Foundation Model on Multimodal Tokens

Add code
Sep 26, 2024
Figure 1 for MIO: A Foundation Model on Multimodal Tokens
Figure 2 for MIO: A Foundation Model on Multimodal Tokens
Figure 3 for MIO: A Foundation Model on Multimodal Tokens
Figure 4 for MIO: A Foundation Model on Multimodal Tokens
Viaarxiv icon

DDK: Distilling Domain Knowledge for Efficient Large Language Models

Add code
Jul 23, 2024
Figure 1 for DDK: Distilling Domain Knowledge for Efficient Large Language Models
Figure 2 for DDK: Distilling Domain Knowledge for Efficient Large Language Models
Figure 3 for DDK: Distilling Domain Knowledge for Efficient Large Language Models
Figure 4 for DDK: Distilling Domain Knowledge for Efficient Large Language Models
Viaarxiv icon