Picture for Keda Tao

Keda Tao

LVOmniBench: Pioneering Long Audio-Video Understanding Evaluation for Omnimodal LLMs

Add code
Mar 19, 2026
Viaarxiv icon

OmniAgent: Audio-Guided Active Perception Agent for Omnimodal Audio-Video Understanding

Add code
Dec 29, 2025
Viaarxiv icon

StreamingAssistant: Efficient Visual Token Pruning for Accelerating Online Video Understanding

Add code
Dec 14, 2025
Viaarxiv icon

OmniZip: Audio-Guided Dynamic Token Compression for Fast Omnimodal Large Language Models

Add code
Nov 18, 2025
Figure 1 for OmniZip: Audio-Guided Dynamic Token Compression for Fast Omnimodal Large Language Models
Figure 2 for OmniZip: Audio-Guided Dynamic Token Compression for Fast Omnimodal Large Language Models
Figure 3 for OmniZip: Audio-Guided Dynamic Token Compression for Fast Omnimodal Large Language Models
Figure 4 for OmniZip: Audio-Guided Dynamic Token Compression for Fast Omnimodal Large Language Models
Viaarxiv icon

When Tokens Talk Too Much: A Survey of Multimodal Long-Context Token Compression across Images, Videos, and Audios

Add code
Jul 27, 2025
Figure 1 for When Tokens Talk Too Much: A Survey of Multimodal Long-Context Token Compression across Images, Videos, and Audios
Figure 2 for When Tokens Talk Too Much: A Survey of Multimodal Long-Context Token Compression across Images, Videos, and Audios
Figure 3 for When Tokens Talk Too Much: A Survey of Multimodal Long-Context Token Compression across Images, Videos, and Audios
Figure 4 for When Tokens Talk Too Much: A Survey of Multimodal Long-Context Token Compression across Images, Videos, and Audios
Viaarxiv icon

PhotoArtAgent: Intelligent Photo Retouching with Language Model-Based Artist Agents

Add code
May 29, 2025
Viaarxiv icon

HoliTom: Holistic Token Merging for Fast Video Large Language Models

Add code
May 28, 2025
Viaarxiv icon

Plug-and-Play 1.x-Bit KV Cache Quantization for Video Large Language Models

Add code
Mar 20, 2025
Figure 1 for Plug-and-Play 1.x-Bit KV Cache Quantization for Video Large Language Models
Figure 2 for Plug-and-Play 1.x-Bit KV Cache Quantization for Video Large Language Models
Figure 3 for Plug-and-Play 1.x-Bit KV Cache Quantization for Video Large Language Models
Figure 4 for Plug-and-Play 1.x-Bit KV Cache Quantization for Video Large Language Models
Viaarxiv icon

Poison as Cure: Visual Noise for Mitigating Object Hallucinations in LVMs

Add code
Jan 31, 2025
Figure 1 for Poison as Cure: Visual Noise for Mitigating Object Hallucinations in LVMs
Figure 2 for Poison as Cure: Visual Noise for Mitigating Object Hallucinations in LVMs
Figure 3 for Poison as Cure: Visual Noise for Mitigating Object Hallucinations in LVMs
Figure 4 for Poison as Cure: Visual Noise for Mitigating Object Hallucinations in LVMs
Viaarxiv icon

Is Oracle Pruning the True Oracle?

Add code
Nov 28, 2024
Figure 1 for Is Oracle Pruning the True Oracle?
Figure 2 for Is Oracle Pruning the True Oracle?
Figure 3 for Is Oracle Pruning the True Oracle?
Figure 4 for Is Oracle Pruning the True Oracle?
Viaarxiv icon