Picture for Keda Tao

Keda Tao

OmniAgent: Audio-Guided Active Perception Agent for Omnimodal Audio-Video Understanding

Add code
Dec 29, 2025
Viaarxiv icon

StreamingAssistant: Efficient Visual Token Pruning for Accelerating Online Video Understanding

Add code
Dec 14, 2025
Viaarxiv icon

OmniZip: Audio-Guided Dynamic Token Compression for Fast Omnimodal Large Language Models

Add code
Nov 18, 2025
Figure 1 for OmniZip: Audio-Guided Dynamic Token Compression for Fast Omnimodal Large Language Models
Figure 2 for OmniZip: Audio-Guided Dynamic Token Compression for Fast Omnimodal Large Language Models
Figure 3 for OmniZip: Audio-Guided Dynamic Token Compression for Fast Omnimodal Large Language Models
Figure 4 for OmniZip: Audio-Guided Dynamic Token Compression for Fast Omnimodal Large Language Models
Viaarxiv icon

When Tokens Talk Too Much: A Survey of Multimodal Long-Context Token Compression across Images, Videos, and Audios

Add code
Jul 27, 2025
Figure 1 for When Tokens Talk Too Much: A Survey of Multimodal Long-Context Token Compression across Images, Videos, and Audios
Figure 2 for When Tokens Talk Too Much: A Survey of Multimodal Long-Context Token Compression across Images, Videos, and Audios
Figure 3 for When Tokens Talk Too Much: A Survey of Multimodal Long-Context Token Compression across Images, Videos, and Audios
Figure 4 for When Tokens Talk Too Much: A Survey of Multimodal Long-Context Token Compression across Images, Videos, and Audios
Viaarxiv icon

PhotoArtAgent: Intelligent Photo Retouching with Language Model-Based Artist Agents

Add code
May 29, 2025
Viaarxiv icon

HoliTom: Holistic Token Merging for Fast Video Large Language Models

Add code
May 28, 2025
Viaarxiv icon

Plug-and-Play 1.x-Bit KV Cache Quantization for Video Large Language Models

Add code
Mar 20, 2025
Figure 1 for Plug-and-Play 1.x-Bit KV Cache Quantization for Video Large Language Models
Figure 2 for Plug-and-Play 1.x-Bit KV Cache Quantization for Video Large Language Models
Figure 3 for Plug-and-Play 1.x-Bit KV Cache Quantization for Video Large Language Models
Figure 4 for Plug-and-Play 1.x-Bit KV Cache Quantization for Video Large Language Models
Viaarxiv icon

Poison as Cure: Visual Noise for Mitigating Object Hallucinations in LVMs

Add code
Jan 31, 2025
Figure 1 for Poison as Cure: Visual Noise for Mitigating Object Hallucinations in LVMs
Figure 2 for Poison as Cure: Visual Noise for Mitigating Object Hallucinations in LVMs
Figure 3 for Poison as Cure: Visual Noise for Mitigating Object Hallucinations in LVMs
Figure 4 for Poison as Cure: Visual Noise for Mitigating Object Hallucinations in LVMs
Viaarxiv icon

Is Oracle Pruning the True Oracle?

Add code
Nov 28, 2024
Viaarxiv icon

DyCoke: Dynamic Compression of Tokens for Fast Video Large Language Models

Add code
Nov 22, 2024
Figure 1 for DyCoke: Dynamic Compression of Tokens for Fast Video Large Language Models
Figure 2 for DyCoke: Dynamic Compression of Tokens for Fast Video Large Language Models
Figure 3 for DyCoke: Dynamic Compression of Tokens for Fast Video Large Language Models
Figure 4 for DyCoke: Dynamic Compression of Tokens for Fast Video Large Language Models
Viaarxiv icon