Picture for Chen Chen

Chen Chen

University of Central Florida, Institute of Artificial Intelligence, Orlando, FL, USA

A.I.R.: Enabling Adaptive, Iterative, and Reasoning-based Frame Selection For Video Question Answering

Add code
Oct 06, 2025
Viaarxiv icon

From Frames to Clips: Efficient Key Clip Selection for Long-Form Video Understanding

Add code
Oct 02, 2025
Viaarxiv icon

AToken: A Unified Tokenizer for Vision

Add code
Sep 19, 2025
Viaarxiv icon

MANZANO: A Simple and Scalable Unified Multimodal Model with a Hybrid Vision Tokenizer

Add code
Sep 19, 2025
Figure 1 for MANZANO: A Simple and Scalable Unified Multimodal Model with a Hybrid Vision Tokenizer
Figure 2 for MANZANO: A Simple and Scalable Unified Multimodal Model with a Hybrid Vision Tokenizer
Figure 3 for MANZANO: A Simple and Scalable Unified Multimodal Model with a Hybrid Vision Tokenizer
Figure 4 for MANZANO: A Simple and Scalable Unified Multimodal Model with a Hybrid Vision Tokenizer
Viaarxiv icon

EvoEmpirBench: Dynamic Spatial Reasoning with Agent-ExpVer

Add code
Sep 16, 2025
Figure 1 for EvoEmpirBench: Dynamic Spatial Reasoning with Agent-ExpVer
Figure 2 for EvoEmpirBench: Dynamic Spatial Reasoning with Agent-ExpVer
Figure 3 for EvoEmpirBench: Dynamic Spatial Reasoning with Agent-ExpVer
Viaarxiv icon

Lethe: Purifying Backdoored Large Language Models with Knowledge Dilution

Add code
Aug 28, 2025
Viaarxiv icon

UNIFORM: Unifying Knowledge from Large-scale and Diverse Pre-trained Models

Add code
Aug 27, 2025
Figure 1 for UNIFORM: Unifying Knowledge from Large-scale and Diverse Pre-trained Models
Figure 2 for UNIFORM: Unifying Knowledge from Large-scale and Diverse Pre-trained Models
Figure 3 for UNIFORM: Unifying Knowledge from Large-scale and Diverse Pre-trained Models
Figure 4 for UNIFORM: Unifying Knowledge from Large-scale and Diverse Pre-trained Models
Viaarxiv icon

Seeing Further on the Shoulders of Giants: Knowledge Inheritance for Vision Foundation Models

Add code
Aug 20, 2025
Viaarxiv icon

FakeHunter: Multimodal Step-by-Step Reasoning for Explainable Video Forensics

Add code
Aug 20, 2025
Viaarxiv icon

UniSTFormer: Unified Spatio-Temporal Lightweight Transformer for Efficient Skeleton-Based Action Recognition

Add code
Aug 12, 2025
Figure 1 for UniSTFormer: Unified Spatio-Temporal Lightweight Transformer for Efficient Skeleton-Based Action Recognition
Figure 2 for UniSTFormer: Unified Spatio-Temporal Lightweight Transformer for Efficient Skeleton-Based Action Recognition
Figure 3 for UniSTFormer: Unified Spatio-Temporal Lightweight Transformer for Efficient Skeleton-Based Action Recognition
Figure 4 for UniSTFormer: Unified Spatio-Temporal Lightweight Transformer for Efficient Skeleton-Based Action Recognition
Viaarxiv icon