Picture for Kaitong Cai

Kaitong Cai

Process-of-Thought Reasoning for Videos

Add code
Feb 07, 2026
Viaarxiv icon

Spectral Gating Networks

Add code
Feb 07, 2026
Viaarxiv icon

Why Keep Your Doubts to Yourself? Trading Visual Uncertainties in Multi-Agent Bandit Systems

Add code
Jan 26, 2026
Viaarxiv icon

Self-Rewarded Multimodal Coherent Reasoning Across Diverse Visual Domains

Add code
Dec 27, 2025
Viaarxiv icon

CoAgent: Collaborative Planning and Consistency Agent for Coherent Video Generation

Add code
Dec 27, 2025
Viaarxiv icon

RevFFN: Memory-Efficient Full-Parameter Fine-Tuning of Mixture-of-Experts LLMs with Reversible Blocks

Add code
Dec 24, 2025
Viaarxiv icon

SirenPose: Dynamic Scene Reconstruction via Geometric Supervision

Add code
Dec 23, 2025
Viaarxiv icon

FlashVLM: Text-Guided Visual Token Selection for Large Multimodal Models

Add code
Dec 23, 2025
Figure 1 for FlashVLM: Text-Guided Visual Token Selection for Large Multimodal Models
Figure 2 for FlashVLM: Text-Guided Visual Token Selection for Large Multimodal Models
Figure 3 for FlashVLM: Text-Guided Visual Token Selection for Large Multimodal Models
Figure 4 for FlashVLM: Text-Guided Visual Token Selection for Large Multimodal Models
Viaarxiv icon

LLM-CAS: Dynamic Neuron Perturbation for Real-Time Hallucination Correction

Add code
Dec 21, 2025
Viaarxiv icon

PTTA: A Pure Text-to-Animation Framework for High-Quality Creation

Add code
Dec 21, 2025
Viaarxiv icon