Picture for Kaitong Cai

Kaitong Cai

Aligning Multimodal Sequential Recommendations via Robust Direct Preference Optimization with Sparse MoE

Add code
Mar 31, 2026
Viaarxiv icon

AgriWorld:A World Tools Protocol Framework for Verifiable Agricultural Reasoning with Code-Executing LLM Agents

Add code
Feb 17, 2026
Viaarxiv icon

Spectral Gating Networks

Add code
Feb 07, 2026
Viaarxiv icon

Process-of-Thought Reasoning for Videos

Add code
Feb 07, 2026
Viaarxiv icon

Why Keep Your Doubts to Yourself? Trading Visual Uncertainties in Multi-Agent Bandit Systems

Add code
Jan 26, 2026
Viaarxiv icon

Self-Rewarded Multimodal Coherent Reasoning Across Diverse Visual Domains

Add code
Dec 27, 2025
Viaarxiv icon

CoAgent: Collaborative Planning and Consistency Agent for Coherent Video Generation

Add code
Dec 27, 2025
Viaarxiv icon

RevFFN: Memory-Efficient Full-Parameter Fine-Tuning of Mixture-of-Experts LLMs with Reversible Blocks

Add code
Dec 24, 2025
Viaarxiv icon

SirenPose: Dynamic Scene Reconstruction via Geometric Supervision

Add code
Dec 23, 2025
Viaarxiv icon

FlashVLM: Text-Guided Visual Token Selection for Large Multimodal Models

Add code
Dec 23, 2025
Figure 1 for FlashVLM: Text-Guided Visual Token Selection for Large Multimodal Models
Figure 2 for FlashVLM: Text-Guided Visual Token Selection for Large Multimodal Models
Figure 3 for FlashVLM: Text-Guided Visual Token Selection for Large Multimodal Models
Figure 4 for FlashVLM: Text-Guided Visual Token Selection for Large Multimodal Models
Viaarxiv icon