Picture for Tong Sun

Tong Sun

ROD: RGB-Only Fast and Efficient Off-road Freespace Detection

Add code
Aug 12, 2025
Viaarxiv icon

StructVRM: Aligning Multimodal Reasoning with Structured and Verifiable Reward Models

Add code
Aug 07, 2025
Viaarxiv icon

Scaling Up Audio-Synchronized Visual Animation: An Efficient Training Paradigm

Add code
Aug 05, 2025
Viaarxiv icon

Seed-Prover: Deep and Broad Reasoning for Automated Theorem Proving

Add code
Aug 01, 2025
Viaarxiv icon

Towards Visual Text Grounding of Multimodal Large Language Model

Add code
Apr 07, 2025
Viaarxiv icon

MDocAgent: A Multi-Modal Multi-Agent Framework for Document Understanding

Add code
Mar 18, 2025
Viaarxiv icon

Persona-SQ: A Personalized Suggested Question Generation Framework For Real-world Documents

Add code
Dec 17, 2024
Figure 1 for Persona-SQ: A Personalized Suggested Question Generation Framework For Real-world Documents
Figure 2 for Persona-SQ: A Personalized Suggested Question Generation Framework For Real-world Documents
Figure 3 for Persona-SQ: A Personalized Suggested Question Generation Framework For Real-world Documents
Figure 4 for Persona-SQ: A Personalized Suggested Question Generation Framework For Real-world Documents
Viaarxiv icon

Numerical Pruning for Efficient Autoregressive Models

Add code
Dec 17, 2024
Figure 1 for Numerical Pruning for Efficient Autoregressive Models
Figure 2 for Numerical Pruning for Efficient Autoregressive Models
Figure 3 for Numerical Pruning for Efficient Autoregressive Models
Figure 4 for Numerical Pruning for Efficient Autoregressive Models
Viaarxiv icon

SUGAR: Subject-Driven Video Customization in a Zero-Shot Manner

Add code
Dec 13, 2024
Viaarxiv icon

LoRA-Contextualizing Adaptation of Large Multimodal Models for Long Document Understanding

Add code
Nov 02, 2024
Viaarxiv icon