Picture for Tong Lu

Tong Lu

Bridging Perspectives: A Survey on Cross-view Collaborative Intelligence with Egocentric-Exocentric Vision

Add code
Jun 06, 2025
Viaarxiv icon

AV-Reasoner: Improving and Benchmarking Clue-Grounded Audio-Visual Counting for MLLMs

Add code
Jun 05, 2025
Viaarxiv icon

Eagle 2.5: Boosting Long-Context Post-Training for Frontier Vision-Language Models

Add code
Apr 21, 2025
Viaarxiv icon

InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models

Add code
Apr 15, 2025
Viaarxiv icon

CoGen: 3D Consistent Video Generation via Adaptive Conditioning for Autonomous Driving

Add code
Mar 28, 2025
Viaarxiv icon

ImagineMap: Enhanced HD Map Construction with SD Maps

Add code
Dec 22, 2024
Viaarxiv icon

CG-Bench: Clue-grounded Question Answering Benchmark for Long Video Understanding

Add code
Dec 16, 2024
Figure 1 for CG-Bench: Clue-grounded Question Answering Benchmark for Long Video Understanding
Figure 2 for CG-Bench: Clue-grounded Question Answering Benchmark for Long Video Understanding
Figure 3 for CG-Bench: Clue-grounded Question Answering Benchmark for Long Video Understanding
Figure 4 for CG-Bench: Clue-grounded Question Answering Benchmark for Long Video Understanding
Viaarxiv icon

Driving with InternVL: Oustanding Champion in the Track on Driving with Language of the Autonomous Grand Challenge at CVPR 2024

Add code
Dec 10, 2024
Viaarxiv icon

LLM-Align: Utilizing Large Language Models for Entity Alignment in Knowledge Graphs

Add code
Dec 06, 2024
Viaarxiv icon

Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling

Add code
Dec 06, 2024
Figure 1 for Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling
Figure 2 for Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling
Figure 3 for Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling
Figure 4 for Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling
Viaarxiv icon