Picture for Bo Zhao

Bo Zhao

Video-XL: Extra-Long Vision Language Model for Hour-Scale Video Understanding

Add code
Sep 24, 2024
Figure 1 for Video-XL: Extra-Long Vision Language Model for Hour-Scale Video Understanding
Figure 2 for Video-XL: Extra-Long Vision Language Model for Hour-Scale Video Understanding
Figure 3 for Video-XL: Extra-Long Vision Language Model for Hour-Scale Video Understanding
Figure 4 for Video-XL: Extra-Long Vision Language Model for Hour-Scale Video Understanding
Viaarxiv icon

Automated design of nonreciprocal thermal emitters via Bayesian optimization

Add code
Sep 13, 2024
Figure 1 for Automated design of nonreciprocal thermal emitters via Bayesian optimization
Figure 2 for Automated design of nonreciprocal thermal emitters via Bayesian optimization
Figure 3 for Automated design of nonreciprocal thermal emitters via Bayesian optimization
Figure 4 for Automated design of nonreciprocal thermal emitters via Bayesian optimization
Viaarxiv icon

Enhancing Long Video Understanding via Hierarchical Event-Based Memory

Add code
Sep 10, 2024
Figure 1 for Enhancing Long Video Understanding via Hierarchical Event-Based Memory
Figure 2 for Enhancing Long Video Understanding via Hierarchical Event-Based Memory
Figure 3 for Enhancing Long Video Understanding via Hierarchical Event-Based Memory
Figure 4 for Enhancing Long Video Understanding via Hierarchical Event-Based Memory
Viaarxiv icon

TC-LLaVA: Rethinking the Transfer from Image to Video Understanding with Temporal Considerations

Add code
Sep 05, 2024
Figure 1 for TC-LLaVA: Rethinking the Transfer from Image to Video Understanding with Temporal Considerations
Figure 2 for TC-LLaVA: Rethinking the Transfer from Image to Video Understanding with Temporal Considerations
Figure 3 for TC-LLaVA: Rethinking the Transfer from Image to Video Understanding with Temporal Considerations
Figure 4 for TC-LLaVA: Rethinking the Transfer from Image to Video Understanding with Temporal Considerations
Viaarxiv icon

52B to 1T: Lessons Learned via Tele-FLM Series

Add code
Jul 03, 2024
Figure 1 for 52B to 1T: Lessons Learned via Tele-FLM Series
Figure 2 for 52B to 1T: Lessons Learned via Tele-FLM Series
Figure 3 for 52B to 1T: Lessons Learned via Tele-FLM Series
Figure 4 for 52B to 1T: Lessons Learned via Tele-FLM Series
Viaarxiv icon

PVUW 2024 Challenge on Complex Video Understanding: Methods and Results

Add code
Jun 24, 2024
Figure 1 for PVUW 2024 Challenge on Complex Video Understanding: Methods and Results
Figure 2 for PVUW 2024 Challenge on Complex Video Understanding: Methods and Results
Figure 3 for PVUW 2024 Challenge on Complex Video Understanding: Methods and Results
Figure 4 for PVUW 2024 Challenge on Complex Video Understanding: Methods and Results
Viaarxiv icon

2nd Place Solution for MeViS Track in CVPR 2024 PVUW Workshop: Motion Expression guided Video Segmentation

Add code
Jun 20, 2024
Figure 1 for 2nd Place Solution for MeViS Track in CVPR 2024 PVUW Workshop: Motion Expression guided Video Segmentation
Figure 2 for 2nd Place Solution for MeViS Track in CVPR 2024 PVUW Workshop: Motion Expression guided Video Segmentation
Viaarxiv icon

SpatialBot: Precise Spatial Understanding with Vision Language Models

Add code
Jun 19, 2024
Viaarxiv icon

Seeing Clearly, Answering Incorrectly: A Multimodal Robustness Benchmark for Evaluating MLLMs on Leading Questions

Add code
Jun 15, 2024
Viaarxiv icon

Omni6DPose: A Benchmark and Model for Universal 6D Object Pose Estimation and Tracking

Add code
Jun 06, 2024
Figure 1 for Omni6DPose: A Benchmark and Model for Universal 6D Object Pose Estimation and Tracking
Figure 2 for Omni6DPose: A Benchmark and Model for Universal 6D Object Pose Estimation and Tracking
Figure 3 for Omni6DPose: A Benchmark and Model for Universal 6D Object Pose Estimation and Tracking
Figure 4 for Omni6DPose: A Benchmark and Model for Universal 6D Object Pose Estimation and Tracking
Viaarxiv icon