Picture for Jianjian Sun

Jianjian Sun

Focus Anywhere for Fine-grained Multi-page Document Understanding

Add code
May 23, 2024
Figure 1 for Focus Anywhere for Fine-grained Multi-page Document Understanding
Figure 2 for Focus Anywhere for Fine-grained Multi-page Document Understanding
Figure 3 for Focus Anywhere for Fine-grained Multi-page Document Understanding
Figure 4 for Focus Anywhere for Fine-grained Multi-page Document Understanding
Viaarxiv icon

OneChart: Purify the Chart Structural Extraction via One Auxiliary Token

Add code
Apr 15, 2024
Figure 1 for OneChart: Purify the Chart Structural Extraction via One Auxiliary Token
Figure 2 for OneChart: Purify the Chart Structural Extraction via One Auxiliary Token
Figure 3 for OneChart: Purify the Chart Structural Extraction via One Auxiliary Token
Figure 4 for OneChart: Purify the Chart Structural Extraction via One Auxiliary Token
Viaarxiv icon

Small Language Model Meets with Reinforced Vision Vocabulary

Add code
Jan 23, 2024
Viaarxiv icon

Vary: Scaling up the Vision Vocabulary for Large Vision-Language Models

Add code
Dec 11, 2023
Figure 1 for Vary: Scaling up the Vision Vocabulary for Large Vision-Language Models
Figure 2 for Vary: Scaling up the Vision Vocabulary for Large Vision-Language Models
Figure 3 for Vary: Scaling up the Vision Vocabulary for Large Vision-Language Models
Figure 4 for Vary: Scaling up the Vision Vocabulary for Large Vision-Language Models
Viaarxiv icon

DreamLLM: Synergistic Multimodal Comprehension and Creation

Add code
Sep 20, 2023
Figure 1 for DreamLLM: Synergistic Multimodal Comprehension and Creation
Figure 2 for DreamLLM: Synergistic Multimodal Comprehension and Creation
Figure 3 for DreamLLM: Synergistic Multimodal Comprehension and Creation
Figure 4 for DreamLLM: Synergistic Multimodal Comprehension and Creation
Viaarxiv icon

ChatSpot: Bootstrapping Multimodal LLMs via Precise Referring Instruction Tuning

Add code
Jul 18, 2023
Figure 1 for ChatSpot: Bootstrapping Multimodal LLMs via Precise Referring Instruction Tuning
Figure 2 for ChatSpot: Bootstrapping Multimodal LLMs via Precise Referring Instruction Tuning
Figure 3 for ChatSpot: Bootstrapping Multimodal LLMs via Precise Referring Instruction Tuning
Figure 4 for ChatSpot: Bootstrapping Multimodal LLMs via Precise Referring Instruction Tuning
Viaarxiv icon

The 1st-place Solution for CVPR 2023 OpenLane Topology in Autonomous Driving Challenge

Add code
Jun 16, 2023
Figure 1 for The 1st-place Solution for CVPR 2023 OpenLane Topology in Autonomous Driving Challenge
Figure 2 for The 1st-place Solution for CVPR 2023 OpenLane Topology in Autonomous Driving Challenge
Figure 3 for The 1st-place Solution for CVPR 2023 OpenLane Topology in Autonomous Driving Challenge
Figure 4 for The 1st-place Solution for CVPR 2023 OpenLane Topology in Autonomous Driving Challenge
Viaarxiv icon

BEVStereo++: Accurate Depth Estimation in Multi-view 3D Object Detection via Dynamic Temporal Stereo

Add code
Apr 09, 2023
Figure 1 for BEVStereo++: Accurate Depth Estimation in Multi-view 3D Object Detection via Dynamic Temporal Stereo
Figure 2 for BEVStereo++: Accurate Depth Estimation in Multi-view 3D Object Detection via Dynamic Temporal Stereo
Figure 3 for BEVStereo++: Accurate Depth Estimation in Multi-view 3D Object Detection via Dynamic Temporal Stereo
Figure 4 for BEVStereo++: Accurate Depth Estimation in Multi-view 3D Object Detection via Dynamic Temporal Stereo
Viaarxiv icon

Exploring Recurrent Long-term Temporal Fusion for Multi-view 3D Perception

Add code
Mar 13, 2023
Figure 1 for Exploring Recurrent Long-term Temporal Fusion for Multi-view 3D Perception
Figure 2 for Exploring Recurrent Long-term Temporal Fusion for Multi-view 3D Perception
Figure 3 for Exploring Recurrent Long-term Temporal Fusion for Multi-view 3D Perception
Figure 4 for Exploring Recurrent Long-term Temporal Fusion for Multi-view 3D Perception
Viaarxiv icon

Cross Modal Transformer via Coordinates Encoding for 3D Object Dectection

Add code
Jan 03, 2023
Figure 1 for Cross Modal Transformer via Coordinates Encoding for 3D Object Dectection
Figure 2 for Cross Modal Transformer via Coordinates Encoding for 3D Object Dectection
Figure 3 for Cross Modal Transformer via Coordinates Encoding for 3D Object Dectection
Figure 4 for Cross Modal Transformer via Coordinates Encoding for 3D Object Dectection
Viaarxiv icon