Picture for Jinrong Yang

Jinrong Yang

Self-supervised Pre-training for Transferable Multi-modal Perception

Add code
May 28, 2024
Figure 1 for Self-supervised Pre-training for Transferable Multi-modal Perception
Figure 2 for Self-supervised Pre-training for Transferable Multi-modal Perception
Figure 3 for Self-supervised Pre-training for Transferable Multi-modal Perception
Figure 4 for Self-supervised Pre-training for Transferable Multi-modal Perception
Viaarxiv icon

Vary: Scaling up the Vision Vocabulary for Large Vision-Language Models

Add code
Dec 11, 2023
Figure 1 for Vary: Scaling up the Vision Vocabulary for Large Vision-Language Models
Figure 2 for Vary: Scaling up the Vision Vocabulary for Large Vision-Language Models
Figure 3 for Vary: Scaling up the Vision Vocabulary for Large Vision-Language Models
Figure 4 for Vary: Scaling up the Vision Vocabulary for Large Vision-Language Models
Viaarxiv icon

Merlin:Empowering Multimodal LLMs with Foresight Minds

Add code
Nov 30, 2023
Figure 1 for Merlin:Empowering Multimodal LLMs with Foresight Minds
Figure 2 for Merlin:Empowering Multimodal LLMs with Foresight Minds
Figure 3 for Merlin:Empowering Multimodal LLMs with Foresight Minds
Figure 4 for Merlin:Empowering Multimodal LLMs with Foresight Minds
Viaarxiv icon

DreamLLM: Synergistic Multimodal Comprehension and Creation

Add code
Sep 20, 2023
Figure 1 for DreamLLM: Synergistic Multimodal Comprehension and Creation
Figure 2 for DreamLLM: Synergistic Multimodal Comprehension and Creation
Figure 3 for DreamLLM: Synergistic Multimodal Comprehension and Creation
Figure 4 for DreamLLM: Synergistic Multimodal Comprehension and Creation
Viaarxiv icon

GroupLane: End-to-End 3D Lane Detection with Channel-wise Grouping

Add code
Jul 18, 2023
Figure 1 for GroupLane: End-to-End 3D Lane Detection with Channel-wise Grouping
Figure 2 for GroupLane: End-to-End 3D Lane Detection with Channel-wise Grouping
Figure 3 for GroupLane: End-to-End 3D Lane Detection with Channel-wise Grouping
Figure 4 for GroupLane: End-to-End 3D Lane Detection with Channel-wise Grouping
Viaarxiv icon

ChatSpot: Bootstrapping Multimodal LLMs via Precise Referring Instruction Tuning

Add code
Jul 18, 2023
Figure 1 for ChatSpot: Bootstrapping Multimodal LLMs via Precise Referring Instruction Tuning
Figure 2 for ChatSpot: Bootstrapping Multimodal LLMs via Precise Referring Instruction Tuning
Figure 3 for ChatSpot: Bootstrapping Multimodal LLMs via Precise Referring Instruction Tuning
Figure 4 for ChatSpot: Bootstrapping Multimodal LLMs via Precise Referring Instruction Tuning
Viaarxiv icon

GMM: Delving into Gradient Aware and Model Perceive Depth Mining for Monocular 3D Detection

Add code
Jun 30, 2023
Figure 1 for GMM: Delving into Gradient Aware and Model Perceive Depth Mining for Monocular 3D Detection
Figure 2 for GMM: Delving into Gradient Aware and Model Perceive Depth Mining for Monocular 3D Detection
Figure 3 for GMM: Delving into Gradient Aware and Model Perceive Depth Mining for Monocular 3D Detection
Figure 4 for GMM: Delving into Gradient Aware and Model Perceive Depth Mining for Monocular 3D Detection
Viaarxiv icon

BEVStereo++: Accurate Depth Estimation in Multi-view 3D Object Detection via Dynamic Temporal Stereo

Add code
Apr 09, 2023
Figure 1 for BEVStereo++: Accurate Depth Estimation in Multi-view 3D Object Detection via Dynamic Temporal Stereo
Figure 2 for BEVStereo++: Accurate Depth Estimation in Multi-view 3D Object Detection via Dynamic Temporal Stereo
Figure 3 for BEVStereo++: Accurate Depth Estimation in Multi-view 3D Object Detection via Dynamic Temporal Stereo
Figure 4 for BEVStereo++: Accurate Depth Estimation in Multi-view 3D Object Detection via Dynamic Temporal Stereo
Viaarxiv icon

Exploring Recurrent Long-term Temporal Fusion for Multi-view 3D Perception

Add code
Mar 13, 2023
Figure 1 for Exploring Recurrent Long-term Temporal Fusion for Multi-view 3D Perception
Figure 2 for Exploring Recurrent Long-term Temporal Fusion for Multi-view 3D Perception
Figure 3 for Exploring Recurrent Long-term Temporal Fusion for Multi-view 3D Perception
Figure 4 for Exploring Recurrent Long-term Temporal Fusion for Multi-view 3D Perception
Viaarxiv icon

Generalizing Multiple Object Tracking to Unseen Domains by Introducing Natural Language Representation

Add code
Dec 03, 2022
Figure 1 for Generalizing Multiple Object Tracking to Unseen Domains by Introducing Natural Language Representation
Figure 2 for Generalizing Multiple Object Tracking to Unseen Domains by Introducing Natural Language Representation
Figure 3 for Generalizing Multiple Object Tracking to Unseen Domains by Introducing Natural Language Representation
Figure 4 for Generalizing Multiple Object Tracking to Unseen Domains by Introducing Natural Language Representation
Viaarxiv icon