Picture for Jifeng Dai

Jifeng Dai

Bounding Box Stability against Feature Dropout Reflects Detector Generalization across Environments

Add code
Mar 20, 2024
Figure 1 for Bounding Box Stability against Feature Dropout Reflects Detector Generalization across Environments
Figure 2 for Bounding Box Stability against Feature Dropout Reflects Detector Generalization across Environments
Figure 3 for Bounding Box Stability against Feature Dropout Reflects Detector Generalization across Environments
Figure 4 for Bounding Box Stability against Feature Dropout Reflects Detector Generalization across Environments
Viaarxiv icon

Vision-RWKV: Efficient and Scalable Visual Perception with RWKV-Like Architectures

Add code
Mar 07, 2024
Viaarxiv icon

The All-Seeing Project V2: Towards General Relation Comprehension of the Open World

Add code
Feb 29, 2024
Figure 1 for The All-Seeing Project V2: Towards General Relation Comprehension of the Open World
Figure 2 for The All-Seeing Project V2: Towards General Relation Comprehension of the Open World
Figure 3 for The All-Seeing Project V2: Towards General Relation Comprehension of the Open World
Figure 4 for The All-Seeing Project V2: Towards General Relation Comprehension of the Open World
Viaarxiv icon

RoboCodeX: Multimodal Code Generation for Robotic Behavior Synthesis

Add code
Feb 25, 2024
Figure 1 for RoboCodeX: Multimodal Code Generation for Robotic Behavior Synthesis
Figure 2 for RoboCodeX: Multimodal Code Generation for Robotic Behavior Synthesis
Figure 3 for RoboCodeX: Multimodal Code Generation for Robotic Behavior Synthesis
Figure 4 for RoboCodeX: Multimodal Code Generation for Robotic Behavior Synthesis
Viaarxiv icon

Motion-I2V: Consistent and Controllable Image-to-Video Generation with Explicit Motion Modeling

Add code
Jan 31, 2024
Figure 1 for Motion-I2V: Consistent and Controllable Image-to-Video Generation with Explicit Motion Modeling
Figure 2 for Motion-I2V: Consistent and Controllable Image-to-Video Generation with Explicit Motion Modeling
Figure 3 for Motion-I2V: Consistent and Controllable Image-to-Video Generation with Explicit Motion Modeling
Figure 4 for Motion-I2V: Consistent and Controllable Image-to-Video Generation with Explicit Motion Modeling
Viaarxiv icon

MM-Interleaved: Interleaved Image-Text Generative Modeling via Multi-modal Feature Synchronizer

Add code
Jan 18, 2024
Figure 1 for MM-Interleaved: Interleaved Image-Text Generative Modeling via Multi-modal Feature Synchronizer
Figure 2 for MM-Interleaved: Interleaved Image-Text Generative Modeling via Multi-modal Feature Synchronizer
Figure 3 for MM-Interleaved: Interleaved Image-Text Generative Modeling via Multi-modal Feature Synchronizer
Figure 4 for MM-Interleaved: Interleaved Image-Text Generative Modeling via Multi-modal Feature Synchronizer
Viaarxiv icon

InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks

Add code
Jan 15, 2024
Viaarxiv icon

Efficient Deformable ConvNets: Rethinking Dynamic and Sparse Operator for Vision Applications

Add code
Jan 11, 2024
Figure 1 for Efficient Deformable ConvNets: Rethinking Dynamic and Sparse Operator for Vision Applications
Figure 2 for Efficient Deformable ConvNets: Rethinking Dynamic and Sparse Operator for Vision Applications
Figure 3 for Efficient Deformable ConvNets: Rethinking Dynamic and Sparse Operator for Vision Applications
Figure 4 for Efficient Deformable ConvNets: Rethinking Dynamic and Sparse Operator for Vision Applications
Viaarxiv icon

A Survey of Reasoning with Foundation Models

Add code
Dec 26, 2023
Figure 1 for A Survey of Reasoning with Foundation Models
Figure 2 for A Survey of Reasoning with Foundation Models
Figure 3 for A Survey of Reasoning with Foundation Models
Figure 4 for A Survey of Reasoning with Foundation Models
Viaarxiv icon

DriveMLM: Aligning Multi-Modal Large Language Models with Behavioral Planning States for Autonomous Driving

Add code
Dec 25, 2023
Figure 1 for DriveMLM: Aligning Multi-Modal Large Language Models with Behavioral Planning States for Autonomous Driving
Figure 2 for DriveMLM: Aligning Multi-Modal Large Language Models with Behavioral Planning States for Autonomous Driving
Figure 3 for DriveMLM: Aligning Multi-Modal Large Language Models with Behavioral Planning States for Autonomous Driving
Figure 4 for DriveMLM: Aligning Multi-Modal Large Language Models with Behavioral Planning States for Autonomous Driving
Viaarxiv icon