Picture for Yingjie Cai

Yingjie Cai

DualCoT-VLA: Visual-Linguistic Chain of Thought via Parallel Reasoning for Vision-Language-Action Models

Add code
Mar 23, 2026
Viaarxiv icon

Lean Learning Beyond Clouds: Efficient Discrepancy-Conditioned Optical-SAR Fusion for Semantic Segmentation

Add code
Mar 21, 2026
Viaarxiv icon

S-VAM: Shortcut Video-Action Model by Self-Distilling Geometric and Semantic Foresight

Add code
Mar 17, 2026
Viaarxiv icon

Occ-LLM: Enhancing Autonomous Driving with Occupancy-Based Large Language Models

Add code
Feb 10, 2025
Viaarxiv icon

VisionPAD: A Vision-Centric Pre-training Paradigm for Autonomous Driving

Add code
Nov 22, 2024
Figure 1 for VisionPAD: A Vision-Centric Pre-training Paradigm for Autonomous Driving
Figure 2 for VisionPAD: A Vision-Centric Pre-training Paradigm for Autonomous Driving
Figure 3 for VisionPAD: A Vision-Centric Pre-training Paradigm for Autonomous Driving
Figure 4 for VisionPAD: A Vision-Centric Pre-training Paradigm for Autonomous Driving
Viaarxiv icon

OmniBooth: Learning Latent Control for Image Synthesis with Multi-modal Instruction

Add code
Oct 07, 2024
Figure 1 for OmniBooth: Learning Latent Control for Image Synthesis with Multi-modal Instruction
Figure 2 for OmniBooth: Learning Latent Control for Image Synthesis with Multi-modal Instruction
Figure 3 for OmniBooth: Learning Latent Control for Image Synthesis with Multi-modal Instruction
Figure 4 for OmniBooth: Learning Latent Control for Image Synthesis with Multi-modal Instruction
Viaarxiv icon

DisEnvisioner: Disentangled and Enriched Visual Prompt for Customized Image Generation

Add code
Oct 02, 2024
Figure 1 for DisEnvisioner: Disentangled and Enriched Visual Prompt for Customized Image Generation
Figure 2 for DisEnvisioner: Disentangled and Enriched Visual Prompt for Customized Image Generation
Figure 3 for DisEnvisioner: Disentangled and Enriched Visual Prompt for Customized Image Generation
Figure 4 for DisEnvisioner: Disentangled and Enriched Visual Prompt for Customized Image Generation
Viaarxiv icon

DetDiffusion: Synergizing Generative and Perceptive Models for Enhanced Data Generation and Perception

Add code
Mar 20, 2024
Figure 1 for DetDiffusion: Synergizing Generative and Perceptive Models for Enhanced Data Generation and Perception
Figure 2 for DetDiffusion: Synergizing Generative and Perceptive Models for Enhanced Data Generation and Perception
Figure 3 for DetDiffusion: Synergizing Generative and Perceptive Models for Enhanced Data Generation and Perception
Figure 4 for DetDiffusion: Synergizing Generative and Perceptive Models for Enhanced Data Generation and Perception
Viaarxiv icon

Forging Vision Foundation Models for Autonomous Driving: Challenges, Methodologies, and Opportunities

Add code
Jan 16, 2024
Figure 1 for Forging Vision Foundation Models for Autonomous Driving: Challenges, Methodologies, and Opportunities
Figure 2 for Forging Vision Foundation Models for Autonomous Driving: Challenges, Methodologies, and Opportunities
Figure 3 for Forging Vision Foundation Models for Autonomous Driving: Challenges, Methodologies, and Opportunities
Figure 4 for Forging Vision Foundation Models for Autonomous Driving: Challenges, Methodologies, and Opportunities
Viaarxiv icon

NDC-Scene: Boost Monocular 3D Semantic Scene Completion in Normalized Device Coordinates Space

Add code
Sep 27, 2023
Viaarxiv icon