Picture for Siyu Zhang

Siyu Zhang

Shape of Thought: Progressive Object Assembly via Visual Chain-of-Thought

Add code
Jan 28, 2026
Viaarxiv icon

Multimodal Interpretation of Remote Sensing Images: Dynamic Resolution Input Strategy and Multi-scale Vision-Language Alignment Mechanism

Add code
Dec 29, 2025
Viaarxiv icon

Toward Faithfulness-guided Ensemble Interpretation of Neural Network

Add code
Sep 04, 2025
Viaarxiv icon

Decoupling Continual Semantic Segmentation

Add code
Aug 07, 2025
Viaarxiv icon

Seaweed-7B: Cost-Effective Training of Video Generation Foundation Model

Add code
Apr 11, 2025
Figure 1 for Seaweed-7B: Cost-Effective Training of Video Generation Foundation Model
Figure 2 for Seaweed-7B: Cost-Effective Training of Video Generation Foundation Model
Figure 3 for Seaweed-7B: Cost-Effective Training of Video Generation Foundation Model
Figure 4 for Seaweed-7B: Cost-Effective Training of Video Generation Foundation Model
Viaarxiv icon

Improving vision-language alignment with graph spiking hybrid Networks

Add code
Jan 31, 2025
Viaarxiv icon

RP-SLAM: Real-time Photorealistic SLAM with Efficient 3D Gaussian Splatting

Add code
Dec 13, 2024
Figure 1 for RP-SLAM: Real-time Photorealistic SLAM with Efficient 3D Gaussian Splatting
Figure 2 for RP-SLAM: Real-time Photorealistic SLAM with Efficient 3D Gaussian Splatting
Figure 3 for RP-SLAM: Real-time Photorealistic SLAM with Efficient 3D Gaussian Splatting
Figure 4 for RP-SLAM: Real-time Photorealistic SLAM with Efficient 3D Gaussian Splatting
Viaarxiv icon

StreamMOTP: Streaming and Unified Framework for Joint 3D Multi-Object Tracking and Trajectory Prediction

Add code
Jun 28, 2024
Viaarxiv icon

NeB-SLAM: Neural Blocks-based Salable RGB-D SLAM for Unknown Scenes

Add code
May 24, 2024
Figure 1 for NeB-SLAM: Neural Blocks-based Salable RGB-D SLAM for Unknown Scenes
Figure 2 for NeB-SLAM: Neural Blocks-based Salable RGB-D SLAM for Unknown Scenes
Figure 3 for NeB-SLAM: Neural Blocks-based Salable RGB-D SLAM for Unknown Scenes
Figure 4 for NeB-SLAM: Neural Blocks-based Salable RGB-D SLAM for Unknown Scenes
Viaarxiv icon

SparseAD: Sparse Query-Centric Paradigm for Efficient End-to-End Autonomous Driving

Add code
Apr 10, 2024
Figure 1 for SparseAD: Sparse Query-Centric Paradigm for Efficient End-to-End Autonomous Driving
Figure 2 for SparseAD: Sparse Query-Centric Paradigm for Efficient End-to-End Autonomous Driving
Figure 3 for SparseAD: Sparse Query-Centric Paradigm for Efficient End-to-End Autonomous Driving
Figure 4 for SparseAD: Sparse Query-Centric Paradigm for Efficient End-to-End Autonomous Driving
Viaarxiv icon