Picture for Siyu Zhang

Siyu Zhang

Sparse but not Simpler: A Multi-Level Interpretability Analysis of Vision Transformers

Add code
Mar 16, 2026
Viaarxiv icon

Zero-Forgetting CISS via Dual-Phase Cognitive Cascades

Add code
Mar 14, 2026
Viaarxiv icon

Shape of Thought: Progressive Object Assembly via Visual Chain-of-Thought

Add code
Jan 28, 2026
Viaarxiv icon

Multimodal Interpretation of Remote Sensing Images: Dynamic Resolution Input Strategy and Multi-scale Vision-Language Alignment Mechanism

Add code
Dec 29, 2025
Viaarxiv icon

Toward Faithfulness-guided Ensemble Interpretation of Neural Network

Add code
Sep 04, 2025
Viaarxiv icon

Decoupling Continual Semantic Segmentation

Add code
Aug 07, 2025
Viaarxiv icon

Seaweed-7B: Cost-Effective Training of Video Generation Foundation Model

Add code
Apr 11, 2025
Figure 1 for Seaweed-7B: Cost-Effective Training of Video Generation Foundation Model
Figure 2 for Seaweed-7B: Cost-Effective Training of Video Generation Foundation Model
Figure 3 for Seaweed-7B: Cost-Effective Training of Video Generation Foundation Model
Figure 4 for Seaweed-7B: Cost-Effective Training of Video Generation Foundation Model
Viaarxiv icon

Improving vision-language alignment with graph spiking hybrid Networks

Add code
Jan 31, 2025
Viaarxiv icon

RP-SLAM: Real-time Photorealistic SLAM with Efficient 3D Gaussian Splatting

Add code
Dec 13, 2024
Figure 1 for RP-SLAM: Real-time Photorealistic SLAM with Efficient 3D Gaussian Splatting
Figure 2 for RP-SLAM: Real-time Photorealistic SLAM with Efficient 3D Gaussian Splatting
Figure 3 for RP-SLAM: Real-time Photorealistic SLAM with Efficient 3D Gaussian Splatting
Figure 4 for RP-SLAM: Real-time Photorealistic SLAM with Efficient 3D Gaussian Splatting
Viaarxiv icon

StreamMOTP: Streaming and Unified Framework for Joint 3D Multi-Object Tracking and Trajectory Prediction

Add code
Jun 28, 2024
Viaarxiv icon