Picture for Guolei Sun

Guolei Sun

equal contribution

Video Understanding: From Geometry and Semantics to Unified Models

Add code
Mar 18, 2026
Viaarxiv icon

EgoSound: Benchmarking Sound Understanding in Egocentric Videos

Add code
Feb 15, 2026
Viaarxiv icon

DINO-Mix: Distilling Foundational Knowledge with Cross-Domain CutMix for Semi-supervised Class-imbalanced Medical Image Segmentation

Add code
Feb 08, 2026
Viaarxiv icon

Revisiting Adaptive Rounding with Vectorized Reparameterization for LLM Quantization

Add code
Feb 02, 2026
Viaarxiv icon

HiM2SAM: Enhancing SAM2 with Hierarchical Motion Estimation and Memory Optimization towards Long-term Tracking

Add code
Jul 10, 2025
Figure 1 for HiM2SAM: Enhancing SAM2 with Hierarchical Motion Estimation and Memory Optimization towards Long-term Tracking
Figure 2 for HiM2SAM: Enhancing SAM2 with Hierarchical Motion Estimation and Memory Optimization towards Long-term Tracking
Figure 3 for HiM2SAM: Enhancing SAM2 with Hierarchical Motion Estimation and Memory Optimization towards Long-term Tracking
Figure 4 for HiM2SAM: Enhancing SAM2 with Hierarchical Motion Estimation and Memory Optimization towards Long-term Tracking
Viaarxiv icon

A Comprehensive Survey on Video Scene Parsing:Advances, Challenges, and Prospects

Add code
Jun 16, 2025
Figure 1 for A Comprehensive Survey on Video Scene Parsing:Advances, Challenges, and Prospects
Figure 2 for A Comprehensive Survey on Video Scene Parsing:Advances, Challenges, and Prospects
Figure 3 for A Comprehensive Survey on Video Scene Parsing:Advances, Challenges, and Prospects
Figure 4 for A Comprehensive Survey on Video Scene Parsing:Advances, Challenges, and Prospects
Viaarxiv icon

CamSAM2: Segment Anything Accurately in Camouflaged Videos

Add code
Mar 26, 2025
Viaarxiv icon

Exploiting Temporal State Space Sharing for Video Semantic Segmentation

Add code
Mar 26, 2025
Viaarxiv icon

Generalized Few-shot 3D Point Cloud Segmentation with Vision-Language Model

Add code
Mar 20, 2025
Figure 1 for Generalized Few-shot 3D Point Cloud Segmentation with Vision-Language Model
Figure 2 for Generalized Few-shot 3D Point Cloud Segmentation with Vision-Language Model
Figure 3 for Generalized Few-shot 3D Point Cloud Segmentation with Vision-Language Model
Figure 4 for Generalized Few-shot 3D Point Cloud Segmentation with Vision-Language Model
Viaarxiv icon

SAM-Aware Graph Prompt Reasoning Network for Cross-Domain Few-Shot Segmentation

Add code
Dec 31, 2024
Figure 1 for SAM-Aware Graph Prompt Reasoning Network for Cross-Domain Few-Shot Segmentation
Figure 2 for SAM-Aware Graph Prompt Reasoning Network for Cross-Domain Few-Shot Segmentation
Figure 3 for SAM-Aware Graph Prompt Reasoning Network for Cross-Domain Few-Shot Segmentation
Figure 4 for SAM-Aware Graph Prompt Reasoning Network for Cross-Domain Few-Shot Segmentation
Viaarxiv icon