Picture for Nicu Sebe

Nicu Sebe

Multimodal Spatial Reasoning in the Large Model Era: A Survey and Benchmarks

Add code
Oct 29, 2025
Viaarxiv icon

Riemannian Batch Normalization: A Gyro Approach

Add code
Sep 08, 2025
Viaarxiv icon

H$_{2}$OT: Hierarchical Hourglass Tokenizer for Efficient Video Pose Transformers

Add code
Sep 08, 2025
Figure 1 for H$_{2}$OT: Hierarchical Hourglass Tokenizer for Efficient Video Pose Transformers
Figure 2 for H$_{2}$OT: Hierarchical Hourglass Tokenizer for Efficient Video Pose Transformers
Figure 3 for H$_{2}$OT: Hierarchical Hourglass Tokenizer for Efficient Video Pose Transformers
Figure 4 for H$_{2}$OT: Hierarchical Hourglass Tokenizer for Efficient Video Pose Transformers
Viaarxiv icon

Organ-Agents: Virtual Human Physiology Simulator via LLMs

Add code
Aug 20, 2025
Viaarxiv icon

Hierarchical Visual Prompt Learning for Continual Video Instance Segmentation

Add code
Aug 12, 2025
Figure 1 for Hierarchical Visual Prompt Learning for Continual Video Instance Segmentation
Figure 2 for Hierarchical Visual Prompt Learning for Continual Video Instance Segmentation
Figure 3 for Hierarchical Visual Prompt Learning for Continual Video Instance Segmentation
Figure 4 for Hierarchical Visual Prompt Learning for Continual Video Instance Segmentation
Viaarxiv icon

Masked Clustering Prediction for Unsupervised Point Cloud Pre-training

Add code
Aug 12, 2025
Figure 1 for Masked Clustering Prediction for Unsupervised Point Cloud Pre-training
Figure 2 for Masked Clustering Prediction for Unsupervised Point Cloud Pre-training
Figure 3 for Masked Clustering Prediction for Unsupervised Point Cloud Pre-training
Figure 4 for Masked Clustering Prediction for Unsupervised Point Cloud Pre-training
Viaarxiv icon

Spatial-Temporal Graph Mamba for Music-Guided Dance Video Synthesis

Add code
Jul 09, 2025
Figure 1 for Spatial-Temporal Graph Mamba for Music-Guided Dance Video Synthesis
Figure 2 for Spatial-Temporal Graph Mamba for Music-Guided Dance Video Synthesis
Figure 3 for Spatial-Temporal Graph Mamba for Music-Guided Dance Video Synthesis
Figure 4 for Spatial-Temporal Graph Mamba for Music-Guided Dance Video Synthesis
Viaarxiv icon

Orthogonal Projection Subspace to Aggregate Online Prior-knowledge for Continual Test-time Adaptation

Add code
Jun 23, 2025
Figure 1 for Orthogonal Projection Subspace to Aggregate Online Prior-knowledge for Continual Test-time Adaptation
Figure 2 for Orthogonal Projection Subspace to Aggregate Online Prior-knowledge for Continual Test-time Adaptation
Figure 3 for Orthogonal Projection Subspace to Aggregate Online Prior-knowledge for Continual Test-time Adaptation
Figure 4 for Orthogonal Projection Subspace to Aggregate Online Prior-knowledge for Continual Test-time Adaptation
Viaarxiv icon

SceneSplat++: A Large Dataset and Comprehensive Benchmark for Language Gaussian Splatting

Add code
Jun 10, 2025
Viaarxiv icon

When Semantics Mislead Vision: Mitigating Large Multimodal Models Hallucinations in Scene Text Spotting and Understanding

Add code
Jun 05, 2025
Figure 1 for When Semantics Mislead Vision: Mitigating Large Multimodal Models Hallucinations in Scene Text Spotting and Understanding
Figure 2 for When Semantics Mislead Vision: Mitigating Large Multimodal Models Hallucinations in Scene Text Spotting and Understanding
Figure 3 for When Semantics Mislead Vision: Mitigating Large Multimodal Models Hallucinations in Scene Text Spotting and Understanding
Figure 4 for When Semantics Mislead Vision: Mitigating Large Multimodal Models Hallucinations in Scene Text Spotting and Understanding
Viaarxiv icon