Picture for Roger Zimmermann

Roger Zimmermann

Uni3D-MoE: Scalable Multimodal 3D Scene Understanding via Mixture of Experts

Add code
May 27, 2025
Viaarxiv icon

JointDistill: Adaptive Multi-Task Distillation for Joint Depth Estimation and Scene Segmentation

Add code
May 15, 2025
Viaarxiv icon

OpenAVS: Training-Free Open-Vocabulary Audio Visual Segmentation with Foundational Models

Add code
Apr 30, 2025
Viaarxiv icon

Reimagining Urban Science: Scaling Causal Inference with Large Language Models

Add code
Apr 15, 2025
Viaarxiv icon

TAIL: Text-Audio Incremental Learning

Add code
Mar 06, 2025
Viaarxiv icon

Facilitate Collaboration between Large Language Model and Task-specific Model for Time Series Anomaly Detection

Add code
Jan 10, 2025
Viaarxiv icon

Improving Multimodal LLMs Ability In Geometry Problem Solving, Reasoning, And Multistep Scoring

Add code
Dec 01, 2024
Viaarxiv icon

Moirai-MoE: Empowering Time Series Foundation Models with Sparse Mixture of Experts

Add code
Oct 14, 2024
Figure 1 for Moirai-MoE: Empowering Time Series Foundation Models with Sparse Mixture of Experts
Figure 2 for Moirai-MoE: Empowering Time Series Foundation Models with Sparse Mixture of Experts
Figure 3 for Moirai-MoE: Empowering Time Series Foundation Models with Sparse Mixture of Experts
Figure 4 for Moirai-MoE: Empowering Time Series Foundation Models with Sparse Mixture of Experts
Viaarxiv icon

Manifold-Aware Local Feature Modeling for Semi-Supervised Medical Image Segmentation

Add code
Oct 14, 2024
Viaarxiv icon

Grounding is All You Need? Dual Temporal Grounding for Video Dialog

Add code
Oct 08, 2024
Figure 1 for Grounding is All You Need? Dual Temporal Grounding for Video Dialog
Figure 2 for Grounding is All You Need? Dual Temporal Grounding for Video Dialog
Figure 3 for Grounding is All You Need? Dual Temporal Grounding for Video Dialog
Figure 4 for Grounding is All You Need? Dual Temporal Grounding for Video Dialog
Viaarxiv icon