Picture for Cordelia Schmid

Cordelia Schmid

Thoth

CURVE: A Benchmark for Cultural and Multilingual Long Video Reasoning

Add code
Jan 15, 2026
Viaarxiv icon

MetricNet: Recovering Metric Scale in Generative Navigation Policies

Add code
Sep 17, 2025
Viaarxiv icon

CAViAR: Critic-Augmented Video Agentic Reasoning

Add code
Sep 09, 2025
Viaarxiv icon

VoCap: Video Object Captioning and Segmentation from Any Prompt

Add code
Aug 29, 2025
Viaarxiv icon

Gondola: Grounded Vision Language Planning for Generalizable Robotic Manipulation

Add code
Jun 12, 2025
Viaarxiv icon

ComposeAnything: Composite Object Priors for Text-to-Image Generation

Add code
May 30, 2025
Figure 1 for ComposeAnything: Composite Object Priors for Text-to-Image Generation
Figure 2 for ComposeAnything: Composite Object Priors for Text-to-Image Generation
Figure 3 for ComposeAnything: Composite Object Priors for Text-to-Image Generation
Figure 4 for ComposeAnything: Composite Object Priors for Text-to-Image Generation
Viaarxiv icon

LoFT: LoRA-fused Training Dataset Generation with Few-shot Guidance

Add code
May 16, 2025
Viaarxiv icon

Feasibility with Language Models for Open-World Compositional Zero-Shot Learning

Add code
May 16, 2025
Viaarxiv icon

MINERVA: Evaluating Complex Video Reasoning

Add code
May 01, 2025
Viaarxiv icon

Memory-Modular Classification: Learning to Generalize with Memory Replacement

Add code
Apr 08, 2025
Viaarxiv icon