Picture for Cordelia Schmid

Cordelia Schmid

Thoth

M&M Mix: A Multimodal Multiview Transformer Ensemble

Add code
Jun 20, 2022
Figure 1 for M&M Mix: A Multimodal Multiview Transformer Ensemble
Figure 2 for M&M Mix: A Multimodal Multiview Transformer Ensemble
Figure 3 for M&M Mix: A Multimodal Multiview Transformer Ensemble
Figure 4 for M&M Mix: A Multimodal Multiview Transformer Ensemble
Viaarxiv icon

Zero-Shot Video Question Answering via Frozen Bidirectional Language Models

Add code
Jun 16, 2022
Figure 1 for Zero-Shot Video Question Answering via Frozen Bidirectional Language Models
Figure 2 for Zero-Shot Video Question Answering via Frozen Bidirectional Language Models
Figure 3 for Zero-Shot Video Question Answering via Frozen Bidirectional Language Models
Figure 4 for Zero-Shot Video Question Answering via Frozen Bidirectional Language Models
Viaarxiv icon

AVATAR: Unconstrained Audiovisual Speech Recognition

Add code
Jun 15, 2022
Figure 1 for AVATAR: Unconstrained Audiovisual Speech Recognition
Figure 2 for AVATAR: Unconstrained Audiovisual Speech Recognition
Figure 3 for AVATAR: Unconstrained Audiovisual Speech Recognition
Figure 4 for AVATAR: Unconstrained Audiovisual Speech Recognition
Viaarxiv icon

Weakly-supervised segmentation of referring expressions

Add code
May 12, 2022
Figure 1 for Weakly-supervised segmentation of referring expressions
Figure 2 for Weakly-supervised segmentation of referring expressions
Figure 3 for Weakly-supervised segmentation of referring expressions
Figure 4 for Weakly-supervised segmentation of referring expressions
Viaarxiv icon

Learning to Answer Visual Questions from Web Videos

Add code
May 11, 2022
Figure 1 for Learning to Answer Visual Questions from Web Videos
Figure 2 for Learning to Answer Visual Questions from Web Videos
Figure 3 for Learning to Answer Visual Questions from Web Videos
Figure 4 for Learning to Answer Visual Questions from Web Videos
Viaarxiv icon

Assembly Planning from Observations under Physical Constraints

Add code
Apr 20, 2022
Figure 1 for Assembly Planning from Observations under Physical Constraints
Figure 2 for Assembly Planning from Observations under Physical Constraints
Figure 3 for Assembly Planning from Observations under Physical Constraints
Figure 4 for Assembly Planning from Observations under Physical Constraints
Viaarxiv icon

Learning Audio-Video Modalities from Image Captions

Add code
Apr 01, 2022
Figure 1 for Learning Audio-Video Modalities from Image Captions
Figure 2 for Learning Audio-Video Modalities from Image Captions
Figure 3 for Learning Audio-Video Modalities from Image Captions
Figure 4 for Learning Audio-Video Modalities from Image Captions
Viaarxiv icon

TubeDETR: Spatio-Temporal Video Grounding with Transformers

Add code
Mar 30, 2022
Figure 1 for TubeDETR: Spatio-Temporal Video Grounding with Transformers
Figure 2 for TubeDETR: Spatio-Temporal Video Grounding with Transformers
Figure 3 for TubeDETR: Spatio-Temporal Video Grounding with Transformers
Figure 4 for TubeDETR: Spatio-Temporal Video Grounding with Transformers
Viaarxiv icon

Leveraging Randomized Smoothing for Optimal Control of Nonsmooth Dynamical Systems

Add code
Mar 11, 2022
Figure 1 for Leveraging Randomized Smoothing for Optimal Control of Nonsmooth Dynamical Systems
Figure 2 for Leveraging Randomized Smoothing for Optimal Control of Nonsmooth Dynamical Systems
Figure 3 for Leveraging Randomized Smoothing for Optimal Control of Nonsmooth Dynamical Systems
Figure 4 for Leveraging Randomized Smoothing for Optimal Control of Nonsmooth Dynamical Systems
Viaarxiv icon

The Right Spin: Learning Object Motion from Rotation-Compensated Flow Fields

Add code
Feb 28, 2022
Figure 1 for The Right Spin: Learning Object Motion from Rotation-Compensated Flow Fields
Figure 2 for The Right Spin: Learning Object Motion from Rotation-Compensated Flow Fields
Figure 3 for The Right Spin: Learning Object Motion from Rotation-Compensated Flow Fields
Figure 4 for The Right Spin: Learning Object Motion from Rotation-Compensated Flow Fields
Viaarxiv icon