Picture for Andrew Rouditchenko

Andrew Rouditchenko

Everything at Once -- Multi-modal Fusion Transformer for Video Retrieval

Add code
Dec 08, 2021
Figure 1 for Everything at Once -- Multi-modal Fusion Transformer for Video Retrieval
Figure 2 for Everything at Once -- Multi-modal Fusion Transformer for Video Retrieval
Figure 3 for Everything at Once -- Multi-modal Fusion Transformer for Video Retrieval
Figure 4 for Everything at Once -- Multi-modal Fusion Transformer for Video Retrieval
Viaarxiv icon

Routing with Self-Attention for Multimodal Capsule Networks

Add code
Dec 01, 2021
Figure 1 for Routing with Self-Attention for Multimodal Capsule Networks
Figure 2 for Routing with Self-Attention for Multimodal Capsule Networks
Figure 3 for Routing with Self-Attention for Multimodal Capsule Networks
Figure 4 for Routing with Self-Attention for Multimodal Capsule Networks
Viaarxiv icon

Cascaded Multilingual Audio-Visual Learning from Videos

Add code
Nov 08, 2021
Figure 1 for Cascaded Multilingual Audio-Visual Learning from Videos
Figure 2 for Cascaded Multilingual Audio-Visual Learning from Videos
Figure 3 for Cascaded Multilingual Audio-Visual Learning from Videos
Figure 4 for Cascaded Multilingual Audio-Visual Learning from Videos
Viaarxiv icon

Spoken ObjectNet: A Bias-Controlled Spoken Caption Dataset

Add code
Oct 14, 2021
Figure 1 for Spoken ObjectNet: A Bias-Controlled Spoken Caption Dataset
Figure 2 for Spoken ObjectNet: A Bias-Controlled Spoken Caption Dataset
Figure 3 for Spoken ObjectNet: A Bias-Controlled Spoken Caption Dataset
Figure 4 for Spoken ObjectNet: A Bias-Controlled Spoken Caption Dataset
Viaarxiv icon

Cross-Modal Discrete Representation Learning

Add code
Jun 10, 2021
Figure 1 for Cross-Modal Discrete Representation Learning
Figure 2 for Cross-Modal Discrete Representation Learning
Figure 3 for Cross-Modal Discrete Representation Learning
Figure 4 for Cross-Modal Discrete Representation Learning
Viaarxiv icon

Multimodal Clustering Networks for Self-supervised Learning from Unlabeled Videos

Add code
May 05, 2021
Figure 1 for Multimodal Clustering Networks for Self-supervised Learning from Unlabeled Videos
Figure 2 for Multimodal Clustering Networks for Self-supervised Learning from Unlabeled Videos
Figure 3 for Multimodal Clustering Networks for Self-supervised Learning from Unlabeled Videos
Figure 4 for Multimodal Clustering Networks for Self-supervised Learning from Unlabeled Videos
Viaarxiv icon

AVLnet: Learning Audio-Visual Language Representations from Instructional Videos

Add code
Jun 16, 2020
Figure 1 for AVLnet: Learning Audio-Visual Language Representations from Instructional Videos
Figure 2 for AVLnet: Learning Audio-Visual Language Representations from Instructional Videos
Figure 3 for AVLnet: Learning Audio-Visual Language Representations from Instructional Videos
Figure 4 for AVLnet: Learning Audio-Visual Language Representations from Instructional Videos
Viaarxiv icon

Label-efficient audio classification through multitask learning and self-supervision

Add code
Oct 19, 2019
Figure 1 for Label-efficient audio classification through multitask learning and self-supervision
Figure 2 for Label-efficient audio classification through multitask learning and self-supervision
Figure 3 for Label-efficient audio classification through multitask learning and self-supervision
Figure 4 for Label-efficient audio classification through multitask learning and self-supervision
Viaarxiv icon

Self-Supervised Audio-Visual Co-Segmentation

Add code
Apr 18, 2019
Figure 1 for Self-Supervised Audio-Visual Co-Segmentation
Figure 2 for Self-Supervised Audio-Visual Co-Segmentation
Figure 3 for Self-Supervised Audio-Visual Co-Segmentation
Figure 4 for Self-Supervised Audio-Visual Co-Segmentation
Viaarxiv icon

The Sound of Pixels

Add code
Oct 14, 2018
Figure 1 for The Sound of Pixels
Figure 2 for The Sound of Pixels
Figure 3 for The Sound of Pixels
Figure 4 for The Sound of Pixels
Viaarxiv icon