Alert button
Picture for Andrew Rouditchenko

Andrew Rouditchenko

Alert button

AV-CPL: Continuous Pseudo-Labeling for Audio-Visual Speech Recognition

Sep 29, 2023
Andrew Rouditchenko, Ronan Collobert, Tatiana Likhomanenko

Figure 1 for AV-CPL: Continuous Pseudo-Labeling for Audio-Visual Speech Recognition
Figure 2 for AV-CPL: Continuous Pseudo-Labeling for Audio-Visual Speech Recognition
Figure 3 for AV-CPL: Continuous Pseudo-Labeling for Audio-Visual Speech Recognition
Figure 4 for AV-CPL: Continuous Pseudo-Labeling for Audio-Visual Speech Recognition
Viaarxiv icon

Comparison of Multilingual Self-Supervised and Weakly-Supervised Speech Pre-Training for Adaptation to Unseen Languages

May 21, 2023
Andrew Rouditchenko, Sameer Khurana, Samuel Thomas, Rogerio Feris, Leonid Karlinsky, Hilde Kuehne, David Harwath, Brian Kingsbury, James Glass

Figure 1 for Comparison of Multilingual Self-Supervised and Weakly-Supervised Speech Pre-Training for Adaptation to Unseen Languages
Figure 2 for Comparison of Multilingual Self-Supervised and Weakly-Supervised Speech Pre-Training for Adaptation to Unseen Languages
Figure 3 for Comparison of Multilingual Self-Supervised and Weakly-Supervised Speech Pre-Training for Adaptation to Unseen Languages
Viaarxiv icon

What, when, and where? -- Self-Supervised Spatio-Temporal Grounding in Untrimmed Multi-Action Videos from Narrated Instructions

Mar 29, 2023
Brian Chen, Nina Shvetsova, Andrew Rouditchenko, Daniel Kondermann, Samuel Thomas, Shih-Fu Chang, Rogerio Feris, James Glass, Hilde Kuehne

Figure 1 for What, when, and where? -- Self-Supervised Spatio-Temporal Grounding in Untrimmed Multi-Action Videos from Narrated Instructions
Figure 2 for What, when, and where? -- Self-Supervised Spatio-Temporal Grounding in Untrimmed Multi-Action Videos from Narrated Instructions
Figure 3 for What, when, and where? -- Self-Supervised Spatio-Temporal Grounding in Untrimmed Multi-Action Videos from Narrated Instructions
Figure 4 for What, when, and where? -- Self-Supervised Spatio-Temporal Grounding in Untrimmed Multi-Action Videos from Narrated Instructions
Viaarxiv icon

C2KD: Cross-Lingual Cross-Modal Knowledge Distillation for Multilingual Text-Video Retrieval

Oct 07, 2022
Andrew Rouditchenko, Yung-Sung Chuang, Nina Shvetsova, Samuel Thomas, Rogerio Feris, Brian Kingsbury, Leonid Karlinsky, David Harwath, Hilde Kuehne, James Glass

Figure 1 for C2KD: Cross-Lingual Cross-Modal Knowledge Distillation for Multilingual Text-Video Retrieval
Figure 2 for C2KD: Cross-Lingual Cross-Modal Knowledge Distillation for Multilingual Text-Video Retrieval
Figure 3 for C2KD: Cross-Lingual Cross-Modal Knowledge Distillation for Multilingual Text-Video Retrieval
Figure 4 for C2KD: Cross-Lingual Cross-Modal Knowledge Distillation for Multilingual Text-Video Retrieval
Viaarxiv icon

UAVM: A Unified Model for Audio-Visual Learning

Jul 29, 2022
Yuan Gong, Alexander H. Liu, Andrew Rouditchenko, James Glass

Figure 1 for UAVM: A Unified Model for Audio-Visual Learning
Figure 2 for UAVM: A Unified Model for Audio-Visual Learning
Figure 3 for UAVM: A Unified Model for Audio-Visual Learning
Figure 4 for UAVM: A Unified Model for Audio-Visual Learning
Viaarxiv icon

CMKD: CNN/Transformer-Based Cross-Model Knowledge Distillation for Audio Classification

Mar 13, 2022
Yuan Gong, Sameer Khurana, Andrew Rouditchenko, James Glass

Figure 1 for CMKD: CNN/Transformer-Based Cross-Model Knowledge Distillation for Audio Classification
Figure 2 for CMKD: CNN/Transformer-Based Cross-Model Knowledge Distillation for Audio Classification
Figure 3 for CMKD: CNN/Transformer-Based Cross-Model Knowledge Distillation for Audio Classification
Figure 4 for CMKD: CNN/Transformer-Based Cross-Model Knowledge Distillation for Audio Classification
Viaarxiv icon

Everything at Once -- Multi-modal Fusion Transformer for Video Retrieval

Dec 08, 2021
Nina Shvetsova, Brian Chen, Andrew Rouditchenko, Samuel Thomas, Brian Kingsbury, Rogerio Feris, David Harwath, James Glass, Hilde Kuehne

Figure 1 for Everything at Once -- Multi-modal Fusion Transformer for Video Retrieval
Figure 2 for Everything at Once -- Multi-modal Fusion Transformer for Video Retrieval
Figure 3 for Everything at Once -- Multi-modal Fusion Transformer for Video Retrieval
Figure 4 for Everything at Once -- Multi-modal Fusion Transformer for Video Retrieval
Viaarxiv icon

Routing with Self-Attention for Multimodal Capsule Networks

Dec 01, 2021
Kevin Duarte, Brian Chen, Nina Shvetsova, Andrew Rouditchenko, Samuel Thomas, Alexander Liu, David Harwath, James Glass, Hilde Kuehne, Mubarak Shah

Figure 1 for Routing with Self-Attention for Multimodal Capsule Networks
Figure 2 for Routing with Self-Attention for Multimodal Capsule Networks
Figure 3 for Routing with Self-Attention for Multimodal Capsule Networks
Figure 4 for Routing with Self-Attention for Multimodal Capsule Networks
Viaarxiv icon

Cascaded Multilingual Audio-Visual Learning from Videos

Nov 08, 2021
Andrew Rouditchenko, Angie Boggust, David Harwath, Samuel Thomas, Hilde Kuehne, Brian Chen, Rameswar Panda, Rogerio Feris, Brian Kingsbury, Michael Picheny, James Glass

Figure 1 for Cascaded Multilingual Audio-Visual Learning from Videos
Figure 2 for Cascaded Multilingual Audio-Visual Learning from Videos
Figure 3 for Cascaded Multilingual Audio-Visual Learning from Videos
Figure 4 for Cascaded Multilingual Audio-Visual Learning from Videos
Viaarxiv icon

Spoken ObjectNet: A Bias-Controlled Spoken Caption Dataset

Oct 14, 2021
Ian Palmer, Andrew Rouditchenko, Andrei Barbu, Boris Katz, James Glass

Figure 1 for Spoken ObjectNet: A Bias-Controlled Spoken Caption Dataset
Figure 2 for Spoken ObjectNet: A Bias-Controlled Spoken Caption Dataset
Figure 3 for Spoken ObjectNet: A Bias-Controlled Spoken Caption Dataset
Figure 4 for Spoken ObjectNet: A Bias-Controlled Spoken Caption Dataset
Viaarxiv icon