Picture for James Glass

James Glass

MIT Computer Science and Artificial Intelligence Laboratory, MA, USA

What, when, and where? -- Self-Supervised Spatio-Temporal Grounding in Untrimmed Multi-Action Videos from Narrated Instructions

Add code
Mar 29, 2023
Viaarxiv icon

Logic Against Bias: Textual Entailment Mitigates Stereotypical Sentence Reasoning

Add code
Mar 10, 2023
Viaarxiv icon

On the Blind Spots of Model-Based Evaluation Metrics for Text Generation

Add code
Dec 20, 2022
Figure 1 for On the Blind Spots of Model-Based Evaluation Metrics for Text Generation
Figure 2 for On the Blind Spots of Model-Based Evaluation Metrics for Text Generation
Figure 3 for On the Blind Spots of Model-Based Evaluation Metrics for Text Generation
Figure 4 for On the Blind Spots of Model-Based Evaluation Metrics for Text Generation
Viaarxiv icon

On Unsupervised Uncertainty-Driven Speech Pseudo-Label Filtering and Model Calibration

Add code
Nov 14, 2022
Figure 1 for On Unsupervised Uncertainty-Driven Speech Pseudo-Label Filtering and Model Calibration
Figure 2 for On Unsupervised Uncertainty-Driven Speech Pseudo-Label Filtering and Model Calibration
Figure 3 for On Unsupervised Uncertainty-Driven Speech Pseudo-Label Filtering and Model Calibration
Figure 4 for On Unsupervised Uncertainty-Driven Speech Pseudo-Label Filtering and Model Calibration
Viaarxiv icon

PCFG-based Natural Language Interface Improves Generalization for Controlled Text Generation

Add code
Oct 14, 2022
Figure 1 for PCFG-based Natural Language Interface Improves Generalization for Controlled Text Generation
Figure 2 for PCFG-based Natural Language Interface Improves Generalization for Controlled Text Generation
Figure 3 for PCFG-based Natural Language Interface Improves Generalization for Controlled Text Generation
Figure 4 for PCFG-based Natural Language Interface Improves Generalization for Controlled Text Generation
Viaarxiv icon

C2KD: Cross-Lingual Cross-Modal Knowledge Distillation for Multilingual Text-Video Retrieval

Add code
Oct 07, 2022
Figure 1 for C2KD: Cross-Lingual Cross-Modal Knowledge Distillation for Multilingual Text-Video Retrieval
Figure 2 for C2KD: Cross-Lingual Cross-Modal Knowledge Distillation for Multilingual Text-Video Retrieval
Figure 3 for C2KD: Cross-Lingual Cross-Modal Knowledge Distillation for Multilingual Text-Video Retrieval
Figure 4 for C2KD: Cross-Lingual Cross-Modal Knowledge Distillation for Multilingual Text-Video Retrieval
Viaarxiv icon

UAVM: A Unified Model for Audio-Visual Learning

Add code
Jul 29, 2022
Figure 1 for UAVM: A Unified Model for Audio-Visual Learning
Figure 2 for UAVM: A Unified Model for Audio-Visual Learning
Figure 3 for UAVM: A Unified Model for Audio-Visual Learning
Figure 4 for UAVM: A Unified Model for Audio-Visual Learning
Viaarxiv icon

Developing a Series of AI Challenges for the United States Department of the Air Force

Add code
Jul 14, 2022
Figure 1 for Developing a Series of AI Challenges for the United States Department of the Air Force
Figure 2 for Developing a Series of AI Challenges for the United States Department of the Air Force
Figure 3 for Developing a Series of AI Challenges for the United States Department of the Air Force
Figure 4 for Developing a Series of AI Challenges for the United States Department of the Air Force
Viaarxiv icon

SAMU-XLSR: Semantically-Aligned Multimodal Utterance-level Cross-Lingual Speech Representation

Add code
May 17, 2022
Figure 1 for SAMU-XLSR: Semantically-Aligned Multimodal Utterance-level Cross-Lingual Speech Representation
Figure 2 for SAMU-XLSR: Semantically-Aligned Multimodal Utterance-level Cross-Lingual Speech Representation
Figure 3 for SAMU-XLSR: Semantically-Aligned Multimodal Utterance-level Cross-Lingual Speech Representation
Figure 4 for SAMU-XLSR: Semantically-Aligned Multimodal Utterance-level Cross-Lingual Speech Representation
Viaarxiv icon

Vocalsound: A Dataset for Improving Human Vocal Sounds Recognition

Add code
May 06, 2022
Figure 1 for Vocalsound: A Dataset for Improving Human Vocal Sounds Recognition
Figure 2 for Vocalsound: A Dataset for Improving Human Vocal Sounds Recognition
Figure 3 for Vocalsound: A Dataset for Improving Human Vocal Sounds Recognition
Figure 4 for Vocalsound: A Dataset for Improving Human Vocal Sounds Recognition
Viaarxiv icon