Picture for James Glass

James Glass

MIT Computer Science and Artificial Intelligence Laboratory, MA, USA

Transformer-Based Multi-Aspect Multi-Granularity Non-Native English Speaker Pronunciation Assessment

Add code
May 06, 2022
Figure 1 for Transformer-Based Multi-Aspect Multi-Granularity Non-Native English Speaker Pronunciation Assessment
Figure 2 for Transformer-Based Multi-Aspect Multi-Granularity Non-Native English Speaker Pronunciation Assessment
Figure 3 for Transformer-Based Multi-Aspect Multi-Granularity Non-Native English Speaker Pronunciation Assessment
Figure 4 for Transformer-Based Multi-Aspect Multi-Granularity Non-Native English Speaker Pronunciation Assessment
Viaarxiv icon

DiffCSE: Difference-based Contrastive Learning for Sentence Embeddings

Add code
Apr 21, 2022
Figure 1 for DiffCSE: Difference-based Contrastive Learning for Sentence Embeddings
Figure 2 for DiffCSE: Difference-based Contrastive Learning for Sentence Embeddings
Figure 3 for DiffCSE: Difference-based Contrastive Learning for Sentence Embeddings
Figure 4 for DiffCSE: Difference-based Contrastive Learning for Sentence Embeddings
Viaarxiv icon

Simple and Effective Unsupervised Speech Synthesis

Add code
Apr 20, 2022
Figure 1 for Simple and Effective Unsupervised Speech Synthesis
Figure 2 for Simple and Effective Unsupervised Speech Synthesis
Figure 3 for Simple and Effective Unsupervised Speech Synthesis
Figure 4 for Simple and Effective Unsupervised Speech Synthesis
Viaarxiv icon

CMKD: CNN/Transformer-Based Cross-Model Knowledge Distillation for Audio Classification

Add code
Mar 13, 2022
Figure 1 for CMKD: CNN/Transformer-Based Cross-Model Knowledge Distillation for Audio Classification
Figure 2 for CMKD: CNN/Transformer-Based Cross-Model Knowledge Distillation for Audio Classification
Figure 3 for CMKD: CNN/Transformer-Based Cross-Model Knowledge Distillation for Audio Classification
Figure 4 for CMKD: CNN/Transformer-Based Cross-Model Knowledge Distillation for Audio Classification
Viaarxiv icon

Controlling the Focus of Pretrained Language Generation Models

Add code
Mar 02, 2022
Figure 1 for Controlling the Focus of Pretrained Language Generation Models
Figure 2 for Controlling the Focus of Pretrained Language Generation Models
Figure 3 for Controlling the Focus of Pretrained Language Generation Models
Figure 4 for Controlling the Focus of Pretrained Language Generation Models
Viaarxiv icon

Everything at Once -- Multi-modal Fusion Transformer for Video Retrieval

Add code
Dec 08, 2021
Figure 1 for Everything at Once -- Multi-modal Fusion Transformer for Video Retrieval
Figure 2 for Everything at Once -- Multi-modal Fusion Transformer for Video Retrieval
Figure 3 for Everything at Once -- Multi-modal Fusion Transformer for Video Retrieval
Figure 4 for Everything at Once -- Multi-modal Fusion Transformer for Video Retrieval
Viaarxiv icon

Routing with Self-Attention for Multimodal Capsule Networks

Add code
Dec 01, 2021
Figure 1 for Routing with Self-Attention for Multimodal Capsule Networks
Figure 2 for Routing with Self-Attention for Multimodal Capsule Networks
Figure 3 for Routing with Self-Attention for Multimodal Capsule Networks
Figure 4 for Routing with Self-Attention for Multimodal Capsule Networks
Viaarxiv icon

Cascaded Multilingual Audio-Visual Learning from Videos

Add code
Nov 08, 2021
Figure 1 for Cascaded Multilingual Audio-Visual Learning from Videos
Figure 2 for Cascaded Multilingual Audio-Visual Learning from Videos
Figure 3 for Cascaded Multilingual Audio-Visual Learning from Videos
Figure 4 for Cascaded Multilingual Audio-Visual Learning from Videos
Viaarxiv icon

SSAST: Self-Supervised Audio Spectrogram Transformer

Add code
Oct 19, 2021
Figure 1 for SSAST: Self-Supervised Audio Spectrogram Transformer
Figure 2 for SSAST: Self-Supervised Audio Spectrogram Transformer
Figure 3 for SSAST: Self-Supervised Audio Spectrogram Transformer
Figure 4 for SSAST: Self-Supervised Audio Spectrogram Transformer
Viaarxiv icon

Spoken ObjectNet: A Bias-Controlled Spoken Caption Dataset

Add code
Oct 14, 2021
Figure 1 for Spoken ObjectNet: A Bias-Controlled Spoken Caption Dataset
Figure 2 for Spoken ObjectNet: A Bias-Controlled Spoken Caption Dataset
Figure 3 for Spoken ObjectNet: A Bias-Controlled Spoken Caption Dataset
Figure 4 for Spoken ObjectNet: A Bias-Controlled Spoken Caption Dataset
Viaarxiv icon