Picture for Erik Marchi

Erik Marchi

A Multimodal Approach to Device-Directed Speech Detection with Large Language Models

Add code
Mar 26, 2024
Figure 1 for A Multimodal Approach to Device-Directed Speech Detection with Large Language Models
Figure 2 for A Multimodal Approach to Device-Directed Speech Detection with Large Language Models
Figure 3 for A Multimodal Approach to Device-Directed Speech Detection with Large Language Models
Figure 4 for A Multimodal Approach to Device-Directed Speech Detection with Large Language Models
Viaarxiv icon

Multimodal Data and Resource Efficient Device-Directed Speech Detection with Large Foundation Models

Add code
Dec 06, 2023
Figure 1 for Multimodal Data and Resource Efficient Device-Directed Speech Detection with Large Foundation Models
Figure 2 for Multimodal Data and Resource Efficient Device-Directed Speech Detection with Large Foundation Models
Figure 3 for Multimodal Data and Resource Efficient Device-Directed Speech Detection with Large Foundation Models
Figure 4 for Multimodal Data and Resource Efficient Device-Directed Speech Detection with Large Foundation Models
Viaarxiv icon

Improving Voice Trigger Detection with Metric Learning

Add code
Apr 05, 2022
Figure 1 for Improving Voice Trigger Detection with Metric Learning
Figure 2 for Improving Voice Trigger Detection with Metric Learning
Figure 3 for Improving Voice Trigger Detection with Metric Learning
Figure 4 for Improving Voice Trigger Detection with Metric Learning
Viaarxiv icon

Device-Directed Speech Detection: Regularization via Distillation for Weakly-Supervised Models

Add code
Mar 30, 2022
Figure 1 for Device-Directed Speech Detection: Regularization via Distillation for Weakly-Supervised Models
Figure 2 for Device-Directed Speech Detection: Regularization via Distillation for Weakly-Supervised Models
Figure 3 for Device-Directed Speech Detection: Regularization via Distillation for Weakly-Supervised Models
Figure 4 for Device-Directed Speech Detection: Regularization via Distillation for Weakly-Supervised Models
Viaarxiv icon

CALM: Contrastive Aligned Audio-Language Multirate and Multimodal Representations

Add code
Feb 08, 2022
Figure 1 for CALM: Contrastive Aligned Audio-Language Multirate and Multimodal Representations
Figure 2 for CALM: Contrastive Aligned Audio-Language Multirate and Multimodal Representations
Figure 3 for CALM: Contrastive Aligned Audio-Language Multirate and Multimodal Representations
Figure 4 for CALM: Contrastive Aligned Audio-Language Multirate and Multimodal Representations
Viaarxiv icon

Whispered and Lombard Neural Speech Synthesis

Add code
Jan 13, 2021
Figure 1 for Whispered and Lombard Neural Speech Synthesis
Figure 2 for Whispered and Lombard Neural Speech Synthesis
Figure 3 for Whispered and Lombard Neural Speech Synthesis
Figure 4 for Whispered and Lombard Neural Speech Synthesis
Viaarxiv icon

Progressive Voice Trigger Detection: Accuracy vs Latency

Add code
Oct 29, 2020
Figure 1 for Progressive Voice Trigger Detection: Accuracy vs Latency
Figure 2 for Progressive Voice Trigger Detection: Accuracy vs Latency
Figure 3 for Progressive Voice Trigger Detection: Accuracy vs Latency
Figure 4 for Progressive Voice Trigger Detection: Accuracy vs Latency
Viaarxiv icon

Knowledge Transfer for Efficient On-device False Trigger Mitigation

Add code
Oct 20, 2020
Figure 1 for Knowledge Transfer for Efficient On-device False Trigger Mitigation
Figure 2 for Knowledge Transfer for Efficient On-device False Trigger Mitigation
Figure 3 for Knowledge Transfer for Efficient On-device False Trigger Mitigation
Figure 4 for Knowledge Transfer for Efficient On-device False Trigger Mitigation
Viaarxiv icon

Self-supervised Learning of Visual Speech Features with Audiovisual Speech Enhancement

Add code
May 06, 2020
Figure 1 for Self-supervised Learning of Visual Speech Features with Audiovisual Speech Enhancement
Figure 2 for Self-supervised Learning of Visual Speech Features with Audiovisual Speech Enhancement
Figure 3 for Self-supervised Learning of Visual Speech Features with Audiovisual Speech Enhancement
Figure 4 for Self-supervised Learning of Visual Speech Features with Audiovisual Speech Enhancement
Viaarxiv icon

Generating Multilingual Voices Using Speaker Space Translation Based on Bilingual Speaker Data

Add code
Apr 10, 2020
Figure 1 for Generating Multilingual Voices Using Speaker Space Translation Based on Bilingual Speaker Data
Figure 2 for Generating Multilingual Voices Using Speaker Space Translation Based on Bilingual Speaker Data
Figure 3 for Generating Multilingual Voices Using Speaker Space Translation Based on Bilingual Speaker Data
Figure 4 for Generating Multilingual Voices Using Speaker Space Translation Based on Bilingual Speaker Data
Viaarxiv icon