Picture for Mark Hasegawa-Johnson

Mark Hasegawa-Johnson

Sound Tagging in Infant-centric Home Soundscapes

Add code
Jun 25, 2024
Viaarxiv icon

Towards Unsupervised Speech Recognition Without Pronunciation Models

Add code
Jun 12, 2024
Viaarxiv icon

C-TPT: Calibrated Test-Time Prompt Tuning for Vision-Language Models via Text Feature Dispersion

Add code
Mar 31, 2024
Figure 1 for C-TPT: Calibrated Test-Time Prompt Tuning for Vision-Language Models via Text Feature Dispersion
Figure 2 for C-TPT: Calibrated Test-Time Prompt Tuning for Vision-Language Models via Text Feature Dispersion
Figure 3 for C-TPT: Calibrated Test-Time Prompt Tuning for Vision-Language Models via Text Feature Dispersion
Figure 4 for C-TPT: Calibrated Test-Time Prompt Tuning for Vision-Language Models via Text Feature Dispersion
Viaarxiv icon

AdaMER-CTC: Connectionist Temporal Classification with Adaptive Maximum Entropy Regularization for Automatic Speech Recognition

Add code
Mar 18, 2024
Figure 1 for AdaMER-CTC: Connectionist Temporal Classification with Adaptive Maximum Entropy Regularization for Automatic Speech Recognition
Figure 2 for AdaMER-CTC: Connectionist Temporal Classification with Adaptive Maximum Entropy Regularization for Automatic Speech Recognition
Figure 3 for AdaMER-CTC: Connectionist Temporal Classification with Adaptive Maximum Entropy Regularization for Automatic Speech Recognition
Figure 4 for AdaMER-CTC: Connectionist Temporal Classification with Adaptive Maximum Entropy Regularization for Automatic Speech Recognition
Viaarxiv icon

Analysis of Self-Supervised Speech Models on Children's Speech and Infant Vocalizations

Add code
Feb 10, 2024
Viaarxiv icon

HiFi Tuner: High-Fidelity Subject-Driven Fine-Tuning for Diffusion Models

Add code
Nov 30, 2023
Viaarxiv icon

Unsupervised Speech Recognition with N-Skipgram and Positional Unigram Matching

Add code
Oct 03, 2023
Figure 1 for Unsupervised Speech Recognition with N-Skipgram and Positional Unigram Matching
Figure 2 for Unsupervised Speech Recognition with N-Skipgram and Positional Unigram Matching
Figure 3 for Unsupervised Speech Recognition with N-Skipgram and Positional Unigram Matching
Figure 4 for Unsupervised Speech Recognition with N-Skipgram and Positional Unigram Matching
Viaarxiv icon

Enhancing Child Vocalization Classification in Multi-Channel Child-Adult Conversations Through Wav2vec2 Children ASR Features

Add code
Sep 13, 2023
Viaarxiv icon

Mitigating the Exposure Bias in Sentence-Level Grapheme-to-Phoneme (G2P) Transduction

Add code
Aug 16, 2023
Figure 1 for Mitigating the Exposure Bias in Sentence-Level Grapheme-to-Phoneme (G2P) Transduction
Figure 2 for Mitigating the Exposure Bias in Sentence-Level Grapheme-to-Phoneme (G2P) Transduction
Figure 3 for Mitigating the Exposure Bias in Sentence-Level Grapheme-to-Phoneme (G2P) Transduction
Figure 4 for Mitigating the Exposure Bias in Sentence-Level Grapheme-to-Phoneme (G2P) Transduction
Viaarxiv icon

Classification of Infant Sleep/Wake States: Cross-Attention among Large Scale Pretrained Transformer Networks using Audio, ECG, and IMU Data

Add code
Jun 27, 2023
Figure 1 for Classification of Infant Sleep/Wake States: Cross-Attention among Large Scale Pretrained Transformer Networks using Audio, ECG, and IMU Data
Figure 2 for Classification of Infant Sleep/Wake States: Cross-Attention among Large Scale Pretrained Transformer Networks using Audio, ECG, and IMU Data
Figure 3 for Classification of Infant Sleep/Wake States: Cross-Attention among Large Scale Pretrained Transformer Networks using Audio, ECG, and IMU Data
Figure 4 for Classification of Infant Sleep/Wake States: Cross-Attention among Large Scale Pretrained Transformer Networks using Audio, ECG, and IMU Data
Viaarxiv icon