Picture for Liangliang Cao

Liangliang Cao

Less is More: Removing Text-regions Improves CLIP Training Efficiency and Robustness

Add code
May 08, 2023
Figure 1 for Less is More: Removing Text-regions Improves CLIP Training Efficiency and Robustness
Figure 2 for Less is More: Removing Text-regions Improves CLIP Training Efficiency and Robustness
Figure 3 for Less is More: Removing Text-regions Improves CLIP Training Efficiency and Robustness
Figure 4 for Less is More: Removing Text-regions Improves CLIP Training Efficiency and Robustness
Viaarxiv icon

STAIR: Learning Sparse Text and Image Representation in Grounded Tokens

Add code
Feb 08, 2023
Viaarxiv icon

Exploiting Category Names for Few-Shot Classification with Vision-Language Models

Add code
Dec 04, 2022
Viaarxiv icon

SurFit: Learning to Fit Surfaces Improves Few Shot Learning on Point Clouds

Add code
Dec 27, 2021
Figure 1 for SurFit: Learning to Fit Surfaces Improves Few Shot Learning on Point Clouds
Figure 2 for SurFit: Learning to Fit Surfaces Improves Few Shot Learning on Point Clouds
Figure 3 for SurFit: Learning to Fit Surfaces Improves Few Shot Learning on Point Clouds
Figure 4 for SurFit: Learning to Fit Surfaces Improves Few Shot Learning on Point Clouds
Viaarxiv icon

Input Length Matters: An Empirical Study Of RNN-T And MWER Training For Long-form Telephony Speech Recognition

Add code
Oct 08, 2021
Figure 1 for Input Length Matters: An Empirical Study Of RNN-T And MWER Training For Long-form Telephony Speech Recognition
Figure 2 for Input Length Matters: An Empirical Study Of RNN-T And MWER Training For Long-form Telephony Speech Recognition
Figure 3 for Input Length Matters: An Empirical Study Of RNN-T And MWER Training For Long-form Telephony Speech Recognition
Figure 4 for Input Length Matters: An Empirical Study Of RNN-T And MWER Training For Long-form Telephony Speech Recognition
Viaarxiv icon

Improving Confidence Estimation on Out-of-Domain Data for End-to-End Speech Recognition

Add code
Oct 07, 2021
Figure 1 for Improving Confidence Estimation on Out-of-Domain Data for End-to-End Speech Recognition
Figure 2 for Improving Confidence Estimation on Out-of-Domain Data for End-to-End Speech Recognition
Figure 3 for Improving Confidence Estimation on Out-of-Domain Data for End-to-End Speech Recognition
Figure 4 for Improving Confidence Estimation on Out-of-Domain Data for End-to-End Speech Recognition
Viaarxiv icon

BigSSL: Exploring the Frontier of Large-Scale Semi-Supervised Learning for Automatic Speech Recognition

Add code
Oct 01, 2021
Figure 1 for BigSSL: Exploring the Frontier of Large-Scale Semi-Supervised Learning for Automatic Speech Recognition
Figure 2 for BigSSL: Exploring the Frontier of Large-Scale Semi-Supervised Learning for Automatic Speech Recognition
Figure 3 for BigSSL: Exploring the Frontier of Large-Scale Semi-Supervised Learning for Automatic Speech Recognition
Figure 4 for BigSSL: Exploring the Frontier of Large-Scale Semi-Supervised Learning for Automatic Speech Recognition
Viaarxiv icon

Multi-Task Learning for End-to-End ASR Word and Utterance Confidence with Deletion Prediction

Add code
Apr 26, 2021
Figure 1 for Multi-Task Learning for End-to-End ASR Word and Utterance Confidence with Deletion Prediction
Figure 2 for Multi-Task Learning for End-to-End ASR Word and Utterance Confidence with Deletion Prediction
Figure 3 for Multi-Task Learning for End-to-End ASR Word and Utterance Confidence with Deletion Prediction
Figure 4 for Multi-Task Learning for End-to-End ASR Word and Utterance Confidence with Deletion Prediction
Viaarxiv icon

Bridging the gap between streaming and non-streaming ASR systems bydistilling ensembles of CTC and RNN-T models

Add code
Apr 25, 2021
Figure 1 for Bridging the gap between streaming and non-streaming ASR systems bydistilling ensembles of CTC and RNN-T models
Figure 2 for Bridging the gap between streaming and non-streaming ASR systems bydistilling ensembles of CTC and RNN-T models
Figure 3 for Bridging the gap between streaming and non-streaming ASR systems bydistilling ensembles of CTC and RNN-T models
Figure 4 for Bridging the gap between streaming and non-streaming ASR systems bydistilling ensembles of CTC and RNN-T models
Viaarxiv icon

Exploring Targeted Universal Adversarial Perturbations to End-to-end ASR Models

Add code
Apr 06, 2021
Figure 1 for Exploring Targeted Universal Adversarial Perturbations to End-to-end ASR Models
Figure 2 for Exploring Targeted Universal Adversarial Perturbations to End-to-end ASR Models
Figure 3 for Exploring Targeted Universal Adversarial Perturbations to End-to-end ASR Models
Figure 4 for Exploring Targeted Universal Adversarial Perturbations to End-to-end ASR Models
Viaarxiv icon