Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Takahiro Shinozaki

Margin Calibration for Long-Tailed Visual Recognition

Dec 14, 2021

Yidong Wang, Bowen Zhang, Wenxin Hou, Zhen Wu, Jindong Wang, Takahiro Shinozaki

Figure 1 for Margin Calibration for Long-Tailed Visual Recognition

Figure 2 for Margin Calibration for Long-Tailed Visual Recognition

Figure 3 for Margin Calibration for Long-Tailed Visual Recognition

Figure 4 for Margin Calibration for Long-Tailed Visual Recognition

Abstract:The long-tailed class distribution in visual recognition tasks poses great challenges for neural networks on how to handle the biased predictions between head and tail classes, i.e., the model tends to classify tail classes as head classes. While existing research focused on data resampling and loss function engineering, in this paper, we take a different perspective: the classification margins. We study the relationship between the margins and logits (classification scores) and empirically observe the biased margins and the biased logits are positively correlated. We propose MARC, a simple yet effective MARgin Calibration function to dynamically calibrate the biased margins for unbiased logits. We validate MARC through extensive experiments on common long-tailed benchmarks including CIFAR-LT, ImageNet-LT, Places-LT, and iNaturalist-LT. Experimental results demonstrate that our MARC achieves favorable results on these benchmarks. In addition, MARC is extremely easy to implement with just three lines of code. We hope this simple method will motivate people to rethink the biased margins and biased logits in long-tailed visual recognition.

* Technical report; 9 pages

Via

Access Paper or Ask Questions

FlexMatch: Boosting Semi-Supervised Learning with Curriculum Pseudo Labeling

Oct 15, 2021

Bowen Zhang, Yidong Wang, Wenxin Hou, Hao Wu, Jindong Wang, Manabu Okumura, Takahiro Shinozaki

Figure 1 for FlexMatch: Boosting Semi-Supervised Learning with Curriculum Pseudo Labeling

Figure 2 for FlexMatch: Boosting Semi-Supervised Learning with Curriculum Pseudo Labeling

Figure 3 for FlexMatch: Boosting Semi-Supervised Learning with Curriculum Pseudo Labeling

Figure 4 for FlexMatch: Boosting Semi-Supervised Learning with Curriculum Pseudo Labeling

Abstract:The recently proposed FixMatch achieved state-of-the-art results on most semi-supervised learning (SSL) benchmarks. However, like other modern SSL algorithms, FixMatch uses a pre-defined constant threshold for all classes to select unlabeled data that contribute to the training, thus failing to consider different learning status and learning difficulties of different classes. To address this issue, we propose Curriculum Pseudo Labeling (CPL), a curriculum learning approach to leverage unlabeled data according to the model's learning status. The core of CPL is to flexibly adjust thresholds for different classes at each time step to let pass informative unlabeled data and their pseudo labels. CPL does not introduce additional parameters or computations (forward or backward propagation). We apply CPL to FixMatch and call our improved algorithm FlexMatch. FlexMatch achieves state-of-the-art performance on a variety of SSL benchmarks, with especially strong performances when the labeled data are extremely limited or when the task is challenging. For example, FlexMatch outperforms FixMatch by 14.32% and 24.55% on CIFAR-100 and STL-10 datasets respectively, when there are only 4 labels per class. CPL also significantly boosts the convergence speed, e.g., FlexMatch can use only 1/5 training time of FixMatch to achieve even better performance. Furthermore, we show that CPL can be easily adapted to other SSL algorithms and remarkably improve their performances. We open source our code at https://github.com/TorchSSL/TorchSSL.

* Accepted by NeurIPS 2021; 16 pages with appendix; code: https://github.com/TorchSSL/TorchSSL

Via

Access Paper or Ask Questions

Exploiting Adapters for Cross-lingual Low-resource Speech Recognition

May 18, 2021

Wenxin Hou, Han Zhu, Yidong Wang, Jindong Wang, Tao Qin, Renjun Xu, Takahiro Shinozaki

Figure 1 for Exploiting Adapters for Cross-lingual Low-resource Speech Recognition

Figure 2 for Exploiting Adapters for Cross-lingual Low-resource Speech Recognition

Figure 3 for Exploiting Adapters for Cross-lingual Low-resource Speech Recognition

Figure 4 for Exploiting Adapters for Cross-lingual Low-resource Speech Recognition

Abstract:Cross-lingual speech adaptation aims to solve the problem of leveraging multiple rich-resource languages to build models for a low-resource target language. Since the low-resource language has limited training data, speech recognition models can easily overfit. In this paper, we propose to use adapters to investigate the performance of multiple adapters for parameter-efficient cross-lingual speech adaptation. Based on our previous MetaAdapter that implicitly leverages adapters, we propose a novel algorithms called SimAdapter for explicitly learning knowledge from adapters. Our algorithm leverages adapters which can be easily integrated into the Transformer structure.MetaAdapter leverages meta-learning to transfer the general knowledge from training data to the test language. SimAdapter aims to learn the similarities between the source and target languages during fine-tuning using the adapters. We conduct extensive experiments on five-low-resource languages in Common Voice dataset. Results demonstrate that our MetaAdapter and SimAdapter methods can reduce WER by 2.98% and 2.55% with only 2.5% and 15.5% of trainable parameters compared to the strong full-model fine-tuning baseline. Moreover, we also show that these two novel algorithms can be integrated for better performance with up to 3.55% relative WER reduction.

* Technical report

Via

Access Paper or Ask Questions

Cross-domain Speech Recognition with Unsupervised Character-level Distribution Matching

Apr 16, 2021

Wenxin Hou, Jindong Wang, Xu Tan, Tao Qin, Takahiro Shinozaki

Figure 1 for Cross-domain Speech Recognition with Unsupervised Character-level Distribution Matching

Figure 2 for Cross-domain Speech Recognition with Unsupervised Character-level Distribution Matching

Figure 3 for Cross-domain Speech Recognition with Unsupervised Character-level Distribution Matching

Figure 4 for Cross-domain Speech Recognition with Unsupervised Character-level Distribution Matching

Abstract:End-to-end automatic speech recognition (ASR) can achieve promising performance with large-scale training data. However, it is known that domain mismatch between training and testing data often leads to a degradation of recognition accuracy. In this work, we focus on the unsupervised domain adaptation for ASR and propose CMatch, a Character-level distribution matching method to perform fine-grained adaptation between each character in two domains. First, to obtain labels for the features belonging to each character, we achieve frame-level label assignment using the Connectionist Temporal Classification (CTC) pseudo labels. Then, we match the character-level distributions using Maximum Mean Discrepancy. We train our algorithm using the self-training technique. Experiments on the Libri-Adapt dataset show that our proposed approach achieves 14.39% and 16.50% relative Word Error Rate (WER) reduction on both cross-device and cross-environment ASR. We also comprehensively analyze the different strategies for frame-level label assignment and Transformer adaptations.

* submitted to INTERSPEECH 2021; code available at https://github.com/jindongwang/transferlearning/tree/master/code/ASR/CMatch

Via

Access Paper or Ask Questions

Reinforcement Learning of Speech Recognition System Based on Policy Gradient and Hypothesis Selection

Nov 10, 2017

Taku Kato, Takahiro Shinozaki

Figure 1 for Reinforcement Learning of Speech Recognition System Based on Policy Gradient and Hypothesis Selection

Figure 2 for Reinforcement Learning of Speech Recognition System Based on Policy Gradient and Hypothesis Selection

Figure 3 for Reinforcement Learning of Speech Recognition System Based on Policy Gradient and Hypothesis Selection

Figure 4 for Reinforcement Learning of Speech Recognition System Based on Policy Gradient and Hypothesis Selection

Abstract:Speech recognition systems have achieved high recognition performance for several tasks. However, the performance of such systems is dependent on the tremendously costly development work of preparing vast amounts of task-matched transcribed speech data for supervised training. The key problem here is the cost of transcribing speech data. The cost is repeatedly required to support new languages and new tasks. Assuming broad network services for transcribing speech data for many users, a system would become more self-sufficient and more useful if it possessed the ability to learn from very light feedback from the users without annoying them. In this paper, we propose a general reinforcement learning framework for speech recognition systems based on the policy gradient method. As a particular instance of the framework, we also propose a hypothesis selection-based reinforcement learning method. The proposed framework provides a new view for several existing training and adaptation methods. The experimental results show that the proposed method improves the recognition performance compared to unsupervised adaptation.

* 5 pages, 6 figures

Via

Access Paper or Ask Questions