Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Masashi Sugiyama

Positive-Unlabeled Classification under Class-Prior Shift: A Prior-invariant Approach Based on Density Ratio Estimation

Jul 11, 2021
Shota Nakajima, Masashi Sugiyama

Figure 1 for Positive-Unlabeled Classification under Class-Prior Shift: A Prior-invariant Approach Based on Density Ratio Estimation

Figure 2 for Positive-Unlabeled Classification under Class-Prior Shift: A Prior-invariant Approach Based on Density Ratio Estimation

Figure 3 for Positive-Unlabeled Classification under Class-Prior Shift: A Prior-invariant Approach Based on Density Ratio Estimation

Figure 4 for Positive-Unlabeled Classification under Class-Prior Shift: A Prior-invariant Approach Based on Density Ratio Estimation

Learning from positive and unlabeled (PU) data is an important problem in various applications. Most of the recent approaches for PU classification assume that the class-prior (the ratio of positive samples) in the training unlabeled dataset is identical to that of the test data, which does not hold in many practical cases. In addition, we usually do not know the class-priors of the training and test data, thus we have no clue on how to train a classifier without them. To address these problems, we propose a novel PU classification method based on density ratio estimation. A notable advantage of our proposed method is that it does not require the class-priors in the training phase; class-prior shift is incorporated only in the test phase. We theoretically justify our proposed method and experimentally demonstrate its effectiveness.

* 18 pages, 4 figures

Via

Access Paper or Ask Questions

Seeing Differently, Acting Similarly: Imitation Learning with Heterogeneous Observations

Jun 17, 2021
Xin-Qiang Cai, Yao-Xiang Ding, Zi-Xuan Chen, Yuan Jiang, Masashi Sugiyama, Zhi-Hua Zhou

Figure 1 for Seeing Differently, Acting Similarly: Imitation Learning with Heterogeneous Observations

Figure 2 for Seeing Differently, Acting Similarly: Imitation Learning with Heterogeneous Observations

Figure 3 for Seeing Differently, Acting Similarly: Imitation Learning with Heterogeneous Observations

Figure 4 for Seeing Differently, Acting Similarly: Imitation Learning with Heterogeneous Observations

In many real-world imitation learning tasks, the demonstrator and the learner have to act in different but full observation spaces. This situation generates significant obstacles for existing imitation learning approaches to work, even when they are combined with traditional space adaptation techniques. The main challenge lies in bridging expert's occupancy measures to learner's dynamically changing occupancy measures under the different observation spaces. In this work, we model the above learning problem as Heterogeneous Observations Imitation Learning (HOIL). We propose the Importance Weighting with REjection (IWRE) algorithm based on the techniques of importance-weighting, learning with rejection, and active querying to solve the key challenge of occupancy measure matching. Experimental results show that IWRE can successfully solve HOIL tasks, including the challenging task of transforming the vision-based demonstrations to random access memory (RAM)-based policies under the Atari domain.

* 17 pages, 25 figures

Via

Access Paper or Ask Questions

Multi-Class Classification from Single-Class Data with Confidences

Jun 16, 2021
Yuzhou Cao, Lei Feng, Senlin Shu, Yitian Xu, Bo An, Gang Niu, Masashi Sugiyama

Figure 1 for Multi-Class Classification from Single-Class Data with Confidences

Figure 2 for Multi-Class Classification from Single-Class Data with Confidences

Figure 3 for Multi-Class Classification from Single-Class Data with Confidences

Figure 4 for Multi-Class Classification from Single-Class Data with Confidences

Can we learn a multi-class classifier from only data of a single class? We show that without any assumptions on the loss functions, models, and optimizers, we can successfully learn a multi-class classifier from only data of a single class with a rigorous consistency guarantee when confidences (i.e., the class-posterior probabilities for all the classes) are available. Specifically, we propose an empirical risk minimization framework that is loss-/model-/optimizer-independent. Instead of constructing a boundary between the given class and other classes, our method can conduct discriminative classification between all the classes even if no data from the other classes are provided. We further theoretically and experimentally show that our method can be Bayes-consistent with a simple modification even if the provided confidences are highly noisy. Then, we provide an extension of our method for the case where data from a subset of all the classes are available. Experimental results demonstrate the effectiveness of our methods.

* 23 pages, 1 figure

Via

Access Paper or Ask Questions

Probabilistic Margins for Instance Reweighting in Adversarial Training

Jun 15, 2021
Qizhou Wang, Feng Liu, Bo Han, Tongliang Liu, Chen Gong, Gang Niu, Mingyuan Zhou, Masashi Sugiyama

Figure 1 for Probabilistic Margins for Instance Reweighting in Adversarial Training

Figure 2 for Probabilistic Margins for Instance Reweighting in Adversarial Training

Figure 3 for Probabilistic Margins for Instance Reweighting in Adversarial Training

Figure 4 for Probabilistic Margins for Instance Reweighting in Adversarial Training

Reweighting adversarial data during training has been recently shown to improve adversarial robustness, where data closer to the current decision boundaries are regarded as more critical and given larger weights. However, existing methods measuring the closeness are not very reliable: they are discrete and can take only a few values, and they are path-dependent, i.e., they may change given the same start and end points with different attack paths. In this paper, we propose three types of probabilistic margin (PM), which are continuous and path-independent, for measuring the aforementioned closeness and reweighting adversarial data. Specifically, a PM is defined as the difference between two estimated class-posterior probabilities, e.g., such the probability of the true label minus the probability of the most confusing label given some natural data. Though different PMs capture different geometric properties, all three PMs share a negative correlation with the vulnerability of data: data with larger/smaller PMs are safer/riskier and should have smaller/larger weights. Experiments demonstrate that PMs are reliable measurements and PM-based reweighting methods outperform state-of-the-art methods.

* 17 pages, 4 figures

Via

Access Paper or Ask Questions

On the Robustness of Average Losses for Partial-Label Learning

Jun 11, 2021
Jiaqi Lv, Lei Feng, Miao Xu, Bo An, Gang Niu, Xin Geng, Masashi Sugiyama

Figure 1 for On the Robustness of Average Losses for Partial-Label Learning

Figure 2 for On the Robustness of Average Losses for Partial-Label Learning

Figure 3 for On the Robustness of Average Losses for Partial-Label Learning

Figure 4 for On the Robustness of Average Losses for Partial-Label Learning

Partial-label (PL) learning is a typical weakly supervised classification problem, where a PL of an instance is a set of candidate labels such that a fixed but unknown candidate is the true label. For PL learning, there are two lines of research: (a) the identification-based strategy (IBS) purifies each label set and extracts the true label; (b) the average-based strategy (ABS) treats all candidates equally for training. In the past two decades, IBS was a much hotter topic than ABS, since it was believed that IBS is more promising. In this paper, we theoretically analyze ABS and find it also promising in the sense of the robustness of its loss functions. Specifically, we consider five problem settings for the generation of clean or noisy PLs, and we prove that average PL losses with bounded multi-class losses are always robust under mild assumptions on the domination of true labels, while average PL losses with unbounded multi-class losses (e.g., the cross-entropy loss) may not be robust. We also conduct experiments to validate our theoretical findings. Note that IBS is heuristic, and we cannot prove its robustness by a similar proof technique; hence, ABS is more advantageous from a theoretical point of view, and it is worth paying attention to the design of more advanced PL learning methods following ABS.

Via

Access Paper or Ask Questions

Loss function based second-order Jensen inequality and its application to particle variational inference

Jun 10, 2021
Futoshi Futami, Tomoharu Iwata, Naonori Ueda, Issei Sato, Masashi Sugiyama

Figure 1 for Loss function based second-order Jensen inequality and its application to particle variational inference

Figure 2 for Loss function based second-order Jensen inequality and its application to particle variational inference

Figure 3 for Loss function based second-order Jensen inequality and its application to particle variational inference

Figure 4 for Loss function based second-order Jensen inequality and its application to particle variational inference

Bayesian model averaging, obtained as the expectation of a likelihood function by a posterior distribution, has been widely used for prediction, evaluation of uncertainty, and model selection. Various approaches have been developed to efficiently capture the information in the posterior distribution; one such approach is the optimization of a set of models simultaneously with interaction to ensure the diversity of the individual models in the same way as ensemble learning. A representative approach is particle variational inference (PVI), which uses an ensemble of models as an empirical approximation for the posterior distribution. PVI iteratively updates each model with a repulsion force to ensure the diversity of the optimized models. However, despite its promising performance, a theoretical understanding of this repulsion and its association with the generalization ability remains unclear. In this paper, we tackle this problem in light of PAC-Bayesian analysis. First, we provide a new second-order Jensen inequality, which has the repulsion term based on the loss function. Thanks to the repulsion term, it is tighter than the standard Jensen inequality. Then, we derive a novel generalization error bound and show that it can be reduced by enhancing the diversity of models. Finally, we derive a new PVI that optimizes the generalization error bound directly. Numerical experiments demonstrate that the performance of the proposed PVI compares favorably with existing methods in the experiment.

Via

Access Paper or Ask Questions

Instance Correction for Learning with Open-set Noisy Labels

Jun 01, 2021
Xiaobo Xia, Tongliang Liu, Bo Han, Mingming Gong, Jun Yu, Gang Niu, Masashi Sugiyama

Figure 1 for Instance Correction for Learning with Open-set Noisy Labels

Figure 2 for Instance Correction for Learning with Open-set Noisy Labels

Figure 3 for Instance Correction for Learning with Open-set Noisy Labels

Figure 4 for Instance Correction for Learning with Open-set Noisy Labels

The problem of open-set noisy labels denotes that part of training data have a different label space that does not contain the true class. Lots of approaches, e.g., loss correction and label correction, cannot handle such open-set noisy labels well, since they need training data and test data to share the same label space, which does not hold for learning with open-set noisy labels. The state-of-the-art methods thus employ the sample selection approach to handle open-set noisy labels, which tries to select clean data from noisy data for network parameters updates. The discarded data are seen to be mislabeled and do not participate in training. Such an approach is intuitive and reasonable at first glance. However, a natural question could be raised "can such data only be discarded during training?". In this paper, we show that the answer is no. Specifically, we discuss that the instances of discarded data could consist of some meaningful information for generalization. For this reason, we do not abandon such data, but use instance correction to modify the instances of the discarded data, which makes the predictions for the discarded data consistent with given labels. Instance correction are performed by targeted adversarial attacks. The corrected data are then exploited for training to help generalization. In addition to the analytical results, a series of empirical evidences are provided to justify our claims.

Via

Access Paper or Ask Questions

Sample Selection with Uncertainty of Losses for Learning with Noisy Labels

Jun 01, 2021
Xiaobo Xia, Tongliang Liu, Bo Han, Mingming Gong, Jun Yu, Gang Niu, Masashi Sugiyama

Figure 1 for Sample Selection with Uncertainty of Losses for Learning with Noisy Labels

Figure 2 for Sample Selection with Uncertainty of Losses for Learning with Noisy Labels

Figure 3 for Sample Selection with Uncertainty of Losses for Learning with Noisy Labels

Figure 4 for Sample Selection with Uncertainty of Losses for Learning with Noisy Labels

In learning with noisy labels, the sample selection approach is very popular, which regards small-loss data as correctly labeled during training. However, losses are generated on-the-fly based on the model being trained with noisy labels, and thus large-loss data are likely but not certainly to be incorrect. There are actually two possibilities of a large-loss data point: (a) it is mislabeled, and then its loss decreases slower than other data, since deep neural networks "learn patterns first"; (b) it belongs to an underrepresented group of data and has not been selected yet. In this paper, we incorporate the uncertainty of losses by adopting interval estimation instead of point estimation of losses, where lower bounds of the confidence intervals of losses derived from distribution-free concentration inequalities, but not losses themselves, are used for sample selection. In this way, we also give large-loss but less selected data a try; then, we can better distinguish between the cases (a) and (b) by seeing if the losses effectively decrease with the uncertainty after the try. As a result, we can better explore underrepresented data that are correctly labeled but seem to be mislabeled at first glance. Experiments demonstrate that the proposed method is superior to baselines and robust to a broad range of label noise types.

Via

Access Paper or Ask Questions

A unified view of likelihood ratio and reparameterization gradients

May 31, 2021
Paavo Parmas, Masashi Sugiyama

Figure 1 for A unified view of likelihood ratio and reparameterization gradients

Figure 2 for A unified view of likelihood ratio and reparameterization gradients

Figure 3 for A unified view of likelihood ratio and reparameterization gradients

Figure 4 for A unified view of likelihood ratio and reparameterization gradients

Reparameterization (RP) and likelihood ratio (LR) gradient estimators are used to estimate gradients of expectations throughout machine learning and reinforcement learning; however, they are usually explained as simple mathematical tricks, with no insight into their nature. We use a first principles approach to explain that LR and RP are alternative methods of keeping track of the movement of probability mass, and the two are connected via the divergence theorem. Moreover, we show that the space of all possible estimators combining LR and RP can be completely parameterized by a flow field $u(x)$ and an importance sampling distribution $q(x)$. We prove that there cannot exist a single-sample estimator of this type outside our characterized space, thus, clarifying where we should be searching for better Monte Carlo gradient estimators.

* In International Conference on Artificial Intelligence and Statistics (pp. 4078-4086). PMLR (2021, March)
* AISTATS2021; Earlier paper was split in two (arXiv:1910.06419). Refer to the current paper for the unified view, but see the earlier paper for discussion on an importance sampling technique

Via

Access Paper or Ask Questions