Annotating a large-scale image dataset is very tedious, yet necessary for training person re-identification models. To alleviate such a problem, we present an active hard sample mining framework via training an effective re-ID model with the least labeling efforts. Considering that hard samples can provide informative patterns, we first formulate an uncertainty estimation to actively select hard samples to iteratively train a re-ID model from scratch. Then, intra-diversity estimation is designed to reduce the redundant hard samples by maximizing their diversity. Moreover, we propose a computer-assisted identity recommendation module embedded in the active hard sample mining framework to help human annotators to rapidly and accurately label the selected samples. Extensive experiments were carried out to demonstrate the effectiveness of our method on several public datasets. Experimental results indicate that our method can reduce 57%, 63%, and 49% annotation efforts on the Market1501, MSMT17, and CUHK03, respectively, while maximizing the performance of the re-ID model.

Tomb biographies of the Tang dynasty provide invaluable information about Chinese history. The original biographies are classical Chinese texts which contain neither word boundaries nor sentence boundaries. Relying on three published books of tomb biographies of the Tang dynasty, we investigated the effectiveness of employing machine-learning methods for algorithmically identifying the pauses and terminals of sentences in the biographies. We consider the segmentation task as a classification problem. Chinese characters that are and are not followed by a punctuation mark are classified into two categories. We applied a machine-learning-based mechanism, the conditional random fields (CRF), to classify the characters (and words) in the texts, and we studied the contributions of selected types of lexical information to the resulting quality of the segmentation recommendations. This proposal presented at the DH 2018 conference discussed some of the basic experiments and their evaluations. By considering the contextual information and employing the heuristics provided by experts of Chinese literature, we achieved F1 measures that were better than 80%. More complex experiments that employ deep neural networks helped us further improve the results in recent work.

Most previous works usually explained adversarial examples from several specific perspectives, lacking relatively integral comprehension about this problem. In this paper, we present a systematic study on adversarial examples from three aspects: the amount of training data, task-dependent and model-specific factors. Particularly, we show that adversarial generalization (i.e. test accuracy on adversarial examples) for standard training requires more data than standard generalization (i.e. test accuracy on clean examples); and uncover the global relationship between generalization and robustness with respect to the data size especially when data is augmented by generative models. This reveals the trade-off correlation between standard generalization and robustness in limited training data regime and their consistency when data size is large enough. Furthermore, we explore how different task-dependent and model-specific factors influence the vulnerability of deep neural networks by extensive empirical analysis. Relevant recommendations on defense against adversarial attacks are provided as well. Our results outline a potential path towards the luminous and systematic understanding of adversarial examples.

In recent years, with the popularization of deep learning frameworks and large datasets, researchers have started parallelizing their models in order to train faster. This is crucially important, because they typically explore many hyperparameters in order to find the best ones for their applications. This process is time consuming and, consequently, speeding up training improves productivity. One approach to parallelize deep learning models followed by many researchers is based on weak scaling. The minibatches increase in size as new GPUs are added to the system. In addition, new learning rates schedules have been proposed to fix optimization issues that occur with large minibatch sizes. In this paper, however, we show that the recommendations provided by recent work do not apply to models that lack large datasets. In fact, we argument in favor of using strong scaling for achieving reliable performance in such cases. We evaluated our approach with up to 32 GPUs and show that weak scaling not only does not have the same accuracy as the sequential model, it also fails to converge most of time. Meanwhile, strong scaling has good scalability while having exactly the same accuracy of a sequential implementation.

Softmax is the most commonly used output function for multiclass problems and is widely used in areas such as vision, natural language processing, and recommendation. A softmax model has linear costs in the number of classes which makes it too expensive for many real-world problems. A common approach to speed up training involves sampling only some of the classes at each training step. It is known that this method is biased and that the bias increases the more the sampling distribution deviates from the output distribution. Nevertheless, almost any recent work uses simple sampling distributions that require a large sample size to mitigate the bias. In this work, we propose a new class of kernel based sampling methods and develop an efficient sampling algorithm. Kernel based sampling adapts to the model as it is trained, thus resulting in low bias. Kernel based sampling can be easily applied to many models because it relies only on the model's last hidden layer. We empirically study the trade-off of bias, sampling distribution and sample size and show that kernel based sampling results in low bias with few samples.

This work presents PESMOC, Predictive Entropy Search for Multi-objective Bayesian Optimization with Constraints, an information-based strategy for the simultaneous optimization of multiple expensive-to-evaluate black-box functions under the presence of several constraints. PESMOC can hence be used to solve a wide range of optimization problems. Iteratively, PESMOC chooses an input location on which to evaluate the objective functions and the constraints so as to maximally reduce the entropy of the Pareto set of the corresponding optimization problem. The constraints considered in PESMOC are assumed to have similar properties to those of the objective functions in typical Bayesian optimization problems. That is, they do not have a known expression (which prevents gradient computation), their evaluation is considered to be very expensive, and the resulting observations may be corrupted by noise. These constraints arise in a plethora of expensive black-box optimization problems. We carry out synthetic experiments to illustrate the effectiveness of PESMOC, where we sample both the objectives and the constraints from a Gaussian process prior. The results obtained show that PESMOC is able to provide better recommendations with a smaller number of evaluations than a strategy based on random search.

Inductive Matrix Completion (IMC) is an important class of matrix completion problems that allows direct inclusion of available features to enhance estimation capabilities. These models have found applications in personalized recommendation systems, multilabel learning, dictionary learning, etc. This paper examines a general class of noisy matrix completion tasks where the underlying matrix is following an IMC model i.e., it is formed by a mixing matrix (a priori unknown) sandwiched between two known feature matrices. The mixing matrix here is assumed to be well approximated by the product of two sparse matrices---referred here to as "sparse factor models." We leverage the main theorem of Soni:2016:NMC and extend it to provide theoretical error bounds for the sparsity-regularized maximum likelihood estimators for the class of problems discussed in this paper. The main result is general in the sense that it can be used to derive error bounds for various noise models. In this paper, we instantiate our main result for the case of Gaussian noise and provide corresponding error bounds in terms of squared loss.

Motivated by an application of eliciting users' preferences, we investigate the problem of learning hemimetrics, i.e., pairwise distances among a set of $n$ items that satisfy triangle inequalities and non-negativity constraints. In our application, the (asymmetric) distances quantify private costs a user incurs when substituting one item by another. We aim to learn these distances (costs) by asking the users whether they are willing to switch from one item to another for a given incentive offer. Without exploiting structural constraints of the hemimetric polytope, learning the distances between each pair of items requires $\Theta(n^2)$ queries. We propose an active learning algorithm that substantially reduces this sample complexity by exploiting the structural constraints on the version space of hemimetrics. Our proposed algorithm achieves provably-optimal sample complexity for various instances of the task. For example, when the items are embedded into $K$ tight clusters, the sample complexity of our algorithm reduces to $O(n K)$. Extensive experiments on a restaurant recommendation data set support the conclusions of our theoretical analysis.

Embedding learning, a.k.a. representation learning, has been shown to be able to model large-scale semantic knowledge graphs. A key concept is a mapping of the knowledge graph to a tensor representation whose entries are predicted by models using latent representations of generalized entities. Knowledge graphs are typically treated as static: A knowledge graph grows more links when more facts become available but the ground truth values associated with links is considered time invariant. In this paper we address the issue of knowledge graphs where triple states depend on time. We assume that changes in the knowledge graph always arrive in form of events, in the sense that the events are the gateway to the knowledge graph. We train an event prediction model which uses both knowledge graph background information and information on recent events. By predicting future events, we also predict likely changes in the knowledge graph and thus obtain a model for the evolution of the knowledge graph as well. Our experiments demonstrate that our approach performs well in a clinical application, a recommendation engine and a sensor network application.

<<

340

341

342

343

344

345

346

347

348

349

350

351

352

>>