Deep neural networks have demonstrated advanced abilities on various visual classification tasks, which heavily rely on the large-scale training samples with annotated ground-truth. However, it is unrealistic always to require such annotation in real-world applications. Recently, Few-Shot learning (FS), as an attempt to address the shortage of training samples, has made significant progress in generic classification tasks. Nonetheless, it is still challenging for current FS models to distinguish the subtle differences between fine-grained categories given limited training data. To filling the classification gap, in this paper, we address the Few-Shot Fine-Grained (FSFG) classification problem, which focuses on tackling the fine-grained classification under the challenging few-shot learning setting. A novel low-rank pairwise bilinear pooling operation is proposed to capture the nuanced differences between the support and query images for learning an effective distance metric. Moreover, a feature alignment layer is designed to match the support image features with query ones before the comparison. We name the proposed model Low-Rank Pairwise Alignment Bilinear Network (LRPABN), which is trained in an end-to-end fashion. Comprehensive experimental results on four widely used fine-grained classification datasets demonstrate that our LRPABN model achieves the superior performances compared to state-of-the-art methods.
This paper explores two classes of model adaptation methods for Web search ranking: Model Interpolation and error-driven learning approaches based on a boosting algorithm. The results show that model interpolation, though simple, achieves the best results on all the open test sets where the test data is very different from the training data. The tree-based boosting algorithm achieves the best performance on most of the closed test sets where the test data and the training data are similar, but its performance drops significantly on the open test sets due to the instability of trees. Several methods are explored to improve the robustness of the algorithm, with limited success.
Railway points are among the key components of railway infrastructure. As a part of signal equipment, points control the routes of trains at railway junctions, having a significant impact on the reliability, capacity, and punctuality of rail transport. Traditionally, maintenance of points is based on a fixed time interval or raised after the equipment failures. Instead, it would be of great value if we could forecast points' failures and take action beforehand, minimising any negative effect. To date, most of the existing prediction methods are either lab-based or relying on specially installed sensors which makes them infeasible for large-scale implementation. Besides, they often use data from only one source. We, therefore, explore a new way that integrates multi-source data which are ready to hand to fulfil this task. We conducted our case study based on Sydney Trains rail network which is an extensive network of passenger and freight railways. Unfortunately, the real-world data are usually incomplete due to various reasons, e.g., faults in the database, operational errors or transmission faults. Besides, railway points differ in their locations, types and some other properties, which means it is hard to use a unified model to predict their failures. Aiming at this challenging task, we firstly constructed a dataset from multiple sources and selected key features with the help of domain experts. In this paper, we formulate our prediction task as a multiple kernel learning problem with missing kernels. We present a robust multiple kernel learning algorithm for predicting points failures. Our model takes into account the missing pattern of data as well as the inherent variance on different sets of railway points. Extensive experiments demonstrate the superiority of our algorithm compared with other state-of-the-art methods.
The recognition ability of human beings is developed in a progressive way. Usually, children learn to discriminate various objects from coarse to fine-grained with limited supervision. Inspired by this learning process, we propose a simple yet effective model for the Few-Shot Fine-Grained (FSFG) recognition, which tries to tackle the challenging fine-grained recognition task using meta-learning. The proposed method, named Pairwise Alignment Bilinear Network (PABN), is an end-to-end deep neural network. Unlike traditional deep bilinear networks for fine-grained classification, which adopt the self-bilinear pooling to capture the subtle features of images, the proposed model uses a novel pairwise bilinear pooling to compare the nuanced differences between base images and query images for learning a deep distance metric. In order to match base image features with query image features, we design feature alignment losses before the proposed pairwise bilinear pooling. Experiment results on four fine-grained classification datasets and one generic few-shot dataset demonstrate that the proposed model outperforms both the state-ofthe-art few-shot fine-grained and general few-shot methods.
Many types of 3D acquisition sensors have emerged in recent years and point cloud has been widely used in many areas. Accurate and fast registration of cross-source 3D point clouds from different sensors is an emerged research problem in computer vision. This problem is extremely challenging because cross-source point clouds contain a mixture of various variances, such as density, partial overlap, large noise and outliers, viewpoint changing. In this paper, an algorithm is proposed to align cross-source point clouds with both high accuracy and high efficiency. There are two main contributions: firstly, two components, the weak region affinity and pixel-wise refinement, are proposed to maintain the global and local information of 3D point clouds. Then, these two components are integrated into an iterative tensor-based registration algorithm to solve the cross-source point cloud registration problem. We conduct experiments on synthetic cross-source benchmark dataset and real cross-source datasets. Comparison with six state-of-the-art methods, the proposed method obtains both higher efficiency and accuracy.
Sufficient training data normally is required to train deeply learned models. However, due to the expensive manual process for labelling large number of images, the amount of available training data is always limited. To produce more data for training a deep network, Generative Adversarial Network (GAN) can be used to generate artificial sample data. However, the generated data usually does not have annotation labels. To solve this problem, in this paper, we propose a virtual label called Multi-pseudo Regularized Label (MpRL) and assign it to the generated data. With MpRL, the generated data will be used as the supplementary of real training data to train a deep neural network in a semi-supervised learning fashion. To build the corresponding relationship between the real data and generated data, MpRL assigns each generated data a proper virtual label which reflects the likelihood of the affiliation of the generated data to pre-defined training classes in the real data domain. Unlike the traditional label which usually is a single integral number, the virtual label proposed in this work is a set of weight-based values each individual of which is a number in (0,1] called multi-pseudo label and reflects the degree of relation between each generated data to every pre-defined class of real data. A comprehensive evaluation is carried out by adopting two state-of-the-art convolutional neural networks (CNNs) in our experiments to verify the effectiveness of MpRL. Experiments demonstrate that by assigning MpRL to generated data, we can further improve the person re-ID performance on five re-ID datasets, i.e., Market-1501, DukeMTMC-reID, CUHK03, VIPeR, and CUHK01. The proposed method obtains +6.29%, +6.30%, +5.58%, +5.84%, and +3.48% improvements in rank-1 accuracy over a strong CNN baseline on the five datasets respectively, and outperforms state-of-the-art methods.
Sliced inverse regression (SIR) is a pioneer tool for supervised dimension reduction. It identifies the effective dimension reduction space, the subspace of significant factors with intrinsic lower dimensionality. In this paper, we propose to refine the SIR algorithm through an overlapping slicing scheme. The new algorithm, called overlapping sliced inverse regression (OSIR), is able to estimate the effective dimension reduction space and determine the number of effective factors more accurately. We show that such overlapping procedure has the potential to identify the information contained in the derivatives of the inverse regression curve, which helps to explain the superiority of OSIR. We also prove that OSIR algorithm is $\sqrt n $-consistent and verify its effectiveness by simulations and real applications.
Distributed learning is an effective way to analyze big data. In distributed regression, a typical approach is to divide the big data into multiple blocks, apply a base regression algorithm on each of them, and then simply average the output functions learnt from these blocks. Since the average process will decrease the variance, not the bias, bias correction is expected to improve the learning performance if the base regression algorithm is a biased one. Regularization kernel network is an effective and widely used method for nonlinear regression analysis. In this paper we will investigate a bias corrected version of regularization kernel network. We derive the error bounds when it is applied to a single data set and when it is applied as a base algorithm in distributed regression. We show that, under certain appropriate conditions, the optimal learning rates can be reached in both situations.
The process of using one image to guide the filtering process of another one is called Guided Image Filtering (GIF). The main challenge of GIF is the structure inconsistency between the guidance image and the target image. Besides, noise in the target image is also a challenging issue especially when it is heavy. In this paper, we propose a general framework for Robust Guided Image Filtering (RGIF), which contains a data term and a smoothness term, to solve the two issues mentioned above. The data term makes our model simultaneously denoise the target image and perform GIF which is robust against the heavy noise. The smoothness term is able to make use of the property of both the guidance image and the target image which is robust against the structure inconsistency. While the resulting model is highly non-convex, it can be solved through the proposed Iteratively Re-weighted Least Squares (IRLS) in an efficient manner. For challenging applications such as guided depth map upsampling, we further develop a data-driven parameter optimization scheme to properly determine the parameter in our model. This optimization scheme can help to preserve small structures and sharp depth edges even for a large upsampling factor (8x for example). Moreover, the specially designed structure of the data term and the smoothness term makes our model perform well in edge-preserving smoothing for single-image tasks (i.e., the guidance image is the target image itself). This paper is an extension of our previous work [1], [2].
With the development of numerous 3D sensing technologies, object registration on cross-source point cloud has aroused researchers' interests. When the point clouds are captured from different kinds of sensors, there are large and different kinds of variations. In this study, we address an even more challenging case in which the differently-source point clouds are acquired from a real street view. One is produced directly by the LiDAR system and the other is generated by using VSFM software on image sequence captured from RGB cameras. When it confronts to large scale point clouds, previous methods mostly focus on point-to-point level registration, and the methods have many limitations.The reason is that the least mean error strategy shows poor ability in registering large variable cross-source point clouds. In this paper, different from previous ICP-based methods, and from a statistic view, we propose a effective coarse-to-fine algorithm to detect and register a small scale SFM point cloud in a large scale Lidar point cloud. Seen from the experimental results, the model can successfully run on LiDAR and SFM point clouds, hence it can make a contribution to many applications, such as robotics and smart city development.