This paper studies few-shot segmentation, which is a task of predicting foreground mask of unseen classes by a few of annotations only, aided by a set of rich annotations already existed. The existing methods mainly focus the task on "\textit{how to transfer segmentation cues from support images (labeled images) to query images (unlabeled images)}", and try to learn efficient and general transfer module that can be easily extended to unseen classes. However, it is proved to be a challenging task to learn the transfer module that is general to various classes. This paper solves few-shot segmentation in a new perspective of "\textit{how to represent unseen classes by existing classes}", and formulates few-shot segmentation as the representation process that represents unseen classes (in terms of forming the foreground prior) by existing classes precisely. Based on such idea, we propose a new class representation based few-shot segmentation framework, which firstly generates class activation map of unseen class based on the knowledge of existing classes, and then uses the map as foreground probability map to extract the foregrounds from query image. A new two-branch based few-shot segmentation network is proposed. Moreover, a new CAM generation module that extracts the CAM of unseen classes rather than the classical training classes is raised. We validate the effectiveness of our method on Pascal VOC 2012 dataset, the value FB-IoU of one-shot and five-shot arrives at 69.2\% and 70.1\% respectively, which outperforms the state-of-the-art method.
Existing method generates class activation map (CAM) by a set of fixed classes (i.e., using all the classes), while the discriminative cues between class pairs are not considered. Note that activation maps by considering different class pair are complementary, and therefore can provide more discriminative cues to overcome the shortcoming of the existing CAM generation that the highlighted regions are usually local part regions rather than global object regions due to the lack of object cues. In this paper, we generate CAM by using a few of representative classes, with aim of extracting more discriminative cues by considering each class pair to obtain CAM more globally. The advantages are twofold. Firstly, the representative classes are able to obtain activation regions that are complementary to each other, and therefore leads to generating activation map more accurately. Secondly, we only need to consider a small number of representative classes, making the CAM generation suitable for small networks. We propose a clustering based method to select the representative classes. Multiple binary classification models rather than a multiple class classification model are used to generate the CAM. Moreover, we propose a multi-layer fusion based CAM generation method to simultaneously combine high-level semantic features and low-level detail features. We validate the proposed method on the PASCAL VOC and COCO database in terms of segmentation groundtruth. Various networks such as classical network (Resnet-50, Resent-101 and Resnet-152) and small network (VGG-19, Resnet-18 and Mobilenet) are considered. Experimental results show that the proposed method improves the CAM generation obviously.
Recently, deep supervised hashing methods have become popular for large-scale image retrieval task. To preserve the semantic similarity notion between examples, they typically utilize the pairwise supervision or the triplet supervised information for hash learning. However, these methods usually ignore the semantic class information which can help the improvement of the semantic discriminative ability of hash codes. In this paper, we propose a novel hierarchy neighborhood discriminative hashing method. Specifically, we construct a bipartite graph to build coarse semantic neighbourhood relationship between the sub-class feature centers and the embeddings features. Moreover, we utilize the pairwise supervised information to construct the fined semantic neighbourhood relationship between embeddings features. Finally, we propose a hierarchy neighborhood discriminative hashing loss to unify the single-label and multilabel image retrieval problem with a one-stream deep neural network architecture. Experimental results on two largescale datasets demonstrate that the proposed method can outperform the state-of-the-art hashing methods.
In the field of objective image quality assessment (IQA), the Spearman's $\rho$ and Kendall's $\tau$ are two most popular rank correlation indicators, which straightforwardly assign uniform weight to all quality levels and assume each pair of images are sortable. They are successful for measuring the average accuracy of an IQA metric in ranking multiple processed images. However, two important perceptual properties are ignored by them as well. Firstly, the sorting accuracy (SA) of high quality images are usually more important than the poor quality ones in many real world applications, where only the top-ranked images would be pushed to the users. Secondly, due to the subjective uncertainty in making judgement, two perceptually similar images are usually hardly sortable, whose ranks do not contribute to the evaluation of an IQA metric. To more accurately compare different IQA algorithms, we explore a perceptually weighted rank correlation indicator in this paper, which rewards the capability of correctly ranking high quality images, and suppresses the attention towards insensitive rank mistakes. More specifically, we focus on activating `valid' pairwise comparison towards image quality, whose difference exceeds a given sensory threshold (ST). Meanwhile, each image pair is assigned an unique weight, which is determined by both the quality level and rank deviation. By modifying the perception threshold, we can illustrate the sorting accuracy with a more sophisticated SA-ST curve, rather than a single rank correlation coefficient. The proposed indicator offers a new insight for interpreting visual perception behaviors. Furthermore, the applicability of our indicator is validated in recommending robust IQA metrics for both the degraded and enhanced image data.
Image segmentation refers to the process to divide an image into nonoverlapping meaningful regions according to human perception, which has become a classic topic since the early ages of computer vision. A lot of research has been conducted and has resulted in many applications. However, while many segmentation algorithms exist, yet there are only a few sparse and outdated summarizations available, an overview of the recent achievements and issues is lacking. We aim to provide a comprehensive review of the recent progress in this field. Covering 180 publications, we give an overview of broad areas of segmentation topics including not only the classic bottom-up approaches, but also the recent development in superpixel, interactive methods, object proposals, semantic image parsing and image cosegmentation. In addition, we also review the existing influential datasets and evaluation metrics. Finally, we suggest some design flavors and research directions for future research in image segmentation.