With the recent outbreak of COVID-19, creating a means to stop it's spread and eventually develop a vaccine are the most important and challenging tasks that the scientific community is facing right now. The first step towards these goals is to correctly identify a patient that is infected with the virus. Our group applied an unsupervised machine learning technique to identify COVID-19 cases. This is an important topic as COVID-19 is a novel disease currently being studied in detail and our methodology has the potential to reveal important differences between it and other viral pneumonia. This could then, in turn, enable doctors to more confidently help each patient. Our experiments utilize Principal Component Analysis (PCA), t-distributed Stochastic Neighbor Embedding (t-SNE), and the recently developed Robust Continuous Clustering algorithm (RCC). We display the performance of RCC in identifying COVID-19 patients and its ability to compete with other unsupervised algorithms, namely K-Means++ (KM++). Using a COVID-19 Radiography dataset, we found that RCC outperformed KM++; we used the Adjusted Mutual Information Score (AMI) in order to measure the effectiveness of both algorithms. The AMI for the two and three class cases of KM++ were 0.0250 and 0.054, respectively. In comparison, RCC scored 0.5044 in the two class case and 0.267 in the three class case, clearly showing RCC as the superior algorithm. This not only opens new possible applications of RCC, but it could potentially aid in the creation of a new tool for COVID-19 identification.
3D hand estimation has been a long-standing research topic in computer vision. A recent trend aims not only to estimate the 3D hand joint locations but also to recover the mesh model. However, achieving those goals from a single RGB image remains challenging. In this paper, we introduce an end-to-end learnable model, BiHand, which consists of three cascaded stages, namely 2D seeding stage, 3D lifting stage, and mesh generation stage. At the output of BiHand, the full hand mesh will be recovered using the joint rotations and shape parameters predicted from the network. Inside each stage, BiHand adopts a novel bisecting design which allows the networks to encapsulate two closely related information (e.g. 2D keypoints and silhouette in 2D seeding stage, 3D joints, and depth map in 3D lifting stage, joint rotations and shape parameters in the mesh generation stage) in a single forward pass. As the information represents different geometry or structure details, bisecting the data flow can facilitate optimization and increase robustness. For quantitative evaluation, we conduct experiments on two public benchmarks, namely the Rendered Hand Dataset (RHD) and the Stereo Hand Pose Tracking Benchmark (STB). Extensive experiments show that our model can achieve superior accuracy in comparison with state-of-the-art methods, and can produce appealing 3D hand meshes in several severe conditions.
Super-resolution is a classical issue in image restoration field. In recent years, deep learning methods have achieved significant success in super-resolution topic, which concentrate on different elaborate network designs to exploit the image features more effectively. However, most of the networks focus on increasing the depth or width for superior capacities with a large number of parameters, which cause a high computation complexity cost and seldom focus on the inherent correlation of different features. This paper proposes a progressive multi-scale residual network (PMRN) for single image super-resolution problem by sequentially exploiting features with restricted parameters. Specifically, we design a progressive multi-scale residual block (PMRB) to progressively explore the multi-scale features with different layer combinations, aiming to consider the correlations of different scales. The combinations for feature exploitation are defined in a recursive fashion for introducing the non-linearity and better feature representation with limited parameters. Furthermore, we investigate a joint channel-wise and pixel-wise attention mechanism for comprehensive correlation exploration, termed as CPA, which is utilized in PMRB by considering both scale and bias factors for features in parallel. Experimental results show that proposed PMRN recovers structural textures more effectively with superior PSNR/SSIM results than other lightweight works. The extension model PMRN+ with self-ensemble achieves competitive or better results than large networks with much fewer parameters and lower computation complexity.
Large-scale network embedding is to learn a latent representation for each node in an unsupervised manner, which captures inherent properties and structural information of the underlying graph. In this field, many popular approaches are influenced by the skip-gram model from natural language processing. Most of them use a contrastive objective to train an encoder which forces the embeddings of similar pairs to be close and embeddings of negative samples to be far. A key of success to such contrastive learning methods is how to draw positive and negative samples. While negative samples that are generated by straightforward random sampling are often satisfying, methods for drawing positive examples remains a hot topic. In this paper, we propose SCE for unsupervised network embedding only using negative samples for training. Our method is based on a new contrastive objective inspired by the well-known sparsest cut problem. To solve the underlying optimization problem, we introduce a Laplacian smoothing trick, which uses graph convolutional operators as low-pass filters for smoothing node representations. The resulting model consists of a GCN-type structure as the encoder and a simple loss function. Notably, our model does not use positive samples but only negative samples for training, which not only makes the implementation and tuning much easier, but also reduces the training time significantly. Finally, extensive experimental studies on real world data sets are conducted. The results clearly demonstrate the advantages of our new model in both accuracy and scalability compared to strong baselines such as GraphSAGE, G2G and DGI.
With widespread applications of artificial intelligence (AI), the capabilities of the perception, understanding, decision-making and control for autonomous systems have improved significantly in the past years. When autonomous systems consider the performance of accuracy and transferability simultaneously, several AI methods, like adversarial learning, reinforcement learning (RL) and meta-learning, show their powerful performance. Here, we review the learning-based approaches in autonomous systems from the perspectives of accuracy and transferability. Accuracy means that a well-trained model shows good results during the testing phase, in which the testing set shares a same task or a data distribution with the training set. Transferability means that when an trained model is transferred to other testing domains, the accuracy is still good. Firstly, we introduce some basic concepts of transfer learning and then present some preliminaries of adversarial learning, RL and meta-learning. Secondly, we focus on reviewing the accuracy and transferability to show the advantages of adversarial learning, like generative adversarial networks (GANs), in typical computer vision tasks in autonomous systems, including image style transfer, image super-resolution, image deblurring/dehazing/rain removal, semantic segmentation, depth estimation and person re-identification. Then, we further review the performance of RL and meta-learning from the aspects of accuracy and transferability in autonomous systems, involving robot navigation and robotic manipulation. Finally, we discuss several challenges and future topics for using adversarial learning, RL and meta-learning in autonomous systems.
Small data challenges have emerged in many learning problems, since the success of deep neural networks often relies on the availability of a huge amount of labeled data that is expensive to collect. To address it, many efforts have been made on training complex models with small data in an unsupervised and semi-supervised fashion. In this paper, we will review the recent progresses on these two major categories of methods. A wide spectrum of small data models will be categorized in a big picture, where we will show how they interplay with each other to motivate explorations of new ideas. We will review the criteria of learning the transformation equivariant, disentangled, self-supervised and semi-supervised representations, which underpin the foundations of recent developments. Many instantiations of unsupervised and semi-supervised generative models have been developed on the basis of these criteria, greatly expanding the territory of existing autoencoders, generative adversarial nets (GANs) and other deep networks by exploring the distribution of unlabeled data for more powerful representations. While we focus on the unsupervised and semi-supervised methods, we will also provide a broader review of other emerging topics, from unsupervised and semi-supervised domain adaptation to the fundamental roles of transformation equivariance and invariance in training a wide spectrum of deep networks. It is impossible for us to write an exclusive encyclopedia to include all related works. Instead, we aim at exploring the main ideas, principles and methods in this area to reveal where we are heading on the journey towards addressing the small data challenges in this big data era.
It is well known that the P3P problem could have 1, 2, 3 and at most 4 positive solutions under different configurations among its 3 control points and the position of the optical center. Since in any real applications, the knowledge on the exact number of possible solutions is a prerequisite for selecting the right one among all the possible solutions, the study on the phenomenon of multiple solutions in the P3P problem has been an active topic . In this work, we provide some new geometric interpretations on the multi-solution phenomenon in the P3P problem, our main results include: (1): The necessary and sufficient condition for the P3P problem to have a pair of side-sharing solutions is the two optical centers of the solutions both lie on one of the 3 vertical planes to the base plane of control points; (2): The necessary and sufficient condition for the P3P problem to have a pair of point-sharing solutions is the two optical centers of the solutions both lie on one of the 3 so-called skewed danger cylinders;(3): If the P3P problem has other solutions in addition to a pair of side-sharing ( point-sharing) solutions, these remaining solutions must be a point-sharing ( side-sharing ) pair. In a sense, the side-sharing pair and the point-sharing pair are companion pairs. In sum, our results provide some new insights into the nature of the multi-solution phenomenon in the P3P problem, in addition to their academic value, they could also be used as some theoretical guidance for practitioners in real applications to avoid occurrence of multiple solutions by properly arranging the control points.
The Eigendecomposition of quadratic forms (symmetric matrices) guaranteed by the spectral theorem is a foundational result in applied mathematics. Motivated by a shared structure found in inferential problems of recent interest---namely orthogonal tensor decompositions, Independent Component Analysis (ICA), topic models, spectral clustering, and Gaussian mixture learning---we generalize the eigendecomposition from quadratic forms to a broad class of "orthogonally decomposable" functions. We identify a key role of convexity in our extension, and we generalize two traditional characterizations of eigenvectors: First, the eigenvectors of a quadratic form arise from the optima structure of the quadratic form on the sphere. Second, the eigenvectors are the fixed points of the power iteration. In our setting, we consider a simple first order generalization of the power method which we call gradient iteration. It leads to efficient and easily implementable methods for basis recovery. It includes influential Machine Learning methods such as cumulant-based FastICA and the tensor power iteration for orthogonally decomposable tensors as special cases. We provide a complete theoretical analysis of gradient iteration using the structure theory of discrete dynamical systems to show almost sure convergence and fast (super-linear) convergence rates. The analysis also extends to the case when the observed function is only approximately orthogonally decomposable, with bounds that are polynomial in dimension and other relevant parameters, such as perturbation size. Our perturbation results can be considered as a non-linear version of the classical Davis-Kahan theorem for perturbations of eigenvectors of symmetric matrices.
Distributed word representations have been shown to be very useful in various natural language processing (NLP) application tasks. These word vectors learned from huge corpora very often carry both semantic and syntactic information of words. However, it is well known that each individual user has his own language patterns because of different factors such as interested topics, friend groups, social activities, wording habits, etc., which may imply some kind of personalized semantics. With such personalized semantics, the same word may imply slightly differently for different users. For example, the word "Cappuccino" may imply "Leisure", "Joy", "Excellent" for a user enjoying coffee, by only a kind of drink for someone else. Such personalized semantics of course cannot be carried by the standard universal word vectors trained with huge corpora produced by many people. In this paper, we propose a framework to train different personalized word vectors for different users based on the very successful continuous skip-gram model using the social network data posted by many individual users. In this framework, universal background word vectors are first learned from the background corpora, and then adapted by the personalized corpus for each individual user to learn the personalized word vectors. We use two application tasks to evaluate the quality of the personalized word vectors obtained in this way, the user prediction task and the sentence completion task. These personalized word vectors were shown to carry some personalized semantics and offer improved performance on these two evaluation tasks.