Abstract:Visualizing high-dimensional datasets through a generalized embedding has been a challenge for a long time. Several methods have shown up for the same, but still, they have not been able to generate a generalized embedding, which not only can reveal the hidden patterns present in the data but also generate realistic high-dimensional samples from it. Motivated by this aspect, in this study, a novel generative model, called G-NeuroDAVIS, has been developed, which is capable of visualizing high-dimensional data through a generalized embedding, and thereby generating new samples. The model leverages advanced generative techniques to produce high-quality embedding that captures the underlying structure of the data more effectively than existing methods. G-NeuroDAVIS can be trained in both supervised and unsupervised settings. We rigorously evaluated our model through a series of experiments, demonstrating superior performance in classification tasks, which highlights the robustness of the learned representations. Furthermore, the conditional sample generation capability of the model has been described through qualitative assessments, revealing a marked improvement in generating realistic and diverse samples. G-NeuroDAVIS has outperformed the Variational Autoencoder (VAE) significantly in multiple key aspects, including embedding quality, classification performance, and sample generation capability. These results underscore the potential of our generative model to serve as a powerful tool in various applications requiring high-quality data generation and representation learning.
Abstract:The task of dimensionality reduction and visualization of high-dimensional datasets remains a challenging problem since long. Modern high-throughput technologies produce newer high-dimensional datasets having multiple views with relatively new data types. Visualization of these datasets require proper methodology that can uncover hidden patterns in the data without affecting the local and global structures within the data. To this end, however, very few such methodology exist, which can realise this task. In this work, we have introduced a novel unsupervised deep neural network model, called NeuroDAVIS, for data visualization. NeuroDAVIS is capable of extracting important features from the data, without assuming any data distribution, and visualize effectively in lower dimension. It has been shown theoritically that neighbourhood relationship of the data in high dimension remains preserved in lower dimension. The performance of NeuroDAVIS has been evaluated on a wide variety of synthetic and real high-dimensional datasets including numeric, textual, image and biological data. NeuroDAVIS has been highly competitive against both t-Distributed Stochastic Neighbor Embedding (t-SNE) and Uniform Manifold Approximation and Projection (UMAP) with respect to visualization quality, and preservation of data size, shape, and both local and global structure. It has outperformed Fast interpolation-based t-SNE (Fit-SNE), a variant of t-SNE, for most of the high-dimensional datasets as well. For the biological datasets, besides t-SNE, UMAP and Fit-SNE, NeuroDAVIS has also performed well compared to other state-of-the-art algorithms, like Potential of Heat-diffusion for Affinity-based Trajectory Embedding (PHATE) and the siamese neural network-based method, called IVIS. Downstream classification and clustering analyses have also revealed favourable results for NeuroDAVIS-generated embeddings.