Hyperspectral Imaging is a crucial tool in remote sensing which captures far more spectral information than standard color images. However, the increase in spectral information comes at the cost of spatial resolution. Super-resolution is a popular technique where the goal is to generate a high-resolution version of a given low-resolution input. The majority of modern super-resolution approaches use convolutional neural networks. However, convolution itself is a linear operation and the networks rely on the non-linear activation functions after each layer to provide the necessary non-linearity to learn the complex underlying function. This means that convolutional neural networks tend to be very deep to achieve the desired results. Recently, self-organized operational neural networks have been proposed that aim to overcome this limitation by replacing the convolutional filters with learnable non-linear functions through the use of MacLaurin series expansions. This work focuses on extending the convolutional filters of a popular super-resolution model to more powerful operational filters to enhance the model performance on hyperspectral images. We also investigate the effects that residual connections and different normalization types have on this type of enhanced network. Despite having fewer parameters than their convolutional network equivalents, our results show that operational neural networks achieve superior super-resolution performance on small hyperspectral image datasets.
Device monitoring services have increased in popularity with the evolution of recent technology and the continuously increased number of Internet of Things (IoT) devices. Among the popular services are the ones that use device location information. However, these services run into privacy issues due to the nature of data collection and transmission. In this work, we introduce a platform incorporating Federated Kalman Filter (FKF) with a federated learning approach and private blockchain technology for privacy preservation. We analyze the accuracy of the proposed design against a standard Kalman Filter (KF) implementation of localization based on the Received Signal Strength Indicator (RSSI). The experimental results reveal significant potential for improved data estimation for RSSI-based localization in device monitoring.
Aiming at answering questions based on the content of remotely sensed images, visual question answering for remote sensing data (RSVQA) has attracted much attention nowadays. However, previous works in RSVQA have focused little on the robustness of RSVQA. As we aim to enhance the reliability of RSVQA models, how to learn robust representations against new words and different question templates with the same meaning is the key challenge. With the proposed augmented dataset, we are able to obtain more questions in addition to the original ones with the same meaning. To make better use of this information, in this study, we propose a contrastive learning strategy for training robust RSVQA models against diverse question templates and words. Experimental results demonstrate that the proposed augmented dataset is effective in improving the robustness of the RSVQA model. In addition, the contrastive learning strategy performs well on the low resolution (LR) dataset.
With the recent study of deep learning in scientific computation, the Physics-Informed Neural Networks (PINNs) method has drawn widespread attention for solving Partial Differential Equations (PDEs). Compared to traditional methods, PINNs can efficiently handle high-dimensional problems, but the accuracy is relatively low, especially for highly irregular problems. Inspired by the idea of adaptive finite element methods and incremental learning, we propose GAS, a Gaussian mixture distribution-based adaptive sampling method for PINNs. During the training procedure, GAS uses the current residual information to generate a Gaussian mixture distribution for the sampling of additional points, which are then trained together with historical data to speed up the convergence of the loss and achieve higher accuracy. Several numerical simulations on 2D and 10D problems show that GAS is a promising method that achieves state-of-the-art accuracy among deep solvers, while being comparable with traditional numerical solvers.
The rapid evolvement of deepfake creation technologies is seriously threating media information trustworthiness. The consequences impacting targeted individuals and institutions can be dire. In this work, we study the evolutions of deep learning architectures, particularly CNNs and Transformers. We identified eight promising deep learning architectures, designed and developed our deepfake detection models and conducted experiments over well-established deepfake datasets. These datasets included the latest second and third generation deepfake datasets. We evaluated the effectiveness of our developed single model detectors in deepfake detection and cross datasets evaluations. We achieved 88.74%, 99.53%, 97.68%, 99.73% and 92.02% accuracy and 99.95%, 100%, 99.88%, 99.99% and 97.61% AUC, in the detection of FF++ 2020, Google DFD, Celeb-DF, Deeper Forensics and DFDC deepfakes, respectively. We also identified and showed the unique strengths of CNNs and Transformers models and analysed the observed relationships among the different deepfake datasets, to aid future developments in this area.
In neural networks, task-relevant information is represented jointly by groups of neurons. However, the specific way in which the information is distributed among the individual neurons is not well understood: While parts of it may only be obtainable from specific single neurons, other parts are carried redundantly or synergistically by multiple neurons. We show how Partial Information Decomposition (PID), a recent extension of information theory, can disentangle these contributions. From this, we introduce the measure of "Representational Complexity", which quantifies the difficulty of accessing information spread across multiple neurons. We show how this complexity is directly computable for smaller layers. For larger layers, we propose subsampling and coarse-graining procedures and prove corresponding bounds on the latter. Empirically, for quantized deep neural networks solving the MNIST task, we observe that representational complexity decreases both through successive hidden layers and over training. Overall, we propose representational complexity as a principled and interpretable summary statistic for analyzing the structure of neural representations.
As a newly emerged asset class, cryptocurrency is evidently more volatile compared to the traditional equity markets. Due to its mostly unregulated nature, and often low liquidity, the price of crypto assets can sustain a significant change within minutes that in turn might result in considerable losses. In this paper, we employ an approach for encoding market information into images and making predictions of short-term realized volatility by employing Convolutional Neural Networks. We then compare the performance of the proposed encoding and corresponding model with other benchmark models. The experimental results demonstrate that this representation of market data with a Convolutional Neural Network as a predictive model has the potential to better capture the market dynamics and a better volatility prediction.
In this article, physical layer security (PLS) in an intelligent reflecting surface (IRS) assisted multiple-input multiple-output multiple antenna eavesdropper (MIMOME) system is studied. In particular, we consider a practical scenario without instantaneous channel state information (CSI) of the eavesdropper and assume that the eavesdropping channel is a Rayleigh channel. To reduce the complexity of currently available IRS-assisted PLS schemes, we propose a low-complexity deep learning (DL) based approach to design transmitter beamforming and IRS jointly, where the precoding vector and phase shift matrix are designed to minimize the secrecy outage probability. Simulation results demonstrate that the proposed DL-based approach can achieve a similar performance of that with conventional alternating optimization (AO) algorithms for a significant reduction in the computational complexity.
The dynamics of neuron populations during diverse tasks often evolve on low-dimensional manifolds. However, it remains challenging to discern the contributions of geometry and dynamics for encoding relevant behavioural variables. Here, we introduce an unsupervised geometric deep learning framework for representing non-linear dynamical systems based on statistical distributions of local phase portrait features. Our method provides robust geometry-aware or geometry-agnostic representations for the unbiased comparison of dynamics based on measured trajectories. We demonstrate that our statistical representation can generalise across neural network instances to discriminate computational mechanisms, obtain interpretable embeddings of neural dynamics in a primate reaching task with geometric correspondence to hand kinematics, and develop a decoding algorithm with state-of-the-art accuracy. Our results highlight the importance of using the intrinsic manifold structure over temporal information to develop better decoding algorithms and assimilate data across experiments.
Body movements carry important information about a person's emotions or mental state and are essential in daily communication. Enhancing the ability of machines to understand emotions expressed through body language can improve the communication of assistive robots with children and elderly users, provide psychiatric professionals with quantitative diagnostic and prognostic assistance, and aid law enforcement in identifying deception. This study develops a high-quality human motor element dataset based on the Laban Movement Analysis movement coding system and utilizes that to jointly learn about motor elements and emotions. Our long-term ambition is to integrate knowledge from computing, psychology, and performing arts to enable automated understanding and analysis of emotion and mental state through body language. This work serves as a launchpad for further research into recognizing emotions through analysis of human movement.