In this paper, an unrolling algorithm of the iterative subspace-based optimization method (SOM) is proposed for solving full-wave inverse scattering problems (ISPs). The unrolling network, named SOM-Net, inherently embeds the Lippmann- Schwinger physical model into the design of network structures. The SOM-Net takes the deterministic induced current and the raw permittivity image obtained from back-propagation (BP) as the input. It then updates the induced current and the permittivity successively in sub-network blocks of the SOM- Net by imitating iterations of the SOM. The final output of the SOM-Net is the full predicted induced current, from which the scattered field and the permittivity image can also be deduced analytically. The parameters of the SOM-Net are optimized in a supervised manner with the total loss to simultaneously ensure the consistency of the induced current, the scattered field, and the permittivity in the governing equations. Numerical tests on both synthetic and experimental data verify the superior performance of the proposed SOM-Net over typical ones. The results on challenging examples like scatterers with tough profiles or high permittivity demonstrate the good generalization ability of the SOM-Net. With the use of deep unrolling technology, this work builds a bridge between traditional iterative methods and deep learning methods for solving ISPs.
Graph neural networks have emerged as a leading architecture for many graph-level tasks such as graph classification and graph generation with a notable improvement. Among these tasks, graph pooling is an essential component of graph neural network architectures for obtaining a holistic graph-level representation of the entire graph. Although a great variety of methods have been proposed in this promising and fast-developing research field, to the best of our knowledge, little effort has been made to systematically summarize these methods. To set the stage for the development of future works, in this paper, we attempt to fill this gap by providing a broad review of recent methods on graph pooling. Specifically, 1) we first propose a taxonomy of existing graph pooling methods and provide a mathematical summary for each category; 2) next, we provide an overview of the libraries related to graph pooling, including the commonly used datasets, model architectures for downstream tasks, and open-source implementations; 3) then, we further outline in brief the applications that incorporate the idea of graph pooling in a number of domains; 4) and finally, we discuss some critical challenges faced by the current studies and share our insights on potential directions for improving graph pooling in the future.
In this paper, a deformable object is considered for cameras deployment with the aim of visual coverage. The object contour is discretized into sampled points as meshes, and the deformation is represented as continuous trajectories for the sampled points. To reduce the computational complexity, some feature points are carefully selected representing the continuous deformation process, and the visual coverage for the deformable object is transferred to cover the specific feature points. In particular, the vertexes of a rectangle that can contain the entire deformation trajectory of every sampled point on the object contour are chosen as the feature points. An improved wolf pack algorithm is then proposed to solve the optimization problem. Finally, simulation results are given to demonstrate the effectiveness of the proposed deployment method of camera network.
Electroencephalogram (EEG) decoding aims to identify the perceptual, semantic, and cognitive content of neural processing based on non-invasively measured brain activity. Traditional EEG decoding methods have achieved moderate success when applied to data acquired in static, well-controlled lab environments. However, an open-world environment is a more realistic setting, where situations affecting EEG recordings can emerge unexpectedly, significantly weakening the robustness of existing methods. In recent years, deep learning (DL) has emerged as a potential solution for such problems due to its superior capacity in feature extraction. It overcomes the limitations of defining `handcrafted' features or features extracted using shallow architectures, but typically requires large amounts of costly, expertly-labelled data - something not always obtainable. Combining DL with domain-specific knowledge may allow for development of robust approaches to decode brain activity even with small-sample data. Although various DL methods have been proposed to tackle some of the challenges in EEG decoding, a systematic tutorial overview, particularly for open-world applications, is currently lacking. This article therefore provides a comprehensive survey of DL methods for open-world EEG decoding, and identifies promising research directions to inspire future studies for EEG decoding in real-world applications.
Electroencephalogram (EEG) decoding aims to identify the perceptual, semantic, and cognitive content of neural processing based on non-invasively measured brain activity. Traditional EEG decoding methods have achieved moderate success when applied to data acquired in static, well-controlled lab environments. However, an open-world environment is a more realistic setting, where situations affecting EEG recordings can emerge unexpectedly, significantly weakening the robustness of existing methods. In recent years, deep learning (DL) has emerged as a potential solution for such problems due to its superior capacity in feature extraction. It overcomes the limitations of defining `handcrafted' features or features extracted using shallow architectures, but typically requires large amounts of costly, expertly-labelled data - something not always obtainable. Combining DL with domain-specific knowledge may allow for development of robust approaches to decode brain activity even with small-sample data. Although various DL methods have been proposed to tackle some of the challenges in EEG decoding, a systematic tutorial overview, particularly for open-world applications, is currently lacking. This article therefore provides a comprehensive survey of DL methods for open-world EEG decoding, and identifies promising research directions to inspire future studies for EEG decoding in real-world applications.
Ranking system is the core part of modern retrieval and recommender systems, where the goal is to rank candidate items given user contexts. Optimizing ranking systems online means that the deployed system can serve user requests, e.g., queries in the web search, and optimize the ranking policy by learning from user interactions, e.g., clicks. Bandit is a general online learning framework and can be used in our optimization task. However, due to the unique features of ranking, there are several challenges in designing bandit algorithms for ranking system optimization. In this dissertation, we study and propose solutions for four challenges in optimizing ranking systems online: effectiveness, safety, nonstationarity, and diversification. First, the effectiveness is related to how fast the algorithm learns from interactions. We study the effective online ranker evaluation task and propose the MergeDTS algorithm to solve the problem effectively. Second, the deployed algorithm should be safe, which means the algorithm only displays reasonable content to user requests. To solve the safe online learning to rank problem, we propose the BubbleRank algorithm. Third, as users change their preferences constantly, the algorithm should handle the nonstationarity. We formulate this nonstationary online learning to rank problem as cascade non-stationary bandits and propose CascadeDUCB and CascadeSWUCB algorithms to solve the problem. Finally, the contents in ranked lists should be diverse. We consider the results diversification task and propose the CascadeHybird algorithm that considers both the item relevance and results diversification when learning from user interactions.
Tactile sensing plays an important role in robotic perception and manipulation tasks. To overcome the real-world limitations of data collection, simulating tactile response in a virtual environment comes as a desirable direction of robotic research. In this paper, we propose Elastic Interaction of Particles (EIP) for tactile simulation. Most existing works model the tactile sensor as a rigid multi-body, which is incapable of reflecting the elastic property of the tactile sensor as well as characterizing the fine-grained physical interaction between the two objects. By contrast, EIP models the tactile sensor as a group of coordinated particles, and the elastic property is applied to regulate the deformation of particles during contact. With the tactile simulation by EIP, we further propose a tactile-visual perception network that enables information fusion between tactile data and visual images. The perception network is based on a global-to-local fusion mechanism where multi-scale tactile features are aggregated to the corresponding local region of the visual modality with the guidance of tactile positions and directions. The fusion method exhibits superiority regarding the 3D geometric reconstruction task.
Unbiased Learning to Rank (ULTR) studies the problem of learning a ranking function based on biased user interactions. In this framework, ULTR algorithms have to rely on a large amount of user data that are collected, stored, and aggregated by central servers. In this paper, we consider an on-device search setting, where users search against their personal corpora on their local devices, and the goal is to learn a ranking function from biased user interactions. Due to privacy constraints, users' queries, personal documents, results lists, and raw interaction data will not leave their devices, and ULTR has to be carried out via Federated Learning (FL). Directly applying existing ULTR algorithms on users' devices could suffer from insufficient training data due to the limited amount of local interactions. To address this problem, we propose the FedIPS algorithm, which learns from user interactions on-device under the coordination of a central server and uses click propensities to remove the position bias in user interactions. Our evaluation of FedIPS on the Yahoo and Istella datasets shows that FedIPS is robust over a range of position biases.
In addition to the high cost and complex setup, the main reason for the limitation of the three-dimensional (3D) display is the problem of accurately estimating the user's current point-of-gaze (PoG) in a 3D space. In this paper, we present a novel noncontact technique for the PoG estimation in a stereoscopic environment, which integrates a 3D stereoscopic display system and an eye-tracking system. The 3D stereoscopic display system can provide users with a friendly and immersive high-definition viewing experience without wearing any equipment. To accurately locate the user's 3D PoG in the field of view, we build a regression-based 3D eye-tracking model with the eye movement data and stereo stimulus videos as input. Besides, to train an optimal regression model, we also design and annotate a dataset that contains 30 users' eye-tracking data corresponding to two designed stereo test scenes. Innovatively, this dataset introduces feature vectors between eye region landmarks for the gaze vector estimation and a combined feature set for the gaze depth estimation. Moreover, five traditional regression models are trained and evaluated based on this dataset. Experimental results show that the average errors of the 3D PoG are about 0.90~cm on the X-axis, 0.83~cm on the Y-axis, and 1.48~cm$/$0.12~m along the Z-axis with the scene-depth range in 75~cm$/$8~m, respectively.
Human action recognition is an active research area in computer vision. Although great process has been made, previous methods mostly recognize actions based on depth data at only one scale, and thus they often neglect multi-scale features that provide additional information action recognition in practical application scenarios. In this paper, we present a novel framework focusing on multi-scale motion information to recognize human actions from depth video sequences. We propose a multi-scale feature map called Laplacian pyramid depth motion images(LP-DMI). We employ depth motion images (DMI) as the templates to generate the multi-scale static representation of actions. Then, we caculate LP-DMI to enhance multi-scale dynamic information of motions and reduces redundant static information in human bodies. We further extract the multi-granularity descriptor called LP-DMI-HOG to provide more discriminative features. Finally, we utilize extreme learning machine (ELM) for action classification. The proposed method yeilds the recognition accuracy of 93.41%, 85.12%, 91.94% on public MSRAction3D dataset, UTD-MHAD and DHA dataset. Through extensive experiments, we prove that our method outperforms state-of-the-art benchmarks.