Disentangled and invariant representations are two critical goals of representation learning and many approaches have been proposed to achieve either one of them. However, those two goals are actually complementary to each other so that we propose a framework to accomplish both of them simultaneously. We introduce a weakly supervised signal to learn disentangled representation which consists of three splits containing predictive, known nuisance and unknown nuisance information respectively. Furthermore, we incorporate contrastive method to enforce representation invariance. Experiments shows that the proposed method outperforms state-of-the-art (SOTA) methods on four standard benchmarks and shows that the proposed method can have better adversarial defense ability comparing to other methods without adversarial training.
Although the Conditional Variational AutoEncoder (CVAE) model can generate more diversified responses than the traditional Seq2Seq model, the responses often have low relevance with the input words or are illogical with the question. A causal analysis is carried out to study the reasons behind, and a methodology of searching for the mediators and mitigating the confounding bias in dialogues is provided. Specifically, we propose to predict the mediators to preserve relevant information and auto-regressively incorporate the mediators into generating process. Besides, a dynamic topic graph guided conditional variational autoencoder (TGG-CVAE) model is utilized to complement the semantic space and reduce the confounding bias in responses. Extensive experiments demonstrate that the proposed model is able to generate both relevant and informative responses, and outperforms the state-of-the-art in terms of automatic metrics and human evaluations.
In this paper, we study the application of DRL algorithms in the context of local navigation problems, in which a robot moves towards a goal location in unknown and cluttered workspaces equipped only with limited-range exteroceptive sensors, such as LiDAR. Collision avoidance policies based on DRL present some advantages, but they are quite susceptible to local minima, once their capacity to learn suitable actions is limited to the sensor range. Since most robots perform tasks in unstructured environments, it is of great interest to seek generalized local navigation policies capable of avoiding local minima, especially in untrained scenarios. To do so, we propose a novel reward function that incorporates map information gained in the training stage, increasing the agent's capacity to deliberate about the best course of action. Also, we use the SAC algorithm for training our ANN, which shows to be more effective than others in the state-of-the-art literature. A set of sim-to-sim and sim-to-real experiments illustrate that our proposed reward combined with the SAC outperforms the compared methods in terms of local minima and collision avoidance.
The statistical supervised learning framework assumes an input-output set with a joint probability distribution that is reliably represented by the training dataset. The learner is then required to output a prediction rule learned from the training dataset's input-output pairs. In this work, we provide meaningful insights into the asymptotic equipartition property (AEP) \citep{Shannon:1948} in the context of machine learning, and illuminate some of its potential ramifications for few-shot learning. We provide theoretical guarantees for reliable learning under the information-theoretic AEP, and for the generalization error with respect to the sample size. We then focus on a highly efficient recurrent neural net (RNN) framework and propose a reduced-entropy algorithm for few-shot learning. We also propose a mathematical intuition for the RNN as an approximation of a sparse coding solver. We verify the applicability, robustness, and computational efficiency of the proposed approach with image deblurring and optical coherence tomography (OCT) speckle suppression. Our experimental results demonstrate significant potential for improving learning models' sample efficiency, generalization, and time complexity, that can therefore be leveraged for practical real-time applications.
In this paper, we lay out a novel model of neuroplasticity in the form of a horizontal-vertical integration model of neural processing. We believe a new approach to neural modeling will benefit the 3rd wave of AI. The horizontal plane consists of an adaptive network of neurons connected by transmission links which generates spatio-temporal spike patterns. This fits with standard computational neuroscience approaches. Additionally for each individual neuron there is a vertical part consisting of internal adaptive parameters steering the external membrane-expressed parameters which are involved in neural transmission. Each neuron has a vertical modular system of parameters corresponding to (a) external parameters at the membrane layer, divided into compartments (spines, boutons) (b) internal parameters in the submembrane zone and the cytoplasm with its protein signaling network and (c) core parameters in the nucleus for genetic and epigenetic information. In such models, each node (=neuron) in the horizontal network has its own internal memory. Neural transmission and information storage are systematically separated, an important conceptual advance over synaptic weight models. We discuss the membrane-based (external) filtering and selection of outside signals for processing vs. signal loss by fast fluctuations and the neuron-internal computing strategies from intracellular protein signaling to the nucleus as the core system. We want to show that the individual neuron has an important role in the computation of signals and that many assumptions derived from the synaptic weight adjustment hypothesis of memory may not hold in a real brain. Not every transmission event leaves a trace and the neuron is a self-programming device, rather than passively determined by current input. Ultimately we strive to build a flexible memory system that processes facts and events automatically.
Predictive beamforming design is an essential task in realizing high-mobility integrated sensing and communication (ISAC), which highly depends on the accuracy of the channel prediction (CP), i.e., predicting the angular parameters of users. However, the performance of CP highly depends on the estimated historical channel stated information (CSI) with estimation errors, resulting in the performance degradation for most traditional CP methods. To further improve the prediction accuracy, in this paper, we focus on the ISAC in vehicle networks and propose a convolutional long-short term (CLSTM) recurrent neural network (CLRNet) to predict the angle of vehicles for the design of predictive beamforming. In the developed CLRNet, both the convolutional neural network (CNN) module and the LSTM module are adopted to exploit the spatial features and the temporal dependency from the estimated historical angles of vehicles to facilitate the angle prediction. Finally, numerical results demonstrate that the developed CLRNet-based method is robust to the estimation error and can significantly outperform the state-of-the-art benchmarks, achieving an excellent sum-rate performance for ISAC systems.
Heterogeneous graph learning has drawn significant attentions in recent years, due to the success of graph neural networks (GNNs) and the broad applications of heterogeneous information networks. Various heterogeneous graph neural networks have been proposed to generalize GNNs for processing the heterogeneous graphs. Unfortunately, these approaches model the heterogeneity via various complicated modules. This paper aims to propose a simple yet efficient framework to make the homogeneous GNNs have adequate ability to handle heterogeneous graphs. Specifically, we propose Relation Embedding based Graph Neural Networks (RE-GNNs), which employ only one parameter per relation to embed the importance of edge type relations and self-loop connections. To optimize these relation embeddings and the other parameters simultaneously, a gradient scaling factor is proposed to constrain the embeddings to converge to suitable values. Besides, we theoretically demonstrate that our RE-GNNs have more expressive power than the meta-path based heterogeneous GNNs. Extensive experiments on the node classification tasks validate the effectiveness of our proposed method.
We introduce a method to learn unsupervised semantic visual information based on the premise that complex events (e.g., minutes) can be decomposed into simpler events (e.g., a few seconds), and that these simple events are shared across several complex events. We split a long video into short frame sequences to extract their latent representation with three-dimensional convolutional neural networks. A clustering method is used to group representations producing a visual codebook (i.e., a long video is represented by a sequence of integers given by the cluster labels). A dense representation is learned by encoding the co-occurrence probability matrix for the codebook entries. We demonstrate how this representation can leverage the performance of the dense video captioning task in a scenario with only visual features. As a result of this approach, we are able to replace the audio signal in the Bi-Modal Transformer (BMT) method and produce temporal proposals with comparable performance. Furthermore, we concatenate the visual signal with our descriptor in a vanilla transformer method to achieve state-of-the-art performance in captioning compared to the methods that explore only visual features, as well as a competitive performance with multi-modal methods. Our code is available at https://github.com/valterlej/dvcusi.
In addition to being familiar with policies, high investment returns also require extensive knowledge of relevant industry knowledge and news. In addition, it is necessary to leverage relevant theories for investment to make decisions, thereby amplifying investment returns. A effective investment return estimate can feedback the future rate of return of investment behavior. In recent years, deep learning are developing rapidly, and investment return prediction based on deep learning has become an emerging research topic. This paper proposes an embedding-based dual branch approach to predict an investment's return. This approach leverages embedding to encode the investment id into a low-dimensional dense vector, thereby mapping high-dimensional data to a low-dimensional manifold, so that highdimensional features can be represented competitively. In addition, the dual branch model realizes the decoupling of features by separately encoding different information in the two branches. In addition, the swish activation function further improves the model performance. Our approach are validated on the Ubiquant Market Prediction dataset. The results demonstrate the superiority of our approach compared to Xgboost, Lightgbm and Catboost.
Quantitative assessment of the abdominal region from clinically acquired CT scans requires the simultaneous segmentation of abdominal organs. Thanks to the availability of high-performance computational resources, deep learning-based methods have resulted in state-of-the-art performance for the segmentation of 3D abdominal CT scans. However, the complex characterization of organs with fuzzy boundaries prevents the deep learning methods from accurately segmenting these anatomical organs. Specifically, the voxels on the boundary of organs are more vulnerable to misprediction due to the highly-varying intensity of inter-organ boundaries. This paper investigates the possibility of improving the abdominal image segmentation performance of the existing 3D encoder-decoder networks by leveraging organ-boundary prediction as a complementary task. To address the problem of abdominal multi-organ segmentation, we train the 3D encoder-decoder network to simultaneously segment the abdominal organs and their corresponding boundaries in CT scans via multi-task learning. The network is trained end-to-end using a loss function that combines two task-specific losses, i.e., complete organ segmentation loss and boundary prediction loss. We explore two different network topologies based on the extent of weights shared between the two tasks within a unified multi-task framework. To evaluate the utilization of complementary boundary prediction task in improving the abdominal multi-organ segmentation, we use three state-of-the-art encoder-decoder networks: 3D UNet, 3D UNet++, and 3D Attention-UNet. The effectiveness of utilizing the organs' boundary information for abdominal multi-organ segmentation is evaluated on two publically available abdominal CT datasets. A maximum relative improvement of 3.5% and 3.6% is observed in Mean Dice Score for Pancreas-CT and BTCV datasets, respectively.