Many joint entity relation extraction models setup two separated label spaces for the two sub-tasks (i.e., entity detection and relation classification). We argue that this setting may hinder the information interaction between entities and relations. In this work, we propose to eliminate the different treatment on the two sub-tasks' label spaces. The input of our model is a table containing all word pairs from a sentence. Entities and relations are represented by squares and rectangles in the table. We apply a unified classifier to predict each cell's label, which unifies the learning of two sub-tasks. For testing, an effective (yet fast) approximate decoder is proposed for finding squares and rectangles from tables. Experiments on three benchmarks (ACE04, ACE05, SciERC) show that, using only half the number of parameters, our model achieves competitive accuracy with the best extractor, and is faster.
In this paper, we investigate the problem of reasoning over natural language statements. Prior neural based approaches do not explicitly consider the inter-dependency among answers and their proofs. In this paper, we propose PRobr, a novel approach for joint answer prediction and proof generation. PRobr defines a joint probabilistic distribution over all possible proof graphs and answers via an induced graphical model. We then optimize the model using variational approximation on top of neural textual representation. Experiments on multiple datasets under diverse settings (fully supervised, few-shot and zero-shot evaluation) verify the effectiveness of PRobr, e.g., achieving 10%-30% improvement on QA accuracy in few/zero-shot evaluation. Our codes and models can be found at https://github.com/changzhisun/PRobr/.
Appliance-level load forecasting plays a critical role in residential energy management, besides having significant importance for ancillary services performed by the utilities. In this paper, we propose to use an LSTM-based sequence-to-sequence (seq2seq) learning model that can capture the load profiles of appliances. We use a real dataset collected fromfour residential buildings and compare our proposed schemewith three other techniques, namely VARMA, Dilated One Dimensional Convolutional Neural Network, and an LSTM model.The results show that the proposed LSTM-based seq2seq model outperforms other techniques in terms of prediction error in most cases.
Despite existing pioneering works on sign language translation (SLT), there is a non-trivial obstacle, i.e., the limited quantity of parallel sign-text data. To tackle this parallel data bottleneck, we propose a sign back-translation (SignBT) approach, which incorporates massive spoken language texts into SLT training. With a text-to-gloss translation model, we first back-translate the monolingual text to its gloss sequence. Then, the paired sign sequence is generated by splicing pieces from an estimated gloss-to-sign bank at the feature level. Finally, the synthetic parallel data serves as a strong supplement for the end-to-end training of the encoder-decoder SLT framework. To promote the SLT research, we further contribute CSL-Daily, a large-scale continuous SLT dataset. It provides both spoken language translations and gloss-level annotations. The topic revolves around people's daily lives (e.g., travel, shopping, medical care), the most likely SLT application scenario. Extensive experimental results and analysis of SLT methods are reported on CSL-Daily. With the proposed sign back-translation method, we obtain a substantial improvement over previous state-of-the-art SLT methods.
We tackle the problem of visual search under resource constraints. Existing systems use the same embedding model to compute representations (embeddings) for the query and gallery images. Such systems inherently face a hard accuracy-efficiency trade-off: the embedding model needs to be large enough to ensure high accuracy, yet small enough to enable query-embedding computation on resource-constrained platforms. This trade-off could be mitigated if gallery embeddings are generated from a large model and query embeddings are extracted using a compact model. The key to building such a system is to ensure representation compatibility between the query and gallery models. In this paper, we address two forms of compatibility: One enforced by modifying the parameters of each model that computes the embeddings. The other by modifying the architectures that compute the embeddings, leading to compatibility-aware neural architecture search (CMP-NAS). We test CMP-NAS on challenging retrieval tasks for fashion images (DeepFashion2), and face images (IJB-C). Compared to ordinary (homogeneous) visual search using the largest embedding model (paragon), CMP-NAS achieves 80-fold and 23-fold cost reduction while maintaining accuracy within 0.3% and 1.6% of the paragon on DeepFashion2 and IJB-C respectively.
Sequence-to-sequence (seq2seq) problems such as machine translation are bidirectional, which naturally derive a pair of directional tasks and two directional learning signals. However, typical seq2seq neural networks are {\em simplex} that only model one unidirectional task, which cannot fully exploit the potential of bidirectional learning signals from parallel data. To address this issue, we propose a {\em duplex} seq2seq neural network, REDER (Reversible Duplex Transformer), and apply it to machine translation. The architecture of REDER has two ends, each of which specializes in a language so as to read and yield sequences in that language. As a result, REDER can simultaneously learn from the bidirectional signals, and enables {\em reversible machine translation} by simply flipping the input and output ends, Experiments on widely-used machine translation benchmarks verify that REDER achieves the first success of reversible machine translation, which helps obtain considerable gains over several strong baselines.
Temporal grounding aims to localize temporal boundaries within untrimmed videos by language queries, but it faces the challenge of two types of inevitable human uncertainties: query uncertainty and label uncertainty. The two uncertainties stem from human subjectivity, leading to limited generalization ability of temporal grounding. In this work, we propose a novel DeNet (Decoupling and De-bias) to embrace human uncertainty: Decoupling - We explicitly disentangle each query into a relation feature and a modified feature. The relation feature, which is mainly based on skeleton-like words (including nouns and verbs), aims to extract basic and consistent information in the presence of query uncertainty. Meanwhile, modified feature assigned with style-like words (including adjectives, adverbs, etc) represents the subjective information, and thus brings personalized predictions; De-bias - We propose a de-bias mechanism to generate diverse predictions, aim to alleviate the bias caused by single-style annotations in the presence of label uncertainty. Moreover, we put forward new multi-label metrics to diversify the performance evaluation. Extensive experiments show that our approach is more effective and robust than state-of-the-arts on Charades-STA and ActivityNet Captions datasets.
Searching for novel molecules with desired chemical properties is crucial in drug discovery. Existing work focuses on developing neural models to generate either molecular sequences or chemical graphs. However, it remains a big challenge to find novel and diverse compounds satisfying several properties. In this paper, we propose MARS, a method for multi-objective drug molecule discovery. MARS is based on the idea of generating the chemical candidates by iteratively editing fragments of molecular graphs. To search for high-quality candidates, it employs Markov chain Monte Carlo sampling (MCMC) on molecules with an annealing scheme and an adaptive proposal. To further improve sample efficiency, MARS uses a graph neural network (GNN) to represent and select candidate edits, where the GNN is trained on-the-fly with samples from MCMC. Experiments show that MARS achieves state-of-the-art performance in various multi-objective settings where molecular bio-activity, drug-likeness, and synthesizability are considered. Remarkably, in the most challenging setting where all four objectives are simultaneously optimized, our approach outperforms previous methods significantly in comprehensive evaluations. The code is available at https://github.com/yutxie/mars.
Accurate pedestrian classification and localization have received considerable attention due to their wide applications such as security monitoring, autonomous driving, etc. Although pedestrian detectors have made great progress in recent years, the fixed Intersection over Union (IoU) based assignment-regression manner still limits their performance. Two main factors are responsible for this: 1) the IoU threshold faces a dilemma that a lower one will result in more false positives, while a higher one will filter out the matched positives; 2) the IoU-based GT-Proposal assignment suffers from the inconsistent supervision problem that spatially adjacent proposals with similar features are assigned to different ground-truth boxes, which means some very similar proposals may be forced to regress towards different targets, and thus confuses the bounding-box regression when predicting the location results. In this paper, we first put forward the question that \textbf{Regression Direction} would affect the performance for pedestrian detection. Consequently, we address the weakness of IoU by introducing one geometric sensitive search algorithm as a new assignment and regression metric. Different from the previous IoU-based \textbf{one-to-one} assignment manner of one proposal to one ground-truth box, the proposed method attempts to seek a reasonable matching between the sets of proposals and ground-truth boxes. Specifically, we boost the MR-FPPI under R$_{75}$ by 8.8\% on Citypersons dataset. Furthermore, by incorporating this method as a metric into the state-of-the-art pedestrian detectors, we show a consistent improvement.
Microgrid (MG) energy management is an important part of MG operation. Various entities are generally involved in the energy management of an MG, e.g., energy storage system (ESS), renewable energy resources (RER) and the load of users, and it is crucial to coordinate these entities. Considering the significant potential of machine learning techniques, this paper proposes a correlated deep Q-learning (CDQN) based technique for the MG energy management. Each electrical entity is modeled as an agent which has a neural network to predict its own Q-values, after which the correlated Q-equilibrium is used to coordinate the operation among agents. In this paper, the Long Short Term Memory networks (LSTM) based deep Q-learning algorithm is introduced and the correlated equilibrium is proposed to coordinate agents. The simulation result shows 40.9% and 9.62% higher profit for ESS agent and photovoltaic (PV) agent, respectively.