Addressing the challenge of high annotation costs in solving Math Word Problems (MWPs) through full supervision with intermediate equations, recent works have proposed weakly supervised task settings that rely solely on the final answer as a supervised signal. Existing leading approaches typically employ various search techniques to infer intermediate equations, but cannot ensure their semantic consistency with natural language descriptions. The rise of Large Language Models (LLMs) like ChatGPT has opened up new possibilities for addressing MWPs directly. However, the computational demands of LLMs make them less than ideal for use in settings where resources are tight. In light of these challenges, we introduce an innovative two-stage framework that adeptly transfers mathematical Expertise from large to tiny language models. In \emph{Distillation Stage}, we propose a series of extraction processes that satisfy the properties of MWPs to distill mathematical knowledge from LLMs to construct problem-equation pairs required for supervised training. In \emph{Refinement Stage}, Due to Knowledge distilling method cannot guarantee the full utilization of all data, we further utilize the unsuccessfully searched data effectively by Knowledge Refine method. Finally, We train a small model using distilled data generated through two-stage methods. As our method fully leverages the semantic understanding capabilities during the searching 'problem-equation' pair, it demonstrates significantly improved performance on the Math23K and Weak12K datasets compared to existing small model methods, while maintaining a much lower computational cost than ChatGPT.
The Text-to-SQL task, aiming to translate the natural language of the questions into SQL queries, has drawn much attention recently. One of the most challenging problems of Text-to-SQL is how to generalize the trained model to the unseen database schemas, also known as the cross-domain Text-to-SQL task. The key lies in the generalizability of (i) the encoding method to model the question and the database schema and (ii) the question-schema linking method to learn the mapping between words in the question and tables/columns in the database schema. Focusing on the above two key issues, we propose a Structure-Aware Dual Graph Aggregation Network (SADGA) for cross-domain Text-to-SQL. In SADGA, we adopt the graph structure to provide a unified encoding model for both the natural language question and database schema. Based on the proposed unified modeling, we further devise a structure-aware aggregation method to learn the mapping between the question-graph and schema-graph. The structure-aware aggregation method is featured with Global Graph Linking, Local Graph Linking, and Dual-Graph Aggregation Mechanism. We not only study the performance of our proposal empirically but also achieved 3rd place on the challenging Text-to-SQL benchmark Spider at the time of writing.
Graph convolutional networks (GCNs) have achieved great success in dealing with data of non-Euclidean structures. Their success directly attributes to fitting graph structures effectively to data such as in social media and knowledge databases. For image processing applications, the use of graph structures and GCNs have not been fully explored. In this paper, we propose a novel encoder-decoder network with added graph convolutions by converting feature maps to vertexes of a pre-generated graph to synthetically construct graph-structured data. By doing this, we inexplicitly apply graph Laplacian regularization to the feature maps, making them more structured. The experiments show that it significantly boosts performance for image restoration tasks, including deblurring and super-resolution. We believe it opens up opportunities for GCN-based approaches in more applications.
Named entity recognition (NER) for identifying proper nouns in unstructured text is one of the most important and fundamental tasks in natural language processing. However, despite the widespread use of NER models, they still require a large-scale labeled data set, which incurs a heavy burden due to manual annotation. Domain adaptation is one of the most promising solutions to this problem, where rich labeled data from the relevant source domain are utilized to strengthen the generalizability of a model based on the target domain. However, the mainstream cross-domain NER models are still affected by the following two challenges (1) Extracting domain-invariant information such as syntactic information for cross-domain transfer. (2) Integrating domain-specific information such as semantic information into the model to improve the performance of NER. In this study, we present a semi-supervised framework for transferable NER, which disentangles the domain-invariant latent variables and domain-specific latent variables. In the proposed framework, the domain-specific information is integrated with the domain-specific latent variables by using a domain predictor. The domain-specific and domain-invariant latent variables are disentangled using three mutual information regularization terms, i.e., maximizing the mutual information between the domain-specific latent variables and the original embedding, maximizing the mutual information between the domain-invariant latent variables and the original embedding, and minimizing the mutual information between the domain-specific and domain-invariant latent variables. Extensive experiments demonstrated that our model can obtain state-of-the-art performance with cross-domain and cross-lingual NER benchmark data sets.
Existing leading code comment generation approaches with the structure-to-sequence framework ignores the type information of the interpretation of the code, e.g., operator, string, etc. However, introducing the type information into the existing framework is non-trivial due to the hierarchical dependence among the type information. In order to address the issues above, we propose a Type Auxiliary Guiding encoder-decoder framework for the code comment generation task which considers the source code as an N-ary tree with type information associated with each node. Specifically, our framework is featured with a Type-associated Encoder and a Type-restricted Decoder which enables adaptive summarization of the source code. We further propose a hierarchical reinforcement learning method to resolve the training difficulties of our proposed framework. Extensive evaluations demonstrate the state-of-the-art performance of our framework with both the auto-evaluated metrics and case studies.
The challenge of learning disentangled representation has recently attracted much attention and boils down to a competition using a new real world disentanglement dataset (Gondal et al., 2019). Various methods based on variational auto-encoder have been proposed to solve this problem, by enforcing the independence between the representation and modifying the regularization term in the variational lower bound. However recent work by Locatello et al. (2018) has demonstrated that the proposed methods are heavily influenced by randomness and the choice of the hyper-parameter. In this work, instead of designing a new regularization term, we adopt the FactorVAE but improve the reconstruction performance and increase the capacity of network and the training step. The strategy turns out to be very effective and achieve the 1st place in the challenge.
Data-driven models are becoming essential parts in modern mechanical systems, commonly used to capture the behavior of various equipment and varying environmental characteristics. Despite the advantages of these data-driven models on excellent adaptivity to high dynamics and aging equipment, they are usually hungry to massive labels over historical data, mostly contributed by human engineers at an extremely high cost. The label demand is now the major limiting factor to modeling accuracy, hindering the fulfillment of visions for applications. Fortunately, domain adaptation enhances the model generalization by utilizing the labelled source data as well as the unlabelled target data and then we can reuse the model on different domains. However, the mainstream domain adaptation methods cannot achieve ideal performance on time series data, because most of them focus on static samples and even the existing time series domain adaptation methods ignore the properties of time series data, such as temporal causal mechanism. In this paper, we assume that causal mechanism is invariant and present our Causal Mechanism Transfer Network(CMTN) for time series domain adaptation. By capturing and transferring the dynamic and temporal causal mechanism of multivariate time series data and alleviating the time lags and different value ranges among different machines, CMTN allows the data-driven models to exploit existing data and labels from similar systems, such that the resulting model on a new system is highly reliable even with very limited data. We report our empirical results and lessons learned from two real-world case studies, on chiller plant energy optimization and boiler fault detection, which outperforms the existing state-of-the-art method.
Refrigeration and chiller optimization is an important and well studied topic in mechanical engineering, mostly taking advantage of physical models, designed on top of over-simplified assumptions, over the equipments. Conventional optimization techniques using physical models make decisions of online parameter tuning, based on very limited information of hardware specifications and external conditions, e.g., outdoor weather. In recent years, new generation of sensors is becoming essential part of new chiller plants, for the first time allowing the system administrators to continuously monitor the running status of all equipments in a timely and accurate way. The explosive growth of data flowing to databases, driven by the increasing analytical power by machine learning and data mining, unveils new possibilities of data-driven approaches for real-time chiller plant optimization. This paper presents our research and industrial experience on the adoption of data models and optimizations on chiller plant and discusses the lessons learnt from our practice on real world plants. Instead of employing complex machine learning models, we emphasize the incorporation of appropriate domain knowledge into data analysis tools, which turns out to be the key performance improver over state-of-the-art deep learning techniques by a significant margin. Our empirical evaluation on a real world chiller plant achieves savings by more than 7% on daily power consumption.
Machine translation is going through a radical revolution, driven by the explosive development of deep learning techniques using Convolutional Neural Network (CNN) and Recurrent Neural Network (RNN). In this paper, we consider a special case in machine translation problems, targeting to convert natural language into Structured Query Language (SQL) for data retrieval over relational database. Although generic CNN and RNN learn the grammar structure of SQL when trained with sufficient samples, the accuracy and training efficiency of the model could be dramatically improved, when the translation model is deeply integrated with the grammar rules of SQL. We present a new encoder-decoder framework, with a suite of new approaches, including new semantic features fed into the encoder, grammar-aware states injected into the memory of decoder, as well as recursive state management for sub-queries. These techniques help the neural network better focus on understanding semantics of operations in natural language and save the efforts on SQL grammar learning. The empirical evaluation on real world database and queries show that our approach outperform state-of-the-art solution by a significant margin.