Alert button
Picture for Yuchen Zhuang

Yuchen Zhuang

Alert button

DF2: Distribution-Free Decision-Focused Learning

Aug 11, 2023
Lingkai Kong, Wenhao Mu, Jiaming Cui, Yuchen Zhuang, B. Aditya Prakash, Bo Dai, Chao Zhang

Decision-focused learning (DFL) has recently emerged as a powerful approach for predict-then-optimize problems by customizing a predictive model to a downstream optimization task. However, existing end-to-end DFL methods are hindered by three significant bottlenecks: model mismatch error, sample average approximation error, and gradient approximation error. Model mismatch error stems from the misalignment between the model's parameterized predictive distribution and the true probability distribution. Sample average approximation error arises when using finite samples to approximate the expected optimization objective. Gradient approximation error occurs as DFL relies on the KKT condition for exact gradient computation, while most methods approximate the gradient for backpropagation in non-convex objectives. In this paper, we present DF2 -- the first \textit{distribution-free} decision-focused learning method explicitly designed to address these three bottlenecks. Rather than depending on a task-specific forecaster that requires precise model assumptions, our method directly learns the expected optimization function during training. To efficiently learn the function in a data-driven manner, we devise an attention-based model architecture inspired by the distribution-based parameterization of the expected objective. Our method is, to the best of our knowledge, the first to address all three bottlenecks within a single model. We evaluate DF2 on a synthetic problem, a wind power bidding problem, and a non-convex vaccine distribution problem, demonstrating the effectiveness of DF2.

* 24 pages 
Viaarxiv icon

Autoregressive Diffusion Model for Graph Generation

Jul 17, 2023
Lingkai Kong, Jiaming Cui, Haotian Sun, Yuchen Zhuang, B. Aditya Prakash, Chao Zhang

Figure 1 for Autoregressive Diffusion Model for Graph Generation
Figure 2 for Autoregressive Diffusion Model for Graph Generation
Figure 3 for Autoregressive Diffusion Model for Graph Generation
Figure 4 for Autoregressive Diffusion Model for Graph Generation

Diffusion-based graph generative models have recently obtained promising results for graph generation. However, existing diffusion-based graph generative models are mostly one-shot generative models that apply Gaussian diffusion in the dequantized adjacency matrix space. Such a strategy can suffer from difficulty in model training, slow sampling speed, and incapability of incorporating constraints. We propose an \emph{autoregressive diffusion} model for graph generation. Unlike existing methods, we define a node-absorbing diffusion process that operates directly in the discrete graph space. For forward diffusion, we design a \emph{diffusion ordering network}, which learns a data-dependent node absorbing ordering from graph topology. For reverse generation, we design a \emph{denoising network} that uses the reverse node ordering to efficiently reconstruct the graph by predicting the node type of the new node and its edges with previously denoised nodes at a time. Based on the permutation invariance of graph, we show that the two networks can be jointly trained by optimizing a simple lower bound of data likelihood. Our experiments on six diverse generic graph datasets and two molecule datasets show that our model achieves better or comparable generation performance with previous state-of-the-art, and meanwhile enjoys fast generation speed.

* 18 pages 
Viaarxiv icon

Large Language Model as Attributed Training Data Generator: A Tale of Diversity and Bias

Jun 28, 2023
Yue Yu, Yuchen Zhuang, Jieyu Zhang, Yu Meng, Alexander Ratner, Ranjay Krishna, Jiaming Shen, Chao Zhang

Figure 1 for Large Language Model as Attributed Training Data Generator: A Tale of Diversity and Bias
Figure 2 for Large Language Model as Attributed Training Data Generator: A Tale of Diversity and Bias
Figure 3 for Large Language Model as Attributed Training Data Generator: A Tale of Diversity and Bias
Figure 4 for Large Language Model as Attributed Training Data Generator: A Tale of Diversity and Bias

Large language models (LLMs) have been recently leveraged as training data generators for various natural language processing (NLP) tasks. While previous research has explored different approaches to training models using generated data, they generally rely on simple class-conditional prompts, which may limit the diversity of the generated data and inherit systematic biases of LLM. Thus, we investigate training data generation with diversely attributed prompts (e.g., specifying attributes like length and style), which have the potential to yield diverse and attributed generated data. Our investigation focuses on datasets with high cardinality and diverse domains, wherein we demonstrate that attributed prompts outperform simple class-conditional prompts in terms of the resulting model's performance. Additionally, we present a comprehensive empirical study on data generation encompassing vital aspects like bias, diversity, and efficiency, and highlight three key observations: firstly, synthetic datasets generated by simple prompts exhibit significant biases, such as regional bias; secondly, attribute diversity plays a pivotal role in enhancing model performance; lastly, attributed prompts achieve the performance of simple class-conditional prompts while utilizing only 5\% of the querying cost of ChatGPT associated with the latter. We release the generated dataset and used prompts to facilitate future research. The data and code will be available on \url{https://github.com/yueyu1030/AttrPrompt}.

* Work in progress. A shorter version is accepted to the ICML DMLR workshop 
Viaarxiv icon

G-STO: Sequential Main Shopping Intention Detection via Graph-Regularized Stochastic Transformer

Jun 25, 2023
Yuchen Zhuang, Xin Shen, Yan Zhao, Chaosheng Dong, Ming Wang, Jin Li, Chao Zhang

Sequential recommendation requires understanding the dynamic patterns of users' behaviors, contexts, and preferences from their historical interactions. Most existing works focus on modeling user-item interactions only from the item level, ignoring that they are driven by latent shopping intentions (e.g., ballpoint pens, miniatures, etc). The detection of the underlying shopping intentions of users based on their historical interactions is a crucial aspect for e-commerce platforms, such as Amazon, to enhance the convenience and efficiency of their customers' shopping experiences. Despite its significance, the area of main shopping intention detection remains under-investigated in the academic literature. To fill this gap, we propose a graph-regularized stochastic Transformer method, G-STO. By considering intentions as sets of products and user preferences as compositions of intentions, we model both of them as stochastic Gaussian embeddings in the latent representation space. Instead of training the stochastic representations from scratch, we develop a global intention relational graph as prior knowledge for regularization, allowing relevant shopping intentions to be distributionally close. Finally, we feed the newly regularized stochastic embeddings into Transformer-based models to encode sequential information from the intention transitions. We evaluate our main shopping intention identification model on three different real-world datasets, where G-STO achieves significantly superior performances to the baselines by 18.08% in Hit@1, 7.01% in Hit@10, and 6.11% in NDCG@10 on average.

Viaarxiv icon

ToolQA: A Dataset for LLM Question Answering with External Tools

Jun 23, 2023
Yuchen Zhuang, Yue Yu, Kuan Wang, Haotian Sun, Chao Zhang

Large Language Models (LLMs) have demonstrated impressive performance in various NLP tasks, but they still suffer from challenges such as hallucination and weak numerical reasoning. To overcome these challenges, external tools can be used to enhance LLMs' question-answering abilities. However, current evaluation methods do not distinguish between questions that can be answered using LLMs' internal knowledge and those that require external information through tool use. To address this issue, we introduce a new dataset called ToolQA, which is designed to faithfully evaluate LLMs' ability to use external tools for question answering. Our development of ToolQA involved a scalable, automated process for dataset curation, along with 13 specialized tools designed for interaction with external knowledge in order to answer questions. Importantly, we strive to minimize the overlap between our benchmark data and LLMs' pre-training data, enabling a more precise evaluation of LLMs' tool-use reasoning abilities. We conducted an in-depth diagnosis of existing tool-use LLMs to highlight their strengths, weaknesses, and potential improvements. Our findings set a new benchmark for evaluating LLMs and suggest new directions for future advancements. Our data and code are freely available to the broader scientific community on GitHub.

Viaarxiv icon

MUBen: Benchmarking the Uncertainty of Pre-Trained Models for Molecular Property Prediction

Jun 14, 2023
Yinghao Li, Lingkai Kong, Yuanqi Du, Yue Yu, Yuchen Zhuang, Wenhao Mu, Chao Zhang

Figure 1 for MUBen: Benchmarking the Uncertainty of Pre-Trained Models for Molecular Property Prediction
Figure 2 for MUBen: Benchmarking the Uncertainty of Pre-Trained Models for Molecular Property Prediction
Figure 3 for MUBen: Benchmarking the Uncertainty of Pre-Trained Models for Molecular Property Prediction
Figure 4 for MUBen: Benchmarking the Uncertainty of Pre-Trained Models for Molecular Property Prediction

Large Transformer models pre-trained on massive unlabeled molecular data have shown great success in predicting molecular properties. However, these models can be prone to overfitting during fine-tuning, resulting in over-confident predictions on test data that fall outside of the training distribution. To address this issue, uncertainty quantification (UQ) methods can be used to improve the models' calibration of predictions. Although many UQ approaches exist, not all of them lead to improved performance. While some studies have used UQ to improve molecular pre-trained models, the process of selecting suitable backbone and UQ methods for reliable molecular uncertainty estimation remains underexplored. To address this gap, we present MUBen, which evaluates different combinations of backbone and UQ models to quantify their performance for both property prediction and uncertainty estimation. By fine-tuning various backbone molecular representation models using different molecular descriptors as inputs with UQ methods from different categories, we critically assess the influence of architectural decisions and training strategies. Our study offers insights for selecting UQ and backbone models, which can facilitate research on uncertainty-critical applications in fields such as materials science and drug discovery.

Viaarxiv icon

DyGen: Learning from Noisy Labels via Dynamics-Enhanced Generative Modeling

Jun 13, 2023
Yuchen Zhuang, Yue Yu, Lingkai Kong, Xiang Chen, Chao Zhang

Figure 1 for DyGen: Learning from Noisy Labels via Dynamics-Enhanced Generative Modeling
Figure 2 for DyGen: Learning from Noisy Labels via Dynamics-Enhanced Generative Modeling
Figure 3 for DyGen: Learning from Noisy Labels via Dynamics-Enhanced Generative Modeling
Figure 4 for DyGen: Learning from Noisy Labels via Dynamics-Enhanced Generative Modeling

Learning from noisy labels is a challenge that arises in many real-world applications where training data can contain incorrect or corrupted labels. When fine-tuning language models with noisy labels, models can easily overfit the label noise, leading to decreased performance. Most existing methods for learning from noisy labels use static input features for denoising, but these methods are limited by the information they can provide on true label distributions and can result in biased or incorrect predictions. In this work, we propose the Dynamics-Enhanced Generative Model (DyGen), which uses dynamic patterns in the embedding space during the fine-tuning process of language models to improve noisy label predictions. DyGen uses the variational auto-encoding framework to infer the posterior distributions of true labels from noisy labels and training dynamics. Additionally, a co-regularization mechanism is used to minimize the impact of potentially noisy labels and priors. DyGen demonstrates an average accuracy improvement of 3.10% on two synthetic noise datasets and 1.48% on three real-world noise datasets compared to the previous state-of-the-art. Extensive experiments and analyses show the effectiveness of each component in DyGen. Our code is available for reproducibility on GitHub.

* Accepted by KDD 2023 research track 
Viaarxiv icon

AdaPlanner: Adaptive Planning from Feedback with Language Models

May 26, 2023
Haotian Sun, Yuchen Zhuang, Lingkai Kong, Bo Dai, Chao Zhang

Figure 1 for AdaPlanner: Adaptive Planning from Feedback with Language Models
Figure 2 for AdaPlanner: Adaptive Planning from Feedback with Language Models
Figure 3 for AdaPlanner: Adaptive Planning from Feedback with Language Models
Figure 4 for AdaPlanner: Adaptive Planning from Feedback with Language Models

Large language models (LLMs) have recently demonstrated the potential in acting as autonomous agents for sequential decision-making tasks. However, most existing methods either take actions greedily without planning or rely on static plans that are not adaptable to environmental feedback. Consequently, the sequential decision-making performance of LLM agents degenerates with problem complexity and plan horizons increase. We propose a closed-loop approach, AdaPlanner, which allows the LLM agent to refine its self-generated plan adaptively in response to environmental feedback. In AdaPlanner, the LLM agent adaptively refines its plan from feedback with both in-plan and out-of-plan refinement strategies. To mitigate hallucination, we develop a code-style LLM prompt structure that facilitates plan generation across a variety of tasks, environments, and agent capabilities. Furthermore, we propose a skill discovery mechanism that leverages successful plans as few-shot exemplars, enabling the agent to plan and refine with fewer task demonstrations. Our experiments in the ALFWorld and MiniWoB++ environments demonstrate that AdaPlanner outperforms state-of-the-art baselines by 3.73% and 4.11% while utilizing 2x and 600x fewer samples, respectively.

Viaarxiv icon

ReGen: Zero-Shot Text Classification via Training Data Generation with Progressive Dense Retrieval

May 18, 2023
Yue Yu, Yuchen Zhuang, Rongzhi Zhang, Yu Meng, Jiaming Shen, Chao Zhang

Figure 1 for ReGen: Zero-Shot Text Classification via Training Data Generation with Progressive Dense Retrieval
Figure 2 for ReGen: Zero-Shot Text Classification via Training Data Generation with Progressive Dense Retrieval
Figure 3 for ReGen: Zero-Shot Text Classification via Training Data Generation with Progressive Dense Retrieval
Figure 4 for ReGen: Zero-Shot Text Classification via Training Data Generation with Progressive Dense Retrieval

With the development of large language models (LLMs), zero-shot learning has attracted much attention for various NLP tasks. Different from prior works that generate training data with billion-scale natural language generation (NLG) models, we propose a retrieval-enhanced framework to create training data from a general-domain unlabeled corpus. To realize this, we first conduct contrastive pretraining to learn an unsupervised dense retriever for extracting the most relevant documents using class-descriptive verbalizers. We then further propose two simple strategies, namely Verbalizer Augmentation with Demonstrations and Self-consistency Guided Filtering to improve the topic coverage of the dataset while removing noisy examples. Experiments on nine datasets demonstrate that REGEN achieves 4.3% gain over the strongest baselines and saves around 70% of the time compared to baselines using large NLG models. Besides, REGEN can be naturally integrated with recently proposed large language models to boost performance.

* ACL 2023 Findings (Code: https://github.com/yueyu1030/ReGen) 
Viaarxiv icon

End-to-End Stochastic Optimization with Energy-Based Model

Nov 25, 2022
Lingkai Kong, Jiaming Cui, Yuchen Zhuang, Rui Feng, B. Aditya Prakash, Chao Zhang

Figure 1 for End-to-End Stochastic Optimization with Energy-Based Model
Figure 2 for End-to-End Stochastic Optimization with Energy-Based Model
Figure 3 for End-to-End Stochastic Optimization with Energy-Based Model
Figure 4 for End-to-End Stochastic Optimization with Energy-Based Model

Decision-focused learning (DFL) was recently proposed for stochastic optimization problems that involve unknown parameters. By integrating predictive modeling with an implicitly differentiable optimization layer, DFL has shown superior performance to the standard two-stage predict-then-optimize pipeline. However, most existing DFL methods are only applicable to convex problems or a subset of nonconvex problems that can be easily relaxed to convex ones. Further, they can be inefficient in training due to the requirement of solving and differentiating through the optimization problem in every training iteration. We propose SO-EBM, a general and efficient DFL method for stochastic optimization using energy-based models. Instead of relying on KKT conditions to induce an implicit optimization layer, SO-EBM explicitly parameterizes the original optimization problem using a differentiable optimization layer based on energy functions. To better approximate the optimization landscape, we propose a coupled training objective that uses a maximum likelihood loss to capture the optimum location and a distribution-based regularizer to capture the overall energy landscape. Finally, we propose an efficient training procedure for SO-EBM with a self-normalized importance sampler based on a Gaussian mixture proposal. We evaluate SO-EBM in three applications: power scheduling, COVID-19 resource allocation, and non-convex adversarial security game, demonstrating the effectiveness and efficiency of SO-EBM.

* NeurIPS 2022 Oral 
Viaarxiv icon