Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Yiming Yang

Kuaishou Technology

Let's Sample Step by Step: Adaptive-Consistency for Efficient Reasoning with LLMs

May 19, 2023

Pranjal Aggarwal, Aman Madaan, Yiming Yang, Mausam

Figure 1 for Let's Sample Step by Step: Adaptive-Consistency for Efficient Reasoning with LLMs

Figure 2 for Let's Sample Step by Step: Adaptive-Consistency for Efficient Reasoning with LLMs

Figure 3 for Let's Sample Step by Step: Adaptive-Consistency for Efficient Reasoning with LLMs

Figure 4 for Let's Sample Step by Step: Adaptive-Consistency for Efficient Reasoning with LLMs

Abstract:A popular approach for improving the correctness of output from large language models (LLMs) is Self-Consistency - poll the LLM multiple times and output the most frequent solution. Existing Self-Consistency techniques always draw a constant number of samples per question, where a better approach will be to non-uniformly distribute the available budget based on the amount of agreement in the samples drawn so far. In response, we introduce Adaptive-Consistency, a cost-efficient, model-agnostic technique that dynamically adjusts the number of samples per question using a lightweight stopping criterion. Our experiments over 13 datasets and two LLMs demonstrate that Adaptive-Consistency reduces sample budget by up to 6.0 times with an average accuracy drop of less than 0.1%.

Via

Access Paper or Ask Questions

Active Retrieval Augmented Generation

May 11, 2023

Zhengbao Jiang, Frank F. Xu, Luyu Gao, Zhiqing Sun, Qian Liu, Jane Dwivedi-Yu, Yiming Yang, Jamie Callan, Graham Neubig

Figure 1 for Active Retrieval Augmented Generation

Figure 2 for Active Retrieval Augmented Generation

Figure 3 for Active Retrieval Augmented Generation

Figure 4 for Active Retrieval Augmented Generation

Abstract:Despite the remarkable ability of large language models (LMs) to comprehend and generate language, they have a tendency to hallucinate and create factually inaccurate output. Augmenting LMs by retrieving information from external knowledge resources is one promising solution. Most existing retrieval-augmented LMs employ a retrieve-and-generate setup that only retrieves information once based on the input. This is limiting, however, in more general scenarios involving generation of long texts, where continually gathering information throughout the generation process is essential. There have been some past efforts to retrieve information multiple times while generating outputs, which mostly retrieve documents at fixed intervals using the previous context as queries. In this work, we provide a generalized view of active retrieval augmented generation, methods that actively decide when and what to retrieve across the course of the generation. We propose Forward-Looking Active REtrieval augmented generation (FLARE), a generic retrieval-augmented generation method which iteratively uses a prediction of the upcoming sentence to anticipate future content, which is then utilized as a query to retrieve relevant documents to regenerate the sentence if it contains low-confidence tokens. We test FLARE along with baselines comprehensively over 4 long-form knowledge-intensive generation tasks/datasets. FLARE achieves superior or competitive performance on all tasks, demonstrating the effectiveness of our method. Code and datasets are available at https://github.com/jzbjyb/FLARE.

Via

Access Paper or Ask Questions

Principle-Driven Self-Alignment of Language Models from Scratch with Minimal Human Supervision

May 04, 2023

Zhiqing Sun, Yikang Shen, Qinhong Zhou, Hongxin Zhang, Zhenfang Chen, David Cox, Yiming Yang, Chuang Gan

Figure 1 for Principle-Driven Self-Alignment of Language Models from Scratch with Minimal Human Supervision

Figure 2 for Principle-Driven Self-Alignment of Language Models from Scratch with Minimal Human Supervision

Figure 3 for Principle-Driven Self-Alignment of Language Models from Scratch with Minimal Human Supervision

Figure 4 for Principle-Driven Self-Alignment of Language Models from Scratch with Minimal Human Supervision

Abstract:Recent AI-assistant agents, such as ChatGPT, predominantly rely on supervised fine-tuning (SFT) with human annotations and reinforcement learning from human feedback (RLHF) to align the output of large language models (LLMs) with human intentions, ensuring they are helpful, ethical, and reliable. However, this dependence can significantly constrain the true potential of AI-assistant agents due to the high cost of obtaining human supervision and the related issues on quality, reliability, diversity, self-consistency, and undesirable biases. To address these challenges, we propose a novel approach called SELF-ALIGN, which combines principle-driven reasoning and the generative power of LLMs for the self-alignment of AI agents with minimal human supervision. Our approach encompasses four stages: first, we use an LLM to generate synthetic prompts, and a topic-guided method to augment the prompt diversity; second, we use a small set of human-written principles for AI models to follow, and guide the LLM through in-context learning from demonstrations (of principles application) to produce helpful, ethical, and reliable responses to user's queries; third, we fine-tune the original LLM with the high-quality self-aligned responses so that the resulting model can generate desirable responses for each query directly without the principle set and the demonstrations anymore; and finally, we offer a refinement step to address the issues of overly-brief or indirect responses. Applying SELF-ALIGN to the LLaMA-65b base language model, we develop an AI assistant named Dromedary. With fewer than 300 lines of human annotations (including < 200 seed prompts, 16 generic principles, and 5 exemplars for in-context learning). Dromedary significantly surpasses the performance of several state-of-the-art AI systems, including Text-Davinci-003 and Alpaca, on benchmark datasets with various settings.

* Project page: https://mitibmdemos.draco.res.ibm.com/dromedary

Via

Access Paper or Ask Questions

Generation-driven Contrastive Self-training for Zero-shot Text Classification with Instruction-tuned GPT

Apr 24, 2023

Ruohong Zhang, Yau-Shian Wang, Yiming Yang

Abstract:Moreover, GPT-based zero-shot classification models tend to make independent predictions over test instances, which can be sub-optimal as the instance correlations and the decision boundaries in the target space are ignored. To address these difficulties and limitations, we propose a new approach to zero-shot text classification, namely \ourmodelshort, which leverages the strong generative power of GPT to assist in training a smaller, more adaptable, and efficient sentence encoder classifier with contrastive self-training. Specifically, GenCo applies GPT in two ways: firstly, it generates multiple augmented texts for each input instance to enhance the semantic embedding of the instance and improve the mapping to relevant labels; secondly, it generates augmented texts conditioned on the predicted label during self-training, which makes the generative process tailored to the decision boundaries in the target space. In our experiments, GenCo outperforms previous state-of-the-art methods on multiple benchmark datasets, even when only limited in-domain text data is available.

Via

Access Paper or Ask Questions

A Wideband Reconfigurable Intelligent Surface for 5G Millimeter-Wave Applications

Apr 23, 2023

Ruiqi Wang, Yiming Yang, Behrooz Makki, Atif Shamim

Abstract:Despite the growing interest in reconfigurable intelligent surfaces (RISs) for millimeter-wave (mm-wave) bands, and the considerable theoretical work reported by the communication community, there is a limited number of published works demonstrating practical implementations and experimental results. To the authors' knowledge, no published literature has reported experimental results for RISs covering the n257 and n258 mm-wave bands. In this work, we propose a novel wideband RIS design that covers the entire mm-wave 5G n257 and n258 bands. In simulations, the unit cell can maintain a phase difference of 180{\deg} +- 20{\deg} and a reflection magnitude greater than -2.8 dB within 22.7 to 30.5 GHz (29.3% bandwidth) using one-bit PIN switches. The proposed unit cell design with four circular cutouts and long vias could realize wideband performance by exciting two adjacent high-order resonances (2.5f and 3.5f). The periodic unit cells can maintain an angular stability of 30{\deg}. Based on the proposed unit cell, a 20 by 20 RIS array is designed and fabricated with a size of 7.1{\lambda} by 7.1{\lambda}. The measurement results demonstrate that the proposed RIS could maintain a 3 dB peak gain variation bandwidth among various array configurations within 22.5 to 29.5 GHz (26.9%) and with a beam scanning capability of 50{\deg}, making this design a good candidate for 5G mm-wave applications.

Via

Access Paper or Ask Questions

Self-Refine: Iterative Refinement with Self-Feedback

Mar 30, 2023

Aman Madaan, Niket Tandon, Prakhar Gupta, Skyler Hallinan, Luyu Gao, Sarah Wiegreffe, Uri Alon, Nouha Dziri, Shrimai Prabhumoye, Yiming Yang(+5 more)

Figure 1 for Self-Refine: Iterative Refinement with Self-Feedback

Figure 2 for Self-Refine: Iterative Refinement with Self-Feedback

Figure 3 for Self-Refine: Iterative Refinement with Self-Feedback

Figure 4 for Self-Refine: Iterative Refinement with Self-Feedback

Abstract:Like people, LLMs do not always generate the best text for a given generation problem on their first try (e.g., summaries, answers, explanations). Just as people then refine their text, we introduce SELF-REFINE, a framework for similarly improving initial outputs from LLMs through iterative feedback and refinement. The main idea is to generate an output using an LLM, then allow the same model to provide multi-aspect feedback for its own output; finally, the same model refines its previously generated output given its own feedback. Unlike earlier work, our iterative refinement framework does not require supervised training data or reinforcement learning, and works with a single LLM. We experiment with 7 diverse tasks, ranging from review rewriting to math reasoning, demonstrating that our approach outperforms direct generation. In all tasks, outputs generated with SELF-REFINE are preferred by humans and by automated metrics over those generated directly with GPT-3.5 and GPT-4, improving on average by absolute 20% across tasks.

* Code, data, and demo at https://selfrefine.info/

Via

Access Paper or Ask Questions

Learning Performance-Improving Code Edits

Feb 21, 2023

Aman Madaan, Alexander Shypula, Uri Alon, Milad Hashemi, Parthasarathy Ranganathan, Yiming Yang, Graham Neubig, Amir Yazdanbakhsh

Abstract:The waning of Moore's Law has shifted the focus of the tech industry towards alternative methods for continued performance gains. While optimizing compilers are a standard tool to help increase program efficiency, programmers continue to shoulder much responsibility in crafting and refactoring code with better performance characteristics. In this paper, we investigate the ability of large language models (LLMs) to suggest functionally correct, performance improving code edits. We hypothesize that language models can suggest such edits in ways that would be impractical for static analysis alone. We investigate these questions by curating a large-scale dataset of Performance-Improving Edits, PIE. PIE contains trajectories of programs, where a programmer begins with an initial, slower version and iteratively makes changes to improve the program's performance. We use PIE to evaluate and improve the capacity of large language models. Specifically, use examples from PIE to fine-tune multiple variants of CODEGEN, a billion-scale Transformer-decoder model. Additionally, we use examples from PIE to prompt OpenAI's CODEX using a few-shot prompting. By leveraging PIE, we find that both CODEX and CODEGEN can generate performance-improving edits, with speedups of more than 2.5x for over 25% of the programs, for C++ and Python, even after the C++ programs were compiled using the O3 optimization level. Crucially, we show that PIE allows CODEGEN, an open-sourced and 10x smaller model than CODEX, to match the performance of CODEX on this challenging task. Overall, this work opens new doors for creating systems and methods that can help programmers write efficient code.

* Project website: https://pie4perf.com/. This version extends the related work and acknowledgements

Via

Access Paper or Ask Questions

DIFUSCO: Graph-based Diffusion Solvers for Combinatorial Optimization

Feb 16, 2023

Zhiqing Sun, Yiming Yang

Abstract:Neural network-based Combinatorial Optimization (CO) methods have shown promising results in solving various NP-complete (NPC) problems without relying on hand-crafted domain knowledge. This paper broadens the current scope of neural solvers for NPC problems by introducing a new graph-based diffusion framework, namely DIFUSCO. Our framework casts NPC problems as discrete {0, 1}-vector optimization problems and leverages graph-based denoising diffusion models to generate high-quality solutions. We investigate two types of diffusion models with Gaussian and Bernoulli noise, respectively, and devise an effective inference schedule to enhance the solution quality. We evaluate our methods on two well-studied NPC combinatorial optimization problems: Traveling Salesman Problem (TSP) and Maximal Independent Set (MIS). Experimental results show that DIFUSCO strongly outperforms the previous state-of-the-art neural solvers, improving the performance gap between ground-truth and neural solvers from 1.76% to 0.46% on TSP-500, from 2.46% to 1.17% on TSP-1000, and from 3.19% to 2.58% on TSP10000. For the MIS problem, DIFUSCO outperforms the previous state-of-the-art neural solver on the challenging SATLIB benchmark. Our code is available at "https://github.com/Edward-Sun/DIFUSCO".

Via

Access Paper or Ask Questions

A Neural PDE Solver with Temporal Stencil Modeling

Feb 16, 2023

Zhiqing Sun, Yiming Yang, Shinjae Yoo

Abstract:Numerical simulation of non-linear partial differential equations plays a crucial role in modeling physical science and engineering phenomena, such as weather, climate, and aerodynamics. Recent Machine Learning (ML) models trained on low-resolution spatio-temporal signals have shown new promises in capturing important dynamics in high-resolution signals, under the condition that the models can effectively recover the missing details. However, this study shows that significant information is often lost in the low-resolution down-sampled features. To address such issues, we propose a new approach, namely Temporal Stencil Modeling (TSM), which combines the strengths of advanced time-series sequence modeling (with the HiPPO features) and state-of-the-art neural PDE solvers (with learnable stencil modeling). TSM aims to recover the lost information from the PDE trajectories and can be regarded as a temporal generalization of classic finite volume methods such as WENO. Our experimental results show that TSM achieves the new state-of-the-art simulation accuracy for 2-D incompressible Navier-Stokes turbulent flows: it significantly outperforms the previously reported best results by 19.9% in terms of the highly-correlated duration time and reduces the inference latency into 80%. We also show a strong generalization ability of the proposed method to various out-of-distribution turbulent flow settings. Our code is available at "https://github.com/Edward-Sun/TSM-PDE".

Via

Access Paper or Ask Questions

A Via-less Fully Screen-Printed Reconfigurable Intelligent Surface for 5G Millimeter Wave Communication

Feb 07, 2023

Yiming Yang, Ruiqi Wang, Mohammad Vaseem, Behrooz Makki, Atif Shamim

Figure 1 for A Via-less Fully Screen-Printed Reconfigurable Intelligent Surface for 5G Millimeter Wave Communication

Figure 2 for A Via-less Fully Screen-Printed Reconfigurable Intelligent Surface for 5G Millimeter Wave Communication

Figure 3 for A Via-less Fully Screen-Printed Reconfigurable Intelligent Surface for 5G Millimeter Wave Communication

Abstract:In this paper, we propose a via-less fully screen-printed reconfigurable intelligent surface which can establish a second line-of-sight communication from 23.5GHz to 29.5GHz. By serially connecting the H shaped resonator along the H field of the incident wave, we minimize the effect of the biasing lines and make a via-less design, which reduces the fabrication difficulty and cost. The unit-cell simulation of the array with screen-printed VO2 switches shows a 215{\deg} to 160{\deg} phase shift difference between the ON and OFF states within bandwidth. During the field testing of the ideal arrays, we verify that the array can redirect the 45{\deg} incident wave to 0{\deg} reflection with a signal enhancement of at least 10 dB as compared to the array which has all unit cells in the OFF condition.

* 2 pages, 3 figures, submitted to 2023 IEEE International Symposium on Antennas and Propagation

Via

Access Paper or Ask Questions