Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Cheng Chang

Stanford University

Driving-RAG: Driving Scenarios Embedding, Search, and RAG Applications

Apr 06, 2025

Cheng Chang, Jingwei Ge, Jiazhe Guo, Zelin Guo, Binghong Jiang, Li Li

Figure 1 for Driving-RAG: Driving Scenarios Embedding, Search, and RAG Applications

Figure 2 for Driving-RAG: Driving Scenarios Embedding, Search, and RAG Applications

Figure 3 for Driving-RAG: Driving Scenarios Embedding, Search, and RAG Applications

Figure 4 for Driving-RAG: Driving Scenarios Embedding, Search, and RAG Applications

Abstract:Driving scenario data play an increasingly vital role in the development of intelligent vehicles and autonomous driving. Accurate and efficient scenario data search is critical for both online vehicle decision-making and planning, and offline scenario generation and simulations, as it allows for leveraging the scenario experiences to improve the overall performance. Especially with the application of large language models (LLMs) and Retrieval-Augmented-Generation (RAG) systems in autonomous driving, urgent requirements are put forward. In this paper, we introduce the Driving-RAG framework to address the challenges of efficient scenario data embedding, search, and applications for RAG systems. Our embedding model aligns fundamental scenario information and scenario distance metrics in the vector space. The typical scenario sampling method combined with hierarchical navigable small world can perform efficient scenario vector search to achieve high efficiency without sacrificing accuracy. In addition, the reorganization mechanism by graph knowledge enhances the relevance to the prompt scenarios and augment LLM generation. We demonstrate the effectiveness of the proposed framework on typical trajectory planning task for complex interactive scenarios such as ramps and intersections, showcasing its advantages for RAG applications.

Via

Access Paper or Ask Questions

Strategic priorities for transformative progress in advancing biology with proteomics and artificial intelligence

Feb 21, 2025

Yingying Sun, Jun A, Zhiwei Liu, Rui Sun, Liujia Qian, Samuel H. Payne, Wout Bittremieux, Markus Ralser, Chen Li, Yi Chen(+52 more)

Figure 1 for Strategic priorities for transformative progress in advancing biology with proteomics and artificial intelligence

Figure 2 for Strategic priorities for transformative progress in advancing biology with proteomics and artificial intelligence

Abstract:Artificial intelligence (AI) is transforming scientific research, including proteomics. Advances in mass spectrometry (MS)-based proteomics data quality, diversity, and scale, combined with groundbreaking AI techniques, are unlocking new challenges and opportunities in biological discovery. Here, we highlight key areas where AI is driving innovation, from data analysis to new biological insights. These include developing an AI-friendly ecosystem for proteomics data generation, sharing, and analysis; improving peptide and protein identification and quantification; characterizing protein-protein interactions and protein complexes; advancing spatial and perturbation proteomics; integrating multi-omics data; and ultimately enabling AI-empowered virtual cells.

* 28 pages, 2 figures, perspective in AI proteomics

Via

Access Paper or Ask Questions

Is Your LLM Secretly a World Model of the Internet? Model-Based Planning for Web Agents

Nov 10, 2024

Yu Gu, Boyuan Zheng, Boyu Gou, Kai Zhang, Cheng Chang, Sanjari Srivastava, Yanan Xie, Peng Qi, Huan Sun, Yu Su

Figure 1 for Is Your LLM Secretly a World Model of the Internet? Model-Based Planning for Web Agents

Figure 2 for Is Your LLM Secretly a World Model of the Internet? Model-Based Planning for Web Agents

Figure 3 for Is Your LLM Secretly a World Model of the Internet? Model-Based Planning for Web Agents

Figure 4 for Is Your LLM Secretly a World Model of the Internet? Model-Based Planning for Web Agents

Abstract:Language agents have demonstrated promising capabilities in automating web-based tasks, though their current reactive approaches still underperform largely compared to humans. While incorporating advanced planning algorithms, particularly tree search methods, could enhance these agents' performance, implementing tree search directly on live websites poses significant safety risks and practical constraints due to irreversible actions such as confirming a purchase. In this paper, we introduce a novel paradigm that augments language agents with model-based planning, pioneering the innovative use of large language models (LLMs) as world models in complex web environments. Our method, WebDreamer, builds on the key insight that LLMs inherently encode comprehensive knowledge about website structures and functionalities. Specifically, WebDreamer uses LLMs to simulate outcomes for each candidate action (e.g., "what would happen if I click this button?") using natural language descriptions, and then evaluates these imagined outcomes to determine the optimal action at each step. Empirical results on two representative web agent benchmarks with online interaction -- VisualWebArena and Mind2Web-live -- demonstrate that WebDreamer achieves substantial improvements over reactive baselines. By establishing the viability of LLMs as world models in web environments, this work lays the groundwork for a paradigm shift in automated web interaction. More broadly, our findings open exciting new avenues for future research into 1) optimizing LLMs specifically for world modeling in complex, dynamic environments, and 2) model-based speculative planning for language agents.

* 18 pages, 6 figures, 4 tables

Via

Access Paper or Ask Questions

AMSnet-KG: A Netlist Dataset for LLM-based AMS Circuit Auto-Design Using Knowledge Graph RAG

Nov 07, 2024

Yichen Shi, Zhuofu Tao, Yuhao Gao, Tianjia Zhou, Cheng Chang, Yaxing Wang, Bingyu Chen, Genhao Zhang, Alvin Liu, Zhiping Yu(+2 more)

Figure 1 for AMSnet-KG: A Netlist Dataset for LLM-based AMS Circuit Auto-Design Using Knowledge Graph RAG

Figure 2 for AMSnet-KG: A Netlist Dataset for LLM-based AMS Circuit Auto-Design Using Knowledge Graph RAG

Figure 3 for AMSnet-KG: A Netlist Dataset for LLM-based AMS Circuit Auto-Design Using Knowledge Graph RAG

Figure 4 for AMSnet-KG: A Netlist Dataset for LLM-based AMS Circuit Auto-Design Using Knowledge Graph RAG

Abstract:High-performance analog and mixed-signal (AMS) circuits are mainly full-custom designed, which is time-consuming and labor-intensive. A significant portion of the effort is experience-driven, which makes the automation of AMS circuit design a formidable challenge. Large language models (LLMs) have emerged as powerful tools for Electronic Design Automation (EDA) applications, fostering advancements in the automatic design process for large-scale AMS circuits. However, the absence of high-quality datasets has led to issues such as model hallucination, which undermines the robustness of automatically generated circuit designs. To address this issue, this paper introduces AMSnet-KG, a dataset encompassing various AMS circuit schematics and netlists. We construct a knowledge graph with annotations on detailed functional and performance characteristics. Facilitated by AMSnet-KG, we propose an automated AMS circuit generation framework that utilizes the comprehensive knowledge embedded in LLMs. We first formulate a design strategy (e.g., circuit architecture using a number of circuit components) based on required specifications. Next, matched circuit components are retrieved and assembled into a complete topology, and transistor sizing is obtained through Bayesian optimization. Simulation results of the netlist are fed back to the LLM for further topology refinement, ensuring the circuit design specifications are met. We perform case studies of operational amplifier and comparator design to verify the automatic design flow from specifications to netlists with minimal human effort. The dataset used in this paper will be open-sourced upon publishing of this paper.

Via

Access Paper or Ask Questions

Navigating the Digital World as Humans Do: Universal Visual Grounding for GUI Agents

Oct 07, 2024

Boyu Gou, Ruohan Wang, Boyuan Zheng, Yanan Xie, Cheng Chang, Yiheng Shu, Huan Sun, Yu Su

Figure 1 for Navigating the Digital World as Humans Do: Universal Visual Grounding for GUI Agents

Figure 2 for Navigating the Digital World as Humans Do: Universal Visual Grounding for GUI Agents

Figure 3 for Navigating the Digital World as Humans Do: Universal Visual Grounding for GUI Agents

Figure 4 for Navigating the Digital World as Humans Do: Universal Visual Grounding for GUI Agents

Abstract:Multimodal large language models (MLLMs) are transforming the capabilities of graphical user interface (GUI) agents, facilitating their transition from controlled simulations to complex, real-world applications across various platforms. However, the effectiveness of these agents hinges on the robustness of their grounding capability. Current GUI agents predominantly utilize text-based representations such as HTML or accessibility trees, which, despite their utility, often introduce noise, incompleteness, and increased computational overhead. In this paper, we advocate a human-like embodiment for GUI agents that perceive the environment entirely visually and directly take pixel-level operations on the GUI. The key is visual grounding models that can accurately map diverse referring expressions of GUI elements to their coordinates on the GUI across different platforms. We show that a simple recipe, which includes web-based synthetic data and slight adaptation of the LLaVA architecture, is surprisingly effective for training such visual grounding models. We collect the largest dataset for GUI visual grounding so far, containing 10M GUI elements and their referring expressions over 1.3M screenshots, and use it to train UGround, a strong universal visual grounding model for GUI agents. Empirical results on six benchmarks spanning three categories (grounding, offline agent, and online agent) show that 1) UGround substantially outperforms existing visual grounding models for GUI agents, by up to 20% absolute, and 2) agents with UGround outperform state-of-the-art agents, despite the fact that existing agents use additional text-based input while ours only uses visual perception. These results provide strong support for the feasibility and promises of GUI agents that navigate the digital world as humans do.

Via

Access Paper or Ask Questions

Roleplay-doh: Enabling Domain-Experts to Create LLM-simulated Patients via Eliciting and Adhering to Principles

Jul 01, 2024

Ryan Louie, Ananjan Nandi, William Fang, Cheng Chang, Emma Brunskill, Diyi Yang

Figure 1 for Roleplay-doh: Enabling Domain-Experts to Create LLM-simulated Patients via Eliciting and Adhering to Principles

Figure 2 for Roleplay-doh: Enabling Domain-Experts to Create LLM-simulated Patients via Eliciting and Adhering to Principles

Figure 3 for Roleplay-doh: Enabling Domain-Experts to Create LLM-simulated Patients via Eliciting and Adhering to Principles

Figure 4 for Roleplay-doh: Enabling Domain-Experts to Create LLM-simulated Patients via Eliciting and Adhering to Principles

Abstract:Recent works leverage LLMs to roleplay realistic social scenarios, aiding novices in practicing their social skills. However, simulating sensitive interactions, such as in mental health, is challenging. Privacy concerns restrict data access, and collecting expert feedback, although vital, is laborious. To address this, we develop Roleplay-doh, a novel human-LLM collaboration pipeline that elicits qualitative feedback from a domain-expert, which is transformed into a set of principles, or natural language rules, that govern an LLM-prompted roleplay. We apply this pipeline to enable senior mental health supporters to create customized AI patients for simulated practice partners for novice counselors. After uncovering issues in GPT-4 simulations not adhering to expert-defined principles, we also introduce a novel principle-adherence prompting pipeline which shows 30\% improvements in response quality and principle following for the downstream task. Via a user study with 25 counseling experts, we demonstrate that the pipeline makes it easy and effective to create AI patients that more faithfully resemble real patients, as judged by creators and third-party counselors.

* 34 pages, 24 figures, 11 Tables

Via

Access Paper or Ask Questions

Aggregation of Reasoning: A Hierarchical Framework for Enhancing Answer Selection in Large Language Models

May 21, 2024

Zhangyue Yin, Qiushi Sun, Qipeng Guo, Zhiyuan Zeng, Xiaonan Li, Tianxiang Sun, Cheng Chang, Qinyuan Cheng, Ding Wang, Xiaofeng Mou(+2 more)

Figure 1 for Aggregation of Reasoning: A Hierarchical Framework for Enhancing Answer Selection in Large Language Models

Figure 2 for Aggregation of Reasoning: A Hierarchical Framework for Enhancing Answer Selection in Large Language Models

Figure 3 for Aggregation of Reasoning: A Hierarchical Framework for Enhancing Answer Selection in Large Language Models

Figure 4 for Aggregation of Reasoning: A Hierarchical Framework for Enhancing Answer Selection in Large Language Models

Abstract:Recent advancements in Chain-of-Thought prompting have facilitated significant breakthroughs for Large Language Models (LLMs) in complex reasoning tasks. Current research enhances the reasoning performance of LLMs by sampling multiple reasoning chains and ensembling based on the answer frequency. However, this approach fails in scenarios where the correct answers are in the minority. We identify this as a primary factor constraining the reasoning capabilities of LLMs, a limitation that cannot be resolved solely based on the predicted answers. To address this shortcoming, we introduce a hierarchical reasoning aggregation framework AoR (Aggregation of Reasoning), which selects answers based on the evaluation of reasoning chains. Additionally, AoR incorporates dynamic sampling, adjusting the number of reasoning chains in accordance with the complexity of the task. Experimental results on a series of complex reasoning tasks show that AoR outperforms prominent ensemble methods. Further analysis reveals that AoR not only adapts various LLMs but also achieves a superior performance ceiling when compared to current methods.

* 17 pages, 14 figures, accepted by LREC-COLING 2024

Via

Access Paper or Ask Questions

ContraNovo: A Contrastive Learning Approach to Enhance De Novo Peptide Sequencing

Dec 18, 2023

Zhi Jin, Sheng Xu, Xiang Zhang, Tianze Ling, Nanqing Dong, Wanli Ouyang, Zhiqiang Gao, Cheng Chang, Siqi Sun

Figure 1 for ContraNovo: A Contrastive Learning Approach to Enhance De Novo Peptide Sequencing

Figure 2 for ContraNovo: A Contrastive Learning Approach to Enhance De Novo Peptide Sequencing

Figure 3 for ContraNovo: A Contrastive Learning Approach to Enhance De Novo Peptide Sequencing

Figure 4 for ContraNovo: A Contrastive Learning Approach to Enhance De Novo Peptide Sequencing

Abstract:De novo peptide sequencing from mass spectrometry (MS) data is a critical task in proteomics research. Traditional de novo algorithms have encountered a bottleneck in accuracy due to the inherent complexity of proteomics data. While deep learning-based methods have shown progress, they reduce the problem to a translation task, potentially overlooking critical nuances between spectra and peptides. In our research, we present ContraNovo, a pioneering algorithm that leverages contrastive learning to extract the relationship between spectra and peptides and incorporates the mass information into peptide decoding, aiming to address these intricacies more efficiently. Through rigorous evaluations on two benchmark datasets, ContraNovo consistently outshines contemporary state-of-the-art solutions, underscoring its promising potential in enhancing de novo peptide sequencing. The source code is available at https://github.com/BEAM-Labs/ContraNovo.

* This paper has been accepted by AAAI 2024

Via

Access Paper or Ask Questions

A conservative hybrid physics-informed neural network method for Maxwell-Ampère-Nernst-Planck equations

Dec 10, 2023

Cheng Chang, Zhouping Xin, Tieyong Zeng

Figure 1 for A conservative hybrid physics-informed neural network method for Maxwell-Ampère-Nernst-Planck equations

Figure 2 for A conservative hybrid physics-informed neural network method for Maxwell-Ampère-Nernst-Planck equations

Figure 3 for A conservative hybrid physics-informed neural network method for Maxwell-Ampère-Nernst-Planck equations

Figure 4 for A conservative hybrid physics-informed neural network method for Maxwell-Ampère-Nernst-Planck equations

Abstract:Maxwell-Amp\`{e}re-Nernst-Planck (MANP) equations were recently proposed to model the dynamics of charged particles. In this study, we enhance a numerical algorithm of this system with deep learning tools. The proposed hybrid algorithm provides an automated means to determine a proper approximation for the dummy variables, which can otherwise only be obtained through massive numerical tests. In addition, the original method is validated for 2-dimensional problems. However, when the spatial dimension is one, the original curl-free relaxation component is inapplicable, and the approximation formula for dummy variables, which works well in a 2-dimensional scenario, fails to provide a reasonable output in the 1-dimensional case. The proposed method can be readily generalised to cases with one spatial dimension. Experiments show numerical stability and good convergence to the steady-state solution obtained from Poisson-Boltzmann type equations in the 1-dimensional case. The experiments conducted in the 2-dimensional case indicate that the proposed method preserves the conservation properties.

Via

Access Paper or Ask Questions

Exchange-of-Thought: Enhancing Large Language Model Capabilities through Cross-Model Communication

Dec 04, 2023

Zhangyue Yin, Qiushi Sun, Cheng Chang, Qipeng Guo, Junqi Dai, Xuanjing Huang, Xipeng Qiu

Figure 1 for Exchange-of-Thought: Enhancing Large Language Model Capabilities through Cross-Model Communication

Figure 2 for Exchange-of-Thought: Enhancing Large Language Model Capabilities through Cross-Model Communication

Figure 3 for Exchange-of-Thought: Enhancing Large Language Model Capabilities through Cross-Model Communication

Figure 4 for Exchange-of-Thought: Enhancing Large Language Model Capabilities through Cross-Model Communication

Abstract:Large Language Models (LLMs) have recently made significant strides in complex reasoning tasks through the Chain-of-Thought technique. Despite this progress, their reasoning is often constrained by their intrinsic understanding, lacking external insights. To address this, we propose Exchange-of-Thought (EoT), a novel framework that enables cross-model communication during problem-solving. Drawing inspiration from network topology, EoT integrates four unique communication paradigms: Memory, Report, Relay, and Debate. This paper delves into the communication dynamics and volume associated with each paradigm. To counterbalance the risks of incorrect reasoning chains, we implement a robust confidence evaluation mechanism within these communications. Our experiments across diverse complex reasoning tasks demonstrate that EoT significantly surpasses established baselines, underscoring the value of external insights in enhancing LLM performance. Furthermore, we show that EoT achieves these superior results in a cost-effective manner, marking a promising advancement for efficient and collaborative AI problem-solving.

* 19 pages, 11 figures, accepted by EMNLP2023

Via

Access Paper or Ask Questions