Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Tingyu Zhu

Uncertainty-Aware Clarification in LLM Agents with Information Gain

Jun 02, 2026

Mengyi Deng, Zhiwei Li, Xin Li, Tingyu Zhu, Ying Zhao, Zhijiang Guo, Wei Wang

Abstract:Large Language Model (LLM) agents often operate under underspecified user instructions, where latent uncertainty over user intent leads to erroneous tool actions. To address this challenge, we propose a goal-oriented clarification framework that aligns clarification behavior with ambiguity resolution. Central to our approach is the Information Gain Reward, a metric that quantifies the utility of clarification questions by measuring the Bayesian belief update towards the ground-truth goal induced by the clarification exchange. We train the clarifier (LLM) using this reward to optimize for high information gain, ensuring that clarifications effectively reduce uncertainty and improve task completion within the agent-tool-user environment. We validate our framework within a clarification-enhanced $τ$-Bench environment, conducting cross-agent evaluations across five heterogeneous backbones. Empirical results demonstrate that our method consistently improves the success rate by 3.7\% over the no-clarification baseline, while adding only 0.3 total interaction steps on average.

* ICML 2026

Via

Access Paper or Ask Questions

DGPO: Beyond Pairwise Preferences with Directional Consistent Groupwise Optimization

May 11, 2026

Mengyi Deng, Zhiwei Li, Xin Li, Tingyu Zhu, Yulan Yuan, Zhijiang Guo, Wei Wang

Abstract:Although Large Language Models (LLMs) have made remarkable progress, current preference optimization methods still struggle to align directional consistency while preserving reasoning diversity. To address this limitation, we propose Directional-Groupwise Preference Optimization (DGPO), a lightweight framework that aggregates supervision signals at the group level and explicitly models direction-aware alignment through multi-candidate comparisons. DGPO organizes forward and reverse question-answer instances into structured sets and optimizes a margin-based likelihood objective that separates coherent reasoning paths from inconsistent alternatives. This group-wise formulation captures richer relative information than pairwise objectives and reinforces consistency across diverse reasoning pathways. Empirical results show that our constructed reverse data yields a 3.2% average improvement across five benchmarks, while DGPO further delivers consistent gains across multiple datasets and model families, achieving average accuracy improvements of up to 3.6%.

Via

Access Paper or Ask Questions

Selection of the Best Policy under Fairness Constraints for Subpopulations

May 11, 2026

Tingyu Zhu, Yuhang Wu, Zeyu Zheng

Abstract:Many high-stakes decisions in health care, public policy, and clinical development require committing to a single policy that will be applied uniformly across a heterogeneous population. Regulatory and fairness standards sometime requires that the chosen policy performs adequately in every pre-specified subpopulation, not only on average. We formalize this as a Selection of the Best with Fairness Constraints (SBFC) problem, in order to identify the policy with the highest average performance among those policies that meet a minimum per-subpopulation threshold. We establish an instance-specific lower bound on sample complexity of the SBFC problem. We then develop a Track-and-Stop with Constraints on Subpopulation (T-a-S-CS) algorithm that achieves the lower bound asymptotically. We extend the framework to general closed-set and penalty-based fairness specifications with matching guarantees. Numerical experiments and a case study using the International Stroke Trial demonstrate substantial efficiency gains over policy-level allocation baselines.

Via

Access Paper or Ask Questions

Spiral RoPE: Rotate Your Rotary Positional Embeddings in the 2D Plane

Feb 03, 2026

Haoyu Liu, Sucheng Ren, Tingyu Zhu, Peng Wang, Cihang Xie, Alan Yuille, Zeyu Zheng, Feng Wang

Abstract:Rotary Position Embedding (RoPE) is the de facto positional encoding in large language models due to its ability to encode relative positions and support length extrapolation. When adapted to vision transformers, the standard axial formulation decomposes two-dimensional spatial positions into horizontal and vertical components, implicitly restricting positional encoding to axis-aligned directions. We identify this directional constraint as a fundamental limitation of the standard axial 2D RoPE, which hinders the modeling of oblique spatial relationships that naturally exist in natural images. To overcome this limitation, we propose Spiral RoPE, a simple yet effective extension that enables multi-directional positional encoding by partitioning embedding channels into multiple groups associated with uniformly distributed directions. Each group is rotated according to the projection of the patch position onto its corresponding direction, allowing spatial relationships to be encoded beyond the horizontal and vertical axes. Across a wide range of vision tasks including classification, segmentation, and generation, Spiral RoPE consistently improves performance. Qualitative analysis of attention maps further show that Spiral RoPE exhibits more concentrated activations on semantically relevant objects and better respects local object boundaries, highlighting the importance of multi-directional positional encoding in vision transformers.

Via

Access Paper or Ask Questions

When Inverse Data Outperforms: Exploring the Pitfalls of Mixed Data in Multi-Stage Fine-Tuning

Sep 16, 2025

Mengyi Deng, Xin Li, Tingyu Zhu, Zhicheng Yang, Zhijiang Guo, Wei Wang

Abstract:Existing work has shown that o1-level performance can be achieved with limited data distillation, but most existing methods focus on unidirectional supervised fine-tuning (SFT), overlooking the intricate interplay between diverse reasoning patterns. In this paper, we construct r1k, a high-quality reverse reasoning dataset derived by inverting 1,000 forward examples from s1k, and examine how SFT and Direct Preference Optimization (DPO) affect alignment under bidirectional reasoning objectives. SFT on r1k yields a 1.6%--6.8% accuracy improvement over s1k across evaluated benchmarks. However, naively mixing forward and reverse data during SFT weakens the directional distinction. Although DPO can partially recover this distinction, it also suppresses less preferred reasoning paths by shifting the probability mass toward irrelevant outputs. These findings suggest that mixed reasoning data introduce conflicting supervision signals, underscoring the need for robust and direction-aware alignment strategies.

Via

Access Paper or Ask Questions

IDA-Bench: Evaluating LLMs on Interactive Guided Data Analysis

May 23, 2025

Hanyu Li, Haoyu Liu, Tingyu Zhu, Tianyu Guo, Zeyu Zheng, Xiaotie Deng, Michael I. Jordan

Abstract:Large Language Models (LLMs) show promise as data analysis agents, but existing benchmarks overlook the iterative nature of the field, where experts' decisions evolve with deeper insights of the dataset. To address this, we introduce IDA-Bench, a novel benchmark evaluating LLM agents in multi-round interactive scenarios. Derived from complex Kaggle notebooks, tasks are presented as sequential natural language instructions by an LLM-simulated user. Agent performance is judged by comparing its final numerical output to the human-derived baseline. Initial results show that even state-of-the-art coding agents (like Claude-3.7-thinking) succeed on < 50% of the tasks, highlighting limitations not evident in single-turn tests. This work underscores the need to improve LLMs' multi-round capabilities for building more reliable data analysis agents, highlighting the necessity of achieving a balance between instruction following and reasoning.

Via

Access Paper or Ask Questions

Structured Diffusion Models with Mixture of Gaussians as Prior Distribution

Oct 24, 2024

Nanshan Jia, Tingyu Zhu, Haoyu Liu, Zeyu Zheng

Figure 1 for Structured Diffusion Models with Mixture of Gaussians as Prior Distribution

Figure 2 for Structured Diffusion Models with Mixture of Gaussians as Prior Distribution

Figure 3 for Structured Diffusion Models with Mixture of Gaussians as Prior Distribution

Figure 4 for Structured Diffusion Models with Mixture of Gaussians as Prior Distribution

Abstract:We propose a class of structured diffusion models, in which the prior distribution is chosen as a mixture of Gaussians, rather than a standard Gaussian distribution. The specific mixed Gaussian distribution, as prior, can be chosen to incorporate certain structured information of the data. We develop a simple-to-implement training procedure that smoothly accommodates the use of mixed Gaussian as prior. Theory is provided to quantify the benefits of our proposed models, compared to the classical diffusion models. Numerical experiments with synthetic, image and operational data are conducted to show comparative advantages of our model. Our method is shown to be robust to mis-specifications and in particular suits situations where training resources are limited or faster training in real time is desired.

Via

Access Paper or Ask Questions

Symbolic Music Generation with Fine-grained Interactive Textural Guidance

Oct 11, 2024

Tingyu Zhu, Haoyu Liu, Zhimin Jiang, Zeyu Zheng

Abstract:The problem of symbolic music generation presents unique challenges due to the combination of limited data availability and the need for high precision in note pitch. To overcome these difficulties, we introduce Fine-grained Textural Guidance (FTG) within diffusion models to correct errors in the learned distributions. By incorporating FTG, the diffusion models improve the accuracy of music generation, which makes them well-suited for advanced tasks such as progressive music generation, improvisation and interactive music creation. We derive theoretical characterizations for both the challenges in symbolic music generation and the effect of the FTG approach. We provide numerical experiments and a demo page for interactive music generation with user input to showcase the effectiveness of our approach.

Via

Access Paper or Ask Questions

Best Arm Identification with Fairness Constraints on Subpopulations

Apr 08, 2023

Yuhang Wu, Zeyu Zheng, Tingyu Zhu

Figure 1 for Best Arm Identification with Fairness Constraints on Subpopulations

Figure 2 for Best Arm Identification with Fairness Constraints on Subpopulations

Figure 3 for Best Arm Identification with Fairness Constraints on Subpopulations

Abstract:We formulate, analyze and solve the problem of best arm identification with fairness constraints on subpopulations (BAICS). Standard best arm identification problems aim at selecting an arm that has the largest expected reward where the expectation is taken over the entire population. The BAICS problem requires that an selected arm must be fair to all subpopulations (e.g., different ethnic groups, age groups, or customer types) by satisfying constraints that the expected reward conditional on every subpopulation needs to be larger than some thresholds. The BAICS problem aims at correctly identify, with high confidence, the arm with the largest expected reward from all arms that satisfy subpopulation constraints. We analyze the complexity of the BAICS problem by proving a best achievable lower bound on the sample complexity with closed-form representation. We then design an algorithm and prove that the algorithm's sample complexity matches with the lower bound in terms of order. A brief account of numerical experiments are conducted to illustrate the theoretical findings.

Via

Access Paper or Ask Questions