Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Peiying Zhu

Sherman

When Aggregate Alignment Misleads: Auditing Policy Repair Without Per-State Expert Actions

Jul 03, 2026

Peiying Zhu, Sidi Chang

Abstract:Agentic AI systems are increasingly used to edit, refine, and repair decision policies, but evaluating these edits is difficult when per-state expert action labels are unavailable. We study this problem in a hotel-pricing simulator where an agentic policy editor receives only region-level diagnostic feedback: summaries of how its price distribution differs from a benchmark policy across time, inventory, and market regions. The editor cannot observe benchmark actions, benchmark source code, reward numbers, or held-out outcomes, and may only propose constrained edits to a target-action table. On 5,000 held-out episodes, a multi-restart LLM editor reaches RevPAR 108.47 (95% CI 107.61 - 109.34), close to the benchmark policy's 108.75 (107.81 - 109.68), with paired gap (LLM minus benchmark) -0.276 and 95% CI [-0.692, 0.146]. A cheap diagnostic projection already recovers much of the revenue (107.90), so the LLM editor's distinctive gain is not raw revenue lift alone: it also reduces episode composition distance from 1.153 to 0.609. This is the strongest non-benchmark repair result. This profile is not explained by restart search alone: non-semantic proposers with up to 2,500 evaluations fall 8.77 - 14.57 RevPAR points short. Nor is it explained by plausible prompt format: a shuffled-diagnostic control breaks region-error correspondence and falls to RevPAR 94.30. The match is genuine but partial. A tree editor achieves stronger pooled alignment, 0.214 versus 0.266, and stronger reference-state D1, 0.328 versus 1.197, yet revenue falls to 98.91. These results show that agentic policy repair should be evaluated by whether diagnostic feedback becomes reliable closed-loop outcome, not by a single behavioral distance.

Via

Access Paper or Ask Questions

When Outcome Looks Right But Discipline Fails: Trace-Based Evaluation Under Hidden Competitor State

May 18, 2026

Peiying Zhu, Sidi Chang

Abstract:Outcome-only evaluation can certify economically unsafe agents: a policy can hit a business KPI while violating deployable behavioral discipline. In hotel pricing with hidden competitor state, a learner can achieve plausible revenue per available room while failing to preserve the rate discipline of a rule-based revenue-management competitor. We introduce discipline stability, a trace-based evaluation paradigm: define the benchmark behavior, restrict observations to the deployment regime, induce trace diagnostics from failure, separate mechanisms with ablations, and test transfer and deployment. Across a two-hotel benchmark and a compact hidden-budget bidding task, reward-only PPO variants miss trace alignment; revealing hidden state reduces label uncertainty; deterministic copy collapses uncertainty; and trace-prior or corrected history policies better preserve price or bid distributions. Pure behavior cloning is nearly enough for symmetric imitation, while Trace-Prior RL adds bounded adaptation under capacity asymmetry. The contribution is an evaluation and benchmark paradigm, not a new optimizer or a universal claim about MARL

Via

Access Paper or Ask Questions

Market-Alignment Risk in Pricing Agents: Trace Diagnostics and Trace-Prior RL under Hidden Competitor State

May 07, 2026

Peiying Zhu, Sidi Chang

Abstract:Outcome metrics can certify the wrong behavior. We study this failure in a two-hotel revenue-management simulator where Hotel A trains an agent against a fixed rule-based revenue-management competitor, Hotel B. A standard learning agent can obtain near-reference revenue per available room (RevPAR) while failing to learn market-like yield management: it sells too aggressively, undercuts, or collapses to modal price buckets. We diagnose this as a Goodhart-style failure under partial observability. Hotel A cannot observe the competitor's remaining inventory, booking curve, or pricing rule, so the same Hotel A-visible state maps to multiple plausible Hotel B prices. Deterministic value-based RL and deterministic copying collapse this unresolved uncertainty into shortcut behavior. We introduce a trace-level diagnostic protocol using RevPAR, occupancy, ADR, full price-bucket distributions, L1/JS distances, and seed-level confidence intervals. The verified repair is Trace-Prior RL: learn a distributional market prior from lagged market traces, then train a stochastic pricing policy with a RevPAR reward and a KL penalty to the learned prior. The final policy matches Hotel B's RevPAR, occupancy, ADR, and price distribution within seed-level uncertainty, while still optimizing Hotel A's own reward. We argue that the contribution is not a new optimizer and not a hotel-pricing leaderboard, but a reproducible failure-and-repair recipe for agentic systems where scalar rewards are easy to game and the intended behavior is only visible in traces. A key finding is that higher exact action accuracy can worsen aggregate trace alignment when the target is distributional.

* 7 pages

Via

Access Paper or Ask Questions

ValueAlpha: Agreement-Gated Stress Testing of LLM-Judged Investment Rationales Before Returns Are Observable

Apr 28, 2026

Sidi Chang, Peiying Zhu, Yuxiao Chen

Abstract:Long-horizon investment decisions create a pre-realization evaluation problem: realized returns are the eventual arbiter of investment quality, but they arrive too late and are too noisy to guide many model-development and governance decisions. LLM judges offer a tempting substitute for pre-deployment evaluation of AI-finance systems, but unvalidated judges may reward verbosity, confidence, or rubric mimicry rather than financial judgment. This paper introduces \textbf{ValueAlpha}, a preregistered agreement-gated stress-test protocol for deciding when LLM-judged investment-rationale claims are publishable, qualified, or invalid. In a controlled market-state capital-allocation prototype with 1,000 honest decision cycles and 100 preregistered adversarial controls (1,100 trajectories, 5,500 judge calls), ValueAlpha clears the aggregate agreement gate at $\barκ_w = 0.7168$ but prevents several overclaims. Lower-rank systems collapse into a tie-class, one rubric dimension fails the per-dimension gate (\texttt{constraint\_awareness}, $\barκ_w = 0.2022$), single-judge rankings are family-dependent, and terse-correct rationales receive a $Δ= -2.81$ rubric-point penalty relative to honest rationales. A targeted anchor-specificity probe further shows that financial constructs such as constraint awareness are operationally load-bearing. The contribution is therefore not a leaderboard and not a claim to measure true investment skill. ValueAlpha is a pre-calibration metrology layer for AI-finance evaluation: it determines whether a proposed LLM-judge-based investment-rationale claim is stable enough, agreed enough, and uncontaminated enough to be reported at all.

* 9 pages, Submitted to IEEE Computational Intelligence in Financial Engineering and Economics (CIFEr) 2026, Tokyo, Japan

Via

Access Paper or Ask Questions

The End of Rented Discovery: How AI Search Redistributes Power Between Hotels and Intermediaries

Mar 20, 2026

Peiying Zhu, Sidi Chang

Abstract:When a traveler asks an AI search engine to recommend a hotel, which sources get cited -- and does query framing matter? We audit 1,357 grounding citations from Google Gemini across 156 hotel queries in Tokyo and document a systematic pattern we call the Intent-Source Divide. Experiential queries draw 55.9\% of their citations from non-OTA sources, compared to 30.8\% for transactional queries -- a 25.1 percentage-point gap ($p < 5 \times 10^{-20}$). The effect is amplified in Japanese, where experiential queries draw 62.1\% non-OTA citations compared to 50.0\% in English -- consistent with a more diverse Japanese non-OTA content ecosystem. For an industry in which hotels have long paid OTAs for demand acquisition, this pattern matters because it suggests that AI search may make hotel discovery less exclusively controlled by commission-based intermediaries.

* 13 pages, 10 tables, Submitted to the 10th Hospitality Finance & Economics Conference (HFE 2026), Tokyo, Japan

Via

Access Paper or Ask Questions

A Primer on Near-Field Communications for Next-Generation Multiple Access

Aug 05, 2024

Chongjun Ouyang, Zhaolin Wang, Yan Chen, Xidong Mu, Peiying Zhu

Figure 1 for A Primer on Near-Field Communications for Next-Generation Multiple Access

Figure 2 for A Primer on Near-Field Communications for Next-Generation Multiple Access

Figure 3 for A Primer on Near-Field Communications for Next-Generation Multiple Access

Figure 4 for A Primer on Near-Field Communications for Next-Generation Multiple Access

Abstract:Multiple-antenna technologies are advancing toward the development of extremely large aperture arrays and the utilization of extremely high frequencies, driving the progress of next-generation multiple access (NGMA). This evolution is accompanied by the emergence of near-field communications (NFC), characterized by spherical-wave propagation, which introduces additional range dimensions to the channel and enhances system throughput. In this context, a tutorial-based primer on NFC is presented, emphasizing its applications in multiuser communications and multiple access (MA). The following areas are investigated: \romannumeral1) the commonly used near-field channel models are reviewed along with their simplifications under various near-field conditions. \romannumeral2) Building upon these models, the information-theoretic capacity limits of NFC-MA are analyzed, including the derivation of sum-rate capacity and capacity region, and their upper limits for both downlink and uplink scenarios. \romannumeral3) A detailed investigation of near-field multiuser beamforming design is presented, offering low-complexity and effective NFC-MA design methodologies in both the spatial and wavenumber (angular) domains. Throughout these investigations, near-field MA is compared with its far-field counterpart to highlight its superiority and flexibility in terms of interference management, thereby laying the groundwork for achieving NGMA.

* 34 pages

Via

Access Paper or Ask Questions

Signal Processing and Learning for Next Generation Multiple Access in 6G

Sep 09, 2023

Wei Chen, Yuanwei Liu, Hamid Jafarkhani, Yonina C. Eldar, Peiying Zhu, Khaled B Letaief

Abstract:Wireless communication systems to date primarily rely on the orthogonality of resources to facilitate the design and implementation, from user access to data transmission. Emerging applications and scenarios in the sixth generation (6G) wireless systems will require massive connectivity and transmission of a deluge of data, which calls for more flexibility in the design concept that goes beyond orthogonality. Furthermore, recent advances in signal processing and learning have attracted considerable attention, as they provide promising approaches to various complex and previously intractable problems of signal processing in many fields. This article provides an overview of research efforts to date in the field of signal processing and learning for next-generation multiple access, with an emphasis on massive random access and non-orthogonal multiple access. The promising interplay with new technologies and the challenges in learning-based NGMA are discussed.

Via

Access Paper or Ask Questions

Sensiverse: A dataset for ISAC study

Aug 26, 2023

Jiajin Luo, Baojian Zhou, Yang Yu, Ping Zhang, Xiaohui Peng, Jianglei Ma, Peiying Zhu, Jianmin Lu, Wen Tong

Figure 1 for Sensiverse: A dataset for ISAC study

Figure 2 for Sensiverse: A dataset for ISAC study

Figure 3 for Sensiverse: A dataset for ISAC study

Figure 4 for Sensiverse: A dataset for ISAC study

Abstract:In order to address the lack of applicable channel models for ISAC research and evaluation, we release Sensiverse, a dataset that can be used for ISAC research. In this paper, we present the method of generating Sensiverse, including the acquisition and formatting of the 3D scene models, the generation of the channel data and associations with Tx/Rx deployment. The file structure and usage of the dataset are also described, and finally the use of the dataset is illustrated with examples through the evaluation of use cases such as 3D environment reconstruction and moving targets.

Via

Access Paper or Ask Questions

On the Road to 6G: Visions, Requirements, Key Technologies and Testbeds

Feb 28, 2023

Cheng-Xiang Wang, Xiaohu You, Xiqi Gao, Xiuming Zhu, Zixin Li, Chuan Zhang, Haiming Wang, Yongming Huang, Yunfei Chen, Harald Haas(+9 more)

Abstract:Fifth generation (5G) mobile communication systems have entered the stage of commercial development, providing users with new services and improved user experiences as well as offering a host of novel opportunities to various industries. However, 5G still faces many challenges. To address these challenges, international industrial, academic, and standards organizations have commenced research on sixth generation (6G) wireless communication systems. A series of white papers and survey papers have been published, which aim to define 6G in terms of requirements, application scenarios, key technologies, etc. Although ITU-R has been working on the 6G vision and it is expected to reach a consensus on what 6G will be by mid-2023, the related global discussions are still wide open and the existing literature has identified numerous open issues. This paper first provides a comprehensive portrayal of the 6G vision, technical requirements, and application scenarios, covering the current common understanding of 6G. Then, a critical appraisal of the 6G network architecture and key technologies is presented. Furthermore, existing testbeds and advanced 6G verification platforms are detailed for the first time. In addition, future research directions and open challenges are identified for stimulating the on-going global debate. Finally, lessons learned to date concerning 6G networks are discussed.

Via

Access Paper or Ask Questions

HAPS for 6G Networks: Potential Use Cases, Open Challenges, and Possible Solutions

Jan 21, 2023

Omid Abbasi, Animesh Yadav, Halim Yanikomeroglu, Ngoc Dung Dao, Gamini Senarath, Peiying Zhu

Figure 1 for HAPS for 6G Networks: Potential Use Cases, Open Challenges, and Possible Solutions

Figure 2 for HAPS for 6G Networks: Potential Use Cases, Open Challenges, and Possible Solutions

Figure 3 for HAPS for 6G Networks: Potential Use Cases, Open Challenges, and Possible Solutions

Figure 4 for HAPS for 6G Networks: Potential Use Cases, Open Challenges, and Possible Solutions

Abstract:High altitude platform station (HAPS), which is deployed in the stratosphere at an altitude of 20-50 kilometres, has attracted much attention in recent years due to their large footprint, line-of-sight links, and fixed position relative to the Earth. Compared with existing network infrastructure, HAPS has a much larger coverage area than terrestrial base stations and is much closer than satellites to the ground users. Besides small-cells and macro-cells, a HAPS can offer one mega-cell, which can complement legacy networks in 6G and beyond wireless systems. This paper explores potential use cases and discusses relevant open challenges of integrating HAPS into legacy networks, while also suggesting some solutions to these challenges. The cumulative density functions of spectral efficiency of the integrated network and cell-edge users are studied and compared with terrestrial network. The results show the capacity gains achieved by the integrated network are beneficial to cell-edge users. Furthermore, the advantages of a HAPS for backhauling aerial base stations are demonstrated by the simulation results.

Via

Access Paper or Ask Questions