Abstract:Generative retrieval (GR) has emerged as a promising paradigm in recommendation systems by autoregressively decoding identifiers of target items. Despite its potential, current approaches typically rely on the next-token prediction schema, which treats each token of the next interacted items as the sole target. This narrow focus 1) limits their ability to capture the nuanced structure of user preferences, and 2) overlooks the deep interaction between decoded identifiers and user behavior sequences. In response to these challenges, we propose RankGR, a Rank-enhanced Generative Retrieval method that incorporates listwise direct preference optimization for recommendation. RankGR decomposes the retrieval process into two complementary stages: the Initial Assessment Phase (IAP) and the Refined Scoring Phase (RSP). In IAP, we incorporate a novel listwise direct preference optimization strategy into GR, thus facilitating a more comprehensive understanding of the hierarchical user preferences and more effective partial-order modeling. The RSP then refines the top-λ candidates generated by IAP with interactions towards input sequences using a lightweight scoring module, leading to more precise candidate evaluation. Both phases are jointly optimized under a unified GR model, ensuring consistency and efficiency. Additionally, we implement several practical improvements in training and deployment, ultimately achieving a real-time system capable of handling nearly ten thousand requests per second. Extensive offline performance on both research and industrial datasets, as well as the online gains on the "Guess You Like" section of Taobao, validate the effectiveness and scalability of RankGR.




Abstract:Large language models (LLMs) have demonstrated remarkable potential in transforming recommender systems from implicit behavioral pattern matching to explicit intent reasoning. While RecGPT-V1 successfully pioneered this paradigm by integrating LLM-based reasoning into user interest mining and item tag prediction, it suffers from four fundamental limitations: (1) computational inefficiency and cognitive redundancy across multiple reasoning routes; (2) insufficient explanation diversity in fixed-template generation; (3) limited generalization under supervised learning paradigms; and (4) simplistic outcome-focused evaluation that fails to match human standards. To address these challenges, we present RecGPT-V2 with four key innovations. First, a Hierarchical Multi-Agent System restructures intent reasoning through coordinated collaboration, eliminating cognitive duplication while enabling diverse intent coverage. Combined with Hybrid Representation Inference that compresses user-behavior contexts, our framework reduces GPU consumption by 60% and improves exclusive recall from 9.39% to 10.99%. Second, a Meta-Prompting framework dynamically generates contextually adaptive prompts, improving explanation diversity by +7.3%. Third, constrained reinforcement learning mitigates multi-reward conflicts, achieving +24.1% improvement in tag prediction and +13.0% in explanation acceptance. Fourth, an Agent-as-a-Judge framework decomposes assessment into multi-step reasoning, improving human preference alignment. Online A/B tests on Taobao demonstrate significant improvements: +2.98% CTR, +3.71% IPV, +2.19% TV, and +11.46% NER. RecGPT-V2 establishes both the technical feasibility and commercial viability of deploying LLM-powered intent reasoning at scale, bridging the gap between cognitive exploration and industrial utility.