Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Jun Yuan

Virtual Width Networks

Nov 17, 2025

Seed, Baisheng Li, Banggu Wu, Bole Ma, Bowen Xiao, Chaoyi Zhang, Cheng Li, Chengyi Wang, Chengyin Xu, Chi Zhang(+108 more)

Abstract:We introduce Virtual Width Networks (VWN), a framework that delivers the benefits of wider representations without incurring the quadratic cost of increasing the hidden size. VWN decouples representational width from backbone width, expanding the embedding space while keeping backbone compute nearly constant. In our large-scale experiment, an 8-times expansion accelerates optimization by over 2 times for next-token and 3 times for next-2-token prediction. The advantage amplifies over training as both the loss gap grows and the convergence-speedup ratio increases, showing that VWN is not only token-efficient but also increasingly effective with scale. Moreover, we identify an approximately log-linear scaling relation between virtual width and loss reduction, offering an initial empirical basis and motivation for exploring virtual-width scaling as a new dimension of large-model efficiency.

Via

Access Paper or Ask Questions

A Novel Mamba-based Sequential Recommendation Method

Apr 10, 2025

Jun Yuan

Figure 1 for A Novel Mamba-based Sequential Recommendation Method

Figure 2 for A Novel Mamba-based Sequential Recommendation Method

Figure 3 for A Novel Mamba-based Sequential Recommendation Method

Figure 4 for A Novel Mamba-based Sequential Recommendation Method

Abstract:Sequential recommendation (SR), which encodes user activity to predict the next action, has emerged as a widely adopted strategy in developing commercial personalized recommendation systems. Although Transformer-based models have proven effective for sequential recommendation, the complexity of the self-attention module in Transformers scales quadratically with the sequence length. Controlling model complexity is essential for large-scale recommendation systems, as these systems may need to handle billion-scale vocabularies that evolve continuously, as well as user behavior sequences that can exceed tens of thousands in length. In this paper, we propose a novel multi-head latent Mamba architecture, which employs multiple low-dimensional Mamba layers and fully connected layers coupled with positional encoding to simultaneously capture historical and item information within each latent subspace. Our proposed method not only enables scaling up to large-scale parameters but also extends to multi-domain recommendation by integrating and fine-tuning LLMs. Through extensive experiments on public datasets, we demonstrate how Hydra effectively addresses the effectiveness-efficiency dilemma, outperforming state-of-the-art sequential recommendation baselines with significantly fewer parameters and reduced training time.

Via

Access Paper or Ask Questions

A Contextual-Aware Position Encoding for Sequential Recommendation

Feb 13, 2025

Jun Yuan, Guohao Cai, Zhenhua Dong

Figure 1 for A Contextual-Aware Position Encoding for Sequential Recommendation

Figure 2 for A Contextual-Aware Position Encoding for Sequential Recommendation

Figure 3 for A Contextual-Aware Position Encoding for Sequential Recommendation

Figure 4 for A Contextual-Aware Position Encoding for Sequential Recommendation

Abstract:Sequential recommendation (SR), which encodes user activity to predict the next action, has emerged as a widely adopted strategy in developing commercial personalized recommendation systems. A critical component of modern SR models is the attention mechanism, which synthesizes users' historical activities. This mechanism is typically order-invariant and generally relies on position encoding (PE). Conventional SR models simply assign a learnable vector to each position, resulting in only modest gains compared to traditional recommendation models. Moreover, limited research has been conducted on position encoding tailored for sequential recommendation, leaving a significant gap in addressing its unique requirements. To bridge this gap, we propose a novel Contextual-Aware Position Encoding method for sequential recommendation, abbreviated as CAPE. To the best of our knowledge, CAPE is the first PE method specifically designed for sequential recommendation. Comprehensive experiments conducted on benchmark SR datasets demonstrate that CAPE consistently enhances multiple mainstream backbone models and achieves state-of-the-art performance, across small and large scale model size. Furthermore, we deployed CAPE in an industrial setting on a real-world commercial platform, clearly showcasing the effectiveness of our approach. Our source code is available at https://github.com/yjdy/CAPE.

* Accepted by WWW'25 Industry Track

Via

Access Paper or Ask Questions

MAJL: A Model-Agnostic Joint Learning Framework for Music Source Separation and Pitch Estimation

Jan 07, 2025

Haojie Wei, Jun Yuan, Rui Zhang, Quanyu Dai, Yueguo Chen

Figure 1 for MAJL: A Model-Agnostic Joint Learning Framework for Music Source Separation and Pitch Estimation

Figure 2 for MAJL: A Model-Agnostic Joint Learning Framework for Music Source Separation and Pitch Estimation

Figure 3 for MAJL: A Model-Agnostic Joint Learning Framework for Music Source Separation and Pitch Estimation

Figure 4 for MAJL: A Model-Agnostic Joint Learning Framework for Music Source Separation and Pitch Estimation

Abstract:Music source separation and pitch estimation are two vital tasks in music information retrieval. Typically, the input of pitch estimation is obtained from the output of music source separation. Therefore, existing methods have tried to perform these two tasks simultaneously, so as to leverage the mutually beneficial relationship between both tasks. However, these methods still face two critical challenges that limit the improvement of both tasks: the lack of labeled data and joint learning optimization. To address these challenges, we propose a Model-Agnostic Joint Learning (MAJL) framework for both tasks. MAJL is a generic framework and can use variant models for each task. It includes a two-stage training method and a dynamic weighting method named Dynamic Weights on Hard Samples (DWHS), which addresses the lack of labeled data and joint learning optimization, respectively. Experimental results on public music datasets show that MAJL outperforms state-of-the-art methods on both tasks, with significant improvements of 0.92 in Signal-to-Distortion Ratio (SDR) for music source separation and 2.71% in Raw Pitch Accuracy (RPA) for pitch estimation. Furthermore, comprehensive studies not only validate the effectiveness of each component of MAJL, but also indicate the great generality of MAJL in adapting to different model architectures.

Via

Access Paper or Ask Questions

A Parameter Update Balancing Algorithm for Multi-task Ranking Models in Recommendation Systems

Oct 08, 2024

Jun Yuan, Guohao Cai, Zhenhua Dong

Figure 1 for A Parameter Update Balancing Algorithm for Multi-task Ranking Models in Recommendation Systems

Figure 2 for A Parameter Update Balancing Algorithm for Multi-task Ranking Models in Recommendation Systems

Figure 3 for A Parameter Update Balancing Algorithm for Multi-task Ranking Models in Recommendation Systems

Figure 4 for A Parameter Update Balancing Algorithm for Multi-task Ranking Models in Recommendation Systems

Abstract:Multi-task ranking models have become essential for modern real-world recommendation systems. While most recommendation researches focus on designing sophisticated models for specific scenarios, achieving performance improvement for multi-task ranking models across various scenarios still remains a significant challenge. Training all tasks naively can result in inconsistent learning, highlighting the need for the development of multi-task optimization (MTO) methods to tackle this challenge. Conventional methods assume that the optimal joint gradient on shared parameters leads to optimal parameter updates. However, the actual update on model parameters may deviates significantly from gradients when using momentum based optimizers such as Adam, and we design and execute statistical experiments to support the observation. In this paper, we propose a novel Parameter Update Balancing algorithm for multi-task optimization, denoted as PUB. In contrast to traditional MTO method which are based on gradient level tasks fusion or loss level tasks fusion, PUB is the first work to optimize multiple tasks through parameter update balancing. Comprehensive experiments on benchmark multi-task ranking datasets demonstrate that PUB consistently improves several multi-task backbones and achieves state-of-the-art performance. Additionally, experiments on benchmark computer vision datasets show the great potential of PUB in various multi-task learning scenarios. Furthermore, we deployed our method for an industrial evaluation on the real-world commercial platform, HUAWEI AppGallery, where PUB significantly enhances the online multi-task ranking model, efficiently managing the primary traffic of a crucial channel.

* Accepted by ICDM'24

Via

Access Paper or Ask Questions

A Scalable Matrix Visualization for Understanding Tree Ensemble Classifiers

Sep 05, 2024

Zhen Li, Weikai Yang, Jun Yuan, Jing Wu, Changjian Chen, Yao Ming, Fan Yang, Hui Zhang, Shixia Liu

Figure 1 for A Scalable Matrix Visualization for Understanding Tree Ensemble Classifiers

Figure 2 for A Scalable Matrix Visualization for Understanding Tree Ensemble Classifiers

Figure 3 for A Scalable Matrix Visualization for Understanding Tree Ensemble Classifiers

Figure 4 for A Scalable Matrix Visualization for Understanding Tree Ensemble Classifiers

Abstract:The high performance of tree ensemble classifiers benefits from a large set of rules, which, in turn, makes the models hard to understand. To improve interpretability, existing methods extract a subset of rules for approximation using model reduction techniques. However, by focusing on the reduced rule set, these methods often lose fidelity and ignore anomalous rules that, despite their infrequency, play crucial roles in real-world applications. This paper introduces a scalable visual analysis method to explain tree ensemble classifiers that contain tens of thousands of rules. The key idea is to address the issue of losing fidelity by adaptively organizing the rules as a hierarchy rather than reducing them. To ensure the inclusion of anomalous rules, we develop an anomaly-biased model reduction method to prioritize these rules at each hierarchical level. Synergized with this hierarchical organization of rules, we develop a matrix-based hierarchical visualization to support exploration at different levels of detail. Our quantitative experiments and case studies demonstrate how our method fosters a deeper understanding of both common and anomalous rules, thereby enhancing interpretability without sacrificing comprehensiveness.

* 15 pages, 10 figures

Via

Access Paper or Ask Questions

Fooling SHAP with Output Shuffling Attacks

Aug 12, 2024

Jun Yuan, Aritra Dasgupta

Abstract:Explainable AI~(XAI) methods such as SHAP can help discover feature attributions in black-box models. If the method reveals a significant attribution from a ``protected feature'' (e.g., gender, race) on the model output, the model is considered unfair. However, adversarial attacks can subvert the detection of XAI methods. Previous approaches to constructing such an adversarial model require access to underlying data distribution, which may not be possible in many practical scenarios. We relax this constraint and propose a novel family of attacks, called shuffling attacks, that are data-agnostic. The proposed attack strategies can adapt any trained machine learning model to fool Shapley value-based explanations. We prove that Shapley values cannot detect shuffling attacks. However, algorithms that estimate Shapley values, such as linear SHAP and SHAP, can detect these attacks with varying degrees of effectiveness. We demonstrate the efficacy of the attack strategies by comparing the performance of linear SHAP and SHAP using real-world datasets.

Via

Access Paper or Ask Questions

RAT: Retrieval-Augmented Transformer for Click-Through Rate Prediction

Apr 05, 2024

Yushen Li, Jinpeng Wang, Tao Dai, Jieming Zhu, Jun Yuan, Rui Zhang, Shu-Tao Xia

Abstract:Predicting click-through rates (CTR) is a fundamental task for Web applications, where a key issue is to devise effective models for feature interactions. Current methodologies predominantly concentrate on modeling feature interactions within an individual sample, while overlooking the potential cross-sample relationships that can serve as a reference context to enhance the prediction. To make up for such deficiency, this paper develops a Retrieval-Augmented Transformer (RAT), aiming to acquire fine-grained feature interactions within and across samples. By retrieving similar samples, we construct augmented input for each target sample. We then build Transformer layers with cascaded attention to capture both intra- and cross-sample feature interactions, facilitating comprehensive reasoning for improved CTR prediction while retaining efficiency. Extensive experiments on real-world datasets substantiate the effectiveness of RAT and suggest its advantage in long-tail scenarios. The code has been open-sourced at \url{https://github.com/YushenLi807/WWW24-RAT}.

* Accepted to The ACM Web Conference 2024 (WWW'24, short paper). Data and code are available

Via

Access Paper or Ask Questions

TRIVEA: Transparent Ranking Interpretation using Visual Explanation of Black-Box Algorithmic Rankers

Aug 28, 2023

Jun Yuan, Kaustav Bhattacharjee, Akm Zahirul Islam, Aritra Dasgupta

Figure 1 for TRIVEA: Transparent Ranking Interpretation using Visual Explanation of Black-Box Algorithmic Rankers

Figure 2 for TRIVEA: Transparent Ranking Interpretation using Visual Explanation of Black-Box Algorithmic Rankers

Figure 3 for TRIVEA: Transparent Ranking Interpretation using Visual Explanation of Black-Box Algorithmic Rankers

Figure 4 for TRIVEA: Transparent Ranking Interpretation using Visual Explanation of Black-Box Algorithmic Rankers

Abstract:Ranking schemes drive many real-world decisions, like, where to study, whom to hire, what to buy, etc. Many of these decisions often come with high consequences. For example, a university can be deemed less prestigious if not featured in a top-k list, and consumers might not even explore products that do not get recommended to buyers. At the heart of most of these decisions are opaque ranking schemes, which dictate the ordering of data entities, but their internal logic is inaccessible or proprietary. Drawing inferences about the ranking differences is like a guessing game to the stakeholders, like, the rankees (i.e., the entities who are ranked, like product companies) and the decision-makers (i.e., who use the rankings, like buyers). In this paper, we aim to enable transparency in ranking interpretation by using algorithmic rankers that learn from available data and by enabling human reasoning about the learned ranking differences using explainable AI (XAI) methods. To realize this aim, we leverage the exploration-explanation paradigm of human-data interaction to let human stakeholders explore subsets and groupings of complex multi-attribute ranking data using visual explanations of model fit and attribute influence on rankings. We realize this explanation paradigm for transparent ranking interpretation in TRIVEA, a visual analytic system that is fueled by: i) visualizations of model fit derived from algorithmic rankers that learn the associations between attributes and rankings from available data and ii) visual explanations derived from XAI methods that help abstract important patterns, like, the relative influence of attributes in different ranking ranges. Using TRIVEA, end users not trained in data science have the agency to transparently reason about the global and local behavior of the rankings without the need to open black-box ranking models and develop confidence in the resulting attribute-based inferences. We demonstrate the efficacy of TRIVEA using multiple usage scenarios and subjective feedback from researchers with diverse domain expertise. Keywords: Visual Analytics, Learning-to-Rank, Explainable ML, Ranking

* Accepted for publication in SpringerNature's Visual Computer Journal

Via

Access Paper or Ask Questions

MISSRec: Pre-training and Transferring Multi-modal Interest-aware Sequence Representation for Recommendation

Aug 22, 2023

Jinpeng Wang, Ziyun Zeng, Yunxiao Wang, Yuting Wang, Xingyu Lu, Tianxiang Li, Jun Yuan, Rui Zhang, Hai-Tao Zheng, Shu-Tao Xia

Figure 1 for MISSRec: Pre-training and Transferring Multi-modal Interest-aware Sequence Representation for Recommendation

Figure 2 for MISSRec: Pre-training and Transferring Multi-modal Interest-aware Sequence Representation for Recommendation

Figure 3 for MISSRec: Pre-training and Transferring Multi-modal Interest-aware Sequence Representation for Recommendation

Figure 4 for MISSRec: Pre-training and Transferring Multi-modal Interest-aware Sequence Representation for Recommendation

Abstract:The goal of sequential recommendation (SR) is to predict a user's potential interested items based on her/his historical interaction sequences. Most existing sequential recommenders are developed based on ID features, which, despite their widespread use, often underperform with sparse IDs and struggle with the cold-start problem. Besides, inconsistent ID mappings hinder the model's transferability, isolating similar recommendation domains that could have been co-optimized. This paper aims to address these issues by exploring the potential of multi-modal information in learning robust and generalizable sequence representations. We propose MISSRec, a multi-modal pre-training and transfer learning framework for SR. On the user side, we design a Transformer-based encoder-decoder model, where the contextual encoder learns to capture the sequence-level multi-modal synergy while a novel interest-aware decoder is developed to grasp item-modality-interest relations for better sequence representation. On the candidate item side, we adopt a dynamic fusion module to produce user-adaptive item representation, providing more precise matching between users and items. We pre-train the model with contrastive learning objectives and fine-tune it in an efficient manner. Extensive experiments demonstrate the effectiveness and flexibility of MISSRec, promising an practical solution for real-world recommendation scenarios.

* Accepted to ACM MM 2023

Via

Access Paper or Ask Questions