Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Bo Yang

Department of Automation, Shanghai Jiao Tong University, Shanghai, China, Key Laboratory of System Control and Information Processing, Ministry of Education of China, Shanghai, China, Shanghai Engineering Research Center of Intelligent Control and Management, Shanghai, China

Survey for Landing Generative AI in Social and E-commerce Recsys -- the Industry Perspectives

Jun 10, 2024

Da Xu, Danqing Zhang, Guangyu Yang, Bo Yang, Shuyuan Xu, Lingling Zheng, Cindy Liang

Abstract:Recently, generative AI (GAI), with their emerging capabilities, have presented unique opportunities for augmenting and revolutionizing industrial recommender systems (Recsys). Despite growing research efforts at the intersection of these fields, the integration of GAI into industrial Recsys remains in its infancy, largely due to the intricate nature of modern industrial Recsys infrastructure, operations, and product sophistication. Drawing upon our experiences in successfully integrating GAI into several major social and e-commerce platforms, this survey aims to comprehensively examine the underlying system and AI foundations, solution frameworks, connections to key research advancements, as well as summarize the practical insights and challenges encountered in the endeavor to integrate GAI into industrial Recsys. As pioneering work in this domain, we hope outline the representative developments of relevant fields, shed lights on practical GAI adoptions in the industry, and motivate future research.

Via

Access Paper or Ask Questions

Joint Association, Beamforming, and Resource Allocation for Multi-IRS Enabled MU-MISO Systems With RSMA

Jun 05, 2024

Chunjie Wang, Xuhui Zhang, Huijun Xing, Liang Xue, Shuqiang Wang, Yanyan Shen, Bo Yang, Xinping Guan

Figure 1 for Joint Association, Beamforming, and Resource Allocation for Multi-IRS Enabled MU-MISO Systems With RSMA

Figure 2 for Joint Association, Beamforming, and Resource Allocation for Multi-IRS Enabled MU-MISO Systems With RSMA

Figure 3 for Joint Association, Beamforming, and Resource Allocation for Multi-IRS Enabled MU-MISO Systems With RSMA

Figure 4 for Joint Association, Beamforming, and Resource Allocation for Multi-IRS Enabled MU-MISO Systems With RSMA

Abstract:Intelligent reflecting surface (IRS) and rate-splitting multiple access (RSMA) technologies are at the forefront of enhancing spectrum and energy efficiency in the next generation multi-antenna communication systems. This paper explores a RSMA system with multiple IRSs, and proposes two purpose-driven scheduling schemes, i.e., the exhaustive IRS-aided (EIA) and opportunistic IRS-aided (OIA) schemes. The aim is to optimize the system weighted energy efficiency (EE) under the above two schemes, respectively. Specifically, the Dinkelbach, branch and bound, successive convex approximation, and the semidefinite relaxation methods are exploited within the alternating optimization framework to obtain effective solutions to the considered problems. The numerical findings indicate that the EIA scheme exhibits better performance compared to the OIA scheme in diverse scenarios when considering the weighted EE, and the proposed algorithm demonstrates superior performance in comparison to the baseline algorithms.

Via

Access Paper or Ask Questions

In-Context Decision Transformer: Reinforcement Learning via Hierarchical Chain-of-Thought

May 31, 2024

Sili Huang, Jifeng Hu, Hechang Chen, Lichao Sun, Bo Yang

Figure 1 for In-Context Decision Transformer: Reinforcement Learning via Hierarchical Chain-of-Thought

Figure 2 for In-Context Decision Transformer: Reinforcement Learning via Hierarchical Chain-of-Thought

Figure 3 for In-Context Decision Transformer: Reinforcement Learning via Hierarchical Chain-of-Thought

Figure 4 for In-Context Decision Transformer: Reinforcement Learning via Hierarchical Chain-of-Thought

Abstract:In-context learning is a promising approach for offline reinforcement learning (RL) to handle online tasks, which can be achieved by providing task prompts. Recent works demonstrated that in-context RL could emerge with self-improvement in a trial-and-error manner when treating RL tasks as an across-episodic sequential prediction problem. Despite the self-improvement not requiring gradient updates, current works still suffer from high computational costs when the across-episodic sequence increases with task horizons. To this end, we propose an In-context Decision Transformer (IDT) to achieve self-improvement in a high-level trial-and-error manner. Specifically, IDT is inspired by the efficient hierarchical structure of human decision-making and thus reconstructs the sequence to consist of high-level decisions instead of low-level actions that interact with environments. As one high-level decision can guide multi-step low-level actions, IDT naturally avoids excessively long sequences and solves online tasks more efficiently. Experimental results show that IDT achieves state-of-the-art in long-horizon tasks over current in-context RL methods. In particular, the online evaluation time of our IDT is \textbf{36$\times$} times faster than baselines in the D4RL benchmark and \textbf{27$\times$} times faster in the Grid World benchmark.

Via

Access Paper or Ask Questions

The RoboDrive Challenge: Drive Anytime Anywhere in Any Condition

May 14, 2024

Lingdong Kong, Shaoyuan Xie, Hanjiang Hu, Yaru Niu, Wei Tsang Ooi, Benoit R. Cottereau, Lai Xing Ng, Yuexin Ma, Wenwei Zhang, Liang Pan(+81 more)

Figure 1 for The RoboDrive Challenge: Drive Anytime Anywhere in Any Condition

Figure 2 for The RoboDrive Challenge: Drive Anytime Anywhere in Any Condition

Figure 3 for The RoboDrive Challenge: Drive Anytime Anywhere in Any Condition

Figure 4 for The RoboDrive Challenge: Drive Anytime Anywhere in Any Condition

Abstract:In the realm of autonomous driving, robust perception under out-of-distribution conditions is paramount for the safe deployment of vehicles. Challenges such as adverse weather, sensor malfunctions, and environmental unpredictability can severely impact the performance of autonomous systems. The 2024 RoboDrive Challenge was crafted to propel the development of driving perception technologies that can withstand and adapt to these real-world variabilities. Focusing on four pivotal tasks -- BEV detection, map segmentation, semantic occupancy prediction, and multi-view depth estimation -- the competition laid down a gauntlet to innovate and enhance system resilience against typical and atypical disturbances. This year's challenge consisted of five distinct tracks and attracted 140 registered teams from 93 institutes across 11 countries, resulting in nearly one thousand submissions evaluated through our servers. The competition culminated in 15 top-performing solutions, which introduced a range of innovative approaches including advanced data augmentation, multi-sensor fusion, self-supervised learning for error correction, and new algorithmic strategies to enhance sensor robustness. These contributions significantly advanced the state of the art, particularly in handling sensor inconsistencies and environmental variability. Participants, through collaborative efforts, pushed the boundaries of current technologies, showcasing their potential in real-world scenarios. Extensive evaluations and analyses provided insights into the effectiveness of these solutions, highlighting key trends and successful strategies for improving the resilience of driving perception systems. This challenge has set a new benchmark in the field, providing a rich repository of techniques expected to guide future research in this field.

* ICRA 2024; 31 pages, 24 figures, 5 tables; Code at https://robodrive-24.github.io/

Via

Access Paper or Ask Questions

Federated Adaptation for Foundation Model-based Recommendations

May 08, 2024

Chunxu Zhang, Guodong Long, Hongkuan Guo, Xiao Fang, Yang Song, Zhaojie Liu, Guorui Zhou, Zijian Zhang, Yang Liu, Bo Yang

Figure 1 for Federated Adaptation for Foundation Model-based Recommendations

Figure 2 for Federated Adaptation for Foundation Model-based Recommendations

Figure 3 for Federated Adaptation for Foundation Model-based Recommendations

Figure 4 for Federated Adaptation for Foundation Model-based Recommendations

Abstract:With the recent success of large language models, particularly foundation models with generalization abilities, applying foundation models for recommendations becomes a new paradigm to improve existing recommendation systems. It becomes a new open challenge to enable the foundation model to capture user preference changes in a timely manner with reasonable communication and computation costs while preserving privacy. This paper proposes a novel federated adaptation mechanism to enhance the foundation model-based recommendation system in a privacy-preserving manner. Specifically, each client will learn a lightweight personalized adapter using its private data. The adapter then collaborates with pre-trained foundation models to provide recommendation service efficiently with fine-grained manners. Importantly, users' private behavioral data remains secure as it is not shared with the server. This data localization-based privacy preservation is embodied via the federated learning framework. The model can ensure that shared knowledge is incorporated into all adapters while simultaneously preserving each user's personal preferences. Experimental results on four benchmark datasets demonstrate our method's superior performance. Implementation code is available to ease reproducibility.

* Accepted as a regular paper of IJCAI'24

Via

Access Paper or Ask Questions

Adversarial example soups: averaging multiple adversarial examples improves transferability without increasing additional generation time

Feb 27, 2024

Bo Yang, Hengwei Zhang, Chenwei Li, Jindong Wang

Abstract:For transfer-based attacks, the adversarial examples are crafted on the surrogate model, which can be implemented to mislead the target model effectively. The conventional method for maximizing adversarial transferability involves: (1) fine-tuning hyperparameters to generate multiple batches of adversarial examples on the substitute model; (2) conserving the batch of adversarial examples that have the best comprehensive performance on substitute model and target model, and discarding the others. In this work, we revisit the second step of this process in the context of fine-tuning hyperparameters to craft adversarial examples, where multiple batches of fine-tuned adversarial examples often appear in a single high error hilltop. We demonstrate that averaging multiple batches of adversarial examples under different hyperparameter configurations, which refers to as "adversarial example soups", can often enhance adversarial transferability. Compared with traditional methods, the proposed method incurs no additional generation time and computational cost. Besides, our method is orthogonal to existing transfer-based methods and can be combined with them seamlessly to generate more transferable adversarial examples. Extensive experiments on the ImageNet dataset show that our methods achieve a higher attack success rate than the state-of-the-art attacks.

* 16 pages, 8 figures, 12 tables

Via

Access Paper or Ask Questions

Scalable Volt-VAR Optimization using RLlib-IMPALA Framework: A Reinforcement Learning Approach

Feb 24, 2024

Alaa Selim, Yanzhu Ye, Junbo Zhao, Bo Yang

Figure 1 for Scalable Volt-VAR Optimization using RLlib-IMPALA Framework: A Reinforcement Learning Approach

Figure 2 for Scalable Volt-VAR Optimization using RLlib-IMPALA Framework: A Reinforcement Learning Approach

Figure 3 for Scalable Volt-VAR Optimization using RLlib-IMPALA Framework: A Reinforcement Learning Approach

Figure 4 for Scalable Volt-VAR Optimization using RLlib-IMPALA Framework: A Reinforcement Learning Approach

Abstract:In the rapidly evolving domain of electrical power systems, the Volt-VAR optimization (VVO) is increasingly critical, especially with the burgeoning integration of renewable energy sources. Traditional approaches to learning-based VVO in expansive and dynamically changing power systems are often hindered by computational complexities. To address this challenge, our research presents a novel framework that harnesses the potential of Deep Reinforcement Learning (DRL), specifically utilizing the Importance Weighted Actor-Learner Architecture (IMPALA) algorithm, executed on the RAY platform. This framework, built upon RLlib-an industry-standard in Reinforcement Learning-ingeniously capitalizes on the distributed computing capabilities and advanced hyperparameter tuning offered by RAY. This design significantly expedites the exploration and exploitation phases in the VVO solution space. Our empirical results demonstrate that our approach not only surpasses existing DRL methods in achieving superior reward outcomes but also manifests a remarkable tenfold reduction in computational requirements. The integration of our DRL agent with the RAY platform facilitates the creation of RLlib-IMPALA, a novel framework that efficiently uses RAY's resources to improve system adaptability and control. RLlib-IMPALA leverages RAY's toolkit to enhance analytical capabilities and significantly speeds up training to become more than 10 times faster than other state-of-the-art DRL methods.

Via

Access Paper or Ask Questions

Low-Dose CT Reconstruction Using Dataset-free Learning

Feb 17, 2024

Feng Wang, Renfang Wang, Bo Yang, Hong Qiu

Abstract:Low-Dose computer tomography (LDCT) is an ideal alternative to reduce radiation risk in clinical applications. Although supervised-deep-learning-based reconstruction methods have demonstrated superior performance compared to conventional model-driven reconstruction algorithms, they require collecting massive pairs of low-dose and norm-dose CT images for neural network training, which limits their practical application in LDCT imaging. In this paper, we propose an unsupervised and training data-free learning reconstruction method for LDCT imaging that avoids the requirement for training data. The proposed method is a post-processing technique that aims to enhance the initial low-quality reconstruction results, and it reconstructs the high-quality images by neural work training that minimizes the $\ell_1$-norm distance between the CT measurements and their corresponding simulated sinogram data, as well as the total variation (TV) value of the reconstructed image. Moreover, the proposed method does not require to set the weights for both the data fidelity term and the plenty term. Experimental results on the AAPM challenge data and LoDoPab-CT data demonstrate that the proposed method is able to effectively suppress the noise and preserve the tiny structures. And these results also shows the proposed method's low computational cost and rapid convergence. The source code is available at \url{https://github.com/linfengyu77/IRLDCT}.

* submitted to Elsevier

Via

Access Paper or Ask Questions

Troublemaker Learning for Low-Light Image Enhancement

Feb 07, 2024

Yinghao Song, Zhiyuan Cao, Wanhong Xiang, Sifan Long, Bo Yang, Hongwei Ge, Yanchun Liang, Chunguo Wu

Figure 1 for Troublemaker Learning for Low-Light Image Enhancement

Figure 2 for Troublemaker Learning for Low-Light Image Enhancement

Figure 3 for Troublemaker Learning for Low-Light Image Enhancement

Figure 4 for Troublemaker Learning for Low-Light Image Enhancement

Abstract:Low-light image enhancement (LLIE) restores the color and brightness of underexposed images. Supervised methods suffer from high costs in collecting low/normal-light image pairs. Unsupervised methods invest substantial effort in crafting complex loss functions. We address these two challenges through the proposed TroubleMaker Learning (TML) strategy, which employs normal-light images as inputs for training. TML is simple: we first dim the input and then increase its brightness. TML is based on two core components. First, the troublemaker model (TM) constructs pseudo low-light images from normal images to relieve the cost of pairwise data. Second, the predicting model (PM) enhances the brightness of pseudo low-light images. Additionally, we incorporate an enhancing model (EM) to further improve the visual performance of PM outputs. Moreover, in LLIE tasks, characterizing global element correlations is important because more information on the same object can be captured. CNN cannot achieve this well, and self-attention has high time complexity. Accordingly, we propose Global Dynamic Convolution (GDC) with O(n) time complexity, which essentially imitates the partial calculation process of self-attention to formulate elementwise correlations. Based on the GDC module, we build the UGDC model. Extensive quantitative and qualitative experiments demonstrate that UGDC trained with TML can achieve competitive performance against state-of-the-art approaches on public datasets. The code is available at https://github.com/Rainbowman0/TML_LLIE.

Via

Access Paper or Ask Questions

Zero-shot sketch-based remote sensing image retrieval based on multi-level and attention-guided tokenization

Feb 03, 2024

Bo Yang, Chen Wang, Xiaoshuang Ma, Beiping Song, Zhuang Liu

Figure 1 for Zero-shot sketch-based remote sensing image retrieval based on multi-level and attention-guided tokenization

Figure 2 for Zero-shot sketch-based remote sensing image retrieval based on multi-level and attention-guided tokenization

Figure 3 for Zero-shot sketch-based remote sensing image retrieval based on multi-level and attention-guided tokenization

Figure 4 for Zero-shot sketch-based remote sensing image retrieval based on multi-level and attention-guided tokenization

Abstract:Effectively and efficiently retrieving images from remote sensing databases is a critical challenge in the realm of remote sensing big data. Utilizing hand-drawn sketches as retrieval inputs offers intuitive and user-friendly advantages, yet the potential of multi-level feature integration from sketches remains underexplored, leading to suboptimal retrieval performance. To address this gap, our study introduces a novel zero-shot, sketch-based retrieval method for remote sensing images, leveraging multi-level, attention-guided tokenization. This approach starts by employing multi-level self-attention feature extraction to tokenize the query sketches, as well as self-attention feature extraction to tokenize the candidate images. It then employs cross-attention mechanisms to establish token correspondence between these two modalities, facilitating the computation of sketch-to-image similarity. Our method demonstrates superior retrieval accuracy over existing sketch-based remote sensing image retrieval techniques, as evidenced by tests on four datasets. Notably, it also exhibits robust zero-shot learning capabilities and strong generalizability in handling unseen categories and novel remote sensing data. The method's scalability can be further enhanced by the pre-calculation of retrieval tokens for all candidate images in a database. This research underscores the significant potential of multi-level, attention-guided tokenization in cross-modal remote sensing image retrieval. For broader accessibility and research facilitation, we have made the code and dataset used in this study publicly available online. Code and dataset are available at https://github.com/Snowstormfly/Cross-modal-retrieval-MLAGT.

* 13 pages, 6 figures

Via

Access Paper or Ask Questions