Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Shihao Li

Alibaba Inc

Constrained Optimal Planning to Minimize Battery Degradation of Autonomous Mobile Robots

Jun 16, 2025

Jiachen Li, Jian Chu, Feiyang Zhao, Shihao Li, Wei Li, Dongmei Chen

Abstract:This paper proposes an optimization framework that addresses both cycling degradation and calendar aging of batteries for autonomous mobile robot (AMR) to minimize battery degradation while ensuring task completion. A rectangle method of piecewise linear approximation is employed to linearize the bilinear optimization problem. We conduct a case study to validate the efficiency of the proposed framework in achieving an optimal path planning for AMRs while reducing battery aging.

Via

Access Paper or Ask Questions

Robust Optimal Task Planning to Maximize Battery Life

Jun 12, 2025

Jiachen Li, Chu Jian, Feiyang Zhao, Shihao Li, Wei Li, Dongmei Chen

Abstract:This paper proposes a control-oriented optimization platform for autonomous mobile robots (AMRs), focusing on extending battery life while ensuring task completion. The requirement of fast AMR task planning while maintaining minimum battery state of charge, thus maximizing the battery life, renders a bilinear optimization problem. McCormick envelop technique is proposed to linearize the bilinear term. A novel planning algorithm with relaxed constraints is also developed to handle parameter uncertainties robustly with high efficiency ensured. Simulation results are provided to demonstrate the utility of the proposed methods in reducing battery degradation while satisfying task completion requirements.

Via

Access Paper or Ask Questions

NEXT: Multi-Grained Mixture of Experts via Text-Modulation for Multi-Modal Object Re-ID

May 26, 2025

Shihao Li, Chenglong Li, Aihua Zheng, Andong Lu, Jin Tang, Jixin Ma

Figure 1 for NEXT: Multi-Grained Mixture of Experts via Text-Modulation for Multi-Modal Object Re-ID

Figure 2 for NEXT: Multi-Grained Mixture of Experts via Text-Modulation for Multi-Modal Object Re-ID

Figure 3 for NEXT: Multi-Grained Mixture of Experts via Text-Modulation for Multi-Modal Object Re-ID

Figure 4 for NEXT: Multi-Grained Mixture of Experts via Text-Modulation for Multi-Modal Object Re-ID

Abstract:Multi-modal object re-identification (ReID) aims to extract identity features across heterogeneous spectral modalities to enable accurate recognition and retrieval in complex real-world scenarios. However, most existing methods rely on implicit feature fusion structures, making it difficult to model fine-grained recognition strategies under varying challenging conditions. Benefiting from the powerful semantic understanding capabilities of Multi-modal Large Language Models (MLLMs), the visual appearance of an object can be effectively translated into descriptive text. In this paper, we propose a reliable multi-modal caption generation method based on attribute confidence, which significantly reduces the unknown recognition rate of MLLMs in multi-modal semantic generation and improves the quality of generated text. Additionally, we propose a novel ReID framework NEXT, the Multi-grained Mixture of Experts via Text-Modulation for Multi-modal Object Re-Identification. Specifically, we decouple the recognition problem into semantic and structural expert branches to separately capture modality-specific appearance and intrinsic structure. For semantic recognition, we propose the Text-Modulated Semantic-sampling Experts (TMSE), which leverages randomly sampled high-quality semantic texts to modulate expert-specific sampling of multi-modal features and mining intra-modality fine-grained semantic cues. Then, to recognize coarse-grained structure features, we propose the Context-Shared Structure-aware Experts (CSSE) that focuses on capturing the holistic object structure across modalities and maintains inter-modality structural consistency through a soft routing mechanism. Finally, we propose the Multi-Modal Feature Aggregation (MMFA), which adopts a unified feature fusion strategy to simply and effectively integrate semantic and structural expert outputs into the final identity representations.

Via

Access Paper or Ask Questions

ICPL-ReID: Identity-Conditional Prompt Learning for Multi-Spectral Object Re-Identification

May 23, 2025

Shihao Li, Chenglong Li, Aihua Zheng, Jin Tang, Bin Luo

Figure 1 for ICPL-ReID: Identity-Conditional Prompt Learning for Multi-Spectral Object Re-Identification

Figure 2 for ICPL-ReID: Identity-Conditional Prompt Learning for Multi-Spectral Object Re-Identification

Figure 3 for ICPL-ReID: Identity-Conditional Prompt Learning for Multi-Spectral Object Re-Identification

Figure 4 for ICPL-ReID: Identity-Conditional Prompt Learning for Multi-Spectral Object Re-Identification

Abstract:Multi-spectral object re-identification (ReID) brings a new perception perspective for smart city and intelligent transportation applications, effectively addressing challenges from complex illumination and adverse weather. However, complex modal differences between heterogeneous spectra pose challenges to efficiently utilizing complementary and discrepancy of spectra information. Most existing methods fuse spectral data through intricate modal interaction modules, lacking fine-grained semantic understanding of spectral information (\textit{e.g.}, text descriptions, part masks, and object keypoints). To solve this challenge, we propose a novel Identity-Conditional text Prompt Learning framework (ICPL), which exploits the powerful cross-modal alignment capability of CLIP, to unify different spectral visual features from text semantics. Specifically, we first propose the online prompt learning using learnable text prompt as the identity-level semantic center to bridge the identity semantics of different spectra in online manner. Then, in lack of concrete text descriptions, we propose the multi-spectral identity-condition module to use identity prototype as spectral identity condition to constraint prompt learning. Meanwhile, we construct the alignment loop mutually optimizing the learnable text prompt and spectral visual encoder to avoid online prompt learning disrupting the pre-trained text-image alignment distribution. In addition, to adapt to small-scale multi-spectral data and mitigate style differences between spectra, we propose multi-spectral adapter that employs a low-rank adaption method to learn spectra-specific features. Comprehensive experiments on 5 benchmarks, including RGBNT201, Market-MM, MSVR310, RGBN300, and RGBNT100, demonstrate that the proposed method outperforms the state-of-the-art methods.

* Accepted by IEEE Transactions on Multimedia (TMM)

Via

Access Paper or Ask Questions

SYKI-SVC: Advancing Singing Voice Conversion with Post-Processing Innovations and an Open-Source Professional Testset

Jan 06, 2025

Yiquan Zhou, Wenyu Wang, Hongwu Ding, Jiacheng Xu, Jihua Zhu, Xin Gao, Shihao Li

Abstract:Singing voice conversion aims to transform a source singing voice into that of a target singer while preserving the original lyrics, melody, and various vocal techniques. In this paper, we propose a high-fidelity singing voice conversion system. Our system builds upon the SVCC T02 framework and consists of three key components: a feature extractor, a voice converter, and a post-processor. The feature extractor utilizes the ContentVec and Whisper models to derive F0 contours and extract speaker-independent linguistic features from the input singing voice. The voice converter then integrates the extracted timbre, F0, and linguistic content to synthesize the target speaker's waveform. The post-processor augments high-frequency information directly from the source through simple and effective signal processing to enhance audio quality. Due to the lack of a standardized professional dataset for evaluating expressive singing conversion systems, we have created and made publicly available a specialized test set. Comparative evaluations demonstrate that our system achieves a remarkably high level of naturalness, and further analysis confirms the efficacy of our proposed system design.

* Accepted by ICASSP 2025

Via

Access Paper or Ask Questions

PANDORA: Deep graph learning based COVID-19 infection risk level forecasting

Jun 07, 2024

Shuo Yu, Feng Xia, Yueru Wang, Shihao Li, Falih Febrinanto, Madhu Chetty

Figure 1 for PANDORA: Deep graph learning based COVID-19 infection risk level forecasting

Figure 2 for PANDORA: Deep graph learning based COVID-19 infection risk level forecasting

Figure 3 for PANDORA: Deep graph learning based COVID-19 infection risk level forecasting

Figure 4 for PANDORA: Deep graph learning based COVID-19 infection risk level forecasting

Abstract:COVID-19 as a global pandemic causes a massive disruption to social stability that threatens human life and the economy. Policymakers and all elements of society must deliver measurable actions based on the pandemic's severity to minimize the detrimental impact of COVID-19. A proper forecasting system is arguably important to provide an early signal of the risk of COVID-19 infection so that the authorities are ready to protect the people from the worst. However, making a good forecasting model for infection risks in different cities or regions is not an easy task, because it has a lot of influential factors that are difficult to be identified manually. To address the current limitations, we propose a deep graph learning model, called PANDORA, to predict the infection risks of COVID-19, by considering all essential factors and integrating them into a geographical network. The framework uses geographical position relations and transportation frequency as higher-order structural properties formulated by higher-order network structures (i.e., network motifs). Moreover, four significant node attributes (i.e., multiple features of a particular area, including climate, medical condition, economy, and human mobility) are also considered. We propose three different aggregators to better aggregate node attributes and structural features, namely, Hadamard, Summation, and Connection. Experimental results over real data show that PANDORA outperforms the baseline method with higher accuracy and faster convergence speed, no matter which aggregator is chosen. We believe that PANDORA using deep graph learning provides a promising approach to get superior performance in infection risk level forecasting and help humans battle the COVID-19 crisis.

Via

Access Paper or Ask Questions

State Space Model for New-Generation Network Alternative to Transformers: A Survey

Apr 15, 2024

Xiao Wang, Shiao Wang, Yuhe Ding, Yuehang Li, Wentao Wu, Yao Rong, Weizhe Kong, Ju Huang, Shihao Li, Haoxiang Yang(+6 more)

Abstract:In the post-deep learning era, the Transformer architecture has demonstrated its powerful performance across pre-trained big models and various downstream tasks. However, the enormous computational demands of this architecture have deterred many researchers. To further reduce the complexity of attention models, numerous efforts have been made to design more efficient methods. Among them, the State Space Model (SSM), as a possible replacement for the self-attention based Transformer model, has drawn more and more attention in recent years. In this paper, we give the first comprehensive review of these works and also provide experimental comparisons and analysis to better demonstrate the features and advantages of SSM. Specifically, we first give a detailed description of principles to help the readers quickly capture the key ideas of SSM. After that, we dive into the reviews of existing SSMs and their various applications, including natural language processing, computer vision, graph, multi-modal and multi-media, point cloud/event stream, time series data, and other domains. In addition, we give statistical comparisons and analysis of these models and hope it helps the readers to understand the effectiveness of different structures on various tasks. Then, we propose possible research points in this direction to better promote the development of the theoretical model and application of SSM. More related works will be continuously updated on the following GitHub: https://github.com/Event-AHU/Mamba_State_Space_Model_Paper_List.

* The First review of State Space Model (SSM)/Mamba and their applications in artificial intelligence, 33 pages

Via

Access Paper or Ask Questions

MRIF: Multi-resolution Interest Fusion for Recommendation

Jul 08, 2020

Shihao Li, Dekun Yang, Bufeng Zhang

Figure 1 for MRIF: Multi-resolution Interest Fusion for Recommendation

Figure 2 for MRIF: Multi-resolution Interest Fusion for Recommendation

Figure 3 for MRIF: Multi-resolution Interest Fusion for Recommendation

Figure 4 for MRIF: Multi-resolution Interest Fusion for Recommendation

Abstract:The main task of personalized recommendation is capturing users' interests based on their historical behaviors. Most of recent advances in recommender systems mainly focus on modeling users' preferences accurately using deep learning based approaches. There are two important properties of users' interests, one is that users' interests are dynamic and evolve over time, the other is that users' interests have different resolutions, or temporal-ranges to be precise, such as long-term and short-term preferences. Existing approaches either use Recurrent Neural Networks (RNNs) to address the drifts in users' interests without considering different temporal-ranges, or design two different networks to model long-term and short-term preferences separately. This paper presents a multi-resolution interest fusion model (MRIF) that takes both properties of users' interests into consideration. The proposed model is capable to capture the dynamic changes in users' interests at different temporal-ranges, and provides an effective way to combine a group of multi-resolution user interests to make predictions. Experiments show that our method outperforms state-of-the-art recommendation methods consistently.

* 4 pages

Via

Access Paper or Ask Questions