Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Shuai Li

Refer to the report for detailed contributions

TEeVTOL: Balancing Energy and Time Efficiency in eVTOL Aircraft Path Planning Across City-Scale Wind Fields

Mar 21, 2024

Songyang Liu, Shuai Li, Haochen Li, Weizi Li, Jindong Tan

Figure 1 for TEeVTOL: Balancing Energy and Time Efficiency in eVTOL Aircraft Path Planning Across City-Scale Wind Fields

Figure 2 for TEeVTOL: Balancing Energy and Time Efficiency in eVTOL Aircraft Path Planning Across City-Scale Wind Fields

Figure 3 for TEeVTOL: Balancing Energy and Time Efficiency in eVTOL Aircraft Path Planning Across City-Scale Wind Fields

Figure 4 for TEeVTOL: Balancing Energy and Time Efficiency in eVTOL Aircraft Path Planning Across City-Scale Wind Fields

Abstract:Electric vertical-takeoff and landing (eVTOL) aircraft, recognized for their maneuverability and flexibility, offer a promising alternative to our transportation system. However, the operational effectiveness of these aircraft faces many challenges, such as the delicate balance between energy and time efficiency, stemming from unpredictable environmental factors, including wind fields. Mathematical modeling-based approaches have been adopted to plan aircraft flight path in urban wind fields with the goal to save energy and time costs. While effective, they are limited in adapting to dynamic and complex environments. To optimize energy and time efficiency in eVTOL's flight through dynamic wind fields, we introduce a novel path planning method leveraging deep reinforcement learning. We assess our method with extensive experiments, comparing it to Dijkstra's algorithm -- the theoretically optimal approach for determining shortest paths in a weighted graph, where weights represent either energy or time cost. The results show that our method achieves a graceful balance between energy and time efficiency, closely resembling the theoretically optimal values for both objectives.

Via

Access Paper or Ask Questions

Low-Cost and Real-Time Industrial Human Action Recognitions Based on Large-Scale Foundation Models

Mar 13, 2024

Wensheng Liang, Ruiyan Zhuang, Xianwei Shi, Shuai Li, Zhicheng Wang, Xiaoguang Ma

Figure 1 for Low-Cost and Real-Time Industrial Human Action Recognitions Based on Large-Scale Foundation Models

Figure 2 for Low-Cost and Real-Time Industrial Human Action Recognitions Based on Large-Scale Foundation Models

Figure 3 for Low-Cost and Real-Time Industrial Human Action Recognitions Based on Large-Scale Foundation Models

Figure 4 for Low-Cost and Real-Time Industrial Human Action Recognitions Based on Large-Scale Foundation Models

Abstract:Industrial managements, including quality control, cost and safety optimization, etc., heavily rely on high quality industrial human action recognitions (IHARs) which were hard to be implemented in large-scale industrial scenes due to their high costs and poor real-time performance. In this paper, we proposed a large-scale foundation model(LSFM)-based IHAR method, wherein various LSFMs and lightweight methods were jointly used, for the first time, to fulfill low-cost dataset establishment and real-time IHARs. Comprehensive tests on in-situ large-scale industrial manufacturing lines elucidated that the proposed method realized great reduction on employment costs, superior real-time performance, and satisfactory accuracy and generalization capabilities, indicating its great potential as a backbone IHAR method, especially for large-scale industrial applications.

Via

Access Paper or Ask Questions

Which LLM to Play? Convergence-Aware Online Model Selection with Time-Increasing Bandits

Mar 11, 2024

Yu Xia, Fang Kong, Tong Yu, Liya Guo, Ryan A. Rossi, Sungchul Kim, Shuai Li

Figure 1 for Which LLM to Play? Convergence-Aware Online Model Selection with Time-Increasing Bandits

Figure 2 for Which LLM to Play? Convergence-Aware Online Model Selection with Time-Increasing Bandits

Figure 3 for Which LLM to Play? Convergence-Aware Online Model Selection with Time-Increasing Bandits

Figure 4 for Which LLM to Play? Convergence-Aware Online Model Selection with Time-Increasing Bandits

Abstract:Web-based applications such as chatbots, search engines and news recommendations continue to grow in scale and complexity with the recent surge in the adoption of LLMs. Online model selection has thus garnered increasing attention due to the need to choose the best model among a diverse set while balancing task reward and exploration cost. Organizations faces decisions like whether to employ a costly API-based LLM or a locally finetuned small LLM, weighing cost against performance. Traditional selection methods often evaluate every candidate model before choosing one, which are becoming impractical given the rising costs of training and finetuning LLMs. Moreover, it is undesirable to allocate excessive resources towards exploring poor-performing models. While some recent works leverage online bandit algorithm to manage such exploration-exploitation trade-off in model selection, they tend to overlook the increasing-then-converging trend in model performances as the model is iteratively finetuned, leading to less accurate predictions and suboptimal model selections. In this paper, we propose a time-increasing bandit algorithm TI-UCB, which effectively predicts the increase of model performances due to finetuning and efficiently balances exploration and exploitation in model selection. To further capture the converging points of models, we develop a change detection mechanism by comparing consecutive increase predictions. We theoretically prove that our algorithm achieves a logarithmic regret upper bound in a typical increasing bandit setting, which implies a fast convergence rate. The advantage of our method is also empirically validated through extensive experiments on classification model selection and online selection of LLMs. Our results highlight the importance of utilizing increasing-then-converging pattern for more efficient and economic model selection in the deployment of LLMs.

* Accepted by WWW'24 (Oral)

Via

Access Paper or Ask Questions

Dynamic Perturbation-Adaptive Adversarial Training on Medical Image Classification

Mar 11, 2024

Shuai Li, Xiaoguang Ma, Shancheng Jiang, Lu Meng

Figure 1 for Dynamic Perturbation-Adaptive Adversarial Training on Medical Image Classification

Figure 2 for Dynamic Perturbation-Adaptive Adversarial Training on Medical Image Classification

Figure 3 for Dynamic Perturbation-Adaptive Adversarial Training on Medical Image Classification

Figure 4 for Dynamic Perturbation-Adaptive Adversarial Training on Medical Image Classification

Abstract:Remarkable successes were made in Medical Image Classification (MIC) recently, mainly due to wide applications of convolutional neural networks (CNNs). However, adversarial examples (AEs) exhibited imperceptible similarity with raw data, raising serious concerns on network robustness. Although adversarial training (AT), in responding to malevolent AEs, was recognized as an effective approach to improve robustness, it was challenging to overcome generalization decline of networks caused by the AT. In this paper, in order to reserve high generalization while improving robustness, we proposed a dynamic perturbation-adaptive adversarial training (DPAAT) method, which placed AT in a dynamic learning environment to generate adaptive data-level perturbations and provided a dynamically updated criterion by loss information collections to handle the disadvantage of fixed perturbation sizes in conventional AT methods and the dependence on external transference. Comprehensive testing on dermatology HAM10000 dataset showed that the DPAAT not only achieved better robustness improvement and generalization preservation but also significantly enhanced mean average precision and interpretability on various CNNs, indicating its great potential as a generic adversarial training method on the MIC.

* 9 pages, 4 figures, 2 tables

Via

Access Paper or Ask Questions

Decoupling Degradations with Recurrent Network for Video Restoration in Under-Display Camera

Mar 08, 2024

Chengxu Liu, Xuan Wang, Yuanting Fan, Shuai Li, Xueming Qian

Figure 1 for Decoupling Degradations with Recurrent Network for Video Restoration in Under-Display Camera

Figure 2 for Decoupling Degradations with Recurrent Network for Video Restoration in Under-Display Camera

Figure 3 for Decoupling Degradations with Recurrent Network for Video Restoration in Under-Display Camera

Figure 4 for Decoupling Degradations with Recurrent Network for Video Restoration in Under-Display Camera

Abstract:Under-display camera (UDC) systems are the foundation of full-screen display devices in which the lens mounts under the display. The pixel array of light-emitting diodes used for display diffracts and attenuates incident light, causing various degradations as the light intensity changes. Unlike general video restoration which recovers video by treating different degradation factors equally, video restoration for UDC systems is more challenging that concerns removing diverse degradation over time while preserving temporal consistency. In this paper, we introduce a novel video restoration network, called D$^2$RNet, specifically designed for UDC systems. It employs a set of Decoupling Attention Modules (DAM) that effectively separate the various video degradation factors. More specifically, a soft mask generation function is proposed to formulate each frame into flare and haze based on the diffraction arising from incident light of different intensities, followed by the proposed flare and haze removal components that leverage long- and short-term feature learning to handle the respective degradations. Such a design offers an targeted and effective solution to eliminating various types of degradation in UDC systems. We further extend our design into multi-scale to overcome the scale-changing of degradation that often occur in long-range videos. To demonstrate the superiority of D$^2$RNet, we propose a large-scale UDC video benchmark by gathering HDR videos and generating realistically degraded videos using the point spread function measured by a commercial UDC system. Extensive quantitative and qualitative evaluations demonstrate the superiority of D$^2$RNet compared to other state-of-the-art video restoration and UDC image restoration methods. Code is available at https://github.com/ChengxuLiu/DDRNet.git

* AAAI 2024

Via

Access Paper or Ask Questions

UniVS: Unified and Universal Video Segmentation with Prompts as Queries

Feb 28, 2024

Minghan Li, Shuai Li, Xindong Zhang, Lei Zhang

Abstract:Despite the recent advances in unified image segmentation (IS), developing a unified video segmentation (VS) model remains a challenge. This is mainly because generic category-specified VS tasks need to detect all objects and track them across consecutive frames, while prompt-guided VS tasks require re-identifying the target with visual/text prompts throughout the entire video, making it hard to handle the different tasks with the same architecture. We make an attempt to address these issues and present a novel unified VS architecture, namely UniVS, by using prompts as queries. UniVS averages the prompt features of the target from previous frames as its initial query to explicitly decode masks, and introduces a target-wise prompt cross-attention layer in the mask decoder to integrate prompt features in the memory pool. By taking the predicted masks of entities from previous frames as their visual prompts, UniVS converts different VS tasks into prompt-guided target segmentation, eliminating the heuristic inter-frame matching process. Our framework not only unifies the different VS tasks but also naturally achieves universal training and testing, ensuring robust performance across different scenarios. UniVS shows a commendable balance between performance and universality on 10 challenging VS benchmarks, covering video instance, semantic, panoptic, object, and referring segmentation tasks. Code can be found at \url{https://github.com/MinghanLi/UniVS}.

* The IEEE/CVF Conference on Computer Vision and Pattern Recognition 2024
* 21 pages, 11 figures, 10 tabels, CVPR2024

Via

Access Paper or Ask Questions

Transcending Adversarial Perturbations: Manifold-Aided Adversarial Examples with Legitimate Semantics

Feb 05, 2024

Shuai Li, Xiaoyu Jiang, Xiaoguang Ma

Figure 1 for Transcending Adversarial Perturbations: Manifold-Aided Adversarial Examples with Legitimate Semantics

Figure 2 for Transcending Adversarial Perturbations: Manifold-Aided Adversarial Examples with Legitimate Semantics

Figure 3 for Transcending Adversarial Perturbations: Manifold-Aided Adversarial Examples with Legitimate Semantics

Figure 4 for Transcending Adversarial Perturbations: Manifold-Aided Adversarial Examples with Legitimate Semantics

Abstract:Deep neural networks were significantly vulnerable to adversarial examples manipulated by malicious tiny perturbations. Although most conventional adversarial attacks ensured the visual imperceptibility between adversarial examples and corresponding raw images by minimizing their geometric distance, these constraints on geometric distance led to limited attack transferability, inferior visual quality, and human-imperceptible interpretability. In this paper, we proposed a supervised semantic-transformation generative model to generate adversarial examples with real and legitimate semantics, wherein an unrestricted adversarial manifold containing continuous semantic variations was constructed for the first time to realize a legitimate transition from non-adversarial examples to adversarial ones. Comprehensive experiments on MNIST and industrial defect datasets showed that our adversarial examples not only exhibited better visual quality but also achieved superior attack transferability and more effective explanations for model vulnerabilities, indicating their great potential as generic adversarial examples. The code and pre-trained models were available at https://github.com/shuaili1027/MAELS.git.

* 12 pages, 6 figures

Via

Access Paper or Ask Questions

Unsupervised Spatial-Temporal Feature Enrichment and Fidelity Preservation Network for Skeleton based Action Recognition

Jan 25, 2024

Chuankun Li, Shuai Li, Yanbo Gao, Ping Chen, Jian Li, Wanqing Li

Abstract:Unsupervised skeleton based action recognition has achieved remarkable progress recently. Existing unsupervised learning methods suffer from severe overfitting problem, and thus small networks are used, significantly reducing the representation capability. To address this problem, the overfitting mechanism behind the unsupervised learning for skeleton based action recognition is first investigated. It is observed that the skeleton is already a relatively high-level and low-dimension feature, but not in the same manifold as the features for action recognition. Simply applying the existing unsupervised learning method may tend to produce features that discriminate the different samples instead of action classes, resulting in the overfitting problem. To solve this problem, this paper presents an Unsupervised spatial-temporal Feature Enrichment and Fidelity Preservation framework (U-FEFP) to generate rich distributed features that contain all the information of the skeleton sequence. A spatial-temporal feature transformation subnetwork is developed using spatial-temporal graph convolutional network and graph convolutional gate recurrent unit network as the basic feature extraction network. The unsupervised Bootstrap Your Own Latent based learning is used to generate rich distributed features and the unsupervised pretext task based learning is used to preserve the information of the skeleton sequence. The two unsupervised learning ways are collaborated as U-FEFP to produce robust and discriminative representations. Experimental results on three widely used benchmarks, namely NTU-RGB+D-60, NTU-RGB+D-120 and PKU-MMD dataset, demonstrate that the proposed U-FEFP achieves the best performance compared with the state-of-the-art unsupervised learning methods. t-SNE illustrations further validate that U-FEFP can learn more discriminative features for unsupervised skeleton based action recognition.

Via

Access Paper or Ask Questions

Discovering Low-rank Subspaces for Language-agnostic Multilingual Representations

Jan 11, 2024

Zhihui Xie, Handong Zhao, Tong Yu, Shuai Li

Figure 1 for Discovering Low-rank Subspaces for Language-agnostic Multilingual Representations

Figure 2 for Discovering Low-rank Subspaces for Language-agnostic Multilingual Representations

Figure 3 for Discovering Low-rank Subspaces for Language-agnostic Multilingual Representations

Figure 4 for Discovering Low-rank Subspaces for Language-agnostic Multilingual Representations

Abstract:Large pretrained multilingual language models (ML-LMs) have shown remarkable capabilities of zero-shot cross-lingual transfer, without direct cross-lingual supervision. While these results are promising, follow-up works found that, within the multilingual embedding spaces, there exists strong language identity information which hinders the expression of linguistic factors shared across languages. For semantic tasks like cross-lingual sentence retrieval, it is desired to remove such language identity signals to fully leverage semantic information. In this work, we provide a novel view of projecting away language-specific factors from a multilingual embedding space. Specifically, we discover that there exists a low-rank subspace that primarily encodes information irrelevant to semantics (e.g., syntactic information). To identify this subspace, we present a simple but effective unsupervised method based on singular value decomposition with multiple monolingual corpora as input. Once the subspace is found, we can directly project the original embeddings into the null space to boost language agnosticism without finetuning. We systematically evaluate our method on various tasks including the challenging language-agnostic QA retrieval task. Empirical results show that applying our method consistently leads to improvements over commonly used ML-LMs.

* 17 pages, 7 figures, EMNLP 2022 (main conference)

Via

Access Paper or Ask Questions

Understanding Representation Learnability of Nonlinear Self-Supervised Learning

Jan 06, 2024

Ruofeng Yang, Xiangyuan Li, Bo Jiang, Shuai Li

Figure 1 for Understanding Representation Learnability of Nonlinear Self-Supervised Learning

Figure 2 for Understanding Representation Learnability of Nonlinear Self-Supervised Learning

Figure 3 for Understanding Representation Learnability of Nonlinear Self-Supervised Learning

Figure 4 for Understanding Representation Learnability of Nonlinear Self-Supervised Learning

Abstract:Self-supervised learning (SSL) has empirically shown its data representation learnability in many downstream tasks. There are only a few theoretical works on data representation learnability, and many of those focus on final data representation, treating the nonlinear neural network as a ``black box". However, the accurate learning results of neural networks are crucial for describing the data distribution features learned by SSL models. Our paper is the first to analyze the learning results of the nonlinear SSL model accurately. We consider a toy data distribution that contains two features: the label-related feature and the hidden feature. Unlike previous linear setting work that depends on closed-form solutions, we use the gradient descent algorithm to train a 1-layer nonlinear SSL model with a certain initialization region and prove that the model converges to a local minimum. Furthermore, different from the complex iterative analysis, we propose a new analysis process which uses the exact version of Inverse Function Theorem to accurately describe the features learned by the local minimum. With this local minimum, we prove that the nonlinear SSL model can capture the label-related feature and hidden feature at the same time. In contrast, the nonlinear supervised learning (SL) model can only learn the label-related feature. We also present the learning processes and results of the nonlinear SSL and SL model via simulation experiments.

Via

Access Paper or Ask Questions