Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Sheng Xu

Iterative Optimization Annotation Pipeline and ALSS-YOLO-Seg for Efficient Banana Plantation Segmentation in UAV Imagery

Oct 09, 2024

Ang He, Ximei Wu, Xing Xu, Jing Chen, Xiaobin Guo, Sheng Xu

Abstract:Precise segmentation of Unmanned Aerial Vehicle (UAV)-captured images plays a vital role in tasks such as crop yield estimation and plant health assessment in banana plantations. By identifying and classifying planted areas, crop area can be calculated, which is indispensable for accurate yield predictions. However, segmenting banana plantation scenes requires a substantial amount of annotated data, and manual labeling of these images is both time-consuming and labor-intensive, limiting the development of large-scale datasets. Furthermore, challenges such as changing target sizes, complex ground backgrounds, limited computational resources, and correct identification of crop categories make segmentation even more difficult. To address these issues, we proposed a comprehensive solution. Firstly, we designed an iterative optimization annotation pipeline leveraging SAM2's zero-shot capabilities to generate high-quality segmentation annotations, thereby reducing the cost and time associated with data annotation significantly. Secondly, we developed ALSS-YOLO-Seg, an efficient lightweight segmentation model optimized for UAV imagery. The model's backbone includes an Adaptive Lightweight Channel Splitting and Shuffling (ALSS) module to improve information exchange between channels and optimize feature extraction, aiding accurate crop identification. Additionally, a Multi-Scale Channel Attention (MSCA) module combines multi-scale feature extraction with channel attention to tackle challenges of varying target sizes and complex ground backgrounds.

Via

Access Paper or Ask Questions

A Survey of Inverse Constrained Reinforcement Learning: Definitions, Progress and Challenges

Sep 11, 2024

Guiliang Liu, Sheng Xu, Shicheng Liu, Ashish Gaurav, Sriram Ganapathi Subramanian, Pascal Poupart

Figure 1 for A Survey of Inverse Constrained Reinforcement Learning: Definitions, Progress and Challenges

Figure 2 for A Survey of Inverse Constrained Reinforcement Learning: Definitions, Progress and Challenges

Figure 3 for A Survey of Inverse Constrained Reinforcement Learning: Definitions, Progress and Challenges

Figure 4 for A Survey of Inverse Constrained Reinforcement Learning: Definitions, Progress and Challenges

Abstract:Inverse Constrained Reinforcement Learning (ICRL) is the task of inferring the implicit constraints followed by expert agents from their demonstration data. As an emerging research topic, ICRL has received considerable attention in recent years. This article presents a categorical survey of the latest advances in ICRL. It serves as a comprehensive reference for machine learning researchers and practitioners, as well as starters seeking to comprehend the definitions, advancements, and important challenges in ICRL. We begin by formally defining the problem and outlining the algorithmic framework that facilitates constraint inference across various scenarios. These include deterministic or stochastic environments, environments with limited demonstrations, and multiple agents. For each context, we illustrate the critical challenges and introduce a series of fundamental methods to tackle these issues. This survey encompasses discrete, virtual, and realistic environments for evaluating ICRL agents. We also delve into the most pertinent applications of ICRL, such as autonomous driving, robot control, and sports analytics. To stimulate continuing research, we conclude the survey with a discussion of key unresolved questions in ICRL that can effectively foster a bridge between theoretical understanding and practical industrial applications.

* 28 pages

Via

Access Paper or Ask Questions

ALSS-YOLO: An Adaptive Lightweight Channel Split and Shuffling Network for TIR Wildlife Detection in UAV Imagery

Sep 10, 2024

Ang He, Xiaobo Li, Ximei Wu, Chengyue Su, Jing Chen, Sheng Xu, Xiaobin Guo

Figure 1 for ALSS-YOLO: An Adaptive Lightweight Channel Split and Shuffling Network for TIR Wildlife Detection in UAV Imagery

Figure 2 for ALSS-YOLO: An Adaptive Lightweight Channel Split and Shuffling Network for TIR Wildlife Detection in UAV Imagery

Figure 3 for ALSS-YOLO: An Adaptive Lightweight Channel Split and Shuffling Network for TIR Wildlife Detection in UAV Imagery

Figure 4 for ALSS-YOLO: An Adaptive Lightweight Channel Split and Shuffling Network for TIR Wildlife Detection in UAV Imagery

Abstract:Unmanned aerial vehicles (UAVs) equipped with thermal infrared (TIR) cameras play a crucial role in combating nocturnal wildlife poaching. However, TIR images often face challenges such as jitter, and wildlife overlap, necessitating UAVs to possess the capability to identify blurred and overlapping small targets. Current traditional lightweight networks deployed on UAVs struggle to extract features from blurry small targets. To address this issue, we developed ALSS-YOLO, an efficient and lightweight detector optimized for TIR aerial images. Firstly, we propose a novel Adaptive Lightweight Channel Split and Shuffling (ALSS) module. This module employs an adaptive channel split strategy to optimize feature extraction and integrates a channel shuffling mechanism to enhance information exchange between channels. This improves the extraction of blurry features, crucial for handling jitter-induced blur and overlapping targets. Secondly, we developed a Lightweight Coordinate Attention (LCA) module that employs adaptive pooling and grouped convolution to integrate feature information across dimensions. This module ensures lightweight operation while maintaining high detection precision and robustness against jitter and target overlap. Additionally, we developed a single-channel focus module to aggregate the width and height information of each channel into four-dimensional channel fusion, which improves the feature representation efficiency of infrared images. Finally, we modify the localization loss function to emphasize the loss value associated with small objects to improve localization accuracy. Extensive experiments on the BIRDSAI and ISOD TIR UAV wildlife datasets show that ALSS-YOLO achieves state-of-the-art performance, Our code is openly available at https://github.com/helloworlder8/computer_vision.

Via

Access Paper or Ask Questions

Explicit Second-order LiDAR Bundle Adjustment Algorithm Using Mean Squared Group Metric

Sep 03, 2024

Tingchen Ma, Yongsheng Ou, Sheng Xu

Figure 1 for Explicit Second-order LiDAR Bundle Adjustment Algorithm Using Mean Squared Group Metric

Figure 2 for Explicit Second-order LiDAR Bundle Adjustment Algorithm Using Mean Squared Group Metric

Figure 3 for Explicit Second-order LiDAR Bundle Adjustment Algorithm Using Mean Squared Group Metric

Figure 4 for Explicit Second-order LiDAR Bundle Adjustment Algorithm Using Mean Squared Group Metric

Abstract:The Bundle Adjustment (BA) algorithm is a widely used nonlinear optimization technique in the backend of Simultaneous Localization and Mapping (SLAM) systems. By leveraging the co-view relationships of landmarks from multiple perspectives, it constructs a joint estimation model for both poses and landmarks, enabling the system to generate refined maps and reduce front-end localization errors. However, applying BA to LiDAR data presents unique challenges due to the large volume of 3D points typically present in point clouds, making robust and accurate model solving more complex. In this work, we propose a novel mean square group metric (MSGM). This metric applies mean square transformation to uniformly process the measurement of plane landmarks from a single perspective. The transformed metric ensures scale interpretability while avoiding the time-consuming point-by-point calculations. By integrating a robust kernel function, the metrics involved in the BA model are reweighted, enhancing the robustness of the solution process. On the basis of the proposed robust LiDAR BA model, we derived an explicit second-order estimator (RSO-BA). This estimator employs analytical formulas for Hessian and gradient calculations, ensuring the precision of the BA solution. We evaluated the proposed RSO-BA estimator against existing implicit second-order and explicit approximate second-order estimators using the publicly available datasets. The experimental results demonstrate that the RSO-BA estimator outperforms its counterparts regarding registration accuracy and robustness, particularly in large-scale or complex unstructured environments.

Via

Access Paper or Ask Questions

SPL: A Socratic Playground for Learning Powered by Large Language Mode

Jun 20, 2024

Liang Zhang, Jionghao Lin, Ziyi Kuang, Sheng Xu, Mohammed Yeasin, Xiangen Hu

Figure 1 for SPL: A Socratic Playground for Learning Powered by Large Language Mode

Figure 2 for SPL: A Socratic Playground for Learning Powered by Large Language Mode

Figure 3 for SPL: A Socratic Playground for Learning Powered by Large Language Mode

Figure 4 for SPL: A Socratic Playground for Learning Powered by Large Language Mode

Abstract:Dialogue-based Intelligent Tutoring Systems (ITSs) have significantly advanced adaptive and personalized learning by automating sophisticated human tutoring strategies within interactive dialogues. However, replicating the nuanced patterns of expert human communication remains a challenge in Natural Language Processing (NLP). Recent advancements in NLP, particularly Large Language Models (LLMs) such as OpenAI's GPT-4, offer promising solutions by providing human-like and context-aware responses based on extensive pre-trained knowledge. Motivated by the effectiveness of LLMs in various educational tasks (e.g., content creation and summarization, problem-solving, and automated feedback provision), our study introduces the Socratic Playground for Learning (SPL), a dialogue-based ITS powered by the GPT-4 model, which employs the Socratic teaching method to foster critical thinking among learners. Through extensive prompt engineering, SPL can generate specific learning scenarios and facilitates efficient multi-turn tutoring dialogues. The SPL system aims to enhance personalized and adaptive learning experiences tailored to individual needs, specifically focusing on improving critical thinking skills. Our pilot experimental results from essay writing tasks demonstrate SPL has the potential to improve tutoring interactions and further enhance dialogue-based ITS functionalities. Our study, exemplified by SPL, demonstrates how LLMs enhance dialogue-based ITSs and expand the accessibility and efficacy of educational technologies.

Via

Access Paper or Ask Questions

BEACON: Benchmark for Comprehensive RNA Tasks and Language Models

Jun 14, 2024

Yuchen Ren, Zhiyuan Chen, Lifeng Qiao, Hongtai Jing, Yuchen Cai, Sheng Xu, Peng Ye, Xinzhu Ma, Siqi Sun, Hongliang Yan(+3 more)

Abstract:RNA plays a pivotal role in translating genetic instructions into functional outcomes, underscoring its importance in biological processes and disease mechanisms. Despite the emergence of numerous deep learning approaches for RNA, particularly universal RNA language models, there remains a significant lack of standardized benchmarks to assess the effectiveness of these methods. In this study, we introduce the first comprehensive RNA benchmark BEACON (\textbf{BE}nchm\textbf{A}rk for \textbf{CO}mprehensive R\textbf{N}A Task and Language Models). First, BEACON comprises 13 distinct tasks derived from extensive previous work covering structural analysis, functional studies, and engineering applications, enabling a comprehensive assessment of the performance of methods on various RNA understanding tasks. Second, we examine a range of models, including traditional approaches like CNNs, as well as advanced RNA foundation models based on language models, offering valuable insights into the task-specific performances of these models. Third, we investigate the vital RNA language model components from the tokenizer and positional encoding aspects. Notably, our findings emphasize the superiority of single nucleotide tokenization and the effectiveness of Attention with Linear Biases (ALiBi) over traditional positional encoding methods. Based on these insights, a simple yet strong baseline called BEACON-B is proposed, which can achieve outstanding performance with limited data and computational resources. The datasets and source code of our benchmark are available at https://github.com/terry-r123/RNABenchmark.

Via

Access Paper or Ask Questions

pFedLVM: A Large Vision Model -Driven and Latent Feature-Based Personalized Federated Learning Framework in Autonomous Driving

May 07, 2024

Wei-Bin Kou, Qingfeng Lin, Ming Tang, Sheng Xu, Rongguang Ye, Yang Leng, Shuai Wang, Zhenyu Chen, Guangxu Zhu, Yik-Chung Wu

Figure 1 for pFedLVM: A Large Vision Model -Driven and Latent Feature-Based Personalized Federated Learning Framework in Autonomous Driving

Figure 2 for pFedLVM: A Large Vision Model -Driven and Latent Feature-Based Personalized Federated Learning Framework in Autonomous Driving

Figure 3 for pFedLVM: A Large Vision Model -Driven and Latent Feature-Based Personalized Federated Learning Framework in Autonomous Driving

Figure 4 for pFedLVM: A Large Vision Model -Driven and Latent Feature-Based Personalized Federated Learning Framework in Autonomous Driving

Abstract:Deep learning-based Autonomous Driving (AD) models often exhibit poor generalization due to data heterogeneity in an ever domain-shifting environment. While Federated Learning (FL) could improve the generalization of an AD model (known as FedAD system), conventional models often struggle with under-fitting as the amount of accumulated training data progressively increases. To address this issue, instead of conventional small models, employing Large Vision Models (LVMs) in FedAD is a viable option for better learning of representations from a vast volume of data. However, implementing LVMs in FedAD introduces three challenges: (I) the extremely high communication overheads associated with transmitting LVMs between participating vehicles and a central server; (II) lack of computing resource to deploy LVMs on each vehicle; (III) the performance drop due to LVM focusing on shared features but overlooking local vehicle characteristics. To overcome these challenges, we propose pFedLVM, a LVM-Driven, Latent Feature-Based Personalized Federated Learning framework. In this approach, the LVM is deployed only on central server, which effectively alleviates the computational burden on individual vehicles. Furthermore, the exchange between central server and vehicles are the learned features rather than the LVM parameters, which significantly reduces communication overhead. In addition, we utilize both shared features from all participating vehicles and individual characteristics from each vehicle to establish a personalized learning mechanism. This enables each vehicle's model to learn features from others while preserving its personalized characteristics, thereby outperforming globally shared models trained in general FL. Extensive experiments demonstrate that pFedLVM outperforms the existing state-of-the-art approaches.

* This paper was submitted to IEEE Transactions on Mobile Computing (TMC) on Apr. 6th, 2024

Via

Access Paper or Ask Questions

RSBuilding: Towards General Remote Sensing Image Building Extraction and Change Detection with Foundation Model

Mar 12, 2024

Mingze Wang, Keyan Chen, Lili Su, Cilin Yan, Sheng Xu, Haotian Zhang, Pengcheng Yuan, Xiaolong Jiang, Baochang Zhang

Figure 1 for RSBuilding: Towards General Remote Sensing Image Building Extraction and Change Detection with Foundation Model

Figure 2 for RSBuilding: Towards General Remote Sensing Image Building Extraction and Change Detection with Foundation Model

Figure 3 for RSBuilding: Towards General Remote Sensing Image Building Extraction and Change Detection with Foundation Model

Figure 4 for RSBuilding: Towards General Remote Sensing Image Building Extraction and Change Detection with Foundation Model

Abstract:The intelligent interpretation of buildings plays a significant role in urban planning and management, macroeconomic analysis, population dynamics, etc. Remote sensing image building interpretation primarily encompasses building extraction and change detection. However, current methodologies often treat these two tasks as separate entities, thereby failing to leverage shared knowledge. Moreover, the complexity and diversity of remote sensing image scenes pose additional challenges, as most algorithms are designed to model individual small datasets, thus lacking cross-scene generalization. In this paper, we propose a comprehensive remote sensing image building understanding model, termed RSBuilding, developed from the perspective of the foundation model. RSBuilding is designed to enhance cross-scene generalization and task universality. Specifically, we extract image features based on the prior knowledge of the foundation model and devise a multi-level feature sampler to augment scale information. To unify task representation and integrate image spatiotemporal clues, we introduce a cross-attention decoder with task prompts. Addressing the current shortage of datasets that incorporate annotations for both tasks, we have developed a federated training strategy to facilitate smooth model convergence even when supervision for some tasks is missing, thereby bolstering the complementarity of different tasks. Our model was trained on a dataset comprising up to 245,000 images and validated on multiple building extraction and change detection datasets. The experimental results substantiate that RSBuilding can concurrently handle two structurally distinct tasks and exhibits robust zero-shot generalization capabilities.

Via

Access Paper or Ask Questions

ContraNovo: A Contrastive Learning Approach to Enhance De Novo Peptide Sequencing

Dec 18, 2023

Zhi Jin, Sheng Xu, Xiang Zhang, Tianze Ling, Nanqing Dong, Wanli Ouyang, Zhiqiang Gao, Cheng Chang, Siqi Sun

Figure 1 for ContraNovo: A Contrastive Learning Approach to Enhance De Novo Peptide Sequencing

Figure 2 for ContraNovo: A Contrastive Learning Approach to Enhance De Novo Peptide Sequencing

Figure 3 for ContraNovo: A Contrastive Learning Approach to Enhance De Novo Peptide Sequencing

Figure 4 for ContraNovo: A Contrastive Learning Approach to Enhance De Novo Peptide Sequencing

Abstract:De novo peptide sequencing from mass spectrometry (MS) data is a critical task in proteomics research. Traditional de novo algorithms have encountered a bottleneck in accuracy due to the inherent complexity of proteomics data. While deep learning-based methods have shown progress, they reduce the problem to a translation task, potentially overlooking critical nuances between spectra and peptides. In our research, we present ContraNovo, a pioneering algorithm that leverages contrastive learning to extract the relationship between spectra and peptides and incorporates the mass information into peptide decoding, aiming to address these intricacies more efficiently. Through rigorous evaluations on two benchmark datasets, ContraNovo consistently outshines contemporary state-of-the-art solutions, underscoring its promising potential in enhancing de novo peptide sequencing. The source code is available at https://github.com/BEAM-Labs/ContraNovo.

* This paper has been accepted by AAAI 2024

Via

Access Paper or Ask Questions

CorefPrompt: Prompt-based Event Coreference Resolution by Measuring Event Type and Argument Compatibilities

Oct 24, 2023

Sheng Xu, Peifeng Li, Qiaoming Zhu

Figure 1 for CorefPrompt: Prompt-based Event Coreference Resolution by Measuring Event Type and Argument Compatibilities

Figure 2 for CorefPrompt: Prompt-based Event Coreference Resolution by Measuring Event Type and Argument Compatibilities

Figure 3 for CorefPrompt: Prompt-based Event Coreference Resolution by Measuring Event Type and Argument Compatibilities

Figure 4 for CorefPrompt: Prompt-based Event Coreference Resolution by Measuring Event Type and Argument Compatibilities

Abstract:Event coreference resolution (ECR) aims to group event mentions referring to the same real-world event into clusters. Most previous studies adopt the "encoding first, then scoring" framework, making the coreference judgment rely on event encoding. Furthermore, current methods struggle to leverage human-summarized ECR rules, e.g., coreferential events should have the same event type, to guide the model. To address these two issues, we propose a prompt-based approach, CorefPrompt, to transform ECR into a cloze-style MLM (masked language model) task. This allows for simultaneous event modeling and coreference discrimination within a single template, with a fully shared context. In addition, we introduce two auxiliary prompt tasks, event-type compatibility and argument compatibility, to explicitly demonstrate the reasoning process of ECR, which helps the model make final predictions. Experimental results show that our method CorefPrompt performs well in a state-of-the-art (SOTA) benchmark.

* Accepted by EMNLP2023

Via

Access Paper or Ask Questions