Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Chang Liu

Department of Mechanical and Aerospace Engineering, Case Western Reserve University, Cleveland, OH 44106, USA

SwarmCVT: Centroidal Voronoi Tessellation-Based Path Planning for Very-Large-Scale Robotics

Oct 03, 2024

James Gao, Jacob Lee, Yuting Zhou, Yunze Hu, Chang Liu, Pingping Zhu

Figure 1 for SwarmCVT: Centroidal Voronoi Tessellation-Based Path Planning for Very-Large-Scale Robotics

Figure 2 for SwarmCVT: Centroidal Voronoi Tessellation-Based Path Planning for Very-Large-Scale Robotics

Figure 3 for SwarmCVT: Centroidal Voronoi Tessellation-Based Path Planning for Very-Large-Scale Robotics

Figure 4 for SwarmCVT: Centroidal Voronoi Tessellation-Based Path Planning for Very-Large-Scale Robotics

Abstract:Swarm robotics, or very large-scale robotics (VLSR), has many meaningful applications for complicated tasks. However, the complexity of motion control and energy costs stack up quickly as the number of robots increases. In addressing this problem, our previous studies have formulated various methods employing macroscopic and microscopic approaches. These methods enable microscopic robots to adhere to a reference Gaussian mixture model (GMM) distribution observed at the macroscopic scale. As a result, optimizing the macroscopic level will result in an optimal overall result. However, all these methods require systematic and global generation of Gaussian components (GCs) within obstacle-free areas to construct the GMM trajectories. This work utilizes centroidal Voronoi tessellation to generate GCs methodically. Consequently, it demonstrates performance improvement while also ensuring consistency and reliability.

* Submitted to American Control Conference (ACC) 2025

Via

Access Paper or Ask Questions

Resource Allocation for Stable LLM Training in Mobile Edge Computing

Sep 30, 2024

Chang Liu, Jun Zhao

Figure 1 for Resource Allocation for Stable LLM Training in Mobile Edge Computing

Figure 2 for Resource Allocation for Stable LLM Training in Mobile Edge Computing

Figure 3 for Resource Allocation for Stable LLM Training in Mobile Edge Computing

Figure 4 for Resource Allocation for Stable LLM Training in Mobile Edge Computing

Abstract:As mobile devices increasingly become focal points for advanced applications, edge computing presents a viable solution to their inherent computational limitations, particularly in deploying large language models (LLMs). However, despite the advancements in edge computing, significant challenges remain in efficient training and deploying LLMs due to the computational demands and data privacy concerns associated with these models. This paper explores a collaborative training framework that integrates mobile users with edge servers to optimize resource allocation, thereby enhancing both performance and efficiency. Our approach leverages parameter-efficient fine-tuning (PEFT) methods, allowing mobile users to adjust the initial layers of the LLM while edge servers handle the more demanding latter layers. Specifically, we formulate a multi-objective optimization problem to minimize the total energy consumption and delay during training. We also address the common issue of instability in model performance by incorporating stability enhancements into our objective function. Through novel fractional programming technique, we achieve a stationary point for the formulated problem. Simulations demonstrate that our method reduces the energy consumption as well as the latency, and increases the reliability of LLMs across various mobile settings.

* This paper appears in the 2024 International Symposium on Theory, Algorithmic Foundations, and Protocol Design for Mobile Networks and Mobile Computing (MobiHoc)

Via

Access Paper or Ask Questions

Haptic-ACT: Bridging Human Intuition with Compliant Robotic Manipulation via Immersive VR

Sep 18, 2024

Kelin Li, Shubham M Wagh, Nitish Sharma, Saksham Bhadani, Wei Chen, Chang Liu, Petar Kormushev

Figure 1 for Haptic-ACT: Bridging Human Intuition with Compliant Robotic Manipulation via Immersive VR

Figure 2 for Haptic-ACT: Bridging Human Intuition with Compliant Robotic Manipulation via Immersive VR

Figure 3 for Haptic-ACT: Bridging Human Intuition with Compliant Robotic Manipulation via Immersive VR

Figure 4 for Haptic-ACT: Bridging Human Intuition with Compliant Robotic Manipulation via Immersive VR

Abstract:Robotic manipulation is essential for the widespread adoption of robots in industrial and home settings and has long been a focus within the robotics community. Advances in artificial intelligence have introduced promising learning-based methods to address this challenge, with imitation learning emerging as particularly effective. However, efficiently acquiring high-quality demonstrations remains a challenge. In this work, we introduce an immersive VR-based teleoperation setup designed to collect demonstrations from a remote human user. We also propose an imitation learning framework called Haptic Action Chunking with Transformers (Haptic-ACT). To evaluate the platform, we conducted a pick-and-place task and collected 50 demonstration episodes. Results indicate that the immersive VR platform significantly reduces demonstrator fingertip forces compared to systems without haptic feedback, enabling more delicate manipulation. Additionally, evaluations of the Haptic-ACT framework in both the MuJoCo simulator and on a real robot demonstrate its effectiveness in teaching robots more compliant manipulation compared to the original ACT. Additional materials are available at https://sites.google.com/view/hapticact.

* This work is under review by ICRA 2025

Via

Access Paper or Ask Questions

Self-Supervised Elimination of Non-Independent Noise in Hyperspectral Imaging

Sep 16, 2024

Guangrui Ding, Chang Liu, Jiaze Yin, Xinyan Teng, Yuying Tan, Hongjian He, Haonan Lin, Lei Tian, Ji-Xin Cheng

Figure 1 for Self-Supervised Elimination of Non-Independent Noise in Hyperspectral Imaging

Figure 2 for Self-Supervised Elimination of Non-Independent Noise in Hyperspectral Imaging

Figure 3 for Self-Supervised Elimination of Non-Independent Noise in Hyperspectral Imaging

Figure 4 for Self-Supervised Elimination of Non-Independent Noise in Hyperspectral Imaging

Abstract:Hyperspectral imaging has been widely used for spectral and spatial identification of target molecules, yet often contaminated by sophisticated noise. Current denoising methods generally rely on independent and identically distributed noise statistics, showing corrupted performance for non-independent noise removal. Here, we demonstrate Self-supervised PErmutation Noise2noise Denoising (SPEND), a deep learning denoising architecture tailor-made for removing non-independent noise from a single hyperspectral image stack. We utilize hyperspectral stimulated Raman scattering and mid-infrared photothermal microscopy as the testbeds, where the noise is spatially correlated and spectrally varied. Based on single hyperspectral images, SPEND permutates odd and even spectral frames to generate two stacks with identical noise properties, and uses the pairs for efficient self-supervised noise-to-noise training. SPEND achieved an 8-fold signal-to-noise improvement without having access to the ground truth data. SPEND enabled accurate mapping of low concentration biomolecules in both fingerprint and silent regions, demonstrating its robustness in sophisticated cellular environments.

Via

Access Paper or Ask Questions

LSVOS Challenge Report: Large-scale Complex and Long Video Object Segmentation

Sep 09, 2024

Henghui Ding, Lingyi Hong, Chang Liu, Ning Xu, Linjie Yang, Yuchen Fan, Deshui Miao, Yameng Gu, Xin Li, Zhenyu He(+23 more)

Figure 1 for LSVOS Challenge Report: Large-scale Complex and Long Video Object Segmentation

Figure 2 for LSVOS Challenge Report: Large-scale Complex and Long Video Object Segmentation

Figure 3 for LSVOS Challenge Report: Large-scale Complex and Long Video Object Segmentation

Figure 4 for LSVOS Challenge Report: Large-scale Complex and Long Video Object Segmentation

Abstract:Despite the promising performance of current video segmentation models on existing benchmarks, these models still struggle with complex scenes. In this paper, we introduce the 6th Large-scale Video Object Segmentation (LSVOS) challenge in conjunction with ECCV 2024 workshop. This year's challenge includes two tasks: Video Object Segmentation (VOS) and Referring Video Object Segmentation (RVOS). In this year, we replace the classic YouTube-VOS and YouTube-RVOS benchmark with latest datasets MOSE, LVOS, and MeViS to assess VOS under more challenging complex environments. This year's challenge attracted 129 registered teams from more than 20 institutes across over 8 countries. This report include the challenge and dataset introduction, and the methods used by top 7 teams in two tracks. More details can be found in our homepage https://lsvos.github.io/.

* ECCV 2024 LSVOS Challenge Report: https://lsvos.github.io/

Via

Access Paper or Ask Questions

Decoupled and Interactive Regression Modeling for High-performance One-stage 3D Object Detection

Sep 01, 2024

Weiping Xiao, Yiqiang Wu, Chang Liu, Yu Qin, Xiaomao Li, Liming Xin

Figure 1 for Decoupled and Interactive Regression Modeling for High-performance One-stage 3D Object Detection

Figure 2 for Decoupled and Interactive Regression Modeling for High-performance One-stage 3D Object Detection

Figure 3 for Decoupled and Interactive Regression Modeling for High-performance One-stage 3D Object Detection

Figure 4 for Decoupled and Interactive Regression Modeling for High-performance One-stage 3D Object Detection

Abstract:Inadequate bounding box modeling in regression tasks constrains the performance of one-stage 3D object detection. Our study reveals that the primary reason lies in two aspects: (1) The limited center-offset prediction seriously impairs the bounding box localization since many highest response positions significantly deviate from object centers. (2) The low-quality sample ignored in regression tasks significantly impacts the bounding box prediction since it produces unreliable quality (IoU) rectification. To tackle these problems, we propose Decoupled and Interactive Regression Modeling (DIRM) for one-stage detection. Specifically, Decoupled Attribute Regression (DAR) is implemented to facilitate long regression range modeling for the center attribute through an adaptive multi-sample assignment strategy that deeply decouples bounding box attributes. On the other hand, to enhance the reliability of IoU predictions for low-quality results, Interactive Quality Prediction (IQP) integrates the classification task, proficient in modeling negative samples, with quality prediction for joint optimization. Extensive experiments on Waymo and ONCE datasets demonstrate that DIRM significantly improves the performance of several state-of-the-art methods with minimal additional inference latency. Notably, DIRM achieves state-of-the-art detection performance on both the Waymo and ONCE datasets.

Via

Access Paper or Ask Questions

Toward Robust Early Detection of Alzheimer's Disease via an Integrated Multimodal Learning Approach

Aug 29, 2024

Yifei Chen, Shenghao Zhu, Zhaojie Fang, Chang Liu, Binfeng Zou, Yuhe Wang, Shuo Chang, Fan Jia, Feiwei Qin, Jin Fan(+2 more)

Figure 1 for Toward Robust Early Detection of Alzheimer's Disease via an Integrated Multimodal Learning Approach

Figure 2 for Toward Robust Early Detection of Alzheimer's Disease via an Integrated Multimodal Learning Approach

Figure 3 for Toward Robust Early Detection of Alzheimer's Disease via an Integrated Multimodal Learning Approach

Abstract:Alzheimer's Disease (AD) is a complex neurodegenerative disorder marked by memory loss, executive dysfunction, and personality changes. Early diagnosis is challenging due to subtle symptoms and varied presentations, often leading to misdiagnosis with traditional unimodal diagnostic methods due to their limited scope. This study introduces an advanced multimodal classification model that integrates clinical, cognitive, neuroimaging, and EEG data to enhance diagnostic accuracy. The model incorporates a feature tagger with a tabular data coding architecture and utilizes the TimesBlock module to capture intricate temporal patterns in Electroencephalograms (EEG) data. By employing Cross-modal Attention Aggregation module, the model effectively fuses Magnetic Resonance Imaging (MRI) spatial information with EEG temporal data, significantly improving the distinction between AD, Mild Cognitive Impairment, and Normal Cognition. Simultaneously, we have constructed the first AD classification dataset that includes three modalities: EEG, MRI, and tabular data. Our innovative approach aims to facilitate early diagnosis and intervention, potentially slowing the progression of AD. The source code and our private ADMC dataset are available at https://github.com/JustlfC03/MSTNet.

* 5 pages, 2 figures

Via

Access Paper or Ask Questions

TDNetGen: Empowering Complex Network Resilience Prediction with Generative Augmentation of Topology and Dynamics

Aug 19, 2024

Chang Liu, Jingtao Ding, Yiwen Song, Yong Li

Figure 1 for TDNetGen: Empowering Complex Network Resilience Prediction with Generative Augmentation of Topology and Dynamics

Figure 2 for TDNetGen: Empowering Complex Network Resilience Prediction with Generative Augmentation of Topology and Dynamics

Figure 3 for TDNetGen: Empowering Complex Network Resilience Prediction with Generative Augmentation of Topology and Dynamics

Figure 4 for TDNetGen: Empowering Complex Network Resilience Prediction with Generative Augmentation of Topology and Dynamics

Abstract:Predicting the resilience of complex networks, which represents the ability to retain fundamental functionality amidst external perturbations or internal failures, plays a critical role in understanding and improving real-world complex systems. Traditional theoretical approaches grounded in nonlinear dynamical systems rely on prior knowledge of network dynamics. On the other hand, data-driven approaches frequently encounter the challenge of insufficient labeled data, a predicament commonly observed in real-world scenarios. In this paper, we introduce a novel resilience prediction framework for complex networks, designed to tackle this issue through generative data augmentation of network topology and dynamics. The core idea is the strategic utilization of the inherent joint distribution present in unlabeled network data, facilitating the learning process of the resilience predictor by illuminating the relationship between network topology and dynamics. Experiment results on three network datasets demonstrate that our proposed framework TDNetGen can achieve high prediction accuracy up to 85%-95%. Furthermore, the framework still demonstrates a pronounced augmentation capability in extreme low-data regimes, thereby underscoring its utility and robustness in enhancing the prediction of network resilience. We have open-sourced our code in the following link, https://github.com/tsinghua-fib-lab/TDNetGen.

Via

Access Paper or Ask Questions

MambaVT: Spatio-Temporal Contextual Modeling for robust RGB-T Tracking

Aug 15, 2024

Simiao Lai, Chang Liu, Jiawen Zhu, Ben Kang, Yang Liu, Dong Wang, Huchuan Lu

Figure 1 for MambaVT: Spatio-Temporal Contextual Modeling for robust RGB-T Tracking

Figure 2 for MambaVT: Spatio-Temporal Contextual Modeling for robust RGB-T Tracking

Figure 3 for MambaVT: Spatio-Temporal Contextual Modeling for robust RGB-T Tracking

Figure 4 for MambaVT: Spatio-Temporal Contextual Modeling for robust RGB-T Tracking

Abstract:Existing RGB-T tracking algorithms have made remarkable progress by leveraging the global interaction capability and extensive pre-trained models of the Transformer architecture. Nonetheless, these methods mainly adopt imagepair appearance matching and face challenges of the intrinsic high quadratic complexity of the attention mechanism, resulting in constrained exploitation of temporal information. Inspired by the recently emerged State Space Model Mamba, renowned for its impressive long sequence modeling capabilities and linear computational complexity, this work innovatively proposes a pure Mamba-based framework (MambaVT) to fully exploit spatio-temporal contextual modeling for robust visible-thermal tracking. Specifically, we devise the long-range cross-frame integration component to globally adapt to target appearance variations, and introduce short-term historical trajectory prompts to predict the subsequent target states based on local temporal location clues. Extensive experiments show the significant potential of vision Mamba for RGB-T tracking, with MambaVT achieving state-of-the-art performance on four mainstream benchmarks while requiring lower computational costs. We aim for this work to serve as a simple yet strong baseline, stimulating future research in this field. The code and pre-trained models will be made available.

Via

Access Paper or Ask Questions

Toward Pedestrian Head Tracking: A Benchmark Dataset and an Information Fusion Network

Aug 12, 2024

Kailai Sun, Xinwei Wang, Shaobo Liu, Qianchuan Zhao, Gao Huang, Chang Liu

Figure 1 for Toward Pedestrian Head Tracking: A Benchmark Dataset and an Information Fusion Network

Figure 2 for Toward Pedestrian Head Tracking: A Benchmark Dataset and an Information Fusion Network

Figure 3 for Toward Pedestrian Head Tracking: A Benchmark Dataset and an Information Fusion Network

Figure 4 for Toward Pedestrian Head Tracking: A Benchmark Dataset and an Information Fusion Network

Abstract:Pedestrian detection and tracking in crowded video sequences have a wide range of applications, including autonomous driving, robot navigation and pedestrian flow surveillance. However, detecting and tracking pedestrians in high-density crowds face many challenges, including intra-class occlusions, complex motions, and diverse poses. Although deep learning models have achieved remarkable progress in head detection, head tracking datasets and methods are extremely lacking. Existing head datasets have limited coverage of complex pedestrian flows and scenes (e.g., pedestrian interactions, occlusions, and object interference). It is of great importance to develop new head tracking datasets and methods. To address these challenges, we present a Chinese Large-scale Cross-scene Pedestrian Head Tracking dataset (Cchead) and a Multi-Source Information Fusion Network (MIFN). Our dataset has features that are of considerable interest, including 10 diverse scenes of 50,528 frames with over 2,366,249 heads and 2,358 tracks annotated. Our dataset contains diverse human moving speeds, directions, and complex crowd pedestrian flows with collision avoidance behaviors. We provide a comprehensive analysis and comparison with existing state-of-the-art (SOTA) algorithms. Moreover, our MIFN is the first end-to-end CNN-based head detection and tracking network that jointly trains RGB frames, pixel-level motion information (optical flow and frame difference maps), depth maps, and density maps in videos. Compared with SOTA pedestrian detection and tracking methods, MIFN achieves superior performance on our Cchead dataset. We believe our datasets and baseline will become valuable resources towards developing pedestrian tracking in dense crowds.

Via

Access Paper or Ask Questions