Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Yuchen Liu

Collection: Datasets from AFAR Challenge

May 11, 2025

Saad Masrur, Ozgur Ozdemir, Anil Gurses, Ismail Guvenc, Mihail L. Sichitiu, Rudra Dutta, Magreth Mushi, homas Zajkowski, Cole Dickerson, Gautham Reddy(+12 more)

Figure 1 for Collection: Datasets from AFAR Challenge

Figure 2 for Collection: Datasets from AFAR Challenge

Figure 3 for Collection: Datasets from AFAR Challenge

Figure 4 for Collection: Datasets from AFAR Challenge

Abstract:This paper presents a comprehensive real-world and Digital Twin (DT) dataset collected as part of the Find A Rover (AFAR) Challenge, organized by the NSF Aerial Experimentation and Research Platform for Advanced Wireless (AERPAW) testbed and hosted at the Lake Wheeler Field in Raleigh, North Carolina. The AFAR Challenge was a competition involving five finalist university teams, focused on promoting innovation in UAV-assisted radio frequency (RF) source localization. Participating teams were tasked with designing UAV flight trajectories and localization algorithms to detect the position of a hidden unmanned ground vehicle (UGV), also referred to as a rover, emitting wireless probe signals generated by GNU Radio. The competition was structured to evaluate solutions in a DT environment first, followed by deployment and testing in AERPAW's outdoor wireless testbed. For each team, the UGV was placed at three different positions, resulting in a total of 30 datasets, 15 collected in a DT simulation environment and 15 in a physical outdoor testbed. Each dataset contains time-synchronized measurements of received signal strength (RSS), received signal quality (RSQ), GPS coordinates, UAV velocity, and UAV orientation (roll, pitch, and yaw). Data is organized into structured folders by team, environment (DT and real-world), and UGV location. The dataset supports research in UAV-assisted RF source localization, air-to-ground (A2G) wireless propagation modeling, trajectory optimization, signal prediction, autonomous navigation, and DT validation. With approximately 300k time-synchronized samples collected from real-world experiments, the dataset provides a substantial foundation for training and evaluating deep learning (DL) models. Overall, the AFAR dataset serves as a valuable resource for advancing robust, real-world solutions in UAV-enabled wireless communications and sensing systems.

* Submitted to IEEE Data Descriptions

Via

Access Paper or Ask Questions

X-Fusion: Introducing New Modality to Frozen Large Language Models

Apr 29, 2025

Sicheng Mo, Thao Nguyen, Xun Huang, Siddharth Srinivasan Iyer, Yijun Li, Yuchen Liu, Abhishek Tandon, Eli Shechtman, Krishna Kumar Singh, Yong Jae Lee(+2 more)

Abstract:We propose X-Fusion, a framework that extends pretrained Large Language Models (LLMs) for multimodal tasks while preserving their language capabilities. X-Fusion employs a dual-tower design with modality-specific weights, keeping the LLM's parameters frozen while integrating vision-specific information for both understanding and generation. Our experiments demonstrate that X-Fusion consistently outperforms alternative architectures on both image-to-text and text-to-image tasks. We find that incorporating understanding-focused data improves generation quality, reducing image data noise enhances overall performance, and feature alignment accelerates convergence for smaller models but has minimal impact on larger ones. Our findings provide valuable insights into building efficient unified multimodal models.

* Project Page: https://sichengmo.github.io/XFusion/

Via

Access Paper or Ask Questions

ZipIR: Latent Pyramid Diffusion Transformer for High-Resolution Image Restoration

Apr 11, 2025

Yongsheng Yu, Haitian Zheng, Zhifei Zhang, Jianming Zhang, Yuqian Zhou, Connelly Barnes, Yuchen Liu, Wei Xiong, Zhe Lin, Jiebo Luo

Abstract:Recent progress in generative models has significantly improved image restoration capabilities, particularly through powerful diffusion models that offer remarkable recovery of semantic details and local fidelity. However, deploying these models at ultra-high resolutions faces a critical trade-off between quality and efficiency due to the computational demands of long-range attention mechanisms. To address this, we introduce ZipIR, a novel framework that enhances efficiency, scalability, and long-range modeling for high-res image restoration. ZipIR employs a highly compressed latent representation that compresses image 32x, effectively reducing the number of spatial tokens, and enabling the use of high-capacity models like the Diffusion Transformer (DiT). Toward this goal, we propose a Latent Pyramid VAE (LP-VAE) design that structures the latent space into sub-bands to ease diffusion training. Trained on full images up to 2K resolution, ZipIR surpasses existing diffusion-based methods, offering unmatched speed and quality in restoring high-resolution images from severely degraded inputs.

Via

Access Paper or Ask Questions

Context-Aware Human Behavior Prediction Using Multimodal Large Language Models: Challenges and Insights

Apr 01, 2025

Yuchen Liu, Lino Lerch, Luigi Palmieri, Andrey Rudenko, Sebastian Koch, Timo Ropinski, Marco Aiello

Abstract:Predicting human behavior in shared environments is crucial for safe and efficient human-robot interaction. Traditional data-driven methods to that end are pre-trained on domain-specific datasets, activity types, and prediction horizons. In contrast, the recent breakthroughs in Large Language Models (LLMs) promise open-ended cross-domain generalization to describe various human activities and make predictions in any context. In particular, Multimodal LLMs (MLLMs) are able to integrate information from various sources, achieving more contextual awareness and improved scene understanding. The difficulty in applying general-purpose MLLMs directly for prediction stems from their limited capacity for processing large input sequences, sensitivity to prompt design, and expensive fine-tuning. In this paper, we present a systematic analysis of applying pre-trained MLLMs for context-aware human behavior prediction. To this end, we introduce a modular multimodal human activity prediction framework that allows us to benchmark various MLLMs, input variations, In-Context Learning (ICL), and autoregressive techniques. Our evaluation indicates that the best-performing framework configuration is able to reach 92.8% semantic similarity and 66.1% exact label accuracy in predicting human behaviors in the target frame.

Via

Access Paper or Ask Questions

Rank-Based Modeling for Universal Packets Compression in Multi-Modal Communications

Mar 24, 2025

Xuanhao Luo, Zhiyuan Peng, Zhouyu Li, Ruozhou Yu, Yuchen Liu

Figure 1 for Rank-Based Modeling for Universal Packets Compression in Multi-Modal Communications

Figure 2 for Rank-Based Modeling for Universal Packets Compression in Multi-Modal Communications

Figure 3 for Rank-Based Modeling for Universal Packets Compression in Multi-Modal Communications

Figure 4 for Rank-Based Modeling for Universal Packets Compression in Multi-Modal Communications

Abstract:The rapid increase in networked systems and data transmission requires advanced data compression solutions to optimize bandwidth utilization and enhance network performance. This study introduces a novel byte-level predictive model using Transformer architecture, capable of handling the redundancy and diversity of data types in network traffic as byte sequences. Unlike traditional methods that require separate compressors for different data types, this unified approach sets new benchmarks and simplifies predictive modeling across various data modalities such as video, audio, images, and text, by processing them at the byte level. This is achieved by predicting subsequent byte probability distributions, encoding them into a sparse rank sequence using lossless entropy coding, and significantly reducing both data size and entropy. Experimental results show that our model achieves compression ratios below 50%, while offering models of various sizes tailored for different communication devices. Additionally, we successfully deploy these models on a range of edge devices and servers, demonstrating their practical applicability and effectiveness in real-world network scenarios. This approach significantly enhances data throughput and reduces bandwidth demands, making it particularly valuable in resource-constrained environments like the Internet of Things sensor networks.

* Accepted for publication in 26th IEEE International Symposium on a World of Wireless, Mobile and Multimedia Networks (WoWMoM)

Via

Access Paper or Ask Questions

Movable Cell-Free Massive MIMO For High-Speed Train Communications: A PPO-Based Antenna Position Optimization

Mar 16, 2025

Jie Dai, Yuchen Liu, Jiakang Zheng, Ruichen Zhang, Jiayi Zhang, Bo Ai

Figure 1 for Movable Cell-Free Massive MIMO For High-Speed Train Communications: A PPO-Based Antenna Position Optimization

Figure 2 for Movable Cell-Free Massive MIMO For High-Speed Train Communications: A PPO-Based Antenna Position Optimization

Figure 3 for Movable Cell-Free Massive MIMO For High-Speed Train Communications: A PPO-Based Antenna Position Optimization

Figure 4 for Movable Cell-Free Massive MIMO For High-Speed Train Communications: A PPO-Based Antenna Position Optimization

Abstract:In recent years, high-speed trains (HSTs) communications have developed rapidly to enhance the stability of train operations and improve passenger connectivity experiences. However, as the train continues to accelerate, urgent technological innovations are needed to overcome challenges such as frequency handover and significant Doppler effects. In this paper, we present a novel architecture featuring movable antennas (MAs) to fully exploit macro spatial diversity, enabling a cell-free (CF) massive multiple-input multiple-output (MIMO) system that supports high-speed train communications. Considering the high likelihood of line-of-sight (LoS) transmission in HST scenario, we derive the uplink spectral efficiency (SE) expression for the movable CF massive MIMO system. Moreover, an optimization problem is formulated to maximize the sum SE of the considered system by optimizing the positions of the antennas. Since the formulated problem is non-convex and highly non-linear, we improve a deep reinforcement learning algorithm to address it by using proximal policy optimization (PPO). Different from traditional optimization approaches, which optimize variables separately and alternately, our improved PPO-based approach optimizes all the variables in unison. Simulation results demonstrate that movable CF massive MIMO effectively suppresses the negative impact of the Doppler effect in HST communications.

Via

Access Paper or Ask Questions

Synergizing AI and Digital Twins for Next-Generation Network Optimization, Forecasting, and Security

Mar 08, 2025

Zifan Zhang, Minghong Fang, Dianwei Chen, Xianfeng Yang, Yuchen Liu

Figure 1 for Synergizing AI and Digital Twins for Next-Generation Network Optimization, Forecasting, and Security

Figure 2 for Synergizing AI and Digital Twins for Next-Generation Network Optimization, Forecasting, and Security

Figure 3 for Synergizing AI and Digital Twins for Next-Generation Network Optimization, Forecasting, and Security

Figure 4 for Synergizing AI and Digital Twins for Next-Generation Network Optimization, Forecasting, and Security

Abstract:Digital network twins (DNTs) are virtual representations of physical networks, designed to enable real-time monitoring, simulation, and optimization of network performance. When integrated with machine learning (ML) techniques, particularly federated learning (FL) and reinforcement learning (RL), DNTs emerge as powerful solutions for managing the complexities of network operations. This article presents a comprehensive analysis of the synergy of DNTs, FL, and RL techniques, showcasing their collective potential to address critical challenges in 6G networks. We highlight key technical challenges that need to be addressed, such as ensuring network reliability, achieving joint data-scenario forecasting, and maintaining security in high-risk environments. Additionally, we propose several pipelines that integrate DNT and ML within coherent frameworks to enhance network optimization and security. Case studies demonstrate the practical applications of our proposed pipelines in edge caching and vehicular networks. In edge caching, the pipeline achieves over 80% cache hit rates while balancing base station loads. In autonomous vehicular system, it ensure a 100% no-collision rate, showcasing its reliability in safety-critical scenarios. By exploring these synergies, we offer insights into the future of intelligent and adaptive network systems that automate decision-making and problem-solving.

* Accepted by IEEE Wireless Communications

Via

Access Paper or Ask Questions

SemiSAM+: Rethinking Semi-Supervised Medical Image Segmentation in the Era of Foundation Models

Feb 28, 2025

Yichi Zhang, Bohao Lv, Le Xue, Wenbo Zhang, Yuchen Liu, Yu Fu, Yuan Cheng, Yuan Qi

Figure 1 for SemiSAM+: Rethinking Semi-Supervised Medical Image Segmentation in the Era of Foundation Models

Figure 2 for SemiSAM+: Rethinking Semi-Supervised Medical Image Segmentation in the Era of Foundation Models

Figure 3 for SemiSAM+: Rethinking Semi-Supervised Medical Image Segmentation in the Era of Foundation Models

Figure 4 for SemiSAM+: Rethinking Semi-Supervised Medical Image Segmentation in the Era of Foundation Models

Abstract:Deep learning-based medical image segmentation typically requires large amount of labeled data for training, making it less applicable in clinical settings due to high annotation cost. Semi-supervised learning (SSL) has emerged as an appealing strategy due to its less dependence on acquiring abundant annotations from experts compared to fully supervised methods. Beyond existing model-centric advancements of SSL by designing novel regularization strategies, we anticipate a paradigmatic shift due to the emergence of promptable segmentation foundation models with universal segmentation capabilities using positional prompts represented by Segment Anything Model (SAM). In this paper, we present SemiSAM+, a foundation model-driven SSL framework to efficiently learn from limited labeled data for medical image segmentation. SemiSAM+ consists of one or multiple promptable foundation models as generalist models, and a trainable task-specific segmentation model as specialist model. For a given new segmentation task, the training is based on the specialist-generalist collaborative learning procedure, where the trainable specialist model delivers positional prompts to interact with the frozen generalist models to acquire pseudo-labels, and then the generalist model output provides the specialist model with informative and efficient supervision which benefits the automatic segmentation and prompt generation in turn. Extensive experiments on two public datasets and one in-house clinical dataset demonstrate that SemiSAM+ achieves significant performance improvement, especially under extremely limited annotation scenarios, and shows strong efficiency as a plug-and-play strategy that can be easily adapted to different specialist and generalist models.

Via

Access Paper or Ask Questions

SegAnyPET: Universal Promptable Segmentation from Positron Emission Tomography Images

Feb 20, 2025

Yichi Zhang, Le Xue, Wenbo Zhang, Lanlan Li, Yuchen Liu, Chen Jiang, Yuan Cheng, Yuan Qi

Figure 1 for SegAnyPET: Universal Promptable Segmentation from Positron Emission Tomography Images

Figure 2 for SegAnyPET: Universal Promptable Segmentation from Positron Emission Tomography Images

Figure 3 for SegAnyPET: Universal Promptable Segmentation from Positron Emission Tomography Images

Figure 4 for SegAnyPET: Universal Promptable Segmentation from Positron Emission Tomography Images

Abstract:Positron Emission Tomography (PET) imaging plays a crucial role in modern medical diagnostics by revealing the metabolic processes within a patient's body, which is essential for quantification of therapy response and monitoring treatment progress. However, the segmentation of PET images presents unique challenges due to their lower contrast and less distinct boundaries compared to other structural medical modalities. Recent developments in segmentation foundation models have shown superior versatility across diverse natural image segmentation tasks. Despite the efforts of medical adaptations, these works primarily focus on structural medical images with detailed physiological structural information and exhibit poor generalization ability when adapted to molecular PET imaging. In this paper, we collect and construct PETS-5k, the largest PET segmentation dataset to date, comprising 5,731 three-dimensional whole-body PET images and encompassing over 1.3M 2D images. Based on the established dataset, we develop SegAnyPET, a modality-specific 3D foundation model for universal promptable segmentation from PET images. To issue the challenge of discrepant annotation quality of PET images, we adopt a cross prompting confident learning (CPCL) strategy with an uncertainty-guided self-rectification process to robustly learn segmentation from high-quality labeled data and low-quality noisy labeled data. Experimental results demonstrate that SegAnyPET can correctly segment seen and unseen targets using only one or a few prompt points, outperforming state-of-the-art foundation models and task-specific fully supervised models with higher accuracy and strong generalization ability for universal segmentation. As the first foundation model for PET images, we believe that SegAnyPET will advance the applications to various downstream tasks for molecular imaging.

Via

Access Paper or Ask Questions

Optimizing Wireless Resource Management and Synchronization in Digital Twin Networks

Feb 07, 2025

Hanzhi Yu, Yuchen Liu, Zhaohui Yang, Haijian Sun, Mingzhe Chen

Abstract:In this paper, we investigate an accurate synchronization between a physical network and its digital network twin (DNT), which serves as a virtual representation of the physical network. The considered network includes a set of base stations (BSs) that must allocate its limited spectrum resources to serve a set of users while also transmitting its partially observed physical network information to a cloud server to generate the DNT. Since the DNT can predict the physical network status based on its historical status, the BSs may not need to send their physical network information at each time slot, allowing them to conserve spectrum resources to serve the users. However, if the DNT does not receive the physical network information of the BSs over a large time period, the DNT's accuracy in representing the physical network may degrade. To this end, each BS must decide when to send the physical network information to the cloud server to update the DNT, while also determining the spectrum resource allocation policy for both DNT synchronization and serving the users. We formulate this resource allocation task as an optimization problem, aiming to maximize the total data rate of all users while minimizing the asynchronization between the physical network and the DNT. To address this problem, we propose a method based on the GRUs and the value decomposition network (VDN). Simulation results show that our GRU and VDN based algorithm improves the weighted sum of data rates and the similarity between the status of the DNT and the physical network by up to 28.96%, compared to a baseline method combining GRU with the independent Q learning.

* 12 pages, 6 figures

Via

Access Paper or Ask Questions