Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Xiang-Yang Li

RINN: One Sample Radio Frequency Imaging based on Physics Informed Neural Network

Apr 19, 2025

Fei Shang, Haohua Du, Dawei Yan, Panlong Yang, Xiang-Yang Li

Abstract:Due to its ability to work in non-line-of-sight and low-light environments, radio frequency (RF) imaging technology is expected to bring new possibilities for embodied intelligence and multimodal sensing. However, widely used RF devices (such as Wi-Fi) often struggle to provide high-precision electromagnetic measurements and large-scale datasets, hindering the application of RF imaging technology. In this paper, we combine the ideas of PINN to design the RINN network, using physical constraints instead of true value comparison constraints and adapting it with the characteristics of ubiquitous RF signals, allowing the RINN network to achieve RF imaging using only one sample without phase and with amplitude noise. Our numerical evaluation results show that compared with 5 classic algorithms based on phase data for imaging results, RINN's imaging results based on phaseless data are good, with indicators such as RRMSE (0.11) performing similarly well. RINN provides new possibilities for the universal development of radio frequency imaging technology.

Via

Access Paper or Ask Questions

Beyond Local Selection: Global Cut Selection for Enhanced Mixed-Integer Programming

Mar 20, 2025

Shuli Zeng, Sijia Zhang, Shaoang Li, Feng Wu, Xiang-Yang Li

Abstract:In mixed-integer programming (MIP) solvers, cutting planes are essential for Branch-and-Cut (B&C) algorithms as they reduce the search space and accelerate the solving process. Traditional methods rely on hard-coded heuristics for cut plane selection but fail to leverage problem-specific structural features. Recent machine learning approaches use neural networks for cut selection but focus narrowly on the efficiency of single-node within the B&C algorithm, without considering the broader contextual information. To address this, we propose Global Cut Selection (GCS), which uses a bipartite graph to represent the search tree and combines graph neural networks with reinforcement learning to develop cut selection strategies. Unlike prior methods, GCS applies cutting planes across all nodes, incorporating richer contextual information. Experiments show GCS significantly improves solving efficiency for synthetic and large-scale real-world MIPs compared to traditional and learning-based methods.

Via

Access Paper or Ask Questions

Real-Time Neural-Enhancement for Online Cloud Gaming

Jan 12, 2025

Shan Jiang, Zhenhua Han, Haisheng Tan, Xinyang Jiang, Yifan Yang, Xiaoxi Zhang, Hongqiu Ni, Yuqing Yang, Xiang-Yang Li

Figure 1 for Real-Time Neural-Enhancement for Online Cloud Gaming

Figure 2 for Real-Time Neural-Enhancement for Online Cloud Gaming

Figure 3 for Real-Time Neural-Enhancement for Online Cloud Gaming

Figure 4 for Real-Time Neural-Enhancement for Online Cloud Gaming

Abstract:Online Cloud gaming demands real-time, high-quality video transmission across variable wide-area networks (WANs). Neural-enhanced video transmission algorithms employing super-resolution (SR) for video quality enhancement have effectively challenged WAN environments. However, these SR-based methods require intensive fine-tuning for the whole video, making it infeasible in diverse online cloud gaming. To address this, we introduce River, a cloud gaming delivery framework designed based on the observation that video segment features in cloud gaming are typically repetitive and redundant. This permits a significant opportunity to reuse fine-tuned SR models, reducing the fine-tuning latency of minutes to query latency of milliseconds. To enable the idea, we design a practical system that addresses several challenges, such as model organization, online model scheduler, and transfer strategy. River first builds a content-aware encoder that fine-tunes SR models for diverse video segments and stores them in a lookup table. When delivering cloud gaming video streams online, River checks the video features and retrieves the most relevant SR models to enhance the frame quality. Meanwhile, if no existing SR model performs well enough for some video segments, River will further fine-tune new models and update the lookup table. Finally, to avoid the overhead of streaming model weight to the clients, River designs a prefetching strategy that predicts the models with the highest possibility of being retrieved. Our evaluation based on real video game streaming demonstrates River can reduce redundant training overhead by 44% and improve the Peak-Signal-to-Noise-Ratio by 1.81dB compared to the SOTA solutions. Practical deployment shows River meets real-time requirements, achieving approximately 720p 20fps on mobile devices.

Via

Access Paper or Ask Questions

FFCG: Effective and Fast Family Column Generation for Solving Large-Scale Linear Program

Dec 26, 2024

Yi-Xiang Hu, Feng Wu, Shaoang Li, Yifang Zhao, Xiang-Yang Li

Figure 1 for FFCG: Effective and Fast Family Column Generation for Solving Large-Scale Linear Program

Figure 2 for FFCG: Effective and Fast Family Column Generation for Solving Large-Scale Linear Program

Figure 3 for FFCG: Effective and Fast Family Column Generation for Solving Large-Scale Linear Program

Figure 4 for FFCG: Effective and Fast Family Column Generation for Solving Large-Scale Linear Program

Abstract:Column Generation (CG) is an effective and iterative algorithm to solve large-scale linear programs (LP). During each CG iteration, new columns are added to improve the solution of the LP. Typically, CG greedily selects one column with the most negative reduced cost, which can be improved by adding more columns at once. However, selecting all columns with negative reduced costs would lead to the addition of redundant columns that do not improve the objective value. Therefore, selecting the appropriate columns to add is still an open problem and previous machine-learning-based approaches for CG only add a constant quantity of columns per iteration due to the state-space explosion problem. To address this, we propose Fast Family Column Generation (FFCG) -- a novel reinforcement-learning-based CG that selects a variable number of columns as needed in an iteration. Specifically, we formulate the column selection problem in CG as an MDP and design a reward metric that balances both the convergence speed and the number of redundant columns. In our experiments, FFCG converges faster on the common benchmarks and reduces the number of CG iterations by 77.1% for Cutting Stock Problem (CSP) and 84.8% for Vehicle Routing Problem with Time Windows (VRPTW), and a 71.4% reduction in computing time for CSP and 84.0% for VRPTW on average compared to several state-of-the-art baselines.

Via

Access Paper or Ask Questions

The Field-based Model: A New Perspective on RF-based Material Sensing

Dec 07, 2024

Fei Shang, Haocheng Jiang, Panlong Yang, Dawei Yan, Haohua Du, Xiang-Yang Li

Figure 1 for The Field-based Model: A New Perspective on RF-based Material Sensing

Figure 2 for The Field-based Model: A New Perspective on RF-based Material Sensing

Figure 3 for The Field-based Model: A New Perspective on RF-based Material Sensing

Figure 4 for The Field-based Model: A New Perspective on RF-based Material Sensing

Abstract:This paper introduces the design and implementation of WiField, a WiFi sensing system deployed on COTS devices that can simultaneously identify multiple wavelength-level targets placed flexibly. Unlike traditional RF sensing schemes that focus on specific targets and RF links, WiField focuses on all media in the sensing area for the entire electric field. In this perspective, WiField provides a unified framework to finely characterize the diffraction, scattering, and other effects of targets at different positions, materials, and numbers on signals. The combination of targets in different positions, numbers, and sizes is just a special case. WiField proposed a scheme that utilizes phaseless data to complete the inverse mapping from electric field to material distribution, thereby achieving the simultaneous identification of multiple wavelength-level targets at any position and having the potential for deployment on a wide range of low-cost COTS devices. Our evaluation results show that it has an average identification accuracy of over 97% for 1-3 targets (5 cm * 10 cm in size) with different materials randomly placed within a 1.05 m * 1.05 m area.

Via

Access Paper or Ask Questions

DeepCore: Simple Fingerprint Construction for Differentiating Homologous and Piracy Models

Nov 01, 2024

Haifeng Sun, Lan Zhang, Xiang-Yang Li

Figure 1 for DeepCore: Simple Fingerprint Construction for Differentiating Homologous and Piracy Models

Figure 2 for DeepCore: Simple Fingerprint Construction for Differentiating Homologous and Piracy Models

Figure 3 for DeepCore: Simple Fingerprint Construction for Differentiating Homologous and Piracy Models

Figure 4 for DeepCore: Simple Fingerprint Construction for Differentiating Homologous and Piracy Models

Abstract:As intellectual property rights, the copyright protection of deep models is becoming increasingly important. Existing work has made many attempts at model watermarking and fingerprinting, but they have ignored homologous models trained with similar structures or training datasets. We highlight challenges in efficiently querying black-box piracy models to protect model copyrights without misidentifying homologous models. To address these challenges, we propose a novel method called DeepCore, which discovers that the classification confidence of the model is positively correlated with the distance of the predicted sample from the model decision boundary and piracy models behave more similarly at high-confidence classified sample points. Then DeepCore constructs core points far away from the decision boundary by optimizing the predicted confidence of a few sample points and leverages behavioral discrepancies between piracy and homologous models to identify piracy models. Finally, we design different model identification methods, including two similarity-based methods and a clustering-based method to identify piracy models using models' predictions of core points. Extensive experiments show the effectiveness of DeepCore in identifying various piracy models, achieving lower missed and false identification rates, and outperforming state-of-the-art methods.

* 9 pages

Via

Access Paper or Ask Questions

ViRED: Prediction of Visual Relations in Engineering Drawings

Sep 02, 2024

Chao Gu, Ke Lin, Yiyang Luo, Jiahui Hou, Xiang-Yang Li

Figure 1 for ViRED: Prediction of Visual Relations in Engineering Drawings

Figure 2 for ViRED: Prediction of Visual Relations in Engineering Drawings

Figure 3 for ViRED: Prediction of Visual Relations in Engineering Drawings

Figure 4 for ViRED: Prediction of Visual Relations in Engineering Drawings

Abstract:To accurately understand engineering drawings, it is essential to establish the correspondence between images and their description tables within the drawings. Existing document understanding methods predominantly focus on text as the main modality, which is not suitable for documents containing substantial image information. In the field of visual relation detection, the structure of the task inherently limits its capacity to assess relationships among all entity pairs in the drawings. To address this issue, we propose a vision-based relation detection model, named ViRED, to identify the associations between tables and circuits in electrical engineering drawings. Our model mainly consists of three parts: a vision encoder, an object encoder, and a relation decoder. We implement ViRED using PyTorch to evaluate its performance. To validate the efficacy of ViRED, we conduct a series of experiments. The experimental results indicate that, within the engineering drawing dataset, our approach attained an accuracy of 96\% in the task of relation prediction, marking a substantial improvement over existing methodologies. The results also show that ViRED can inference at a fast speed even when there are numerous objects in a single engineering drawing.

* 8 pages, 5 figures

Via

Access Paper or Ask Questions

SGSM: A Foundation-model-like Semi-generalist Sensing Model

Jun 15, 2024

Tianjian Yang, Hao Zhou, Shuo Liu, Kaiwen Guo, Yiwen Hou, Haohua Du, Zhi Liu, Xiang-Yang Li

Figure 1 for SGSM: A Foundation-model-like Semi-generalist Sensing Model

Figure 2 for SGSM: A Foundation-model-like Semi-generalist Sensing Model

Figure 3 for SGSM: A Foundation-model-like Semi-generalist Sensing Model

Figure 4 for SGSM: A Foundation-model-like Semi-generalist Sensing Model

Abstract:The significance of intelligent sensing systems is growing in the realm of smart services. These systems extract relevant signal features and generate informative representations for particular tasks. However, building the feature extraction component for such systems requires extensive domain-specific expertise or data. The exceptionally rapid development of foundation models is likely to usher in newfound abilities in such intelligent sensing. We propose a new scheme for sensing model, which we refer to as semi-generalist sensing model (SGSM). SGSM is able to semiautomatically solve various tasks using relatively less task-specific labeled data compared to traditional systems. Built through the analysis of the common theoretical model, SGSM can depict different modalities, such as the acoustic and Wi-Fi signal. Experimental results on such two heterogeneous sensors illustrate that SGSM functions across a wide range of scenarios, thereby establishing its broad applicability. In some cases, SGSM even achieves better performance than sensor-specific specialized solutions. Wi-Fi evaluations indicate a 20\% accuracy improvement when applying SGSM to an existing sensing model.

Via

Access Paper or Ask Questions

Towards the limits: Sensing Capability Measurement for ISAC Through Channel Encoder

May 15, 2024

Fei Shang, Haohua Du, Panlong Yang, Xin He, Wen Ma, Xiang-Yang Li

Figure 1 for Towards the limits: Sensing Capability Measurement for ISAC Through Channel Encoder

Figure 2 for Towards the limits: Sensing Capability Measurement for ISAC Through Channel Encoder

Figure 3 for Towards the limits: Sensing Capability Measurement for ISAC Through Channel Encoder

Figure 4 for Towards the limits: Sensing Capability Measurement for ISAC Through Channel Encoder

Abstract:Integrated Sensing and Communication (ISAC) is gradually becoming a reality due to the significant increase in frequency and bandwidth of next-generation wireless communication technologies. Therefore it becomes crucial to evaluate the communication and sensing performance using appropriate channel models to address resource competition from each other. Existing work only models the sensing capability based on the mutual information between the channel response and the received signal, and its theoretical resolution is difficult to support the high-precision requirements of ISAC for sensing tasks, and may even affect its communication optimal. In this paper, we propose a sensing channel encoder model to measure the sensing capacity with higher resolution by discrete task mutual information. For the first time, derive upper and lower bounds on the sensing accuracy for a given channel. This model not only provides the possibility of optimizing the ISAC systems at a finer granularity and balancing communication and sensing resources, but also provides theoretical explanations for classical intuitive feelings (like more modalities more accuracy) in wireless sensing. Furthermore, we validate the effectiveness of the proposed channel model through real-case studies, including person identification, displacement detection, direction estimation, and device recognition. The evaluation results indicate a Pearson correlation coefficient exceeding 0.9 between our task mutual information and conventional experimental metrics (e.g., accuracy).

Via

Access Paper or Ask Questions

NaturalSpeech 3: Zero-Shot Speech Synthesis with Factorized Codec and Diffusion Models

Mar 05, 2024

Zeqian Ju, Yuancheng Wang, Kai Shen, Xu Tan, Detai Xin, Dongchao Yang, Yanqing Liu, Yichong Leng, Kaitao Song, Siliang Tang(+9 more)

Abstract:While recent large-scale text-to-speech (TTS) models have achieved significant progress, they still fall short in speech quality, similarity, and prosody. Considering speech intricately encompasses various attributes (e.g., content, prosody, timbre, and acoustic details) that pose significant challenges for generation, a natural idea is to factorize speech into individual subspaces representing different attributes and generate them individually. Motivated by it, we propose NaturalSpeech 3, a TTS system with novel factorized diffusion models to generate natural speech in a zero-shot way. Specifically, 1) we design a neural codec with factorized vector quantization (FVQ) to disentangle speech waveform into subspaces of content, prosody, timbre, and acoustic details; 2) we propose a factorized diffusion model to generate attributes in each subspace following its corresponding prompt. With this factorization design, NaturalSpeech 3 can effectively and efficiently model the intricate speech with disentangled subspaces in a divide-and-conquer way. Experiments show that NaturalSpeech 3 outperforms the state-of-the-art TTS systems on quality, similarity, prosody, and intelligibility. Furthermore, we achieve better performance by scaling to 1B parameters and 200K hours of training data.

* 22 pages, 15 tables, 3 figures

Via

Access Paper or Ask Questions