Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Tingjun Chen

Real-Time and Scalable Zak-OTFS Receiver Processing on GPUs

Apr 02, 2026

Junyao Zheng, Chung-Hsuan Tung, Yuncheng Yao, Nishant Mehrotra, Sandesh Mattu, Zhenzhou Qi, Danyang Zhuo, Robert Calderbank, Tingjun Chen

Abstract:Orthogonal time frequency space (OTFS) modulation offers superior robustness to high-mobility channels compared to conventional orthogonal frequency-division multiplexing (OFDM) waveforms. However, its explicit delay-Doppler (DD) domain representation incurs substantial signal processing complexity, especially with increased DD domain grid sizes. To address this challenge, we present a scalable, real-time Zak-OTFS receiver architecture on GPUs through hardware--algorithm co-design that exploits DD-domain channel sparsity. Our design leverages compact matrix operations for key processing stages, a branchless iterative equalizer, and a structured sparse channel matrix of the DD domain channel matrix to significantly reduce computational and memory overhead. These optimizations enable low-latency processing that consistently meets the 99.9-th percentile real-time processing deadline. The proposed system achieves up to 906.52 Mbps throughput with a DD grid size of (16384,32) using 16QAM modulation over 245.76 MHz bandwidth. Extensive evaluations under a Vehicular-A channel model demonstrate strong scalability and robust performance across CPU (Intel Xeon) and multiple GPU platforms (NVIDIA Jetson Orin, RTX 6000 Ada, A100, and H200), highlighting the effectiveness of compute-aware Zak-OTFS receiver design for next-generation (NextG) high-mobility communication systems.

* This work has been submitted to the IEEE for possible publication

Via

Access Paper or Ask Questions

Skilled AI Agents for Embedded and IoT Systems Development

Mar 20, 2026

Yiming Li, Yuhan Cheng, Mingchen Ma, Yihang Zou, Ningyuan Yang, Wei Cheng, Hai "Helen" Li, Yiran Chen, Tingjun Chen

Abstract:Large language models (LLMs) and agentic systems have shown promise for automated software development, but applying them to hardware-in-the-loop (HIL) embedded and Internet-of-Things (IoT) systems remains challenging due to the tight coupling between software logic and physical hardware behavior. Code that compiles successfully may still fail when deployed on real devices because of timing constraints, peripheral initialization requirements, or hardware-specific behaviors. To address this challenge, we introduce a skills-based agentic framework for HIL embedded development together with IoT-SkillsBench, a benchmark designed to systematically evaluate AI agents in real embedded programming environments. IoT-SkillsBench spans three representative embedded platforms, 23 peripherals, and 42 tasks across three difficulty levels, where each task is evaluated under three agent configurations (no-skills, LLM-generated skills, and human-expert skills) and validated through real hardware execution. Across 378 hardware validated experiments, we show that concise human-expert skills with structured expert knowledge enable near-perfect success rates across platforms.

Via

Access Paper or Ask Questions

Agentic AI for Scalable and Robust Optical Systems Control

Feb 23, 2026

Zehao Wang, Mingzhe Han, Wei Cheng, Yue-Kai Huang, Philip Ji, Denton Wu, Mahdi Safari, Flemming Holtorf, Kenaish AlQubaisi, Norbert M. Linke(+5 more)

Abstract:We present AgentOptics, an agentic AI framework for high-fidelity, autonomous optical system control built on the Model Context Protocol (MCP). AgentOptics interprets natural language tasks and executes protocol-compliant actions on heterogeneous optical devices through a structured tool abstraction layer. We implement 64 standardized MCP tools across 8 representative optical devices and construct a 410-task benchmark to evaluate request understanding, role-aware responses, multi-step coordination, robustness to linguistic variation, and error handling. We assess two deployment configurations--commercial online LLMs and locally hosted open-source LLMs--and compare them with LLM-based code generation baselines. AgentOptics achieves 87.7%--99.0% average task success rates, significantly outperforming code-generation approaches, which reach up to 50% success. We further demonstrate broader applicability through five case studies extending beyond device-level control to system orchestration, monitoring, and closed-loop optimization. These include DWDM link provisioning and coordinated monitoring of coherent 400 GbE and analog radio-over-fiber (ARoF) channels; autonomous characterization and bias optimization of a wideband ARoF link carrying 5G fronthaul traffic; multi-span channel provisioning with launch power optimization; closed-loop fiber polarization stabilization; and distributed acoustic sensing (DAS)-based fiber monitoring with LLM-assisted event detection. These results establish AgentOptics as a scalable, robust paradigm for autonomous control and orchestration of heterogeneous optical systems.

Via

Access Paper or Ask Questions

Optical Link Tomography: First Field Trial and 4D Extension

Oct 10, 2025

Takeo Sasai, Giacomo Borraccini, Yue-Kai Huang, Hideki Nishizawa, Zehao Wang, Tingjun Chen, Yoshiaki Sone, Minami Takahashi, Tatsuya Matsumura, Masanori Nakamura(+4 more)

Abstract:Optical link tomography (OLT) is a rapidly evolving field that allows the multi-span, end-to-end visualization of optical power along fiber links in multiple dimensions from network endpoints, solely by processing signals received at coherent receivers. This paper has two objectives: (1) to report the first field trial of OLT, using a commercial transponder under standard DWDM transmission, and (2) to extend its capability to visualize across 4D (distance, time, frequency, and polarization), allowing for locating and measuring multiple QoT degradation causes, including time-varying power anomalies, spectral anomalies, and excessive polarization dependent loss. We also address a critical aspect of OLT, i.e., its need for high fiber launch power, by improving power profile signal-to-noise ratio through averaging across all available dimensions. Consequently, multiple loss anomalies in a field-deployed link are observed even at launch power lower than the system-optimal level. The applications and use cases of OLT from network commissioning to provisioning and operation for current and near-term network scenarios are also discussed.

* Journal of Lightwave Technology, 2025
* 12 pages, 7 figures, accepted version for Journal of Lightwave Technology

Via

Access Paper or Ask Questions

Chameleon: Integrated Sensing and Communication with Sub-Symbol Beam Switching in mmWave Networks

Sep 18, 2025

Zhihui Gao, Zhecun Liu, Tingjun Chen

Abstract:Next-generation cellular networks are envisioned to integrate sensing capabilities with communication, particularly in the millimeter-wave (mmWave) spectrum, where beamforming using large-scale antenna arrays enables directional signal transmissions for improved spatial multiplexing. In current 5G networks, however, beamforming is typically designed either for communication or sensing (e.g., beam training during link establishment). In this paper, we present Chameleon, a novel framework that augments and rapidly switches beamformers during each demodulation reference signal (DMRS) symbol to achieve integrated sensing and communication (ISAC) in 5G mmWave networks. Each beamformer introduces an additional sensing beam toward target angles while maintaining the communication beams toward multiple users. We implement Chameleon on a 28 GHz software-defined radio testbed supporting over-the-air 5G physical downlink shared channel (PDSCH) transmissions. Extensive experiments in open environments show that Chameleon achieves multi-user communication with a sum data rate of up to 0.80 Gbps across two users. Simultaneously, Chameleon employs a beamformer switching interval of only 0.24 {\mu}s, therefore producing a 31x31-point 2D imaging within just 0.875 ms. Leveraging machine learning, Chameleon further enables object localization with median errors of 0.14 m (distance) and 0.24{\deg} (angle), and material classification with 99.0% accuracy.

* 14 pages, 17 figures

Via

Access Paper or Ask Questions

BatStation: Toward In-Situ Radar Sensing on 5G Base Stations with Zero-Shot Template Generation

Sep 08, 2025

Zhihui Gao, Zhecun Liu, Tingjun Chen

Abstract:The coexistence between incumbent radar signals and commercial 5G signals necessitates a versatile and ubiquitous radar sensing for efficient and adaptive spectrum sharing. In this context, leveraging the densely deployed 5G base stations (BS) for radar sensing is particularly promising, offering both wide coverage and immediate feedback to 5G scheduling. However, the targeting radar signals are superimposed with concurrent 5G uplink transmissions received by the BS, and practical deployment also demands a lightweight, portable radar sensing model. This paper presents BatStation, a lightweight, in-situ radar sensing framework seamlessly integrated into 5G BSs. BatStation leverages uplink resource grids to extract radar signals through three key components: (i) radar signal separation to cancel concurrent 5G transmissions and reveal the radar signals, (ii) resource grid reshaping to align time-frequency resolution with radar pulse characteristics, and (iii) zero-shot template correlation based on a portable model trained purely on synthetic data that supports detection, classification, and localization of radar pulses without fine-tuning using experimental data. We implement BatStation on a software-defined radio (SDR) testbed and evaluate its performance with real 5G traffic in the CBRS band. Results show robust performance across diverse radar types, achieving detection probabilities of 97.02% (PUCCH) and 79.23% (PUSCH), classification accuracy up to 97.00%, and median localization errors of 2.68-6.20 MHz (frequency) and 24.6-32.4 microseconds (time). Notably, BatStation achieves this performance with a runtime latency of only 0.11/0.94 ms on GPU/CPU, meeting the real-time requirement of 5G networks.

* 14 pages, 17 figures

Via

Access Paper or Ask Questions

RaGNNarok: A Light-Weight Graph Neural Network for Enhancing Radar Point Clouds on Unmanned Ground Vehicles

Jul 01, 2025

David Hunt, Shaocheng Luo, Spencer Hallyburton, Shafii Nillongo, Yi Li, Tingjun Chen, Miroslav Pajic

Abstract:Low-cost indoor mobile robots have gained popularity with the increasing adoption of automation in homes and commercial spaces. However, existing lidar and camera-based solutions have limitations such as poor performance in visually obscured environments, high computational overhead for data processing, and high costs for lidars. In contrast, mmWave radar sensors offer a cost-effective and lightweight alternative, providing accurate ranging regardless of visibility. However, existing radar-based localization suffers from sparse point cloud generation, noise, and false detections. Thus, in this work, we introduce RaGNNarok, a real-time, lightweight, and generalizable graph neural network (GNN)-based framework to enhance radar point clouds, even in complex and dynamic environments. With an inference time of just 7.3 ms on the low-cost Raspberry Pi 5, RaGNNarok runs efficiently even on such resource-constrained devices, requiring no additional computational resources. We evaluate its performance across key tasks, including localization, SLAM, and autonomous navigation, in three different environments. Our results demonstrate strong reliability and generalizability, making RaGNNarok a robust solution for low-cost indoor mobile robots.

* 8 pages, accepted by IROS 2025

Via

Access Paper or Ask Questions

Machine Intelligence on Wireless Edge Networks

Jun 13, 2025

Sri Krishna Vadlamani, Kfir Sulimany, Zhihui Gao, Tingjun Chen, Dirk Englund

Abstract:Deep neural network (DNN) inference on power-constrained edge devices is bottlenecked by costly weight storage and data movement. We introduce MIWEN, a radio-frequency (RF) analog architecture that ``disaggregates'' memory by streaming weights wirelessly and performing classification in the analog front end of standard transceivers. By encoding weights and activations onto RF carriers and using native mixers as computation units, MIWEN eliminates local weight memory and the overhead of analog-to-digital and digital-to-analog conversion. We derive the effective number of bits of radio-frequency analog computation under thermal noise, quantify the energy--precision trade-off, and demonstrate digital-comparable MNIST accuracy at orders-of-magnitude lower energy, unlocking real-time inference on low-power, memory-free edge devices.

* 13 pages, 6 figures

Via

Access Paper or Ask Questions

Phantora: Live GPU Cluster Simulation for Machine Learning System Performance Estimation

May 02, 2025

Jianxing Qin, Jingrong Chen, Xinhao Kong, Yongji Wu, Liang Luo, Zhaodong Wang, Ying Zhang, Tingjun Chen, Alvin R. Lebeck, Danyang Zhuo

Abstract:To accommodate ever-increasing model complexity, modern machine learning (ML) systems have to scale to large GPU clusters. Changes in ML model architecture, ML system implementation, and cluster configuration can significantly affect overall ML system performance. However, quantifying the performance impact before deployment is challenging. Existing performance estimation methods use performance modeling or static workload simulation. These techniques are not general: they requires significant human effort and computation capacity to generate training data or a workload. It is also difficult to adapt ML systems to use these techniques. This paper introduces, Phantora, a live GPU cluster simulator for performance estimation. Phantora runs minimally modified ML models and frameworks, intercepting and simulating GPU-related operations to enable high-fidelity performance estimation. Phantora overcomes several research challenges in integrating an event-driven network simulator with live system execution, and introduces a set of techniques to improve simulation speed, scalability, and accuracy. Our evaluation results show that Phantora can deliver similar estimation accuracy to the state-of-the-art workload simulation approach with only one GPU, while reducing human effort and increasing generalizability.

Via

Access Paper or Ask Questions

Disaggregated Deep Learning via In-Physics Computing at Radio Frequency

Apr 24, 2025

Zhihui Gao, Sri Krishna Vadlamani, Kfir Sulimany, Dirk Englund, Tingjun Chen

Figure 1 for Disaggregated Deep Learning via In-Physics Computing at Radio Frequency

Figure 2 for Disaggregated Deep Learning via In-Physics Computing at Radio Frequency

Figure 3 for Disaggregated Deep Learning via In-Physics Computing at Radio Frequency

Figure 4 for Disaggregated Deep Learning via In-Physics Computing at Radio Frequency

Abstract:Modern edge devices, such as cameras, drones, and Internet-of-Things nodes, rely on deep learning to enable a wide range of intelligent applications, including object recognition, environment perception, and autonomous navigation. However, deploying deep learning models directly on the often resource-constrained edge devices demands significant memory footprints and computational power for real-time inference using traditional digital computing architectures. In this paper, we present WISE, a novel computing architecture for wireless edge networks designed to overcome energy constraints in deep learning inference. WISE achieves this goal through two key innovations: disaggregated model access via wireless broadcasting and in-physics computation of general complex-valued matrix-vector multiplications directly at radio frequency. Using a software-defined radio platform with wirelessly broadcast model weights over the air, we demonstrate that WISE achieves 95.7% image classification accuracy with ultra-low operation power of 6.0 fJ/MAC per client, corresponding to a computation efficiency of 165.8 TOPS/W. This approach enables energy-efficient deep learning inference on wirelessly connected edge devices, achieving more than two orders of magnitude improvement in efficiency compared to traditional digital computing.

* 11 pages, 4 figures. Supplementary Information: 54 pages, 20 figures, 1 table

Via

Access Paper or Ask Questions