Abstract:Accurate observation of dynamic environments traditionally relies on synthesizing raw, signal-level information from multiple distributed sensors. This work investigates an alternative approach: performing geospatial inference using only encrypted packet-level information, without access to the raw sensory data. We further explore how this indirect information can be fused with directly available sensory data to extend overall inference capabilities. We introduce GraySense, a learning-based framework that performs geospatial object tracking by analyzing encrypted wireless video transmission traffic, such as packet sizes, from cameras with inaccessible streams. GraySense leverages the inherent relationship between scene dynamics and transmitted packet sizes to infer object motion. The framework consists of two stages: (1) a Packet Grouping module that identifies frame boundaries and estimates frame sizes from encrypted network traffic, and (2) a Tracker module, based on a Transformer encoder with a recurrent state, which fuses indirect packet-based inputs with optional direct camera-based inputs to estimate the object's position. Extensive experiments with realistic videos from the CARLA simulator and emulated networks under varying conditions show that GraySense achieves 2.33 meters tracking error (Euclidean distance) without raw signal access, within the dimensions of tracked objects (4.61m x 1.93m). To our knowledge, this capability has not been previously demonstrated, expanding the use of latent signals for sensing.




Abstract:Semantic signal processing and communications are poised to play a central part in developing the next generation of sensor devices and networks. A crucial component of a semantic system is the extraction of semantic signals from the raw input signals, which has become increasingly tractable with the recent advances in machine learning (ML) and artificial intelligence (AI) techniques. The accurate extraction of semantic signals using the aforementioned ML and AI methods, and the detection of semantic innovation for scheduling transmission and/or storage events are critical tasks for reliable semantic signal processing and communications. In this work, we propose a reliable semantic information extraction framework based on our previous work on semantic signal representations in a hierarchical graph-based structure. The proposed framework includes a time integration method to increase fidelity of ML outputs in a class-aware manner, a graph-edit-distance based metric to detect innovation events at the graph-level and filter out sporadic errors, and a Hidden Markov Model (HMM) to produce smooth and reliable graph signals. The proposed methods within the framework are demonstrated individually and collectively through simulations and case studies based on real-world computer vision examples.