People with blindness and low vision (pBLV) encounter substantial challenges when it comes to comprehensive scene recognition and precise object identification in unfamiliar environments. Additionally, due to the vision loss, pBLV have difficulty in accessing and identifying potential tripping hazards on their own. In this paper, we present a pioneering approach that leverages a large vision-language model to enhance visual perception for pBLV, offering detailed and comprehensive descriptions of the surrounding environments and providing warnings about the potential risks. Our method begins by leveraging a large image tagging model (i.e., Recognize Anything (RAM)) to identify all common objects present in the captured images. The recognition results and user query are then integrated into a prompt, tailored specifically for pBLV using prompt engineering. By combining the prompt and input image, a large vision-language model (i.e., InstructBLIP) generates detailed and comprehensive descriptions of the environment and identifies potential risks in the environment by analyzing the environmental objects and scenes, relevant to the prompt. We evaluate our approach through experiments conducted on both indoor and outdoor datasets. Our results demonstrate that our method is able to recognize objects accurately and provide insightful descriptions and analysis of the environment for pBLV.
This article puts the spotlight on the receiver front-end (RFE), an integral part of any wireless device that information theory typically idealizes into a mere addition of noise. While this idealization was sound in the past, as operating frequencies, bandwidths, and antenna counts rise, a soaring amount of power is required for the RFE to behave accordingly. Containing this surge in power expenditure exposes a harsher behavior on the part of the RFE (more noise, nonlinearities, and coarse quantization), setting up a tradeoff between the spectral efficiency under such nonidealities and the efficiency in the use of energy by the RFE. With the urge for radically better power consumptions and energy efficiencies in 6G, this emerges as an issue on which information theory can cast light at a fundamental level. More broadly, this article advocates the interest of having information theory embrace the device power consumption in its analyses. In turn, this calls for new models and abstractions such as the ones herein put together for the RFE, and for a more holistic perspective.
We prototype and validate a multistatic mmWave ISAC system based on IEEE802.11ay. Compensation of the clock asynchrony between each TX and RX pair is performed using the sole LoS wireless signal propagation. As a result, our system provides concurrent target tracking and micro-Doppler estimation from multiple points of view, paving the way for practical multistatic data fusion. Our results on human movement sensing, complemented with precise, quantitative GT data, demonstrate the enhanced sensing capabilities of multistatic ISAC, due to the spatial diversity of the receiver nodes.
Surface electromyography (sEMG) and high-density sEMG (HD-sEMG) biosignals have been extensively investigated for myoelectric control of prosthetic devices, neurorobotics, and more recently human-computer interfaces because of their capability for hand gesture recognition/prediction in a wearable and non-invasive manner. High intraday (same-day) performance has been reported. However, the interday performance (separating training and testing days) is substantially degraded due to the poor generalizability of conventional approaches over time, hindering the application of such techniques in real-life practices. There are limited recent studies on the feasibility of multi-day hand gesture recognition. The existing studies face a major challenge: the need for long sEMG epochs makes the corresponding neural interfaces impractical due to the induced delay in myoelectric control. This paper proposes a compact ViT-based network for multi-day dynamic hand gesture prediction. We tackle the main challenge as the proposed model only relies on very short HD-sEMG signal windows (i.e., 50 ms, accounting for only one-sixth of the convention for real-time myoelectric implementation), boosting agility and responsiveness. Our proposed model can predict 11 dynamic gestures for 20 subjects with an average accuracy of over 71% on the testing day, 3-25 days after training. Moreover, when calibrated on just a small portion of data from the testing day, the proposed model can achieve over 92% accuracy by retraining less than 10% of the parameters for computational efficiency.
The upper mid-band -- roughly from 7 to 24 GHz -- has attracted considerable recent interest for new cellular services. This frequency range has vastly more spectrum than the highly congested bands below 7 GHz while offering more favorable propagation and coverage than the millimeter wave (mmWave) frequencies. Realizing the full potential of these bands, however, will require fundamental changes to the design of cellular systems. Most importantly, spectrum will likely need to be shared with incumbents including communication satellites, military RADAR, and radio astronomy. Also, due to the wide bandwidth, directional nature of transmission, and intermittent occupancy of incumbents, cellular systems will need to be agile to sense and intelligently use large spatial and bandwidth degrees of freedom. This paper attempts to provide an initial assessment of the feasibility and potential gains of wideband cellular systems operating in the upper mid-band. The study includes: (1) a system study to assess potential gains of multi-band systems in a representative dense urban environment; (2) propagation calculations to assess potential cross interference between satellites and terrestrial cellular services; and (3) design and evaluation of a compact multi-band antenna array structure. Leveraging these preliminary results, we identify potential future research directions to realize next-generation systems in these frequencies.
The growing focus on indoor robot navigation utilizing wireless signals has stemmed from the capability of these signals to capture high-resolution angular and temporal measurements. However, employing end-to-end generic reinforcement learning (RL) for wireless indoor navigation (WIN) in initially unknown environments remains a significant challenge, due to its limited generalization ability and poor sample efficiency. At the same time, purely model-based solutions, based on radio frequency propagation, are simple and generalizable, but unable to find optimal decisions in complex environments. This work proposes a novel physics-informed RL (PIRL) were a standard distance-to-target-based cost along with physics-informed terms on the optimal trajectory. The proposed PIRL is evaluated using a wireless digital twin (WDT) built upon simulations of a large class of indoor environments from the AI Habitat dataset augmented with electromagnetic radiation (EM) simulation for wireless signals. It is shown that the PIRL significantly outperforms both standard RL and purely physics-based solutions in terms of generalizability and performance. Furthermore, the resulting PIRL policy is explainable in that it is empirically consistent with the physics heuristic.
Generative Adversarial Networks (GANs) are a popular formulation to train generative models for complex high dimensional data. The standard method for training GANs involves a gradient descent-ascent (GDA) procedure on a minimax optimization problem. This procedure is hard to analyze in general due to the nonlinear nature of the dynamics. We study the local dynamics of GDA for training a GAN with a kernel-based discriminator. This convergence analysis is based on a linearization of a non-linear dynamical system that describes the GDA iterations, under an \textit{isolated points model} assumption from [Becker et al. 2022]. Our analysis brings out the effect of the learning rates, regularization, and the bandwidth of the kernel discriminator, on the local convergence rate of GDA. Importantly, we show phase transitions that indicate when the system converges, oscillates, or diverges. We also provide numerical simulations that verify our claims.
Wideband millimeter-wave communication systems can be extended to provide radar-like sensing capabilities on top of data communication, in a cost-effective manner. However, the development of joint communication and sensing technology is hindered by practical challenges, such as occlusions to the line-of-sight path and clock asynchrony between devices. The latter introduces time-varying timing and frequency offsets that prevent the estimation of sensing parameters and, in turn, the use of standard signal processing solutions. Existing approaches cannot be applied to commonly used phased-array receivers, as they build on stringent assumptions about the multipath environment, and are computationally complex. We present JUMP, the first system enabling practical bistatic and asynchronous joint communication and sensing, while achieving accurate target tracking and micro-Doppler extraction in realistic conditions. Our system compensates for the timing offset by exploiting the channel correlation across subsequent packets. Further, it tracks multipath reflections and eliminates frequency offsets by observing the phase of a dynamically-selected static reference path. JUMP has been implemented on a 60 GHz experimental platform, performing extensive evaluations of human motion sensing, including non-line-of-sight scenarios. In our results, JUMP attains comparable tracking performance to a full-duplex monostatic system and similar micro-Doppler quality with respect to a phase-locked bistatic receiver.