In the sixth-generation (6G) networks, massive low-power devices are expected to sense environment and deliver tremendous data. To enhance the radio resource efficiency, the integrated sensing and communication (ISAC) technique exploits the sensing and communication functionalities of signals, while the simultaneous wireless information and power transfer (SWIPT) techniques utilizes the same signals as the carriers for both information and power delivery. The further combination of ISAC and SWIPT leads to the advanced technology namely integrated sensing, communication, and power transfer (ISCPT). In this paper, a multi-user multiple-input multiple-output (MIMO) ISCPT system is considered, where a base station equipped with multiple antennas transmits messages to multiple information receivers (IRs), transfers power to multiple energy receivers (ERs), and senses a target simultaneously. The sensing target can be regarded as a point or an extended surface. When the locations of IRs and ERs are separated, the MIMO beamforming designs are optimized to improve the sensing performance while meeting the communication and power transfer requirements. The resultant non-convex optimization problems are solved based on a series of techniques including Schur complement transformation and rank reduction. Moreover, when the IRs and ERs are co-located, the power splitting factors are jointly optimized together with the beamformers to balance the performance of communication and power transfer. To better understand the performance of ISCPT, the target positioning problem is further investigated. Simulations are conducted to verify the effectiveness of our proposed designs, which also reveal a performance tradeoff among sensing, communication, and power transfer.
This paper investigates the sensing performance of two intelligent reflecting surface (IRS)-enabled non-line-of-sight (NLoS) sensing systems with fully-passive and semi-passive IRSs, respectively. In particular, we consider a fundamental setup with one base station (BS), one uniform linear array (ULA) IRS, and one point target in the NLoS region of the BS. Accordingly, we analyze the sensing signal-to-noise ratio (SNR) performance for a target detection scenario and the estimation Cram\'er-Rao bound (CRB) performance for a target's direction-of-arrival (DoA) estimation scenario, in cases where the transmit beamforming at the BS and the reflective beamforming at the IRS are jointly optimized. First, for the target detection scenario, we characterize the maximum sensing SNR when the BS-IRS channels are line-of-sight (LoS) and Rayleigh fading, respectively. It is revealed that when the number of reflecting elements $N$ equipped at the IRS becomes sufficiently large, the maximum sensing SNR increases proportionally to $N^2$ for the semi-passive-IRS sensing system, but proportionally to $N^4$ for the fully-passive-IRS counterpart. Then, for the target's DoA estimation scenario, we analyze the minimum CRB performance when the BS-IRS channel follows Rayleigh fading. Specifically, when $N$ grows, the minimum CRB decreases inversely proportionally to $N^4$ and $N^6$ for the semi-passive and fully-passive-IRS sensing systems, respectively. Finally, numerical results are presented to corroborate our analysis across various transmit and reflective beamforming design schemes under general channel setups. It is shown that the fully-passive-IRS sensing system outperforms the semi-passive counterpart when $N$ exceeds a certain threshold. This advantage is attributed to the additional reflective beamforming gain in the IRS-BS path, which efficiently compensates for the path loss for a large $N$.
This correspondence presents a novel sensing-assisted sparse channel recovery approach for massive antenna wireless communication systems. We focus on a fundamental configuration with one massive-antenna base station (BS) and one single-antenna communication user (CU). The wireless channel exhibits sparsity and consists of multiple paths associated with scatterers detectable via radar sensing. Under this setup, the BS first sends downlink pilots to the CU and concurrently receives the echo pilot signals for sensing the surrounding scatterers. Subsequently, the CU sends feedback information on its received pilot signal to the BS. Accordingly, the BS determines the sparse basis based on the sensed scatterers and proceeds to recover the wireless channel, exploiting the feedback information based on advanced compressive sensing (CS) algorithms. Numerical results show that the proposed sensing-assisted approach significantly increases the overall achievable rate than the conventional design relying on a discrete Fourier transform (DFT)-based sparse basis without sensing, thanks to the reduced training overhead and enhanced recovery accuracy with limited feedback.
Training large-scale artificial intelligence (AI) models demands significant computational power and energy, leading to increased carbon footprint with potential environmental repercussions. This paper delves into the challenges of training AI models across geographically distributed (geo-distributed) data centers, emphasizing the balance between learning performance and carbon footprint. We consider Federated Learning (FL) as a solution, which prioritizes model parameter exchange over raw data, ensuring data privacy and compliance with local regulations. Given the variability in carbon intensity across regions, we propose a new framework called CAFE (short for Carbon-Aware Federated Learning) to optimize training within a fixed carbon footprint budget. Our approach incorporates coreset selection to assess learning performance, employs the Lyapunov drift-plus-penalty framework to address the unpredictability of future carbon intensity, and devises an efficient algorithm to address the combinatorial complexity of the data center selection. Through extensive simulations using real-world carbon intensity data, we demonstrate the efficacy of our algorithm, highlighting its superiority over existing methods in optimizing learning performance while minimizing environmental impact.
Recent progress in vision language foundation models has shown their ability to understand multimodal data and resolve complicated vision language tasks, including robotics manipulation. We seek a straightforward way of making use of existing vision-language models (VLMs) with simple fine-tuning on robotics data. To this end, we derive a simple and novel vision-language manipulation framework, dubbed RoboFlamingo, built upon the open-source VLMs, OpenFlamingo. Unlike prior works, RoboFlamingo utilizes pre-trained VLMs for single-step vision-language comprehension, models sequential history information with an explicit policy head, and is slightly fine-tuned by imitation learning only on language-conditioned manipulation datasets. Such a decomposition provides RoboFlamingo the flexibility for open-loop control and deployment on low-performance platforms. By exceeding the state-of-the-art performance with a large margin on the tested benchmark, we show RoboFlamingo can be an effective and competitive alternative to adapt VLMs to robot control. Our extensive experimental results also reveal several interesting conclusions regarding the behavior of different pre-trained VLMs on manipulation tasks. We believe RoboFlamingo has the potential to be a cost-effective and easy-to-use solution for robotics manipulation, empowering everyone with the ability to fine-tune their own robotics policy.
Radar systems typically employ well-designed deterministic signals for target sensing, while integrated sensing and communications (ISAC) systems have to adopt random signals to convey useful information. This paper analyzes the sensing and ISAC performance relying on random signaling in a multiantenna system. Towards this end, we define a new sensing performance metric, namely, ergodic linear minimum mean square error (ELMMSE), which characterizes the estimation error averaged over random ISAC signals. Then, we investigate a data-dependent precoding (DDP) scheme to minimize the ELMMSE in sensing-only scenarios, which attains the optimized performance at the cost of high implementation overhead. To reduce the cost, we present an alternative data-independent precoding (DIP) scheme by stochastic gradient projection (SGP). Moreover, we shed light on the optimal structures of both sensing-only DDP and DIP precoders. As a further step, we extend the proposed DDP and DIP approaches to ISAC scenarios, which are solved via a tailored penalty-based alternating optimization algorithm. Our numerical results demonstrate that the proposed DDP and DIP methods achieve substantial performance gains over conventional ISAC signaling schemes that treat the signal sample covariance matrix as deterministic, which proves that random ISAC signals deserve dedicated precoding designs.
With recent advancements, the wireless local area network (WLAN) or wireless fidelity (Wi-Fi) technology has been successfully utilized to realize sensing functionalities such as detection, localization, and recognition. However, the WLANs standards are developed mainly for the purpose of communication, and thus may not be able to meet the stringent requirements for emerging sensing applications. To resolve this issue, a new Task Group (TG), namely IEEE 802.11bf, has been established by the IEEE 802.11 working group, with the objective of creating a new amendment to the WLAN standard to meet advanced sensing requirements while minimizing the effect on communications. This paper provides a comprehensive overview on the up-to-date efforts in the IEEE 802.11bf TG. First, we introduce the definition of the 802.11bf amendment and its formation and standardization timeline. Next, we discuss the WLAN sensing use cases with the corresponding key performance indicator (KPI) requirements. After reviewing previous WLAN sensing research based on communication-oriented WLAN standards, we identify their limitations and underscore the practical need for the new sensing-oriented amendment in 802.11bf. Furthermore, we discuss the WLAN sensing framework and procedure used for measurement acquisition, by considering both sensing at sub-7GHz and directional multi-gigabit (DMG) sensing at 60 GHz, respectively, and address their shared features, similarities, and differences. In addition, we present various candidate technical features for IEEE 802.11bf, including waveform/sequence design, feedback types, as well as quantization and compression techniques. We also describe the methodologies and the channel modeling used by the IEEE 802.11bf TG for evaluation. Finally, we discuss the challenges and future research directions to motivate more research endeavors towards this field in details.
Ambiguity is ubiquitous in human communication. Previous approaches in Human-Robot Interaction (HRI) have often relied on predefined interaction templates, leading to reduced performance in realistic and open-ended scenarios. To address these issues, we present a large-scale dataset, \invig, for interactive visual grounding under language ambiguity. Our dataset comprises over 520K images accompanied by open-ended goal-oriented disambiguation dialogues, encompassing millions of object instances and corresponding question-answer pairs. Leveraging the \invig dataset, we conduct extensive studies and propose a set of baseline solutions for end-to-end interactive visual disambiguation and grounding, achieving a 45.6\% success rate during validation. To the best of our knowledge, the \invig dataset is the first large-scale dataset for resolving open-ended interactive visual grounding, presenting a practical yet highly challenging benchmark for ambiguity-aware HRI. Codes and datasets are available at: \href{https://openivg.github.io}{https://openivg.github.io}.
Federated multi-view clustering has the potential to learn a global clustering model from data distributed across multiple devices. In this setting, label information is unknown and data privacy must be preserved, leading to two major challenges. First, views on different clients often have feature heterogeneity, and mining their complementary cluster information is not trivial. Second, the storage and usage of data from multiple clients in a distributed environment can lead to incompleteness of multi-view data. To address these challenges, we propose a novel federated deep multi-view clustering method that can mine complementary cluster structures from multiple clients, while dealing with data incompleteness and privacy concerns. Specifically, in the server environment, we propose sample alignment and data extension techniques to explore the complementary cluster structures of multiple views. The server then distributes global prototypes and global pseudo-labels to each client as global self-supervised information. In the client environment, multiple clients use the global self-supervised information and deep autoencoders to learn view-specific cluster assignments and embedded features, which are then uploaded to the server for refining the global self-supervised information. Finally, the results of our extensive experiments demonstrate that our proposed method exhibits superior performance in addressing the challenges of incomplete multi-view data in distributed environments.