This paper studies the over-the-air computation (AirComp) in an orthogonal frequency division multiplexing (OFDM) system with imperfect channel state information (CSI), in which multiple single-antenna wireless devices (WDs) simultaneously send uncoded signals to a multi-antenna access point (AP) for distributed functional computation over multiple subcarriers. In particular, we consider two scenarios with best-effort and error-constrained computation tasks, with the objectives of minimizing the average computation mean squared error (MSE) and the computation outage probability over the multiple subcarriers, respectively. Towards this end, we jointly optimize the transmit coefficients at the WDs and the receive beamforming vectors at the AP over subcarriers, subject to the maximum transmit power constraints at individual WDs. First, for the special case with a single receive antenna at the AP, we propose the semi-closed-form globally optimal solutions to the two problems using the Lagrange-duality method. It is shown that at each subcarrier, the WDs' optimized power control policy for average MSE minimization follows a regularized channel inversion structure, while that for computation outage probability minimization follows an on-off regularized channel inversion, with the regularization dependent on the transmit power budget and channel estimation error. Next, for the general case with multiple receive antennas at the AP, we present efficient algorithms based on alternating optimization and convex optimization to find converged solutions to both problems.
This paper studies a multi-intelligent-reflecting-surface-(IRS)-enabled integrated sensing and communications (ISAC) system, in which multiple IRSs are installed to help the base station (BS) provide ISAC services at separate line-of-sight (LoS) blocked areas. We focus on the scenario with semi-passive uniform linear array (ULA) IRSsfor sensing, in which each IRS is integrated with dedicated sensors for processing echo signals, and each IRS simultaneously serves one sensing target and one communication user (CU) in its coverage area. In particular, we suppose that the BS sends combined information and dedicated sensing signals for ISAC, and we consider two cases with point and extended targets, in which each IRS aims to estimate the direction-of-arrival (DoA) of the corresponding target and the complete target response matrix, respectively. Under this setup, we first derive the closed-form Cram{\'e}r-Rao bounds (CRBs) for parameters estimation under the two target models. For the point target case, the CRB for AoA estimation is shown to be inversely proportional to the cubic of the number of sensors at each IRS, while for the extended target case, the CRB for target response matrix estimation is proportional to the number of IRS sensors. Next, we consider two different types of CU receivers that can and cannot cancel the interference from dedicated sensing signals prior to information decoding. To achieve fair and optimized sensing performance, we minimize the maximum CRB at all IRSs for the two target cases, via jointly optimizing the transmit beamformers at the BS and the reflective beamformers at the multiple IRSs, subject to the minimum signal-to-interference-plus-noise ratio (SINR) constraints at individual CUs, the maximum transmit power constraint at the BS, and the unit-modulus constraints at the multiple IRSs.
This paper investigates an intelligent reflecting surface (IRS) enabled multiuser integrated sensing and communications (ISAC) system, which consists of one multi-antenna base station (BS), one IRS, multiple single-antenna communication users (CUs), and one target at the non-line-of-sight (NLoS) region of the BS. The IRS is deployed to not only assist the communication from the BS to the CUs, but also enable the BS's NLoS target sensing based on the echo signals from the BS-IRS-target-IRS-BS link. We consider two types of targets, namely the extended and point targets, for which the BS aims to estimate the complete target response matrix and the target direction-of-arrival (DoA) with respect to the IRS, respectively. To provide full degrees of freedom for sensing, we consider that the BS sends dedicated sensing signals in addition to the communication signals. Accordingly, we model two types of CU receivers, namely Type-I and Type-II CU receivers, which do not have and have the capability of canceling the interference from the sensing signals, respectively. Under each setup, we jointly optimize the transmit beamforming at the BS and the reflective beamforming at the IRS to minimize the Cram\'er-Rao bound (CRB) for target estimation, subject to the minimum signal-to-interference-plus-noise ratio (SINR) constraints at the CUs and the maximum transmit power constraint at the BS. We present efficient algorithms to solve the highly non-convex SINR-constrained CRB minimization problems, by using the techniques of alternating optimization, semi-definite relaxation, and successive convex approximation. Numerical results show that the proposed design achieves lower estimation CRB than other benchmark schemes, and the sensing signal interference cancellation at Type-II CU receivers is beneficial when the number of CUs is greater than one.
This paper investigates the energy efficiency of a multiple-input multiple-output (MIMO) integrated sensing and communications (ISAC) system, in which one multi-antenna base station (BS) transmits unified ISAC signals to a multi-antenna communication user (CU) and at the same time use the echo signals to estimate an extended target. We focus on one particular ISAC transmission block and take into account the practical on-off non-transmission power at the BS. Under this setup, we minimize the energy consumption at the BS while ensuring a minimum average data rate requirement for communication and a maximum Cram\'er-Rao bound (CRB) requirement for target estimation, by jointly optimizing the transmit covariance matrix and the ``on'' duration for active transmission. We obtain the optimal solution to the rate-and-CRB-constrained energy minimization problem in a semi-closed form. Interestingly, the obtained optimal solution is shown to unify the spectrum-efficient and energy-efficient communications and sensing designs. In particular, for the special MIMO sensing case with rate constraint inactive, the optimal solution follows the isotropic transmission with shortest ``on'' duration, in which the BS radiates the required sensing energy by using sufficiently high power over the shortest duration. For the general ISAC case, the optimal transmit covariance solution is of full rank and follows the eigenmode transmission based on the communication channel, while the optimal ``on'' duration is determined based on both the rate and CRB constraints. Numerical results show that the proposed ISAC design achieves significantly reduced energy consumption as compared to the benchmark schemes based on isotropic transmission, always-on transmission, and sensing or communications only designs, especially when the rate and CRB constraints become stringent.
For 3D object manipulation, methods that build an explicit 3D representation perform better than those relying only on camera images. But using explicit 3D representations like voxels comes at large computing cost, adversely affecting scalability. In this work, we propose RVT, a multi-view transformer for 3D manipulation that is both scalable and accurate. Some key features of RVT are an attention mechanism to aggregate information across views and re-rendering of the camera input from virtual views around the robot workspace. In simulations, we find that a single RVT model works well across 18 RLBench tasks with 249 task variations, achieving 26% higher relative success than the existing state-of-the-art method (PerAct). It also trains 36X faster than PerAct for achieving the same performance and achieves 2.3X the inference speed of PerAct. Further, RVT can perform a variety of manipulation tasks in the real world with just a few ($\sim$10) demonstrations per task. Visual results, code, and trained model are provided at https://robotic-view-transformer.github.io/.
Future sixth-generation (6G) networks are envisioned to provide both sensing and communications functionalities by using densely deployed base stations (BSs) with massive antennas operating in millimeter wave (mmWave) and terahertz (THz). Due to the large number of antennas and the high frequency band, the sensing and communications will operate within the near-field region, thus making the conventional designs based on the far-field channel models inapplicable. This paper studies a near-field multiple-input-multiple-output (MIMO) radar sensing system, in which the transceivers with massive antennas aim to localize multiple near-field targets in the three-dimensional (3D) space. In particular, we adopt a general wavefront propagation model by considering the exact spherical wavefront with both channel phase and amplitude variations over different antennas. Besides, we consider the general transmit signal waveforms and also consider the unknown cluttered environments. Under this setup, the unknown parameters to estimate include the 3D coordinates and the complex reflection coefficients of the multiple targets, as well as the noise and interference covariance matrix. Accordingly, we derive the Cram\'er-Rao bound (CRB) for estimating the target coordinates and reflection coefficients. Next, to facilitate practical localization, we propose an efficient estimator based on the 3D approximate cyclic optimization (3D-ACO), which is obtained following the maximum likelihood (ML) criterion. Finally, numerical results show that considering the exact antenna-varying channel amplitudes achieves more accurate CRB as compared to prior works based on constant channel amplitudes across antennas, especially when the targets are close to the transceivers. It is also shown that the proposed estimator achieves localization performance close to the derived CRB, thus validating its superior performance.
This correspondence studies the wireless powered over-the-air computation (AirComp) for achieving sustainable wireless data aggregation (WDA) by integrating AirComp and wireless power transfer (WPT) into a joint design. In particular, we consider that a multi-antenna hybrid access point (HAP) employs the transmit energy beamforming to charge multiple single-antenna low-power wireless devices (WDs) in the downlink, and the WDs use the harvested energy to simultaneously send their messages to the HAP for AirComp in the uplink. Under this setup, we minimize the computation mean square error (MSE), by jointly optimizing the transmit energy beamforming and the receive AirComp beamforming at the HAP, as well as the transmit power at the WDs, subject to the maximum transmit power constraint at the HAP and the wireless energy harvesting constraints at individual WDs. To tackle the non-convex computation MSE minimization problem, we present an efficient algorithm to find a converged high-quality solution by using the alternating optimization technique. Numerical results show that the proposed joint WPT-AirComp approach significantly reduces the computation MSE, as compared to other benchmark schemes.
Few-shot text classification has recently been promoted by the meta-learning paradigm which aims to identify target classes with knowledge transferred from source classes with sets of small tasks named episodes. Despite their success, existing works building their meta-learner based on Prototypical Networks are unsatisfactory in learning discriminative text representations between similar classes, which may lead to contradictions during label prediction. In addition, the tasklevel and instance-level overfitting problems in few-shot text classification caused by a few training examples are not sufficiently tackled. In this work, we propose a contrastive learning framework named ContrastNet to tackle both discriminative representation and overfitting problems in few-shot text classification. ContrastNet learns to pull closer text representations belonging to the same class and push away text representations belonging to different classes, while simultaneously introducing unsupervised contrastive regularization at both task-level and instance-level to prevent overfitting. Experiments on 8 few-shot text classification datasets show that ContrastNet outperforms the current state-of-the-art models.
METHODS: First, a set of evaluation criteria is designed based on a comprehensive literature review. Second, existing candidate criteria are optimized for using a Delphi method by five experts in medicine and engineering. Third, three clinical experts design a set of medical datasets to interact with LLMs. Finally, benchmarking experiments are conducted on the datasets. The responses generated by chatbots based on LLMs are recorded for blind evaluations by five licensed medical experts. RESULTS: The obtained evaluation criteria cover medical professional capabilities, social comprehensive capabilities, contextual capabilities, and computational robustness, with sixteen detailed indicators. The medical datasets include twenty-seven medical dialogues and seven case reports in Chinese. Three chatbots are evaluated, ChatGPT by OpenAI, ERNIE Bot by Baidu Inc., and Doctor PuJiang (Dr. PJ) by Shanghai Artificial Intelligence Laboratory. Experimental results show that Dr. PJ outperforms ChatGPT and ERNIE Bot in both multiple-turn medical dialogue and case report scenarios.
Artificial intelligence (AI) has demonstrated the ability to extract insights from data, but the issue of fairness remains a concern in high-stakes fields such as healthcare. Despite extensive discussion and efforts in algorithm development, AI fairness and clinical concerns have not been adequately addressed. In this paper, we discuss the misalignment between technical and clinical perspectives of AI fairness, highlight the barriers to AI fairness' translation to healthcare, advocate multidisciplinary collaboration to bridge the knowledge gap, and provide possible solutions to address the clinical concerns pertaining to AI fairness.