Abstract:This paper investigates the transmission of three-dimensional (3D) human face content for immersive communication over a rate-constrained transmitter-receiver link. We propose a new framework named NeRF-SeCom, which leverages neural radiance fields (NeRF) and semantic communications to improve the quality of 3D visualizations while minimizing the communication overhead. In the NeRF-SeCom framework, we first train a NeRF face model based on the NeRFBlendShape method, which is pre-shared between the transmitter and receiver as the semantic knowledge base to facilitate the real-time transmission. Next, with knowledge base, the transmitter extracts and sends only the essential semantic features for the receiver to reconstruct 3D face in real time. To optimize the transmission efficiency, we classify the expression features into static and dynamic types. Over each video chunk, static features are transmitted once for all frames, whereas dynamic features are transmitted over a portion of frames to adhere to rate constraints. Additionally, we propose a feature prediction mechanism, which allows the receiver to predict the dynamic features for frames that are not transmitted. Experiments show that our proposed NeRF-SeCom framework significantly outperforms benchmark methods in delivering high-quality 3D visualizations of human faces.
Abstract:Robotic assembly for high-mixture settings requires adaptivity to diverse parts and poses, which is an open challenge. Meanwhile, in other areas of robotics, large models and sim-to-real have led to tremendous progress. Inspired by such work, we present AutoMate, a learning framework and system that consists of 4 parts: 1) a dataset of 100 assemblies compatible with simulation and the real world, along with parallelized simulation environments for policy learning, 2) a novel simulation-based approach for learning specialist (i.e., part-specific) policies and generalist (i.e., unified) assembly policies, 3) demonstrations of specialist policies that individually solve 80 assemblies with 80% or higher success rates in simulation, as well as a generalist policy that jointly solves 20 assemblies with an 80%+ success rate, and 4) zero-shot sim-to-real transfer that achieves similar (or better) performance than simulation, including on perception-initialized assembly. The key methodological takeaway is that a union of diverse algorithms from manufacturing engineering, character animation, and time-series analysis provides a generic and robust solution for a diverse range of robotic assembly problems.To our knowledge, AutoMate provides the first simulation-based framework for learning specialist and generalist policies over a wide range of assemblies, as well as the first system demonstrating zero-shot sim-to-real transfer over such a range.
Abstract:LiDAR odometry is a pivotal technology in the fields of autonomous driving and autonomous mobile robotics. However, most of the current works focus on nonlinear optimization methods, and still existing many challenges in using the traditional Iterative Extended Kalman Filter (IEKF) framework to tackle the problem: IEKF only iterates over the observation equation, relying on a rough estimate of the initial state, which is insufficient to fully eliminate motion distortion in the input point cloud; the system process noise is difficult to be determined during state estimation of the complex motions; and the varying motion models across different sensor carriers. To address these issues, we propose the Dual-Iteration Extended Kalman Filter (I2EKF) and the LiDAR odometry based on I2EKF (I2EKF-LO). This approach not only iterates over the observation equation but also leverages state updates to iteratively mitigate motion distortion in LiDAR point clouds. Moreover, it dynamically adjusts process noise based on the confidence level of prior predictions during state estimation and establishes motion models for different sensor carriers to achieve accurate and efficient state estimation. Comprehensive experiments demonstrate that I2EKF-LO achieves outstanding levels of accuracy and computational efficiency in the realm of LiDAR odometry. Additionally, to foster community development, our code is open-sourced.https://github.com/YWL0720/I2EKF-LO.
Abstract:This paper studies the multi-intelligent reflecting surface (IRS)-assisted cooperative sensing, in which multiple active IRSs are deployed in a distributed manner to facilitate multi-view target sensing at the non-line-of-sight (NLoS) area of the base station (BS). Different from prior works employing passive IRSs, we leverage active IRSs with the capability of amplifying the reflected signals to overcome the severe multi-hop-reflection path loss in NLoS sensing. In particular, we consider two sensing setups without and with dedicated sensors equipped at active IRSs. In the first case without dedicated sensors at IRSs, we investigate the cooperative sensing at the BS, where the target's direction-of-arrival (DoA) with respect to each IRS is estimated based on the echo signals received at the BS. In the other case with dedicated sensors at IRSs, we consider that each IRS is able to receive echo signals and estimate the target's DoA with respect to itself. For both sensing setups, we first derive the closed-form Cram\'{e}r-Rao bound (CRB) for estimating target DoA. Then, the (maximum) CRB is minimized by jointly optimizing the transmit beamforming at the BS and the reflective beamforming at the multiple IRSs, subject to the constraints on the maximum transmit power at the BS, as well as the maximum amplification power and the maximum power amplification gain constraints at individual active IRSs. To tackle the resulting highly non-convex (max-)CRB minimization problems, we propose two efficient algorithms to obtain high-quality solutions for the two cases with sensing at the BS and at the IRSs, respectively, based on alternating optimization, successive convex approximation, and semi-definite relaxation.
Abstract:In this work, we study how to build a robotic system that can solve multiple 3D manipulation tasks given language instructions. To be useful in industrial and household domains, such a system should be capable of learning new tasks with few demonstrations and solving them precisely. Prior works, like PerAct and RVT, have studied this problem, however, they often struggle with tasks requiring high precision. We study how to make them more effective, precise, and fast. Using a combination of architectural and system-level improvements, we propose RVT-2, a multitask 3D manipulation model that is 6X faster in training and 2X faster in inference than its predecessor RVT. RVT-2 achieves a new state-of-the-art on RLBench, improving the success rate from 65% to 82%. RVT-2 is also effective in the real world, where it can learn tasks requiring high precision, like picking up and inserting plugs, with just 10 demonstrations. Visual results, code, and trained model are provided at: https://robotic-view-transformer-2.github.io/.
Abstract:Learning temporal dependencies among targets (TDT) benefits better time series forecasting, where targets refer to the predicted sequence. Although autoregressive methods model TDT recursively, they suffer from inefficient inference and error accumulation. We argue that integrating TDT learning into non-autoregressive methods is essential for pursuing effective and efficient time series forecasting. In this study, we introduce the differencing approach to represent TDT and propose a parameter-free and plug-and-play solution through an optimization objective, namely TDT Loss. It leverages the proportion of inconsistent signs between predicted and ground truth TDT as an adaptive weight, dynamically balancing target prediction and fine-grained TDT fitting. Importantly, TDT Loss incurs negligible additional cost, with only $\mathcal{O}(n)$ increased computation and $\mathcal{O}(1)$ memory requirements, while significantly enhancing the predictive performance of non-autoregressive models. To assess the effectiveness of TDT loss, we conduct extensive experiments on 7 widely used datasets. The experimental results of plugging TDT loss into 6 state-of-the-art methods show that out of the 168 experiments, 75.00\% and 94.05\% exhibit improvements in terms of MSE and MAE with the maximum 24.56\% and 16.31\%, respectively.
Abstract:The ethical integration of Artificial Intelligence (AI) in healthcare necessitates addressing fairness-a concept that is highly context-specific across medical fields. Extensive studies have been conducted to expand the technical components of AI fairness, while tremendous calls for AI fairness have been raised from healthcare. Despite this, a significant disconnect persists between technical advancements and their practical clinical applications, resulting in a lack of contextualized discussion of AI fairness in clinical settings. Through a detailed evidence gap analysis, our review systematically pinpoints several deficiencies concerning both healthcare data and the provided AI fairness solutions. We highlight the scarcity of research on AI fairness in many medical domains where AI technology is increasingly utilized. Additionally, our analysis highlights a substantial reliance on group fairness, aiming to ensure equality among demographic groups from a macro healthcare system perspective; in contrast, individual fairness, focusing on equity at a more granular level, is frequently overlooked. To bridge these gaps, our review advances actionable strategies for both the healthcare and AI research communities. Beyond applying existing AI fairness methods in healthcare, we further emphasize the importance of involving healthcare professionals to refine AI fairness concepts and methods to ensure contextually relevant and ethically sound AI applications in healthcare.
Abstract:Model-Free Reinforcement Learning~(MFRL), leveraging the policy gradient theorem, has demonstrated considerable success in continuous control tasks. However, these approaches are plagued by high gradient variance due to zeroth-order gradient estimation, resulting in suboptimal policies. Conversely, First-Order Model-Based Reinforcement Learning~(FO-MBRL) methods, employing differentiable simulation, provide gradients with reduced variance but are susceptible to sampling error in scenarios involving stiff dynamics, such as physical contact. This paper investigates the source of this error and introduces Adaptive Horizon Actor-Critic (AHAC), an FO-MBRL algorithm that reduces gradient error by adapting the model-based horizon to avoid stiff dynamics. Empirical findings reveal that AHAC outperforms MFRL baselines, attaining 40\% more reward across a set of locomotion tasks, and efficiently scaling to high-dimensional control environments with improved wall-clock-time efficiency.
Abstract:In recent years, various intelligent autonomous robots have begun to appear in daily life and production. Desktop-level robots are characterized by their flexible deployment, rapid response, and suitability for light workload environments. In order to meet the current societal demand for service robot technology, this study proposes using a miniaturized desktop-level robot (by ROS) as a carrier, locally deploying a natural language model (NLP-BERT), and integrating visual recognition (CV-YOLO) and speech recognition technology (ASR-Whisper) as inputs to achieve autonomous decision-making and rational action by the desktop robot. Three comprehensive experiments were designed to validate the robotic arm, and the results demonstrate excellent performance using this approach across all three experiments. In Task 1, the execution rates for speech recognition and action performance were 92.6% and 84.3%, respectively. In Task 2, the highest execution rates under the given conditions reached 92.1% and 84.6%, while in Task 3, the highest execution rates were 95.2% and 80.8%, respectively. Therefore, it can be concluded that the proposed solution integrating ASR, NLP, and other technologies on edge devices is feasible and provides a technical and engineering foundation for realizing multimodal desktop-level robots.
Abstract:This paper investigates secure communications in a near-field multi-functional integrated sensing, communication, and powering (ISCAP) system with an extremely large-scale antenna arrays (ELAA) equipped at the base station (BS). In this system, the BS sends confidential messages to a single communication user (CU), and at the same time wirelessly senses a point target and charges multiple energy receivers (ERs). It is assumed that the ERs and the sensing target are potential eavesdroppers that may attempt to intercept the confidential messages intended for the CU. We consider the joint transmit beamforming design to support secure communications while ensuring the sensing and powering requirements. In particular, the BS transmits dedicated sensing/energy beams in addition to the information beam, which also play the role of artificial noise (AN) for effectively jamming potential eavesdroppers. Building upon this, we maximize the secrecy rate at the CU, subject to the maximum \ac{crb} constraints for target sensing and the minimum harvested energy constraints for the ERs. Although the formulated joint beamforming problem is non-convex and challenging to solve, we acquire the optimal solution via the semi-definite relaxation (SDR) and fractional programming techniques together with a one-dimensional (1D) search. Subsequently, we present two alternative designs based on zero-forcing (ZF) beamforming and maximum ratio transmission (MRT), respectively. Finally, our numerical results show that our proposed approaches exploit both the distance-domain resolution of near-field ELAA and the joint beamforming design for enhancing secure communication performance while ensuring the sensing and powering requirements in ISCAP, especially when the CU and the target and ER eavesdroppers are located at the same angle (but different distances) with respect to the BS.