In healthcare, thanks to many model agnostic methods, explainability of the prediction scores made by deep learning applications has improved. However, we note that for daily or hourly risk of deterioration prediction of in-hospital patients, not only the predicted risk probability score matters, but also the variance of the risk scores play key roles in aiding clinical decision making. In this paper, we propose to use delta's method to approximate variance of prediction deterministically, such that the SHAP method can be adopted to attribute contribution of variance. The prediction variance is estimated by sampling the conditional hidden space in variational models and is propagated to input clinical variables based on Shapley values of the variance game. This approach works with variational time series models such as variational recurrent neural networks and variational transformers. We further argue that variational time series models are perfect fits for achieving a balance between predictive power and explainability through a series of experiments on a public clinical ICU datasets. Since SHAP values are additive, we also postulate that the SHAP importance of clinical variables with respect to prediction variations can guide their frequency of measurements.
The fifth generation (5G) cellular network technology is mature and increasingly utilized in many industrial and robotics applications, while an important functionality is the advanced Quality of Service (QoS) features. Despite the prevalence of 5G QoS discussions in the related literature, there is a notable absence of real-life implementations and studies concerning their application in time-critical robotics scenarios. This article considers the operation of time-critical applications for 5G-enabled unmanned aerial vehicles (UAVs) and how their operation can be improved by the possibility to dynamically switch between QoS data flows with different priorities. As such, we introduce a robotics oriented analysis on the impact of the 5G QoS functionality on the performance of 5G-enabled UAVs. Furthermore, we introduce a novel framework for the dynamic selection of distinct 5G QoS data flows that is autonomously managed by the 5G-enabled UAV. This problem is addressed in a novel feedback loop fashion utilizing a probabilistic finite state machine (PFSM). Finally, the efficacy of the proposed scheme is experimentally validated with a 5G-enabled UAV in a real-world 5G stand-alone (SA) network.
A simultaneous transmitting and reflecting reconfigurable intelligent surface (STAR-RIS) aided integrated sensing and communication (ISAC) dual-secure communication system is studied in this paper. The sensed target and legitimate users (LUs) are situated on the opposite sides of the STAR-RIS, and the energy splitting and time switching protocols are applied in the STAR-RIS, respectively. The long-term average security rate for LUs is maximized by the joint design of the base station (BS) transmit beamforming and receive filter, along with the STAR-RIS transmitting and reflecting coefficients, under guarantying the echo signal-to-noise ratio thresholds and rate constraints for the LUs. Since the channel information changes over time, conventional convex optimization techniques cannot provide the optimal performance for the system, and result in excessively high computational complexity in the exploration of the long-term gains for the system. Taking continuity control decisions into account, the deep deterministic policy gradient and soft actor-critic algorithms based on off-policy are applied to address the complex non-convex problem. Simulation results comprehensively evaluate the performance of the proposed two reinforcement learning algorithms and demonstrate that STAR-RIS is remarkably better than the two benchmarks in the ISAC system.
Math word problems are critical K-8 educational tools, but writing them is time-consuming and requires domain expertise. We suggest that language models can support K-8 math education by automatically generating problems at scale. To be educational, generated problems must be 1) solvable, 2) accurate, and 3) appropriate. Existing datasets are unlabeled for these criteria, making them ill-suited for training problem generators. We introduce MATHWELL, a Llama-2 (70B) model iteratively finetuned to generate K-8 math word problems using data from expert annotation. Using MATHWELL, we generate the largest English word problem dataset with Program of Thought (PoT) rationales to date, containing 20,490 problems. 3,484 are scored by domain experts who find MATHWELL has a 40% higher share of problems that have executable solutions and meet all criteria than alternatives, with 74% of its problems with executable solutions being solvable, accurate, and appropriate. We release our model, data, and annotations.
Public LLMs such as the Llama 2-Chat have driven huge activity in LLM research. These models underwent alignment training and were considered safe. Recently Qi et al. (2023) reported that even benign fine-tuning (e.g., on seemingly safe datasets) can give rise to unsafe behaviors in the models. The current paper is about methods and best practices to mitigate such loss of alignment. Through extensive experiments on several chat models (Meta's Llama 2-Chat, Mistral AI's Mistral 7B Instruct v0.2, and OpenAI's GPT-3.5 Turbo), this paper uncovers that the prompt templates used during fine-tuning and inference play a crucial role in preserving safety alignment, and proposes the "Pure Tuning, Safe Testing" (PTST) principle -- fine-tune models without a safety prompt, but include it at test time. Fine-tuning experiments on GSM8K, ChatDoctor, and OpenOrca show that PTST significantly reduces the rise of unsafe behaviors, and even almost eliminates them in some cases.
In the rapidly evolving area of image synthesis, a serious challenge is the presence of complex artifacts that compromise perceptual realism of synthetic images. To alleviate artifacts and improve quality of synthetic images, we fine-tune Vision-Language Model (VLM) as artifact classifier to automatically identify and classify a wide range of artifacts and provide supervision for further optimizing generative models. Specifically, we develop a comprehensive artifact taxonomy and construct a dataset of synthetic images with artifact annotations for fine-tuning VLM, named SynArtifact-1K. The fine-tuned VLM exhibits superior ability of identifying artifacts and outperforms the baseline by 25.66%. To our knowledge, this is the first time such end-to-end artifact classification task and solution have been proposed. Finally, we leverage the output of VLM as feedback to refine the generative model for alleviating artifacts. Visualization results and user study demonstrate that the quality of images synthesized by the refined diffusion model has been obviously improved.
This paper considers the problem of generative novel view synthesis (GNVS), generating novel, plausible views of a scene given a limited number of known views. Here, we propose a set-based generative model that can simultaneously generate multiple, self-consistent new views, conditioned on any number of known views. Our approach is not limited to generating a single image at a time and can condition on zero, one, or more views. As a result, when generating a large number of views, our method is not restricted to a low-order autoregressive generation approach and is better able to maintain generated image quality over large sets of images. We evaluate the proposed model on standard NVS datasets and show that it outperforms the state-of-the-art image-based GNVS baselines. Further, we show that the model is capable of generating sets of camera views that have no natural sequential ordering, like loops and binocular trajectories, and significantly outperforms other methods on such tasks.
In recent years, exploiting the domain-specific underlying structure of data and its generative factors for representation learning has shown success in various use-case agnostic applications. However, the diversity and complexity of tabular data have made it challenging to represent these structures in a latent space through multi-dimensional vectors. We design an autoencoder-based framework for building general purpose embeddings, we assess the performance of different autoencoder architectures, and show simpler models outperform complex ones in embedding highly complex tabular data. We apply our framework to produce plug-and-play, rich, and anonymized embeddings representing AWS customers for usage in any model, saving up to 45% of development time, and observe significant improvements in downstream models. Moreover, we propose a significant improvement to the calculation of reconstruction loss for multi-layer contractive autoencoders (CAE) by calculating the Jacobian of the entire encoder leading to a 15% improvement in reconstruction quality when compared to a stacked CAE.
Recently, humanoid robots have made significant advances in their ability to perform complex tasks due to the deployment of Reinforcement Learning (RL), however, the inherent complexity of humanoid robots, including the difficulty of planning complex reward functions and training entire complex systems, still poses a notable challenge. To conquer these challenges, after many iterations and in-depth investigations, we have meticulously developed a full-size humanoid robot, ''Adam'', whose innovative structural design greatly improves the efficiency and effectiveness of the imitation learning process. In addition, we have developed a novel imitation learning framework based on an adversarial motion prior, which applies not only to Adam but also to humanoid robots in general. Using the framework, Adam can exhibit unprecedented human-like characteristics in locomotion tasks. Our experimental results demonstrate that the proposed framework enables Adam to achieve human-comparable performance in complex locomotion tasks, marking the first time that human locomotion data has been used for imitation learning in a full-size humanoid robot.
This paper presents a multimodal deep learning framework that utilizes advanced image techniques to improve the performance of clinical analysis heavily dependent on routinely acquired standard images. More specifically, we develop a joint learning network that for the first time leverages the accuracy and reproducibility of myocardial strains obtained from Displacement Encoding with Stimulated Echo (DENSE) to guide the analysis of cine cardiac magnetic resonance (CMR) imaging in late mechanical activation (LMA) detection. An image registration network is utilized to acquire the knowledge of cardiac motions, an important feature estimator of strain values, from standard cine CMRs. Our framework consists of two major components: (i) a DENSE-supervised strain network leveraging latent motion features learned from a registration network to predict myocardial strains; and (ii) a LMA network taking advantage of the predicted strain for effective LMA detection. Experimental results show that our proposed work substantially improves the performance of strain analysis and LMA detection from cine CMR images, aligning more closely with the achievements of DENSE.