Contextual bandits constitute a classical framework for decision-making under uncertainty. In this setting, the goal is to learn the arms of highest reward subject to contextual information, while the unknown reward parameters of each arm need to be learned by experimenting that specific arm. Accordingly, a fundamental problem is that of balancing exploration (i.e., pulling different arms to learn their parameters), versus exploitation (i.e., pulling the best arms to gain reward). To study this problem, the existing literature mostly considers perfectly observed contexts. However, the setting of partial context observations remains unexplored to date, despite being theoretically more general and practically more versatile. We study bandit policies for learning to select optimal arms based on the data of observations, which are noisy linear functions of the unobserved context vectors. Our theoretical analysis shows that the Thompson sampling policy successfully balances exploration and exploitation. Specifically, we establish the followings: (i) regret bounds that grow poly-logarithmically with time, (ii) square-root consistency of parameter estimation, and (iii) scaling of the regret with other quantities including dimensions and number of arms. Extensive numerical experiments with both real and synthetic data are presented as well, corroborating the efficacy of Thompson sampling. To establish the results, we introduce novel martingale techniques and concentration inequalities to address partially observed dependent random variables generated from unspecified distributions, and also leverage problem-dependent information to sharpen probabilistic bounds for time-varying suboptimality gaps. These techniques pave the road towards studying other decision-making problems with contextual information as well as partial observations.
Spiking Neural Networks (SNNs) offer a promising solution to achieve ultra low-power/energy computation for solving machine learning tasks. Currently, most of the SNN architectures are derived from Artificial Neural Networks whose neurons' architectures and operations are different from SNNs, or developed without considering memory budgets from the underlying processing hardware. These limitations hinder the SNNs from reaching their full potential in accuracy and efficiency. Towards this, we propose SpikeNAS, a novel memory-aware neural architecture search (NAS) framework for SNNs that can quickly find an appropriate SNN architecture with high accuracy under the given memory budgets. To do this, our SpikeNAS employs several key steps: analyzing the impacts of network operations on the accuracy, enhancing the network architecture to improve the learning quality, and developing a fast memory-aware search algorithm. The experimental results show that our SpikeNAS improves the searching time and maintains high accuracy as compared to state-of-the-art while meeting the given memory budgets (e.g., 4.4x faster search with 1.3% accuracy improvement for CIFAR100, using an Nvidia RTX 6000 Ada GPU machine), thereby quickly providing the appropriate SNN architecture for memory-constrained SNN-based systems.
Temporal action abstractions, along with belief state representations, are a powerful knowledge sharing mechanism for sequential decision making. In this work, we propose a novel view that treats inducing temporal action abstractions as a sequence compression problem. To do so, we bring a subtle but critical component of LLM training pipelines -- input tokenization via byte pair encoding (BPE) -- to the seemingly distant task of learning skills of variable time span in continuous control domains. We introduce an approach called Primitive Sequence Encoding (PRISE) that combines continuous action quantization with BPE to learn powerful action abstractions. We empirically show that high-level skills discovered by PRISE from a multitask set of robotic manipulation demonstrations significantly boost the performance of both multitask imitation learning as well as few-shot imitation learning on unseen tasks. Our code will be released at https://github.com/FrankZheng2022/PRISE.
We propose a neural network-based real-time volume rendering method for realistic and efficient rendering of volumetric media. The traditional volume rendering method uses path tracing to solve the radiation transfer equation, which requires a huge amount of calculation and cannot achieve real-time rendering. Therefore, this paper uses neural networks to simulate the iterative integration process of solving the radiative transfer equation to speed up the volume rendering of volume media. Specifically, the paper first performs data processing on the volume medium to generate a variety of sampling features, including density features, transmittance features and phase features. The hierarchical transmittance fields are fed into a 3D-CNN network to compute more important transmittance features. Secondly, the diffuse reflection sampling template and the highlight sampling template are used to layer the three types of sampling features into the network. This method can pay more attention to light scattering, highlights and shadows, and then select important channel features through the attention module. Finally, the scattering distribution of the center points of all sampling templates is predicted through the backbone neural network. This method can achieve realistic volumetric media rendering effects and greatly increase the rendering speed while maintaining rendering quality, which is of great significance for real-time rendering applications. Experimental results indicate that our method outperforms previous methods.
In day-to-day communication, people often approximate the truth - for example, rounding the time or omitting details - in order to be maximally helpful to the listener. How do large language models (LLMs) handle such nuanced trade-offs? To address this question, we use psychological models and experiments designed to characterize human behavior to analyze LLMs. We test a range of LLMs and explore how optimization for human preferences or inference-time reasoning affects these trade-offs. We find that reinforcement learning from human feedback improves both honesty and helpfulness, while chain-of-thought prompting skews LLMs towards helpfulness over honesty. Finally, GPT-4 Turbo demonstrates human-like response patterns including sensitivity to the conversational framing and listener's decision context. Our findings reveal the conversational values internalized by LLMs and suggest that even these abstract values can, to a degree, be steered by zero-shot prompting.
Within the field of Humanities, there is a recognized need for educational innovation, as there are currently no reported tools available that enable individuals to interact with their environment to create an enhanced learning experience in the humanities (e.g., immersive spaces). This project proposes a solution to address this gap by integrating technology and promoting the development of teaching methodologies in the humanities, specifically by incorporating emotional monitoring during the learning process of humanistic context inside an immersive space. In order to achieve this goal, a real-time emotion detection EEG-based system was developed to interpret and classify specific emotions. These emotions aligned with the early proposal by Descartes (Passions), including admiration, love, hate, desire, joy, and sadness. This system aims to integrate emotional data into the Neurohumanities Lab interactive platform, creating a comprehensive and immersive learning environment. This work developed a ML, real-time emotion detection model that provided Valence, Arousal, and Dominance (VAD) estimations every 5 seconds. Using PCA, PSD, RF, and Extra-Trees, the best 8 channels and their respective best band powers were extracted; furthermore, multiple models were evaluated using shift-based data division and cross-validations. After assessing their performance, Extra-Trees achieved a general accuracy of 96%, higher than the reported in the literature (88% accuracy). The proposed model provided real-time predictions of VAD variables and was adapted to classify Descartes' six main passions. However, with the VAD values obtained, more than 15 emotions can be classified (reported in the VAD emotion mapping) and extend the range of this application.
Electronic health record (EHR) is more and more popular, and it comes with applying machine learning solutions to resolve various problems in the domain. This growing research area also raises the need for EHRs accessibility. Medical Information Mart for Intensive Care (MIMIC) dataset is a popular, public, and free EHR dataset in a raw format that has been used in numerous studies. However, despite of its popularity, it is lacking benchmarking work, especially with recent state of the art works in the field of deep learning with time-series tabular data. The aim of this work is to fill this lack by providing a benchmark for latest version of MIMIC dataset, MIMIC-IV. We also give a detailed literature survey about studies that has been already done for MIIMIC-III.
We propose Diffusion Inference-Time T-Optimization (DITTO), a general-purpose frame-work for controlling pre-trained text-to-music diffusion models at inference-time via optimizing initial noise latents. Our method can be used to optimize through any differentiable feature matching loss to achieve a target (stylized) output and leverages gradient checkpointing for memory efficiency. We demonstrate a surprisingly wide-range of applications for music generation including inpainting, outpainting, and looping as well as intensity, melody, and musical structure control - all without ever fine-tuning the underlying model. When we compare our approach against related training, guidance, and optimization-based methods, we find DITTO achieves state-of-the-art performance on nearly all tasks, including outperforming comparable approaches on controllability, audio quality, and computational efficiency, thus opening the door for high-quality, flexible, training-free control of diffusion models. Sound examples can be found at https://DITTO-Music.github.io/web/.
Analyzing facial features and expressions is a complex task in computer vision. The human face is intricate, with significant shape, texture, and appearance variations. In medical contexts, facial structures that differ from the norm, such as those affected by paralysis, are particularly important to study and require precise analysis. One area of interest is the subtle movements involved in blinking, a process that is not yet fully understood and needs high-resolution, time-specific analysis for detailed understanding. However, a significant challenge is that many advanced computer vision techniques demand programming skills, making them less accessible to medical professionals who may not have these skills. The Jena Facial Palsy Toolbox (JeFaPaTo) has been developed to bridge this gap. It utilizes cutting-edge computer vision algorithms and offers a user-friendly interface for those without programming expertise. This toolbox is designed to make advanced facial analysis more accessible to medical experts, simplifying integration into their workflow. The state of the eye closure is of high interest to medical experts, e.g., in the context of facial palsy or Parkinson's disease. Due to facial nerve damage, the eye-closing process might be impaired and could lead to many undesirable side effects. Hence, more than a simple distinction between open and closed eyes is required for a detailed analysis. Factors such as duration, synchronicity, velocity, complete closure, the time between blinks, and frequency over time are highly relevant. Such detailed analysis could help medical experts better understand the blinking process, its deviations, and possible treatments for better eye care.
Deep learning offers a promising solution to improve spectrum access techniques by utilizing data-driven approaches to manage and share limited spectrum resources for emerging applications. For several of these applications, the sensitive wireless data (such as spectrograms) are stored in a shared database or multistakeholder cloud environment and are therefore prone to privacy leaks. This paper aims to address such privacy concerns by examining the representative case study of shared database scenarios in 5G Open Radio Access Network (O-RAN) networks where we have a shared database within the near-real-time (near-RT) RAN intelligent controller. We focus on securing the data that can be used by machine learning (ML) models for spectrum sharing and interference mitigation applications without compromising the model and network performances. The underlying idea is to leverage a (i) Shuffling-based learnable encryption technique to encrypt the data, following which, (ii) employ a custom Vision transformer (ViT) as the trained ML model that is capable of performing accurate inferences on such encrypted data. The paper offers a thorough analysis and comparisons with analogous convolutional neural networks (CNN) as well as deeper architectures (such as ResNet-50) as baselines. Our experiments showcase that the proposed approach significantly outperforms the baseline CNN with an improvement of 24.5% and 23.9% for the percent accuracy and F1-Score respectively when operated on encrypted data. Though deeper ResNet-50 architecture is obtained as a slightly more accurate model, with an increase of 4.4%, the proposed approach boasts a reduction of parameters by 99.32%, and thus, offers a much-improved prediction time by nearly 60%.