Recently, incorporating natural language instructions into reinforcement learning (RL) to learn semantically meaningful representations and foster generalization has caught many concerns. However, the semantical information in language instructions is usually entangled with task-specific state information, which hampers the learning of semantically invariant and reusable representations. In this paper, we propose a method to learn such representations called element randomization, which extracts task-relevant but environment-agnostic semantics from instructions using a set of environments with randomized elements, e.g., topological structures or textures, yet the same language instruction. We theoretically prove the feasibility of learning semantically invariant representations through randomization. In practice, we accordingly develop a hierarchy of policies, where a high-level policy is designed to modulate the behavior of a goal-conditioned low-level policy by proposing subgoals as semantically invariant representations. Experiments on challenging long-horizon tasks show that (1) our low-level policy reliably generalizes to tasks against environment changes; (2) our hierarchical policy exhibits extensible generalization in unseen new tasks that can be decomposed into several solvable sub-tasks; and (3) by storing and replaying language trajectories as succinct policy representations, the agent can complete tasks in a one-shot fashion, i.e., once one successful trajectory has been attained.
We present a convolutional-recurrent neural network architecture with long short-term memory for real-time processing and classification of digital sensor data. The network implicitly performs typical signal processing tasks such as filtering and peak detection, and learns time-resolved embeddings of the input signal. We use a prototype multi-sensor wearable device to collect over 180h of photoplethysmography (PPG) data sampled at 20Hz, of which 36h are during atrial fibrillation (AFib). We use end-to-end learning to achieve state-of-the-art results in detecting AFib from raw PPG data. For classification labels output every 0.8s, we demonstrate an area under ROC curve of 0.9999, with false positive and false negative rates both below $2\times 10^{-3}$. This constitutes a significant improvement on previous results utilising domain-specific feature engineering, such as heart rate extraction, and brings large-scale atrial fibrillation screenings within imminent reach.