We present DeepIA, a deep neural network (DNN) framework for enabling fast and reliable initial access for AI-driven beyond 5G and 6G millimeter (mmWave) networks. DeepIA reduces the beam sweep time compared to a conventional exhaustive search-based IA process by utilizing only a subset of the available beams. DeepIA maps received signal strengths (RSSs) obtained from a subset of beams to the beam that is best oriented to the receiver. In both line of sight (LoS) and non-line of sight (NLoS) conditions, DeepIA reduces the IA time and outperforms the conventional IA's beam prediction accuracy. We show that the beam prediction accuracy of DeepIA saturates with the number of beams used for IA and depends on the particular selection of the beams. In LoS conditions, the selection of the beams is consequential and improves the accuracy by up to 70%. In NLoS situations, it improves accuracy by up to 35%. We find that, averaging multiple RSS snapshots further reduces the number of beams needed and achieves more than 95% accuracy in both LoS and NLoS conditions. Finally, we evaluate the beam prediction time of DeepIA through embedded hardware implementation and show the improvement over the conventional beam sweeping.
Recent theoretical work studies sample-efficient reinforcement learning (RL) extensively in two settings: learning interactively in the environment (online RL), or learning from an offline dataset (offline RL). However, existing algorithms and theories for learning near-optimal policies in these two settings are rather different and disconnected. Towards bridging this gap, this paper initiates the theoretical study of policy finetuning, that is, online RL where the learner has additional access to a "reference policy" $\mu$ close to the optimal policy $\pi_\star$ in a certain sense. We consider the policy finetuning problem in episodic Markov Decision Processes (MDPs) with $S$ states, $A$ actions, and horizon length $H$. We first design a sharp offline reduction algorithm -- which simply executes $\mu$ and runs offline policy optimization on the collected dataset -- that finds an $\varepsilon$ near-optimal policy within $\widetilde{O}(H^3SC^\star/\varepsilon^2)$ episodes, where $C^\star$ is the single-policy concentrability coefficient between $\mu$ and $\pi_\star$. This offline result is the first that matches the sample complexity lower bound in this setting, and resolves a recent open question in offline RL. We then establish an $\Omega(H^3S\min\{C^\star, A\}/\varepsilon^2)$ sample complexity lower bound for any policy finetuning algorithm, including those that can adaptively explore the environment. This implies that -- perhaps surprisingly -- the optimal policy finetuning algorithm is either offline reduction or a purely online RL algorithm that does not use $\mu$. Finally, we design a new hybrid offline/online algorithm for policy finetuning that achieves better sample complexity than both vanilla offline reduction and purely online RL algorithms, in a relaxed setting where $\mu$ only satisfies concentrability partially up to a certain time step.
The LEAP submission for DIHARD-III challenge is described in this paper. The proposed system is composed of a speech bandwidth classifier, and diarization systems fine-tuned for narrowband and wideband speech separately. We use an end-to-end speaker diarization system for the narrowband conversational telephone speech recordings. For the wideband multi-speaker recordings, we use a neural embedding based clustering approach, similar to the baseline system. The embeddings are extracted from a time-delay neural network (called x-vectors) followed by the graph based path integral clustering (PIC) approach. The LEAP system showed 24% and 18% relative improvements for Track-1 and Track-2 respectively over the baseline system provided by the organizers. This paper describes the challenge submission, the post-evaluation analysis and improvements observed on the DIHARD-III dataset.
We consider an online revenue maximization problem over a finite time horizon subject to lower and upper bounds on cost. At each period, an agent receives a context vector sampled i.i.d. from an unknown distribution and needs to make a decision adaptively. The revenue and cost functions depend on the context vector as well as some fixed but possibly unknown parameter vector to be learned. We propose a novel offline benchmark and a new algorithm that mixes an online dual mirror descent scheme with a generic parameter learning process. When the parameter vector is known, we demonstrate an $O(\sqrt{T})$ regret result as well an $O(\sqrt{T})$ bound on the possible constraint violations. When the parameter is not known and must be learned, we demonstrate that the regret and constraint violations are the sums of the previous $O(\sqrt{T})$ terms plus terms that directly depend on the convergence of the learning process.
Full-color imaging is significant in digital pathology. Compared with a grayscale image or a pseudo-color image that only contains the contrast information, it can identify and detect the target object better with color texture information. Fourier ptychographic microscopy (FPM) is a high-throughput computational imaging technique that breaks the tradeoff between high resolution (HR) and large field-of-view (FOV), which eliminates the artifacts of scanning and stitching in digital pathology and improves its imaging efficiency. However, the conventional full-color digital pathology based on FPM is still time-consuming due to the repeated experiments with tri-wavelengths. A color transfer FPM approach, termed CFPM was reported. The color texture information of a low resolution (LR) full-color pathologic image is directly transferred to the HR grayscale FPM image captured by only a single wavelength. The color space of FPM based on the standard CIE-XYZ color model and display based on the standard RGB (sRGB) color space were established. Different FPM colorization schemes were analyzed and compared with thirty different biological samples. The average root-mean-square error (RMSE) of the conventional method and CFPM compared with the ground truth is 5.3% and 5.7%, respectively. Therefore, the acquisition time is significantly reduced by 2/3 with the sacrifice of precision of only 0.4%. And CFPM method is also compatible with advanced fast FPM approaches to reduce computation time further.
The separation assurance task will be extremely challenging for air traffic controllers in a complex and high density airspace environment. Deep reinforcement learning (DRL) was used to develop an autonomous separation assurance framework in our previous work where the learned model advised speed maneuvers. In order to improve the safety of this model in unseen environments with uncertainties, in this work we propose a safety module for DRL in autonomous separation assurance applications. The proposed module directly addresses both model uncertainty and state uncertainty to improve safety. Our safety module consists of two sub-modules: (1) the state safety sub-module is based on the execution-time data augmentation method to introduce state disturbances in the model input state; (2) the model safety sub-module is a Monte-Carlo dropout extension that learns the posterior distribution of the DRL model policy. We demonstrate the effectiveness of the two sub-modules in an open-source air traffic simulator with challenging environment settings. Through extensive numerical experiments, our results show that the proposed sub-safety modules help the DRL agent significantly improve its safety performance in an autonomous separation assurance task.
The distributed inference framework is an emerging technology for real-time applications empowered by cutting-edge deep machine learning (ML) on resource-constrained Internet of things (IoT) devices. In distributed inference, computational tasks are offloaded from the IoT device to other devices or the edge server via lossy IoT networks. However, narrow-band and lossy IoT networks cause non-negligible packet losses and retransmissions, resulting in non-negligible communication latency. This study solves the problem of the incremental retransmission latency caused by packet loss in a lossy IoT network. We propose a split inference with no retransmissions (SI-NR) method that achieves high accuracy without any retransmissions, even when packet loss occurs. In SI-NR, the key idea is to train the ML model by emulating the packet loss by a dropout method, which randomly drops the output of hidden units in a DNN layer. This enables the SI-NR system to obtain robustness against packet losses. Our ML experimental evaluation reveals that SI-NR obtains accurate predictions without packet retransmission at a packet loss rate of 60%.
In this paper, we address the anomaly detection problem where the objective is to find the anomalous processes among a given set of processes. To this end, the decision-making agent probes a subset of processes at every time instant and obtains a potentially erroneous estimate of the binary variable which indicates whether or not the corresponding process is anomalous. The agent continues to probe the processes until it obtains a sufficient number of measurements to reliably identify the anomalous processes. In this context, we develop a sequential selection algorithm that decides which processes to be probed at every instant to detect the anomalies with an accuracy exceeding a desired value while minimizing the delay in making the decision and the total number of measurements taken. Our algorithm is based on active inference which is a general framework to make sequential decisions in order to maximize the notion of free energy. We define the free energy using the objectives of the selection policy and implement the active inference framework using a deep neural network approximation. Using numerical experiments, we compare our algorithm with the state-of-the-art method based on deep actor-critic reinforcement learning and demonstrate the superior performance of our algorithm.
Compared to the conventional metasurface design, machine learning-based methods have recently created an inspiring platform for an inverse realization of the metasurfaces. Here, we have used the Deep Neural Network (DNN) for the generation of desired output unit cell structures in an ultra-wide working frequency band for both TE and TM polarized waves. To automatically generate metasurfaces in a wide range of working frequencies from 4 to 45 GHz, we deliberately design an 8 ring-shaped pattern in such a way that the unit-cells generated in the dataset can produce single or multiple notches in the desired working frequency band. Compared to the general approach, whereby the final metasurface structure may be formed by any randomly distributed "0" and "1", we propose here a restricted output structure. By restricting the output, the number of calculations will be reduced and the learning speed will be increased. Moreover, we have shown that the accuracy of the network reaches 91\%. Obtaining the final unit cell directly without any time-consuming optimization algorithms for both TE and TM polarized waves, and high average accuracy, promises an effective strategy for the metasurface design; thus, the designer is required only to focus on the design goal.
To assess the effectiveness of any medical intervention, researchers must conduct a time-intensive and highly manual literature review. NLP systems can help to automate or assist in parts of this expensive process. In support of this goal, we release MS^2 (Multi-Document Summarization of Medical Studies), a dataset of over 470k documents and 20k summaries derived from the scientific literature. This dataset facilitates the development of systems that can assess and aggregate contradictory evidence across multiple studies, and is the first large-scale, publicly available multi-document summarization dataset in the biomedical domain. We experiment with a summarization system based on BART, with promising early results. We formulate our summarization inputs and targets in both free text and structured forms and modify a recently proposed metric to assess the quality of our system's generated summaries. Data and models are available at https://github.com/allenai/ms2