Urinalysis is a standard diagnostic test to detect urinary system related problems. The automation of urinalysis will reduce the overall diagnostic time. Recent studies used urine microscopic datasets for designing deep learning based algorithms to classify and detect urine cells. But these datasets are not publicly available for further research. To alleviate the need for urine datsets, we prepare our urine sediment microscopic image (UMID) dataset comprising of around 3700 cell annotations and 3 categories of cells namely RBC, pus and epithelial cells. We discuss the several challenges involved in preparing the dataset and the annotations. We make the dataset publicly available.
Despite the rapid progress of neuromorphic computing, inadequate capacity and insufficient representation power of spiking neural networks (SNNs) severely restrict their application scope in practice. Residual learning and shortcuts have been evidenced as an important approach for training deep neural networks, but rarely did previous work assess their applicability to the characteristics of spike-based communication and spatiotemporal dynamics. In this paper, we first identify that this negligence leads to impeded information flow and accompanying degradation problem in previous residual SNNs. Then we propose a novel SNN-oriented residual block, MS-ResNet, which is able to significantly extend the depth of directly trained SNNs, e.g. up to 482 layers on CIFAR-10 and 104 layers on ImageNet, without observing any slight degradation problem. We validate the effectiveness of MS-ResNet on both frame-based and neuromorphic datasets, and MS-ResNet104 achieves a superior result of 76.02% accuracy on ImageNet, the first time in the domain of directly trained SNNs. Great energy efficiency is also observed that on average only one spike per neuron is needed to classify an input sample. We believe our powerful and scalable models will provide a strong support for further exploration of SNNs.
We consider information dissemination over a network of gossiping agents (nodes). In this model, a source keeps the most up-to-date information about a time-varying binary state of the world, and $n$ receiver nodes want to follow the information at the source as accurately as possible. When the information at the source changes, the source first sends updates to a subset of $m\leq n$ nodes. After that, the nodes share their local information during the gossiping period to disseminate the information further. The nodes then estimate the information at the source using the majority rule at the end of the gossiping period. To analyze information dissemination, we introduce a new error metric to find the average percentage of nodes that can accurately obtain the most up-to-date information at the source. We characterize the equations necessary to obtain the steady-state distribution for the average error and then analyze the system behavior under both high and low gossip rates. In the high gossip rate, in which each node can access other nodes' information more frequently, we show that the nodes update their information based on the majority of the information in the network. In the low gossip rate, we introduce and analyze the gossip gain, which is the reduction at the average error due to gossiping. In particular, we develop an adaptive policy that the source can use to determine its current transmission capacity $m$ based on its past transmission rates and the accuracy of the information at the nodes. In numerical results, we show that when the source's transmission capacity $m$ is limited, gossiping can be harmful as it causes incorrect information to disseminate. We then find the optimal gossip rates to minimize the average error for a fixed $m$. Finally, we illustrate the outperformance of our adaptive policy compared to the constant $m$-selection policy even for the high gossip rates.
The ability to recover from an unexpected external perturbation is a fundamental motor skill in bipedal locomotion. An effective response includes the ability to not just recover balance and maintain stability but also to fall in a safe manner when balance recovery is physically infeasible. For robots associated with bipedal locomotion, such as humanoid robots and assistive robotic devices that aid humans in walking, designing controllers which can provide this stability and safety can prevent damage to robots or prevent injury related medical costs. This is a challenging task because it involves generating highly dynamic motion for a high-dimensional, non-linear and under-actuated system with contacts. Despite prior advancements in using model-based and optimization methods, challenges such as requirement of extensive domain knowledge, relatively large computational time and limited robustness to changes in dynamics still make this an open problem. In this thesis, to address these issues we develop learning-based algorithms capable of synthesizing push recovery control policies for two different kinds of robots : Humanoid robots and assistive robotic devices that assist in bipedal locomotion. Our work can be branched into two closely related directions : 1) Learning safe falling and fall prevention strategies for humanoid robots and 2) Learning fall prevention strategies for humans using a robotic assistive devices. To achieve this, we introduce a set of Deep Reinforcement Learning (DRL) algorithms to learn control policies that improve safety while using these robots.
As an important type of reinforcement learning algorithms, actor-critic (AC) and natural actor-critic (NAC) algorithms are often executed in two ways for finding optimal policies. In the first nested-loop design, actor's one update of policy is followed by an entire loop of critic's updates of the value function, and the finite-sample analysis of such AC and NAC algorithms have been recently well established. The second two time-scale design, in which actor and critic update simultaneously but with different learning rates, has much fewer tuning parameters than the nested-loop design and is hence substantially easier to implement. Although two time-scale AC and NAC have been shown to converge in the literature, the finite-sample convergence rate has not been established. In this paper, we provide the first such non-asymptotic convergence rate for two time-scale AC and NAC under Markovian sampling and with actor having general policy class approximation. We show that two time-scale AC requires the overall sample complexity at the order of $\mathcal{O}(\epsilon^{-2.5}\log^3(\epsilon^{-1}))$ to attain an $\epsilon$-accurate stationary point, and two time-scale NAC requires the overall sample complexity at the order of $\mathcal{O}(\epsilon^{-4}\log^2(\epsilon^{-1}))$ to attain an $\epsilon$-accurate global optimal point. We develop novel techniques for bounding the bias error of the actor due to dynamically changing Markovian sampling and for analyzing the convergence rate of the linear critic with dynamically changing base functions and transition kernel.
Recently, an event-based end-to-end model (SEDT) has been proposed for sound event detection (SED) and achieves competitive performance. However, compared with the frame-based model, it requires more training data with temporal annotations to improve the localization ability. Synthetic data is an alternative, but it suffers from a great domain gap with real recordings. Inspired by the great success of UP-DETR in object detection, we propose to self-supervisedly pre-train SEDT (SP-SEDT) by detecting random patches (only cropped along the time axis). Experiments on the DCASE2019 task4 dataset show the proposed SP-SEDT can outperform fine-tuned frame-based model. The ablation study is also conducted to investigate the impact of different loss functions and patch size.
Unmanned Aerial Vehicles (UAVs) have recently attracted significant attention due to their outstanding ability to be used in different sectors and serve in difficult and dangerous areas. Moreover, the advancements in computer vision and artificial intelligence have increased the use of UAVs in various applications and solutions, such as forest fires detection and borders monitoring. However, using deep neural networks (DNNs) with UAVs introduces several challenges of processing deeper networks and complex models, which restricts their on-board computation. In this work, we present a strategy aiming at distributing inference requests to a swarm of resource-constrained UAVs that classifies captured images on-board and finds the minimum decision-making latency. We formulate the model as an optimization problem that minimizes the latency between acquiring images and making the final decisions. The formulated optimization solution is an NP-hard problem. Hence it is not adequate for online resource allocation. Therefore, we introduce an online heuristic solution, namely DistInference, to find the layers placement strategy that gives the best latency among the available UAVs. The proposed approach is general enough to be used for different low decision-latency applications as well as for all CNN types organized into the pipeline of layers (e.g., VGG) or based on residual blocks (e.g., ResNet).
We study the role of magnitude structured pruning as an architecture search to speed up the inference time of a deep noise suppression (DNS) model. While deep learning approaches have been remarkably successful in enhancing audio quality, their increased complexity inhibits their deployment in real-time applications. We achieve up to a 7.25X inference speedup over the baseline, with a smooth model performance degradation. Ablation studies indicate that our proposed network re-parameterization (i.e., size per layer) is the major driver of the speedup, and that magnitude structured pruning does comparably to directly training a model in the smaller size. We report inference speed because a parameter reduction does not necessitate speedup, and we measure model quality using an accurate non-intrusive objective speech quality metric.
The creation of detailed 3D models is relevant for a wide range of applications such as navigation in three-dimensional space, construction planning or disaster assessment. However, the complex processing and long execution time for detailed 3D reconstructions require the original database to be reduced in order to obtain a result in reasonable time. In this paper we therefore present our framework iVS3D for intelligent pre-processing of image sequences. Our software is able to down sample entire videos to a specific frame rate, as well as to resize and crop the individual images. Furthermore, thanks to our modular architecture, it is easy to develop and integrate plugins with additional algorithms. We provide three plugins as baseline methods that enable an intelligent selection of suitable images and can enrich them with additional information. To filter out images affected by motion blur, we developed a plugin that detects these frames and also searches the spatial neighbourhood for suitable images as replacements. The second plugin uses optical flow to detect redundant images caused by a temporarily stationary camera. In our experiments, we show how this approach leads to a more balanced image sampling if the camera speed varies, and that excluding such redundant images leads to a time saving of 8.1\percent for our sequences. A third plugin makes it possible to exclude challenging image regions from the 3D reconstruction by performing semantic segmentation. As we think that the community can greatly benefit from such an approach, we will publish our framework and the developed plugins open source using the MIT licence to allow co-development and easy extension.