Tracking a car or a person in a city is crucial for urban safety management. How can we complete the task with minimal number of spatiotemporal searches from massive camera records? This paper proposes a strategy named IHMs (Intermediate Searching at Heuristic Moments): each step we figure out which moment is the best to search according to a heuristic indicator, then at that moment search locations one by one in descending order of predicted appearing probabilities, until a search hits; iterate this step until we get the object's current location. Five searching strategies are compared in experiments, and IHMs is validated to be most efficient, which can save up to 1/3 total costs. This result provides an evidence that "searching at intermediate moments can save cost".
The COVID-19 is sweeping the world with deadly consequences. Its contagious nature and clinical similarity to other pneumonias make separating subjects contracted with COVID-19 and non-COVID-19 viral pneumonia a priority and a challenge. However, COVID-19 testing has been greatly limited by the availability and cost of existing methods, even in developed countries like the US. Intrigued by the wide availability of routine blood tests, we propose to leverage them for COVID-19 testing using the power of machine learning. Two proven-robust machine learning model families, random forests (RFs) and support vector machines (SVMs), are employed to tackle the challenge. Trained on blood data from 208 moderate COVID-19 subjects and 86 subjects with non-COVID-19 moderate viral pneumonia, the best result is obtained in an SVM-based classifier with an accuracy of 84%, a sensitivity of 88%, a specificity of 80%, and a precision of 92%. The results are found explainable from both machine learning and medical perspectives. A privacy-protected web portal is set up to help medical personnel in their practice and the trained models are released for developers to further build other applications. We hope our results can help the world fight this pandemic and welcome clinical verification of our approach on larger populations.
3D instance segmentation, with a variety of applications in robotics and augmented reality, is in large demands these days. Unlike 2D images that are projective observations of the environment, 3D models provide metric reconstruction of the scenes without occlusion or scale ambiguity. In this paper, we define "3D occupancy size", as the number of voxels occupied by each instance. It owns advantages of robustness in prediction, on which basis, OccuSeg, an occupancy-aware 3D instance segmentation scheme is proposed. Our multi-task learning produces both occupancy signal and embedding representations, where the training of spatial and feature embeddings varies with their difference in scale-aware. Our clustering scheme benefits from the reliable comparison between the predicted occupancy size and the clustered occupancy size, which encourages hard samples being correctly clustered and avoids over segmentation. The proposed approach achieves state-of-the-art performance on 3 real-world datasets, i.e. ScanNetV2, S3DIS and SceneNN, while maintaining high efficiency.
Convective storms are one of the severe weather hazards found during the warm season. Doppler weather radar is the only operational instrument that can frequently sample the detailed structure of convective storm which has a small spatial scale and short lifetime. For the challenging task of short-term convective storm forecasting, 3-D radar images contain information about the processes in convective storm. However, effectively extracting such information from multisource raw data has been problematic due to a lack of methodology and computation limitations. Recent advancements in deep learning techniques and graphics processing units now make it possible. This article investigates the feasibility and performance of an end-to-end deep learning nowcasting method. The nowcasting problem was transformed into a classification problem first, and then, a deep learning method that uses a convolutional neural network was presented to make predictions. On the first layer of CNN, a cross-channel 3D convolution was proposed to fuse 3D raw data. The CNN method eliminates the handcrafted feature engineering, i.e., the process of using domain knowledge of the data to manually design features. Operationally produced historical data of the Beijing-Tianjin-Hebei region in China was used to train the nowcasting system and evaluate its performance; 3737332 samples were collected in the training data set. The experimental results show that the deep learning method improves nowcasting skills compared with traditional machine learning methods.
Very short-term convective storm forecasting, termed nowcasting, has long been an important issue and has attracted substantial interest. Existing nowcasting methods rely principally on radar images and are limited in terms of nowcasting storm initiation and growth. Real-time re-analysis of meteorological data supplied by numerical models provides valuable information about three-dimensional (3D), atmospheric, boundary layer thermal dynamics, such as temperature and wind. To mine such data, we here develop a convolution-recurrent, hybrid deep-learning method with the following characteristics: (1) the use of cell-based oversampling to increase the number of training samples; this mitigates the class imbalance issue; (2) the use of both raw 3D radar data and 3D meteorological data re-analyzed via multi-source 3D convolution without any need for handcraft feature engineering; and (3) the stacking of convolutional neural networks on a long short-term memory encoder/decoder that learns the spatiotemporal patterns of convective processes. Experimental results demonstrated that our method performs better than other extrapolation methods. Qualitative analysis yielded encouraging nowcasting results.
Neural network language model (NNLM) is an essential component of industrial ASR systems. One important challenge of training an NNLM is to leverage between scaling the learning process and handling big data. Conventional approaches such as block momentum provides a blockwise model update filtering (BMUF) process and achieves almost linear speedups with no performance degradation for speech recognition. However, it needs to calculate the model average from all computing nodes (e.g., GPUs) and when the number of computing nodes is large, the learning suffers from the severe communication latency. As a consequence, BMUF is not suitable under restricted network conditions. In this paper, we present a decentralized BMUF process, in which the model is split into different components, each of which is updated by communicating to some randomly chosen neighbor nodes with the same component, followed by a BMUF-like process. We apply this method to several LSTM language modeling tasks. Experimental results show that our approach achieves consistently better performance than conventional BMUF. In particular, we obtain a lower perplexity than the single-GPU baseline on the wiki-text-103 benchmark using 4 GPUs. In addition, no performance degradation is observed when scaling to 8 and 16 GPUs.
The success of speech assistants requires precise recognition of a number of entities on particular contexts. A common solution is to train a class-based n-gram language model and then expand the classes into specific words or phrases. However, when the class has a huge list, e.g., more than 20 million songs, a fully expansion will cause memory explosion. Worse still, the list items in the class need to be updated frequently, which requires a dynamic model updating technique. In this work, we propose to train pruned language models for the word classes to replace the slots in the root n-gram. We further propose to use a novel technique, named Difference Language Model (DLM), to correct the bias from the pruned language models. Once the decoding graph is built, we only need to recalculate the DLM when the entities in word classes are updated. Results show that the proposed method consistently and significantly outperforms the conventional approaches on all datasets, esp. for large lists, which the conventional approaches cannot handle.