Text detection, the key technology for understanding scene text, has become an attractive research topic. For detecting various scene texts, researchers propose plenty of detectors with different advantages: detection-based models enjoy fast detection speed, and segmentation-based algorithms are not limited by text shapes. However, for most intelligent systems, the detector needs to detect arbitrary-shaped texts with high speed and accuracy simultaneously. Thus, in this study, we design an efficient pipeline named as MT, which can detect adhesive arbitrary-shaped texts with only a single binary mask in the inference stage. This paper presents the contributions on three aspects: (1) a light-weight detection framework is designed to speed up the inference process while keeping high detection accuracy; (2) a multi-perspective feature module is proposed to learn more discriminative representations to segment the mask accurately; (3) a multi-factor constraints IoU minimization loss is introduced for training the proposed model. The effectiveness of MT is evaluated on four real-world scene text datasets, and it surpasses all the state-of-the-art competitors to a large extent.
Personalization and active learning are key aspects to successful learning. These aspects are important to address in intelligent educational applications, as they help systems to adapt and close the gap between students with varying abilities, which becomes increasingly important in the context of online and distance learning. We run a comparative head-to-head study of learning outcomes for two popular online learning platforms: Platform A, which follows a traditional model delivering content over a series of lecture videos and multiple-choice quizzes, and Platform B, which creates a personalized learning environment and provides problem-solving exercises and personalized feedback. We report on the results of our study using pre- and post-assessment quizzes with participants taking courses on an introductory data science topic on two platforms. We observe a statistically significant increase in the learning outcomes on Platform B, highlighting the impact of well-designed and well-engineered technology supporting active learning and problem-based learning in online education. Moreover, the results of the self-assessment questionnaire, where participants reported on perceived learning gains, suggest that participants using Platform B improve their metacognition.
Interleaving learning is a human learning technique where a learner interleaves the studies of multiple topics, which increases long-term retention and improves ability to transfer learned knowledge. Inspired by the interleaving learning technique of humans, in this paper we explore whether this learning methodology is beneficial for improving the performance of machine learning models as well. We propose a novel machine learning framework referred to as interleaving learning (IL). In our framework, a set of models collaboratively learn a data encoder in an interleaving fashion: the encoder is trained by model 1 for a while, then passed to model 2 for further training, then model 3, and so on; after trained by all models, the encoder returns back to model 1 and is trained again, then moving to model 2, 3, etc. This process repeats for multiple rounds. Our framework is based on multi-level optimization consisting of multiple inter-connected learning stages. An efficient gradient-based algorithm is developed to solve the multi-level optimization problem. We apply interleaving learning to search neural architectures for image classification on CIFAR-10, CIFAR-100, and ImageNet. The effectiveness of our method is strongly demonstrated by the experimental results.
Manipulation of small-scale matter is a fundamental topic in micro- and nanorobotics. Numerous magnetic robotic systems have been developed for the manipulation of microparticles in an ambient environment, liquid as well as on the air-liquid interface. Those systems move intrinsically magnetic or magnetically tagged objects by inducing a magnetic torque or force. However, most of the materials found in nature are non-magnetic. Here, we report a novel ferrofluidic manipulator for automatic two-dimensional manipulation of non-magnetic objects floating on top of a ferrofluid. The manipulation system employs eight centimeter-scale solenoids, which can move non-magnetic particles floating on the air-liquid interface by deforming the air-ferrofluid interface. Using linear programming, we can control the motion of non-magnetic particles with a predefined trajectory of a line, square, and circle with a precision of 57.4+/-33.6 um, 74+/-44.4 um, and 67.2+/-38.6 um, respectively. The ferrofluidic manipulator is versatile with the materials and the shapes of the objects under manipulation. We have successfully manipulated particles made of polyethylene, polystyrene, a silicon chip, and poppy and sesame seeds.
The utilization of deep learning techniques in generating various contents (such as image, text, etc.) has become a trend. Especially music, the topic of this paper, has attracted widespread attention of countless researchers.The whole process of producing music can be divided into three stages, corresponding to the three levels of music generation: score generation produces scores, performance generation adds performance characteristics to the scores, and audio generation converts scores with performance characteristics into audio by assigning timbre or generates music in audio format directly. Previous surveys have explored the network models employed in the field of automatic music generation. However, the development history, the model evolution, as well as the pros and cons of same music generation task have not been clearly illustrated. This paper attempts to provide an overview of various composition tasks under different music generation levels, covering most of the currently popular music generation tasks using deep learning. In addition, we summarize the datasets suitable for diverse tasks, discuss the music representations, the evaluation methods as well as the challenges under different levels, and finally point out several future directions.
Recent developments in data science in general and machine learning in particular have transformed the way experts envision the future of surgery. Surgical data science is a new research field that aims to improve the quality of interventional healthcare through the capture, organization, analysis and modeling of data. While an increasing number of data-driven approaches and clinical applications have been studied in the fields of radiological and clinical data science, translational success stories are still lacking in surgery. In this publication, we shed light on the underlying reasons and provide a roadmap for future advances in the field. Based on an international workshop involving leading researchers in the field of surgical data science, we review current practice, key achievements and initiatives as well as available standards and tools for a number of topics relevant to the field, namely (1) technical infrastructure for data acquisition, storage and access in the presence of regulatory constraints, (2) data annotation and sharing and (3) data analytics. Drawing from this extensive review, we present current challenges for technology development and (4) describe a roadmap for faster clinical translation and exploitation of the full potential of surgical data science.
Convolutional neural networks (CNN) have achieved great success in analyzing tropical cyclones (TC) with satellite images in several tasks, such as TC intensity estimation. In contrast, TC structure, which is conventionally described by a few parameters estimated subjectively by meteorology specialists, is still hard to be profiled objectively and routinely. This study applies CNN on satellite images to create the entire TC structure profiles, covering all the structural parameters. By utilizing the meteorological domain knowledge to construct TC wind profiles based on historical structure parameters, we provide valuable labels for training in our newly released benchmark dataset. With such a dataset, we hope to attract more attention to this crucial issue among data scientists. Meanwhile, a baseline is established with a specialized convolutional model operating on polar-coordinates. We discovered that it is more feasible and physically reasonable to extract structural information on polar-coordinates, instead of Cartesian coordinates, according to a TC's rotational and spiral natures. Experimental results on the released benchmark dataset verified the robustness of the proposed model and demonstrated the potential for applying deep learning techniques for this barely developed yet important topic.
Multi-view subspace clustering is an important and hot topic in machine learning field, which aims to promote clustering results based on multi-view data, which are collected from different domains or various measurements. In this paper, we propose a novel tensor-based intrinsic subspace representation learning for multi-view clustering. Specifically, to investigate the underlying subspace representation, the rank preserving decomposition accompanied with the tensor-singular value decomposition based low-rank tensor constraint is introduced and applied on the subspace representation matrices of multiple views. The specific information of different views can be considered by the rank preserving decomposition and the high-order correlations of multi-view data are fully explored by the low-rank tensor constraint in our method. Based on the learned subspace representation, clustering results can be obtained by employing the standard spectral clustering algorithm. The objective function is efficiently optimized by utilizing the augmented Lagrangian multiplier based alternating direction minimization algorithm. Experimental results on nine real-world datasets illustrate the superiority of our method compared to several state-of-the-arts.
There is a need to build intelligence in operating machinery and use data analysis on monitored signals in order to quantify the health of the operating system and self-diagnose any initiations of fault. Built-in control procedures can automatically take corrective actions in order to avoid catastrophic failure when a fault is diagnosed. This paper presents a Temporal Clustering Network (TCN) capability for processing acceleration measurement(s) made on the operating system (i.e. machinery foundation, machinery casing, etc.), or any other type of temporal signals, and determine based on the monitored signal when a fault is at its onset. The new capability uses: one-dimensional convolutional neural networks (1D-CNN) for processing the measurements; unsupervised learning (i.e. no labeled signals from the different operating conditions and no signals at pristine vs. damaged conditions are necessary for training the 1D-CNN); clustering (i.e. grouping signals in different clusters reflective of the operating conditions); and statistical analysis for identifying fault signals that are not members of any of the clusters associated with the pristine operating conditions. A case study demonstrating its operation is included in the paper. Finally topics for further research are identified.