Unsupervised learning technology has caught up with or even surpassed supervised learning technology in general object classification (GOC) and person re-identification (re-ID). However, it is found that the unsupervised learning of fine-grained visual classification (FGVC) is more challenging than GOC and person re-ID. In order to bridge the gap between unsupervised and supervised learning for FGVC, we investigate the essential factors (including feature extraction, clustering, and contrastive learning) for the performance gap between supervised and unsupervised FGVC. Furthermore, we propose a simple, effective, and practical method, termed as UFCL, to alleviate the gap. Three key issues are concerned and improved: First, we introduce a robust and powerful backbone, ResNet50-IBN, which has an ability of domain adaptation when we transfer ImageNet pre-trained models to FGVC tasks. Next, we propose to introduce HDBSCAN instead of DBSCAN to do clustering, which can generate better clusters for adjacent categories with fewer hyper-parameters. Finally, we propose a weighted feature agent and its updating mechanism to do contrastive learning by using the pseudo labels with inevitable noise, which can improve the optimization process of learning the parameters of the network. The effectiveness of our UFCL is verified on CUB-200-2011, Oxford-Flowers, Oxford-Pets, Stanford-Dogs, Stanford-Cars and FGVC-Aircraft datasets. Under the unsupervised FGVC setting, we achieve state-of-the-art results, and analyze the key factors and the important parameters to provide a practical guidance.
Influenced by deep penetration of the new generation of information technology, power systems have gradually evolved into highly coupled cyber-physical systems (CPS). Among many possible power CPS network attacks, a false data injection attacks (FDIAs) is the most serious. Taking account of the fact that the existing knowledge-driven detection process for FDIAs has been in a passive detection state for a long time and ignores the advantages of data-driven active capture of features, an active and passive hybrid detection method for power CPS FDIAs with improved adaptive Kalman filter (AKF) and convolutional neural networks (CNN) is proposed in this paper. First, we analyze the shortcomings of the traditional AKF algorithm in terms of filtering divergence and calculation speed. The state estimation algorithm based on non-negative positive-definite adaptive Kalman filter (NDAKF) is improved, and a passive detection method of FDIAs is constructed, with similarity Euclidean distance detection and residual detection at its core. Then, combined with the advantages of gate recurrent unit (GRU) and CNN in terms of temporal memory and feature-expression ability, an active detection method of FDIAs based on a GRU-CNN hybrid neural network is proposed. Finally, the results of joint knowledge-driven and data-driven parallel detection are used to define a mixed fixed-calculation formula, and an active and passive hybrid detection method of FDIAs is established, considering the characteristic constraints of the parallel mode. A simulation system example of power CPS FDIAs verifies the effectiveness and accuracy of the method proposed in this paper.
The wind power ramp events threaten the power grid safety significantly. To improve the ramp prediction accuracy, a hybrid wavelet deep belief network algorithm with adaptive feature selection (WDBNAFS) is proposed. First, the wind power characteristic is analyzed. Then, wavelet decomposition is addressed to the time series, and an adaptive feature selection algorithm is proposed to select the inputs of the prediction model. Finally, a deep belief network is employed to predict the wind power ramp event, and the proposed WDBNAFS was testified with the experiments based on the practical data. The simulation results demonstrate that the prediction accuracy of the proposed algorithm is more than 90%.
The reliability and precision of dynamic database are vital for the optimal operating and global control of integrated energy systems. One of the effective ways to obtain the accurate states is state estimations. A novel robust dynamic state estimation methodology for integrated natural gas and electric power systems is proposed based on Kalman filter. To take full advantage of measurement redundancies and predictions for enhancing the estimating accuracy, the dynamic state estimation model coupling gas and power systems by gas turbine units is established. The exponential smoothing technique and gas physical model are integrated in Kalman filter. Additionally, the time-varying scalar matrix is proposed to conquer bad data in Kalman filter algorithm. The proposed method is applied to an integrated gas and power systems formed by GasLib-40 and IEEE 39-bus system with five gas turbine units. The simulating results show that the method can obtain the accurate dynamic states under three different measurement error conditions, and the filtering performance are better than separate estimation methods. Additionally, the proposed method is robust when the measurements experience bad data.
The ever-growing demand and complexity of machine learning are putting pressure on hyper-parameter tuning systems: while the evaluation cost of models continues to increase, the scalability of state-of-the-arts starts to become a crucial bottleneck. In this paper, inspired by our experience when deploying hyper-parameter tuning in a real-world application in production and the limitations of existing systems, we propose Hyper-Tune, an efficient and robust distributed hyper-parameter tuning framework. Compared with existing systems, Hyper-Tune highlights multiple system optimizations, including (1) automatic resource allocation, (2) asynchronous scheduling, and (3) multi-fidelity optimizer. We conduct extensive evaluations on benchmark datasets and a large-scale real-world dataset in production. Empirically, with the aid of these optimizations, Hyper-Tune outperforms competitive hyper-parameter tuning systems on a wide range of scenarios, including XGBoost, CNN, RNN, and some architectural hyper-parameters for neural networks. Compared with the state-of-the-art BOHB and A-BOHB, Hyper-Tune achieves up to 11.2x and 5.1x speedups, respectively.
The layout of a mobile screen is a critical data source for UI design research and semantic understanding of the screen. However, UI layouts in existing datasets are often noisy, have mismatches with their visual representation, or consists of generic or app-specific types that are difficult to analyze and model. In this paper, we propose the CLAY pipeline that uses a deep learning approach for denoising UI layouts, allowing us to automatically improve existing mobile UI layout datasets at scale. Our pipeline takes both the screenshot and the raw UI layout, and annotates the raw layout by removing incorrect nodes and assigning a semantically meaningful type to each node. To experiment with our data-cleaning pipeline, we create the CLAY dataset of 59,555 human-annotated screen layouts, based on screenshots and raw layouts from Rico, a public mobile UI corpus. Our deep models achieve high accuracy with F1 scores of 82.7% for detecting layout objects that do not have a valid visual representation and 85.9% for recognizing object types, which significantly outperforms a heuristic baseline. Our work lays a foundation for creating large-scale high quality UI layout datasets for data-driven mobile UI research and reduces the need of manual labeling efforts that are prohibitively expensive.
The layout of a mobile screen is a critical data source for UI designresearch and semantic understanding of the screen. However, UIlayouts in existing datasets are often noisy, have mismatches withtheir visual representation, or consists of generic or app-specifictypes that are difficult to analyze and model. In this paper, wepropose the CLAY pipeline that uses a deep learning approach fordenoising UI layouts, allowing us to automatically improve existingmobile UI layout datasets at scale. Our pipeline takes both thescreenshot and the raw UI layout, and annotates the raw layout byremoving incorrect nodes and assigning a semantically meaningfultype to each node. To experiment with our data-cleaning pipeline,we create the CLAY dataset of 59,555 human-annotated screenlayouts, based on screenshots and raw layouts from Rico, a publicmobile UI corpus. Our deep models achieve high accuracy withF1 scores of 82.7% for detecting layout objects that do not have avalid visual representation and 85.9% for recognizing object types,which significantly outperforms a heuristic baseline. Our work laysa foundation for creating large-scale high quality UI layout datasetsfor data-driven mobile UI research and reduces the need of manuallabeling efforts that are prohibitively expensive.
Magnetic resonance imaging (MRI) is an important non-invasive clinical tool that can produce high-resolution and reproducible images. However, a long scanning time is required for high-quality MR images, which leads to exhaustion and discomfort of patients, inducing more artefacts due to voluntary movements of the patients and involuntary physiological movements. To accelerate the scanning process, methods by k-space undersampling and deep learning based reconstruction have been popularised. This work introduced SwinMR, a novel Swin transformer based method for fast MRI reconstruction. The whole network consisted of an input module (IM), a feature extraction module (FEM) and an output module (OM). The IM and OM were 2D convolutional layers and the FEM was composed of a cascaded of residual Swin transformer blocks (RSTBs) and 2D convolutional layers. The RSTB consisted of a series of Swin transformer layers (STLs). The shifted windows multi-head self-attention (W-MSA/SW-MSA) of STL was performed in shifted windows rather than the multi-head self-attention (MSA) of the original transformer in the whole image space. A novel multi-channel loss was proposed by using the sensitivity maps, which was proved to reserve more textures and details. We performed a series of comparative studies and ablation studies in the Calgary-Campinas public brain MR dataset and conducted a downstream segmentation experiment in the Multi-modal Brain Tumour Segmentation Challenge 2017 dataset. The results demonstrate our SwinMR achieved high-quality reconstruction compared with other benchmark methods, and it shows great robustness with different undersampling masks, under noise interruption and on different datasets. The code is publicly available at https://github.com/ayanglab/SwinMR.
Few-shot learning (learning with a few samples) is one of the most important capacities of the human brain. However, the current artificial intelligence systems meet difficulties in achieving this ability, so as the biologically plausible spiking neural networks (SNNs). Datasets for traditional few-shot learning domains provide few amounts of temporal information. And the absence of the neuromorphic datasets has hindered the development of few-shot learning for SNNs. Here, we provide the first neuromorphic dataset: N-Omniglot, using the Dynamic Vision Sensor (DVS). It contains 1623 categories of handwritten characters, with only 20 samples per class. N-Omniglot eliminates the need for a neuromorphic dataset for SNNs with high spareness and tremendous temporal coherence. Additionally, the dataset provides a powerful challenge and a suitable benchmark for developing SNNs algorithm in the few-shot learning domain due to the chronological information of strokes. We also provide the improved nearest neighbor, convolutional network, SiameseNet, and meta-learning algorithm in spiking version for verification.
Planar object tracking plays an important role in AI applications, such as robotics, visual servoing, and visual SLAM. Although the previous planar trackers work well in most scenarios, it is still a challenging task due to the rapid motion and large transformation between two consecutive frames. The essential reason behind this problem is that the condition number of such a non-linear system changes unstably when the searching range of the homography parameter space becomes larger. To this end, we propose a novel Homography Decomposition Networks(HDN) approach that drastically reduces and stabilizes the condition number by decomposing the homography transformation into two groups. Specifically, a similarity transformation estimator is designed to predict the first group robustly by a deep convolution equivariant network. By taking advantage of the scale and rotation estimation with high confidence, a residual transformation is estimated by a simple regression model. Furthermore, the proposed end-to-end network is trained in a semi-supervised fashion. Extensive experiments show that our proposed approach outperforms the state-of-the-art planar tracking methods at a large margin on the challenging POT, UCSB and POIC datasets.