Chromosome karyotype analysis is of great clinical importance in the diagnosis and treatment of diseases, especially for genetic diseases. Since manual analysis is highly time and effort consuming, computer-assisted automatic chromosome karyotype analysis based on images is routinely used to improve the efficiency and accuracy of the analysis. Due to the strip shape of the chromosomes, they easily get overlapped with each other when imaged, significantly affecting the accuracy of the analysis afterward. Conventional overlapping chromosome segmentation methods are usually based on manually tagged features, hence, the performance of which is easily affected by the quality, such as resolution and brightness, of the images. To address the problem, in this paper, we present an adversarial multiscale feature learning framework to improve the accuracy and adaptability of overlapping chromosome segmentation. Specifically, we first adopt the nested U-shape network with dense skip connections as the generator to explore the optimal representation of the chromosome images by exploiting multiscale features. Then we use the conditional generative adversarial network (cGAN) to generate images similar to the original ones, the training stability of which is enhanced by applying the least-square GAN objective. Finally, we employ Lovasz-Softmax to help the model converge in a continuous optimization setting. Comparing with the established algorithms, the performance of our framework is proven superior by using public datasets in eight evaluation criteria, showing its great potential in overlapping chromosome segmentation
As countries look towards re-opening of economic activities amidst the ongoing COVID-19 pandemic, ensuring public health has been challenging. While contact tracing only aims to track past activities of infected users, one path to safe reopening is to develop reliable spatiotemporal risk scores to indicate the propensity of the disease. Existing works which aim to develop risk scores either rely on compartmental model-based reproduction numbers (which assume uniform population mixing) or develop coarse-grain spatial scores based on reproduction number (R0) and macro-level density-based mobility statistics. Instead, in this paper, we develop a Hawkes process-based technique to assign relatively fine-grain spatial and temporal risk scores by leveraging high-resolution mobility data based on cell-phone originated location signals. While COVID-19 risk scores also depend on a number of factors specific to an individual, including demography and existing medical conditions, the primary mode of disease transmission is via physical proximity and contact. Therefore, we focus on developing risk scores based on location density and mobility behaviour. We demonstrate the efficacy of the developed risk scores via simulation based on real-world mobility data. Our results show that fine-grain spatiotemporal risk scores based on high-resolution mobility data can provide useful insights and facilitate safe re-opening.
Compressed sensing (CS) computed tomography has been proven to be important for several clinical applications, such as sparse-view computed tomography (CT), digital tomosynthesis and interior tomography. Traditional compressed sensing focuses on the design of handcrafted prior regularizers, which are usually image-dependent and time-consuming. Inspired by recently proposed deep learning-based CT reconstruction models, we extend the state-of-the-art LEARN model to a dual-domain version, dubbed LEARN++. Different from existing iteration unrolling methods, which only involve projection data in the data consistency layer, the proposed LEARN++ model integrates two parallel and interactive subnetworks to perform image restoration and sinogram inpainting operations on both the image and projection domains simultaneously, which can fully explore the latent relations between projection data and reconstructed images. The experimental results demonstrate that the proposed LEARN++ model achieves competitive qualitative and quantitative results compared to several state-of-the-art methods in terms of both artifact reduction and detail preservation.
Trajectory prediction for scenes with multiple agents and entities is a challenging problem in numerous domains such as traffic prediction, pedestrian tracking and path planning. We present a general architecture to address this challenge which models the crucial inductive biases of motion, namely, inertia, relative motion, intents and interactions. Specifically, we propose a relational model to flexibly model interactions between agents in diverse environments. Since it is well-known that human decision making is fuzzy by nature, at the core of our model lies a novel attention mechanism which models interactions by making continuous-valued (fuzzy) decisions and learning the corresponding responses. Our architecture demonstrates significant performance gains over existing state-of-the-art predictive models in diverse domains such as human crowd trajectories, US freeway traffic, NBA sports data and physics datasets. We also present ablations and augmentations to understand the decision-making process and the source of gains in our model.
Spectral computed tomography (CT) can reconstruct spectral images from different energy bins using photon counting detectors (PCDs). However, due to the limited photons and counting rate in the corresponding spectral fraction, the reconstructed spectral images usually suffer from severe noise. In this paper, a fourth-order nonlocal tensor decomposition model for spectral CT image reconstruction (FONT-SIR) method is proposed. Similar patches are collected in both spatial and spectral dimensions simultaneously to form the basic tensor unit. Additionally, principal component analysis (PCA) is applied to extract latent features from the patches for a robust and efficient similarity measure. Then, low-rank and sparsity decomposition is performed on the produced fourth-order tensor unit, and the weighted nuclear norm and total variation (TV) norm are used to enforce the low-rank and sparsity constraints, respectively. The alternating direction method of multipliers (ADMM) is adopted to optimize the objective function. The experimental results with our proposed FONT-SIR demonstrates a superior qualitative and quantitative performance for both simulated and real data sets relative to several state-of-the-art methods, in terms of noise suppression and detail preservation.
Current mainstream of CT reconstruction methods based on deep learning usually needs to fix the scanning geometry and dose level, which will significantly aggravate the training cost and need more training data for clinical application. In this paper, we propose a parameter-dependent framework (PDF) which trains data with multiple scanning geometries and dose levels simultaneously. In the proposed PDF, the geometry and dose level are parameterized and fed into two multi-layer perceptrons (MLPs). The MLPs are leveraged to modulate the feature maps of CT reconstruction network, which condition the network outputs on different scanning geometries and dose levels. The experiments show that our proposed method can obtain competing performance similar to the original network trained with specific geometry and dose level, which can efficiently save the extra training cost for multiple scanning geometries and dose levels.
One of the key challenges of machine learning (ML) based intrusion detection system (IDS) is the expensive computational complexity which is largely due to redundant, incomplete, and irrelevant features contain in the IDS datasets. To overcome such challenge and ensure building an efficient and more accurate IDS models, many researchers utilize preprocessing techniques such as normalization and feature selection in a hybrid modeling approach. In this work, we propose a hybrid IDS modeling approach with an algorithm for feature selection (FS) and another for building an IDS. The FS algorithm is a wrapper-based with a decision tree as the feature evaluator. The propose FS method is used in combination with some selected ML algorithms to build IDS models using the UNSW-NB15 dataset. Some IDS models are built as a baseline in a single modeling approach using the full features of the dataset. We evaluate the effectiveness of our propose method by comparing it with the baseline models and also with state-of-the-art works. Our method achieves the best DR of 97.95% and shown to be quite effective in comparison to state-of-the-art works. We, therefore, recommend its usage especially in IDS modeling with the UNSW-NB15 dataset.
The dissemination of fake news intended to deceive people, influence public opinion and manipulate social outcomes, has become a pressing problem on social media. Moreover, information sharing on social media facilitates diffusion of viral information cascades. In this work, we focus on understanding and leveraging diffusion dynamics of false and legitimate contents in order to facilitate network interventions for fake news mitigation. We analyze real-world Twitter datasets comprising fake and true news cascades, to understand differences in diffusion dynamics and user behaviours with regards to fake and true contents. Based on the analysis, we model the diffusion as a mixture of Independent Cascade models (MIC) with parameters $\theta_T, \theta_F$ over the social network graph; and derive unsupervised inference techniques for parameter estimation of the diffusion mixture model from observed, unlabeled cascades. Users influential in the propagation of true and fake contents are identified using the inferred diffusion dynamics. Characteristics of the identified influential users reveal positive correlation between influential users identified for fake news and their relative appearance in fake news cascades. Identified influential users tend to be related to topics of more viral information cascades than less viral ones; and identified fake news influential users have relatively fewer counts of direct followers, compared to the true news influential users. Intervention analysis on nodes and edges demonstrates capacity of the inferred diffusion dynamics in supporting network interventions for mitigation.
Most existing crowd counting systems rely on the availability of the object location annotation which can be expensive to obtain. To reduce the annotation cost, one attractive solution is to leverage a large number of unlabeled images to build a crowd counting model in semi-supervised fashion. This paper tackles the semi-supervised crowd counting problem from the perspective of feature learning. Our key idea is to leverage the unlabeled images to train a generic feature extractor rather than the entire network of a crowd counter. The rationale of this design is that learning the feature extractor can be more reliable and robust towards the inevitable noisy supervision generated from the unlabeled data. Also, on top of a good feature extractor, it is possible to build a density map regressor with much fewer density map annotations. Specifically, we proposed a novel semi-supervised crowd counting method which is built upon two innovative components: (1) a set of inter-related binary segmentation tasks are derived from the original density map regression task as the surrogate prediction target; (2) the surrogate target predictors are learned from both labeled and unlabeled data by utilizing a proposed self-training scheme which fully exploits the underlying constraints of these binary segmentation tasks. Through experiments, we show that the proposed method is superior over the existing semisupervised crowd counting method and other representative baselines.
Human activity, which usually consists of several actions, generally covers interactions among persons and or objects. In particular, human actions involve certain spatial and temporal relationships, are the components of more complicated activity, and evolve dynamically over time. Therefore, the description of a single human action and the modeling of the evolution of successive human actions are two major issues in human activity recognition. In this paper, we develop a method for human activity recognition that tackles these two issues. In the proposed method, an activity is divided into several successive actions represented by spatio temporal patterns, and the evolution of these actions are captured by a sequential model. A refined comprehensive spatio temporal graph is utilized to represent a single action, which is a qualitative representation of a human action incorporating both the spatial and temporal relations of the participant objects. Next, a discrete hidden Markov model is applied to model the evolution of action sequences. Moreover, a fully automatic partition method is proposed to divide a long-term human activity video into several human actions based on variational objects and qualitative spatial relations. Finally, a hierarchical decomposition of the human body is introduced to obtain a discriminative representation for a single action. Experimental results on the Cornell Activity Dataset demonstrate the efficiency and effectiveness of the proposed approach, which will enable long videos of human activity to be better recognized.