Bronchoscopy inspection as a follow-up procedure from the radiological imaging plays a key role in lung disease diagnosis and determining treatment plans for the patients. Doctors needs to make a decision whether to biopsy the patients timely when performing bronchoscopy. However, the doctors also needs to be very selective with biopsies as biopsies may cause uncontrollable bleeding of the lung tissue which is life-threaten. To help doctors to be more selective on biopsies and provide a second opinion on diagnosis, in this work, we propose a computer-aided diagnosis (CAD) system for lung diseases including cancers and tuberculosis (TB). The system is developed based on transfer learning. We propose a novel transfer learning method: sentential fine-tuning . Compared to traditional fine-tuning methods, our methods achieves the best performance. We obtained a overall accuracy of 77.0% a dataset of 81 normal cases, 76 tuberculosis cases and 277 lung cancer cases while the other traditional transfer learning methods achieve an accuracy of 73% and 68%. . The detection accuracy of our method for cancers, TB and normal cases are 87%, 54% and 91% respectively. This indicates that the CAD system has potential to improve lung disease diagnosis accuracy in bronchoscopy and it also might be used to be more selective with biopsies.
In this paper, we propose an Attentional Generative Adversarial Network (AttnGAN) that allows attention-driven, multi-stage refinement for fine-grained text-to-image generation. With a novel attentional generative network, the AttnGAN can synthesize fine-grained details at different subregions of the image by paying attentions to the relevant words in the natural language description. In addition, a deep attentional multimodal similarity model is proposed to compute a fine-grained image-text matching loss for training the generator. The proposed AttnGAN significantly outperforms the previous state of the art, boosting the best reported inception score by 14.14% on the CUB dataset and 170.25% on the more challenging COCO dataset. A detailed analysis is also performed by visualizing the attention layers of the AttnGAN. It for the first time shows that the layered attentional GAN is able to automatically select the condition at the word level for generating different parts of the image.
Synthesizing high-quality images from text descriptions is a challenging problem in computer vision and has many practical applications. Samples generated by existing text-to-image approaches can roughly reflect the meaning of the given descriptions, but they fail to contain necessary details and vivid object parts. In this paper, we propose Stacked Generative Adversarial Networks (StackGAN) to generate 256x256 photo-realistic images conditioned on text descriptions. We decompose the hard problem into more manageable sub-problems through a sketch-refinement process. The Stage-I GAN sketches the primitive shape and colors of the object based on the given text description, yielding Stage-I low-resolution images. The Stage-II GAN takes Stage-I results and text descriptions as inputs, and generates high-resolution images with photo-realistic details. It is able to rectify defects in Stage-I results and add compelling details with the refinement process. To improve the diversity of the synthesized images and stabilize the training of the conditional-GAN, we introduce a novel Conditioning Augmentation technique that encourages smoothness in the latent conditioning manifold. Extensive experiments and comparisons with state-of-the-arts on benchmark datasets demonstrate that the proposed method achieves significant improvements on generating photo-realistic images conditioned on text descriptions.
Inspired by classic generative adversarial networks (GAN), we propose a novel end-to-end adversarial neural network, called SegAN, for the task of medical image segmentation. Since image segmentation requires dense, pixel-level labeling, the single scalar real/fake output of a classic GAN's discriminator may be ineffective in producing stable and sufficient gradient feedback to the networks. Instead, we use a fully convolutional neural network as the segmentor to generate segmentation label maps, and propose a novel adversarial critic network with a multi-scale $L_1$ loss function to force the critic and segmentor to learn both global and local features that capture long- and short-range spatial relationships between pixels. In our SegAN framework, the segmentor and critic networks are trained in an alternating fashion in a min-max game: The critic takes as input a pair of images, (original_image $*$ predicted_label_map, original_image $*$ ground_truth_label_map), and then is trained by maximizing a multi-scale loss function; The segmentor is trained with only gradients passed along by the critic, with the aim to minimize the multi-scale loss function. We show that such a SegAN framework is more effective and stable for the segmentation task, and it leads to better performance than the state-of-the-art U-net segmentation method. We tested our SegAN method using datasets from the MICCAI BRATS brain tumor segmentation challenge. Extensive experimental results demonstrate the effectiveness of the proposed SegAN with multi-scale loss: on BRATS 2013 SegAN gives performance comparable to the state-of-the-art for whole tumor and tumor core segmentation while achieves better precision and sensitivity for Gd-enhance tumor core segmentation; on BRATS 2015 SegAN achieves better performance than the state-of-the-art in both dice score and precision.
Solving constrained optimization problems by multi-objective evolutionary algorithms has scored tremendous achievements in the last decade. Standard multi-objective schemes usually aim at minimizing the objective function and also the degree of constraint violation simultaneously. This paper proposes a new multi-objective method for solving constrained optimization problems. The new method keeps two standard objectives: the original objective function and the sum of degrees of constraint violation. But besides them, four more objectives are added. One is based on the feasible rule. The other three come from the penalty functions. This paper conducts an initial experimental study on thirteen benchmark functions. A simplified version of CMODE is applied to solving multi-objective optimization problems. Our initial experimental results confirm our expectation that adding more helper functions could be useful. The performance of SMODE with more helper functions (four or six) is better than that with only two helper functions.
Structure information is ubiquitous in natural scene images and it plays an important role in scene representation. In this paper, third order structure statistics (TOSS) and fourth order structure statistics (FOSS) are exploited to encode higher order structure information. Afterwards, based on the radial and normal slice of TOSS and FOSS, we propose the high order structure feature: third order structure feature (TOSF) and fourth order structure feature (FOSF). It is well known that scene images are well characterized by particular arrangements of their local structures, we divide the scene image into the non-overlapping sub-regions and compute the proposed higher order structural features among them. Then a scene classification is performed by using SVM classifier with these higher order structure features. The experimental results show that higher order structure statistics can deliver image structure information well and its spatial envelope has strong discriminative ability.
In object recognition, Fisher vector (FV) representation is one of the state-of-art image representations ways at the expense of dense, high dimensional features and increased computation time. A simplification of FV is attractive, so we propose Sparse Fisher vector (SFV). By incorporating locality strategy, we can accelerate the Fisher coding step in image categorization which is implemented from a collective of local descriptors. Combining with pooling step, we explore the relationship between coding step and pooling step to give a theoretical explanation about SFV. Experiments on benchmark datasets have shown that SFV leads to a speedup of several-fold of magnitude compares with FV, while maintaining the categorization performance. In addition, we demonstrate how SFV preserves the consistence in representation of similar local features.
We consider the data-driven dictionary learning problem. The goal is to seek an over-complete dictionary from which every training signal can be best approximated by a linear combination of only a few codewords. This task is often achieved by iteratively executing two operations: sparse coding and dictionary update. In the literature, there are two benchmark mechanisms to update a dictionary. The first approach, such as the MOD algorithm, is characterized by searching for the optimal codewords while fixing the sparse coefficients. In the second approach, represented by the K-SVD method, one codeword and the related sparse coefficients are simultaneously updated while all other codewords and coefficients remain unchanged. We propose a novel framework that generalizes the aforementioned two methods. The unique feature of our approach is that one can update an arbitrary set of codewords and the corresponding sparse coefficients simultaneously: when sparse coefficients are fixed, the underlying optimization problem is similar to that in the MOD algorithm; when only one codeword is selected for update, it can be proved that the proposed algorithm is equivalent to the K-SVD method; and more importantly, our method allows us to update all codewords and all sparse coefficients simultaneously, hence the term simultaneous codeword optimization (SimCO). Under the proposed framework, we design two algorithms, namely, primitive and regularized SimCO. We implement these two algorithms based on a simple gradient descent mechanism. Simulations are provided to demonstrate the performance of the proposed algorithms, as compared with two baseline algorithms MOD and K-SVD. Results show that regularized SimCO is particularly appealing in terms of both learning performance and running speed.