Alert button
Picture for Jichen Yang

Jichen Yang

Alert button

Segment Anything Model for Medical Image Analysis: an Experimental Study

Apr 25, 2023
Maciej A. Mazurowski, Haoyu Dong, Hanxue Gu, Jichen Yang, Nicholas Konz, Yixin Zhang

Figure 1 for Segment Anything Model for Medical Image Analysis: an Experimental Study
Figure 2 for Segment Anything Model for Medical Image Analysis: an Experimental Study
Figure 3 for Segment Anything Model for Medical Image Analysis: an Experimental Study
Figure 4 for Segment Anything Model for Medical Image Analysis: an Experimental Study

Training segmentation models for medical images continues to be challenging due to the limited availability and acquisition expense of data annotations. Segment Anything Model (SAM) is a foundation model trained on over 1 billion annotations, predominantly for natural images, that is intended to be able to segment the user-defined object of interest in an interactive manner. Despite its impressive performance on natural images, it is unclear how the model is affected when shifting to medical image domains. Here, we perform an extensive evaluation of SAM's ability to segment medical images on a collection of 11 medical imaging datasets from various modalities and anatomies. In our experiments, we generated point prompts using a standard method that simulates interactive segmentation. Experimental results show that SAM's performance based on single prompts highly varies depending on the task and the dataset, i.e., from 0.1135 for a spine MRI dataset to 0.8650 for a hip x-ray dataset, evaluated by IoU. Performance appears to be high for tasks including well-circumscribed objects with unambiguous prompts and poorer in many other scenarios such as segmentation of tumors. When multiple prompts are provided, performance improves only slightly overall, but more so for datasets where the object is not contiguous. An additional comparison to RITM showed a much better performance of SAM for one prompt but a similar performance of the two methods for a larger number of prompts. We conclude that SAM shows impressive performance for some datasets given the zero-shot learning setup but poor to moderate performance for multiple other datasets. While SAM as a model and as a learning paradigm might be impactful in the medical imaging domain, extensive research is needed to identify the proper ways of adapting it in this domain.

* Link to our code: https://github.com/mazurowski-lab/segment-anything-medical 
Viaarxiv icon

Deep Learning for Classification of Thyroid Nodules on Ultrasound: Validation on an Independent Dataset

Jul 27, 2022
Jingxi Weng, Benjamin Wildman-Tobriner, Mateusz Buda, Jichen Yang, Lisa M. Ho, Brian C. Allen, Wendy L. Ehieli, Chad M. Miller, Jikai Zhang, Maciej A. Mazurowski

Figure 1 for Deep Learning for Classification of Thyroid Nodules on Ultrasound: Validation on an Independent Dataset
Figure 2 for Deep Learning for Classification of Thyroid Nodules on Ultrasound: Validation on an Independent Dataset
Figure 3 for Deep Learning for Classification of Thyroid Nodules on Ultrasound: Validation on an Independent Dataset
Figure 4 for Deep Learning for Classification of Thyroid Nodules on Ultrasound: Validation on an Independent Dataset

Objectives: The purpose is to apply a previously validated deep learning algorithm to a new thyroid nodule ultrasound image dataset and compare its performances with radiologists. Methods: Prior study presented an algorithm which is able to detect thyroid nodules and then make malignancy classifications with two ultrasound images. A multi-task deep convolutional neural network was trained from 1278 nodules and originally tested with 99 separate nodules. The results were comparable with that of radiologists. The algorithm was further tested with 378 nodules imaged with ultrasound machines from different manufacturers and product types than the training cases. Four experienced radiologists were requested to evaluate the nodules for comparison with deep learning. Results: The Area Under Curve (AUC) of the deep learning algorithm and four radiologists were calculated with parametric, binormal estimation. For the deep learning algorithm, the AUC was 0.70 (95% CI: 0.64 - 0.75). The AUC of radiologists were 0.66 (95% CI: 0.61 - 0.71), 0.67 (95% CI:0.62 - 0.73), 0.68 (95% CI: 0.63 - 0.73), and 0.66 (95%CI: 0.61 - 0.71). Conclusion: In the new testing dataset, the deep learning algorithm achieved similar performances with all four radiologists.

Viaarxiv icon

Deep learning-based algorithm for assessment of knee osteoarthritis severity in radiographs matches performance of radiologists

Jul 25, 2022
Albert Swiecicki, Nianyi Li, Jonathan O'Donnell, Nicholas Said, Jichen Yang, Richard C. Mather, William A. Jiranek, Maciej A. Mazurowski

Figure 1 for Deep learning-based algorithm for assessment of knee osteoarthritis severity in radiographs matches performance of radiologists
Figure 2 for Deep learning-based algorithm for assessment of knee osteoarthritis severity in radiographs matches performance of radiologists
Figure 3 for Deep learning-based algorithm for assessment of knee osteoarthritis severity in radiographs matches performance of radiologists
Figure 4 for Deep learning-based algorithm for assessment of knee osteoarthritis severity in radiographs matches performance of radiologists

A fully-automated deep learning algorithm matched performance of radiologists in assessment of knee osteoarthritis severity in radiographs using the Kellgren-Lawrence grading system. To develop an automated deep learning-based algorithm that jointly uses Posterior-Anterior (PA) and Lateral (LAT) views of knee radiographs to assess knee osteoarthritis severity according to the Kellgren-Lawrence grading system. We used a dataset of 9739 exams from 2802 patients from Multicenter Osteoarthritis Study (MOST). The dataset was divided into a training set of 2040 patients, a validation set of 259 patients and a test set of 503 patients. A novel deep learning-based method was utilized for assessment of knee OA in two steps: (1) localization of knee joints in the images, (2) classification according to the KL grading system. Our method used both PA and LAT views as the input to the model. The scores generated by the algorithm were compared to the grades provided in the MOST dataset for the entire test set as well as grades provided by 5 radiologists at our institution for a subset of the test set. The model obtained a multi-class accuracy of 71.90% on the entire test set when compared to the ratings provided in the MOST dataset. The quadratic weighted Kappa coefficient for this set was 0.9066. The average quadratic weighted Kappa between all pairs of radiologists from our institution who took a part of study was 0.748. The average quadratic-weighted Kappa between the algorithm and the radiologists at our institution was 0.769. The proposed model performed demonstrated equivalency of KL classification to MSK radiologists, but clearly superior reproducibility. Our model also agreed with radiologists at our institution to the same extent as the radiologists with each other. The algorithm could be used to provide reproducible assessment of knee osteoarthritis severity.

* Computers in Biology and Medicine Computers in Biology and Medicine, Volume 133, June 2021, 104334  
Viaarxiv icon

Automated Grading of Radiographic Knee Osteoarthritis Severity Combined with Joint Space Narrowing

Mar 16, 2022
Hanxue Gu, Keyu Li, Roy J. Colglazier, Jichen Yang, Michael Lebhar, Jonathan O'Donnell, William A. Jiranek, Richard C. Mather, Rob J. French, Nicholas Said, Jikai Zhang, Christine Park, Maciej A. Mazurowski

Figure 1 for Automated Grading of Radiographic Knee Osteoarthritis Severity Combined with Joint Space Narrowing
Figure 2 for Automated Grading of Radiographic Knee Osteoarthritis Severity Combined with Joint Space Narrowing
Figure 3 for Automated Grading of Radiographic Knee Osteoarthritis Severity Combined with Joint Space Narrowing
Figure 4 for Automated Grading of Radiographic Knee Osteoarthritis Severity Combined with Joint Space Narrowing

The assessment of knee osteoarthritis (KOA) severity on knee X-rays is a central criteria for the use of total knee arthroplasty. However, this assessment suffers from imprecise standards and a remarkably high inter-reader variability. An algorithmic, automated assessment of KOA severity could improve overall outcomes of knee replacement procedures by increasing the appropriateness of its use. We propose a novel deep learning-based five-step algorithm to automatically grade KOA from posterior-anterior (PA) views of radiographs: (1) image preprocessing (2) localization of knees joints in the image using the YOLO v3-Tiny model, (3) initial assessment of the severity of osteoarthritis using a convolutional neural network-based classifier, (4) segmentation of the joints and calculation of the joint space narrowing (JSN), and (5), a combination of the JSN and the initial assessment to determine a final Kellgren-Lawrence (KL) score. Furthermore, by displaying the segmentation masks used to make the assessment, our algorithm demonstrates a higher degree of transparency compared to typical "black box" deep learning classifiers. We perform a comprehensive evaluation using two public datasets and one dataset from our institution, and show that our algorithm reaches state-of-the art performance. Moreover, we also collected ratings from multiple radiologists at our institution and showed that our algorithm performs at the radiologist level. The software has been made publicly available at https://github.com/MaciejMazurowski/osteoarthritis-classification.

Viaarxiv icon

Key-Sparse Transformer with Cascaded Cross-Attention Block for Multimodal Speech Emotion Recognition

Jun 22, 2021
Weidong Chen, Xiaofeng Xing, Xiangmin Xu, Jichen Yang, Jianxin Pang

Figure 1 for Key-Sparse Transformer with Cascaded Cross-Attention Block for Multimodal Speech Emotion Recognition
Figure 2 for Key-Sparse Transformer with Cascaded Cross-Attention Block for Multimodal Speech Emotion Recognition
Figure 3 for Key-Sparse Transformer with Cascaded Cross-Attention Block for Multimodal Speech Emotion Recognition
Figure 4 for Key-Sparse Transformer with Cascaded Cross-Attention Block for Multimodal Speech Emotion Recognition

Speech emotion recognition is a challenging and important research topic that plays a critical role in human-computer interaction. Multimodal inputs can improve the performance as more emotional information is used for recognition. However, existing studies learnt all the information in the sample while only a small portion of it is about emotion. Moreover, under the multimodal framework, the interaction between different modalities is shallow and insufficient. In this paper, a keysparse Transformer is proposed for efficient SER by only focusing on emotion related information. Furthermore, a cascaded cross-attention block, which is specially designed for multimodal framework, is introduced to achieve deep interaction between different modalities. The proposed method is evaluated by IEMOCAP corpus and the experimental results show that the proposed method gives better performance than the state-of-theart approaches.

Viaarxiv icon

Data Augmentation with Signal Companding for Detection of Logical Access Attacks

Feb 12, 2021
Rohan Kumar Das, Jichen Yang, Haizhou Li

Figure 1 for Data Augmentation with Signal Companding for Detection of Logical Access Attacks
Figure 2 for Data Augmentation with Signal Companding for Detection of Logical Access Attacks
Figure 3 for Data Augmentation with Signal Companding for Detection of Logical Access Attacks
Figure 4 for Data Augmentation with Signal Companding for Detection of Logical Access Attacks

The recent advances in voice conversion (VC) and text-to-speech (TTS) make it possible to produce natural sounding speech that poses threat to automatic speaker verification (ASV) systems. To this end, research on spoofing countermeasures has gained attention to protect ASV systems from such attacks. While the advanced spoofing countermeasures are able to detect known nature of spoofing attacks, they are not that effective under unknown attacks. In this work, we propose a novel data augmentation technique using a-law and mu-law based signal companding. We believe that the proposed method has an edge over traditional data augmentation by adding small perturbation or quantization noise. The studies are conducted on ASVspoof 2019 logical access corpus using light convolutional neural network based system. We find that the proposed data augmentation technique based on signal companding outperforms the state-of-the-art spoofing countermeasures showing ability to handle unknown nature of attacks.

* 5 pages, Accepted for publication in International Conference on Acoustics, Speech, and Signal Processing (ICASSP) 2021 
Viaarxiv icon

GridTracer: Automatic Mapping of Power Grids using Deep Learning and Overhead Imagery

Jan 16, 2021
Bohao Huang, Jichen Yang, Artem Streltsov, Kyle Bradbury, Leslie M. Collins, Jordan Malof

Figure 1 for GridTracer: Automatic Mapping of Power Grids using Deep Learning and Overhead Imagery
Figure 2 for GridTracer: Automatic Mapping of Power Grids using Deep Learning and Overhead Imagery
Figure 3 for GridTracer: Automatic Mapping of Power Grids using Deep Learning and Overhead Imagery
Figure 4 for GridTracer: Automatic Mapping of Power Grids using Deep Learning and Overhead Imagery

Energy system information valuable for electricity access planning such as the locations and connectivity of electricity transmission and distribution towers, termed the power grid, is often incomplete, outdated, or altogether unavailable. Furthermore, conventional means for collecting this information is costly and limited. We propose to automatically map the grid in overhead remotely sensed imagery using deep learning. Towards this goal, we develop and publicly-release a large dataset ($263km^2$) of overhead imagery with ground truth for the power grid, to our knowledge this is the first dataset of its kind in the public domain. Additionally, we propose scoring metrics and baseline algorithms for two grid mapping tasks: (1) tower recognition and (2) power line interconnection (i.e., estimating a graph representation of the grid). We hope the availability of the training data, scoring metrics, and baselines will facilitate rapid progress on this important problem to help decision-makers address the energy needs of societies around the world.

Viaarxiv icon

Towards Noise-Robust Neural Networks via Progressive Adversarial Training

Sep 17, 2019
Hang Yu, Aishan Liu, Xianglong Liu, Jichen Yang, Chongzhi Zhang

Figure 1 for Towards Noise-Robust Neural Networks via Progressive Adversarial Training
Figure 2 for Towards Noise-Robust Neural Networks via Progressive Adversarial Training
Figure 3 for Towards Noise-Robust Neural Networks via Progressive Adversarial Training
Figure 4 for Towards Noise-Robust Neural Networks via Progressive Adversarial Training

Adversarial examples, intentionally designed inputs tending to mislead deep neural networks, have attracted great attention in the past few years. Although a series of defense strategies have been developed and achieved encouraging model robustness, most of them are still vulnerable to the more commonly witnessed corruptions, e.g., Gaussian noise, blur, etc., in the real world. In this paper, we theoretically and empirically discover the fact that there exists an inherent connection between adversarial robustness and corruption robustness. Based on the fundamental discovery, this paper further proposes a more powerful training method named Progressive Adversarial Training (PAT) that adds diversified adversarial noises progressively during training, and thus obtains robust model against both adversarial examples and corruptions through higher training data complexity. Meanwhile, we also theoretically find that PAT can promise better generalization ability. Experimental evaluation on MNIST, CIFAR-10 and SVHN show that PAT is able to enhance the robustness and generalization of the state-of-the-art network structures, performing comprehensively well compared to various augmentation methods. Moreover, we also propose Mixed Test to evaluate model generalization ability more fairly.

Viaarxiv icon

I4U Submission to NIST SRE 2018: Leveraging from a Decade of Shared Experiences

Apr 16, 2019
Kong Aik Lee, Ville Hautamaki, Tomi Kinnunen, Hitoshi Yamamoto, Koji Okabe, Ville Vestman, Jing Huang, Guohong Ding, Hanwu Sun, Anthony Larcher, Rohan Kumar Das, Haizhou Li, Mickael Rouvier, Pierre-Michel Bousquet, Wei Rao, Qing Wang, Chunlei Zhang, Fahimeh Bahmaninezhad, Hector Delgado, Jose Patino, Qiongqiong Wang, Ling Guo, Takafumi Koshinaka, Jiacen Zhang, Koichi Shinoda, Trung Ngo Trong, Md Sahidullah, Fan Lu, Yun Tang, Ming Tu, Kah Kuan Teh, Huy Dat Tran, Kuruvachan K. George, Ivan Kukanov, Florent Desnous, Jichen Yang, Emre Yilmaz, Longting Xu, Jean-Francois Bonastre, Chenglin Xu, Zhi Hao Lim, Eng Siong Chng, Shivesh Ranjan, John H. L. Hansen, Massimiliano Todisco, Nicholas Evans

Figure 1 for I4U Submission to NIST SRE 2018: Leveraging from a Decade of Shared Experiences
Figure 2 for I4U Submission to NIST SRE 2018: Leveraging from a Decade of Shared Experiences
Figure 3 for I4U Submission to NIST SRE 2018: Leveraging from a Decade of Shared Experiences
Figure 4 for I4U Submission to NIST SRE 2018: Leveraging from a Decade of Shared Experiences

The I4U consortium was established to facilitate a joint entry to NIST speaker recognition evaluations (SRE). The latest edition of such joint submission was in SRE 2018, in which the I4U submission was among the best-performing systems. SRE'18 also marks the 10-year anniversary of I4U consortium into NIST SRE series of evaluation. The primary objective of the current paper is to summarize the results and lessons learned based on the twelve sub-systems and their fusion submitted to SRE'18. It is also our intention to present a shared view on the advancements, progresses, and major paradigm shifts that we have witnessed as an SRE participant in the past decade from SRE'08 to SRE'18. In this regard, we have seen, among others, a paradigm shift from supervector representation to deep speaker embedding, and a switch of research challenge from channel compensation to domain adaptation.

* 5 pages 
Viaarxiv icon