This paper achieves state of the art results for the ICD code prediction task using the MIMIC-III dataset. This was achieved through the use of Clinical BERT (Alsentzer et al., 2019). embeddings and text augmentation and label balancing to improve F1 scores for both ICD Chapter as well as ICD disease codes. We attribute the improved performance mainly to the use of novel text augmentation to shuffle the order of sentences during training. In comparison to the Top-32 ICD code prediction (Keyang Xu, et. al.) with an F1 score of 0.76, we achieve a final F1 score of 0.75 but on a total of the top 50 ICD codes.
This paper explores whether the use of drug reviews and social media could be leveraged as potential alternative sources for pharmacovigilance of adverse drug reactions (ADRs). We examined the performance of BERT alongside two variants that are trained on biomedical papers, BioBERT7, and clinical notes, Clinical BERT8. A variety of 8 different BERT models were fine-tuned and compared across three different tasks in order to evaluate their relative performance to one another in the ADR tasks. The tasks include sentiment classification of drug reviews, presence of ADR in twitter postings, and named entity recognition of ADRs in twitter postings. BERT demonstrates its flexibility with high performance across all three different pharmacovigilance related tasks.