Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Wray Buntine

Transformer over Pre-trained Transformer for Neural Text Segmentation with Enhanced Topic Coherence

Oct 14, 2021
Kelvin Lo, Yuan Jin, Weicong Tan, Ming Liu, Lan Du, Wray Buntine

Figure 1 for Transformer over Pre-trained Transformer for Neural Text Segmentation with Enhanced Topic Coherence

Figure 2 for Transformer over Pre-trained Transformer for Neural Text Segmentation with Enhanced Topic Coherence

Figure 3 for Transformer over Pre-trained Transformer for Neural Text Segmentation with Enhanced Topic Coherence

Figure 4 for Transformer over Pre-trained Transformer for Neural Text Segmentation with Enhanced Topic Coherence

This paper proposes a transformer over transformer framework, called Transformer$^2$, to perform neural text segmentation. It consists of two components: bottom-level sentence encoders using pre-trained transformers, and an upper-level transformer-based segmentation model based on the sentence embeddings. The bottom-level component transfers the pre-trained knowledge learned from large external corpora under both single and pair-wise supervised NLP tasks to model the sentence embeddings for the documents. Given the sentence embeddings, the upper-level transformer is trained to recover the segmentation boundaries as well as the topic labels of each sentence. Equipped with a multi-task loss and the pre-trained knowledge, Transformer$^2$ can better capture the semantic coherence within the same segments. Our experiments show that (1) Transformer$^2$ manages to surpass state-of-the-art text segmentation models in terms of a commonly-used semantic coherence measure; (2) in most cases, both single and pair-wise pre-trained knowledge contribute to the model performance; (3) bottom-level sentence encoders pre-trained on specific languages yield better performance than those pre-trained on specific domains.

Via

Access Paper or Ask Questions

All Labels Are Not Created Equal: Enhancing Semi-supervision via Label Grouping and Co-training

Apr 12, 2021
Islam Nassar, Samitha Herath, Ehsan Abbasnejad, Wray Buntine, Gholamreza Haffari

Figure 1 for All Labels Are Not Created Equal: Enhancing Semi-supervision via Label Grouping and Co-training

Figure 2 for All Labels Are Not Created Equal: Enhancing Semi-supervision via Label Grouping and Co-training

Figure 3 for All Labels Are Not Created Equal: Enhancing Semi-supervision via Label Grouping and Co-training

Figure 4 for All Labels Are Not Created Equal: Enhancing Semi-supervision via Label Grouping and Co-training

Pseudo-labeling is a key component in semi-supervised learning (SSL). It relies on iteratively using the model to generate artificial labels for the unlabeled data to train against. A common property among its various methods is that they only rely on the model's prediction to make labeling decisions without considering any prior knowledge about the visual similarity among the classes. In this paper, we demonstrate that this degrades the quality of pseudo-labeling as it poorly represents visually similar classes in the pool of pseudo-labeled data. We propose SemCo, a method which leverages label semantics and co-training to address this problem. We train two classifiers with two different views of the class labels: one classifier uses the one-hot view of the labels and disregards any potential similarity among the classes, while the other uses a distributed view of the labels and groups potentially similar classes together. We then co-train the two classifiers to learn based on their disagreements. We show that our method achieves state-of-the-art performance across various SSL tasks including 5.6% accuracy improvement on Mini-ImageNet dataset with 1000 labeled examples. We also show that our method requires smaller batch size and fewer training iterations to reach its best performance. We make our code available at https://github.com/islam-nassar/semco.

* Accepted in CVPR2021

Via

Access Paper or Ask Questions

Topic Modelling Meets Deep Neural Networks: A Survey

Feb 28, 2021
He Zhao, Dinh Phung, Viet Huynh, Yuan Jin, Lan Du, Wray Buntine

Figure 1 for Topic Modelling Meets Deep Neural Networks: A Survey

Topic modelling has been a successful technique for text analysis for almost twenty years. When topic modelling met deep neural networks, there emerged a new and increasingly popular research area, neural topic models, with over a hundred models developed and a wide range of applications in neural language understanding such as text generation, summarisation and language models. There is a need to summarise research developments and discuss open problems and future directions. In this paper, we provide a focused yet comprehensive overview of neural topic models for interested researchers in the AI community, so as to facilitate them to navigate and innovate in this fast-growing research area. To the best of our knowledge, ours is the first review focusing on this specific topic.

* A review on Neural Topic Models

Via

Access Paper or Ask Questions

SQAPlanner: Generating Data-InformedSoftware Quality Improvement Plans

Feb 19, 2021
Dilini Rajapaksha, Chakkrit Tantithamthavorn, Jirayus Jiarpakdee, Christoph Bergmeir, John Grundy, Wray Buntine

Figure 1 for SQAPlanner: Generating Data-InformedSoftware Quality Improvement Plans

Figure 2 for SQAPlanner: Generating Data-InformedSoftware Quality Improvement Plans

Figure 3 for SQAPlanner: Generating Data-InformedSoftware Quality Improvement Plans

Figure 4 for SQAPlanner: Generating Data-InformedSoftware Quality Improvement Plans

Software Quality Assurance (SQA) planning aims to define proactive plans, such as defining maximum file size, to prevent the occurrence of software defects in future releases. To aid this, defect prediction models have been proposed to generate insights as the most important factors that are associated with software quality. Such insights that are derived from traditional defect models are far from actionable-i.e., practitioners still do not know what they should do or avoid to decrease the risk of having defects, and what is the risk threshold for each metric. A lack of actionable guidance and risk threshold can lead to inefficient and ineffective SQA planning processes. In this paper, we investigate the practitioners' perceptions of current SQA planning activities, current challenges of such SQA planning activities, and propose four types of guidance to support SQA planning. We then propose and evaluate our AI-Driven SQAPlanner approach, a novel approach for generating four types of guidance and their associated risk thresholds in the form of rule-based explanations for the predictions of defect prediction models. Finally, we develop and evaluate an information visualization for our SQAPlanner approach. Through the use of qualitative survey and empirical evaluation, our results lead us to conclude that SQAPlanner is needed, effective, stable, and practically applicable. We also find that 80% of our survey respondents perceived that our visualization is more actionable. Thus, our SQAPlanner paves a way for novel research in actionable software analytics-i.e., generating actionable guidance on what should practitioners do and not do to decrease the risk of having defects to support SQA planning.

* This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible 24 pages

Via

Access Paper or Ask Questions

Temporal Cascade and Structural Modelling of EHRs for Granular Readmission Prediction

Feb 04, 2021
Bhagya Hettige, Weiqing Wang, Yuan-Fang Li, Suong Le, Wray Buntine

Figure 1 for Temporal Cascade and Structural Modelling of EHRs for Granular Readmission Prediction

Figure 2 for Temporal Cascade and Structural Modelling of EHRs for Granular Readmission Prediction

Figure 3 for Temporal Cascade and Structural Modelling of EHRs for Granular Readmission Prediction

Figure 4 for Temporal Cascade and Structural Modelling of EHRs for Granular Readmission Prediction

Predicting (1) when the next hospital admission occurs and (2) what will happen in the next admission about a patient by mining electronic health record (EHR) data can provide granular readmission predictions to assist clinical decision making. Recurrent neural network (RNN) and point process models are usually employed in modelling temporal sequential data. Simple RNN models assume that sequences of hospital visits follow strict causal dependencies between consecutive visits. However, in the real-world, a patient may have multiple co-existing chronic medical conditions, i.e., multimorbidity, which results in a cascade of visits where a non-immediate historical visit can be most influential to the next visit. Although a point process (e.g., Hawkes process) is able to model a cascade temporal relationship, it strongly relies on a prior generative process assumption. We propose a novel model, MEDCAS, to address these challenges. MEDCAS combines the strengths of RNN-based models and point processes by integrating point processes in modelling visit types and time gaps into an attention-based sequence-to-sequence learning model, which is able to capture the temporal cascade relationships. To supplement the patients with short visit sequences, a structural modelling technique with graph-based methods is used to construct the markers of the point process in MEDCAS. Extensive experiments on three real-world EHR datasets have been performed and the results demonstrate that \texttt{MEDCAS} outperforms state-of-the-art models in both tasks.

Via

Access Paper or Ask Questions

Discriminative, Generative and Self-Supervised Approaches for Target-Agnostic Learning

Nov 12, 2020
Yuan Jin, Wray Buntine, Francois Petitjean, Geoffrey I. Webb

Figure 1 for Discriminative, Generative and Self-Supervised Approaches for Target-Agnostic Learning

Figure 2 for Discriminative, Generative and Self-Supervised Approaches for Target-Agnostic Learning

Figure 3 for Discriminative, Generative and Self-Supervised Approaches for Target-Agnostic Learning

Figure 4 for Discriminative, Generative and Self-Supervised Approaches for Target-Agnostic Learning

Supervised learning, characterized by both discriminative and generative learning, seeks to predict the values of single (or sometimes multiple) predefined target attributes based on a predefined set of predictor attributes. For applications where the information available and predictions to be made may vary from instance to instance, we propose the task of target-agnostic learning where arbitrary disjoint sets of attributes can be used for each of predictors and targets for each to-be-predicted instance. For this task, we survey a wide range of techniques available for handling missing values, self-supervised training and pseudo-likelihood training, and adapt them to a suite of algorithms that are suitable for the task. We conduct extensive experiments on this suite of algorithms on a large collection of categorical, continuous and discretized datasets, and report their performance in terms of both classification and regression errors. We also report the training and prediction time of these algorithms when handling large-scale datasets. Both generative and self-supervised learning models are shown to perform well at the task, although their characteristics towards the different types of data are quite different. Nevertheless, our derived theorem for the pseudo-likelihood theory also shows that they are related for inferring a joint distribution model based on the pseudo-likelihood training.

Via

Access Paper or Ask Questions

Collective Wisdom: Improving Low-resource Neural Machine Translation using Adaptive Knowledge Distillation

Oct 12, 2020
Fahimeh Saleh, Wray Buntine, Gholamreza Haffari

Figure 1 for Collective Wisdom: Improving Low-resource Neural Machine Translation using Adaptive Knowledge Distillation

Figure 2 for Collective Wisdom: Improving Low-resource Neural Machine Translation using Adaptive Knowledge Distillation

Figure 3 for Collective Wisdom: Improving Low-resource Neural Machine Translation using Adaptive Knowledge Distillation

Figure 4 for Collective Wisdom: Improving Low-resource Neural Machine Translation using Adaptive Knowledge Distillation

Scarcity of parallel sentence-pairs poses a significant hurdle for training high-quality Neural Machine Translation (NMT) models in bilingually low-resource scenarios. A standard approach is transfer learning, which involves taking a model trained on a high-resource language-pair and fine-tuning it on the data of the low-resource MT condition of interest. However, it is not clear generally which high-resource language-pair offers the best transfer learning for the target MT setting. Furthermore, different transferred models may have complementary semantic and/or syntactic strengths, hence using only one model may be sub-optimal. In this paper, we tackle this problem using knowledge distillation, where we propose to distill the knowledge of ensemble of teacher models to a single student model. As the quality of these teacher models varies, we propose an effective adaptive knowledge distillation approach to dynamically adjust the contribution of the teacher models during the distillation process. Experiments on transferring from a collection of six language pairs from IWSLT to five low-resource language-pairs from TED Talks demonstrate the effectiveness of our approach, achieving up to +0.9 BLEU score improvement compared to strong baselines.

Via

Access Paper or Ask Questions

Neural Sinkhorn Topic Model

Aug 12, 2020
He Zhao, Dinh Phung, Viet Huynh, Trung Le, Wray Buntine

Figure 1 for Neural Sinkhorn Topic Model

Figure 2 for Neural Sinkhorn Topic Model

Figure 3 for Neural Sinkhorn Topic Model

Figure 4 for Neural Sinkhorn Topic Model

In this paper, we present a new topic modelling approach via the theory of optimal transport (OT). Specifically, we present a document with two distributions: a distribution over the words (doc-word distribution) and a distribution over the topics (doc-topic distribution). For one document, the doc-word distribution is the observed, sparse, low-level representation of the content, while the doc-topic distribution is the latent, dense, high-level one of the same content. Learning a topic model can then be viewed as a process of minimising the transportation of the semantic information from one distribution to the other. This new viewpoint leads to a novel OT-based topic modelling framework, which enjoys appealing simplicity, effectiveness, and efficiency. Extensive experiments show that our framework significantly outperforms several state-of-the-art models in terms of both topic quality and document representations.

Via

Access Paper or Ask Questions

Hands-on Bayesian Neural Networks -- a Tutorial for Deep Learning Users

Jul 14, 2020
Laurent Valentin Jospin, Wray Buntine, Farid Boussaid, Hamid Laga, Mohammed Bennamoun

Figure 1 for Hands-on Bayesian Neural Networks -- a Tutorial for Deep Learning Users

Figure 2 for Hands-on Bayesian Neural Networks -- a Tutorial for Deep Learning Users

Figure 3 for Hands-on Bayesian Neural Networks -- a Tutorial for Deep Learning Users

Figure 4 for Hands-on Bayesian Neural Networks -- a Tutorial for Deep Learning Users

Modern deep learning methods have equipped researchers and engineers with incredibly powerful tools to tackle problems that previously seemed impossible. However, since deep learning methods operate as black boxes, the uncertainty associated with their predictions is often challenging to quantify. Bayesian statistics offer a formalism to understand and quantify the uncertainty associated with deep neural networks predictions. This paper provides a tutorial for researchers and scientists who are using machine learning, especially deep learning, with an overview of the relevant literature and a complete toolset to design, implement, train, use and evaluate Bayesian neural networks.

* 35 pages, 15 figures

Via

Access Paper or Ask Questions

$\mathtt{MedGraph:}$ Structural and Temporal Representation Learning of Electronic Medical Records

Dec 08, 2019
Bhagya Hettige, Yuan-Fang Li, Weiqing Wang, Suong Le, Wray Buntine

$Figure 1 for $\mathtt{MedGraph:}$ Structural and Temporal Representation Learning of Electronic Medical Records$

$Figure 2 for $\mathtt{MedGraph:}$ Structural and Temporal Representation Learning of Electronic Medical Records$

$Figure 3 for $\mathtt{MedGraph:}$ Structural and Temporal Representation Learning of Electronic Medical Records$

$Figure 4 for $\mathtt{MedGraph:}$ Structural and Temporal Representation Learning of Electronic Medical Records$

Electronic medical record (EMR) data contains historical sequences of visits of patients, and each visit contains rich information, such as patient demographics, hospital utilisation and medical codes, including diagnosis, procedure and medication codes. Most existing EMR embedding methods capture visit-code associations by constructing input visit representations as binary vectors with a static vocabulary of medical codes. With this limited representation, they fail in encapsulating rich attribute information of visits (demographics and utilisation information) and/or codes (e.g., medical code descriptions). Furthermore, current work considers visits of the same patient as discrete-time events and ignores time gaps between them. However, the time gaps between visits depict dynamics of the patient's medical history inducing varying influences on future visits. To address these limitations, we present $\mathtt{MedGraph}$, a supervised EMR embedding method that captures two types of information: (1) the visit-code associations in an attributed bipartite graph, and (2) the temporal sequencing of visits through point processes. $\mathtt{MedGraph}$ produces Gaussian embeddings for visits and codes to model the uncertainty. We evaluate the performance of $\mathtt{MedGraph}$ through an extensive experimental study and show that $\mathtt{MedGraph}$ outperforms state-of-the-art EMR embedding methods in several medical risk prediction tasks.

Via

Access Paper or Ask Questions