Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Pascale Fung

Shammie

SNP2Vec: Scalable Self-Supervised Pre-Training for Genome-Wide Association Study

Apr 14, 2022

Samuel Cahyawijaya, Tiezheng Yu, Zihan Liu, Tiffany T. W. Mak, Xiaopu Zhou, Nancy Y. Ip, Pascale Fung

Figure 1 for SNP2Vec: Scalable Self-Supervised Pre-Training for Genome-Wide Association Study

Figure 2 for SNP2Vec: Scalable Self-Supervised Pre-Training for Genome-Wide Association Study

Figure 3 for SNP2Vec: Scalable Self-Supervised Pre-Training for Genome-Wide Association Study

Figure 4 for SNP2Vec: Scalable Self-Supervised Pre-Training for Genome-Wide Association Study

Abstract:Self-supervised pre-training methods have brought remarkable breakthroughs in the understanding of text, image, and speech. Recent developments in genomics has also adopted these pre-training methods for genome understanding. However, they focus only on understanding haploid sequences, which hinders their applicability towards understanding genetic variations, also known as single nucleotide polymorphisms (SNPs), which is crucial for genome-wide association study. In this paper, we introduce SNP2Vec, a scalable self-supervised pre-training approach for understanding SNP. We apply SNP2Vec to perform long-sequence genomics modeling, and we evaluate the effectiveness of our approach on predicting Alzheimer's disease risk in a Chinese cohort. Our approach significantly outperforms existing polygenic risk score methods and all other baselines, including the model that is trained entirely with haploid sequences. We release our code and dataset on https://github.com/HLTCHKUST/snp2vec.

Via

Access Paper or Ask Questions

Clozer: Adaptable Data Augmentation for Cloze-style Reading Comprehension

Apr 12, 2022

Holy Lovenia, Bryan Wilie, Willy Chung, Min Zeng, Samuel Cahyawijaya, Su Dan, Pascale Fung

Figure 1 for Clozer: Adaptable Data Augmentation for Cloze-style Reading Comprehension

Figure 2 for Clozer: Adaptable Data Augmentation for Cloze-style Reading Comprehension

Figure 3 for Clozer: Adaptable Data Augmentation for Cloze-style Reading Comprehension

Figure 4 for Clozer: Adaptable Data Augmentation for Cloze-style Reading Comprehension

Abstract:Task-adaptive pre-training (TAPT) alleviates the lack of labelled data and provides performance lift by adapting unlabelled data to downstream task. Unfortunately, existing adaptations mainly involve deterministic rules that cannot generalize well. Here, we propose Clozer, a sequence-tagging based cloze answer extraction method used in TAPT that is extendable for adaptation on any cloze-style machine reading comprehension (MRC) downstream tasks. We experiment on multiple-choice cloze-style MRC tasks, and show that Clozer performs significantly better compared to the oracle and state-of-the-art in escalating TAPT effectiveness in lifting model performance, and prove that Clozer is able to recognize the gold answers independently of any heuristics.

Via

Access Paper or Ask Questions

Enabling Multimodal Generation on CLIP via Vision-Language Knowledge Distillation

Mar 30, 2022

Wenliang Dai, Lu Hou, Lifeng Shang, Xin Jiang, Qun Liu, Pascale Fung

Figure 1 for Enabling Multimodal Generation on CLIP via Vision-Language Knowledge Distillation

Figure 2 for Enabling Multimodal Generation on CLIP via Vision-Language Knowledge Distillation

Figure 3 for Enabling Multimodal Generation on CLIP via Vision-Language Knowledge Distillation

Figure 4 for Enabling Multimodal Generation on CLIP via Vision-Language Knowledge Distillation

Abstract:The recent large-scale vision-language pre-training (VLP) of dual-stream architectures (e.g., CLIP) with a tremendous amount of image-text pair data, has shown its superiority on various multimodal alignment tasks. Despite its success, the resulting models are not capable of multimodal generative tasks due to the weak text encoder. To tackle this problem, we propose to augment the dual-stream VLP model with a textual pre-trained language model (PLM) via vision-language knowledge distillation (VLKD), enabling the capability for multimodal generation. VLKD is pretty data- and computation-efficient compared to the pre-training from scratch. Experimental results show that the resulting model has strong zero-shot performance on multimodal generation tasks, such as open-ended visual question answering and image captioning. For example, it achieves 44.5% zero-shot accuracy on the VQAv2 dataset, surpassing the previous state-of-the-art zero-shot model with $7\times$ fewer parameters. Furthermore, the original textual language understanding and generation ability of the PLM is maintained after VLKD, which makes our model versatile for both multimodal and unimodal tasks.

* Accepted to ACL 2022

Via

Access Paper or Ask Questions

Read before Generate! Faithful Long Form Question Answering with Machine Reading

Mar 01, 2022

Dan Su, Xiaoguang Li, Jindi Zhang, Lifeng Shang, Xin Jiang, Qun Liu, Pascale Fung

Figure 1 for Read before Generate! Faithful Long Form Question Answering with Machine Reading

Figure 2 for Read before Generate! Faithful Long Form Question Answering with Machine Reading

Figure 3 for Read before Generate! Faithful Long Form Question Answering with Machine Reading

Figure 4 for Read before Generate! Faithful Long Form Question Answering with Machine Reading

Abstract:Long-form question answering (LFQA) aims to generate a paragraph-length answer for a given question. While current work on LFQA using large pre-trained model for generation are effective at producing fluent and somewhat relevant content, one primary challenge lies in how to generate a faithful answer that has less hallucinated content. We propose a new end-to-end framework that jointly models answer generation and machine reading. The key idea is to augment the generation model with fine-grained, answer-related salient information which can be viewed as an emphasis on faithful facts. State-of-the-art results on two LFQA datasets, ELI5 and MS MARCO, demonstrate the effectiveness of our method, in comparison with strong baselines on automatic and human evaluation metrics. A detailed analysis further proves the competency of our methods in generating fluent, relevant, and more faithful answers.

* long paper, accepted to ACL 2022 findings

Via

Access Paper or Ask Questions

VScript: Controllable Script Generation with Audio-Visual Presentation

Mar 01, 2022

Ziwei Ji, Yan Xu, I-Tsun Cheng, Samuel Cahyawijaya, Rita Frieske, Etsuko Ishii, Min Zeng, Andrea Madotto, Pascale Fung

Figure 1 for VScript: Controllable Script Generation with Audio-Visual Presentation

Figure 2 for VScript: Controllable Script Generation with Audio-Visual Presentation

Figure 3 for VScript: Controllable Script Generation with Audio-Visual Presentation

Figure 4 for VScript: Controllable Script Generation with Audio-Visual Presentation

Abstract:Automatic script generation could save a considerable amount of resources and offer inspiration to professional scriptwriters. We present VScript, a controllable pipeline that generates complete scripts including dialogues and scene descriptions, and presents visually using video retrieval and aurally using text-to-speech for spoken dialogue. With an interactive interface, our system allows users to select genres and input starting words that control the theme and development of the generated script. We adopt a hierarchical structure, which generates the plot, then the script and its audio-visual presentation. We also introduce a novel approach to plot-guided dialogue generation by treating it as an inverse dialogue summarization. Experiment results show that our approach outperforms the baselines on both automatic and human evaluations, especially in terms of genre control.

Via

Access Paper or Ask Questions

QA4QG: Using Question Answering to Constrain Multi-Hop Question Generation

Feb 14, 2022

Dan Su, Peng Xu, Pascale Fung

Figure 1 for QA4QG: Using Question Answering to Constrain Multi-Hop Question Generation

Figure 2 for QA4QG: Using Question Answering to Constrain Multi-Hop Question Generation

Figure 3 for QA4QG: Using Question Answering to Constrain Multi-Hop Question Generation

Figure 4 for QA4QG: Using Question Answering to Constrain Multi-Hop Question Generation

Abstract:Multi-hop question generation (MQG) aims to generate complex questions which require reasoning over multiple pieces of information of the input passage. Most existing work on MQG has focused on exploring graph-based networks to equip the traditional Sequence-to-sequence framework with reasoning ability. However, these models do not take full advantage of the constraint between questions and answers. Furthermore, studies on multi-hop question answering (QA) suggest that Transformers can replace the graph structure for multi-hop reasoning. Therefore, in this work, we propose a novel framework, QA4QG, a QA-augmented BART-based framework for MQG. It augments the standard BART model with an additional multi-hop QA module to further constrain the generated question. Our results on the HotpotQA dataset show that QA4QG outperforms all state-of-the-art models, with an increase of 8 BLEU-4 and 8 ROUGE points compared to the best results previously reported. Our work suggests the advantage of introducing pre-trained language models and QA module for the MQG task.

* 4 pages, accepted by ICASSP2022

Via

Access Paper or Ask Questions

Survey of Hallucination in Natural Language Generation

Feb 08, 2022

Ziwei Ji, Nayeon Lee, Rita Frieske, Tiezheng Yu, Dan Su, Yan Xu, Etsuko Ishii, Yejin Bang, Andrea Madotto, Pascale Fung

Figure 1 for Survey of Hallucination in Natural Language Generation

Figure 2 for Survey of Hallucination in Natural Language Generation

Figure 3 for Survey of Hallucination in Natural Language Generation

Figure 4 for Survey of Hallucination in Natural Language Generation

Abstract:Natural Language Generation (NLG) has improved exponentially in recent years thanks to the development of deep learning technologies such as Transformer-based language models. This advancement has led to more fluent and coherent natural language generation, naturally leading to development in downstream tasks such as abstractive summarization, dialogue generation and data-to-text generation. However, it is also investigated that such generation includes hallucinated texts, which makes the performances of text generation fail to meet users' expectations in many real-world scenarios. In order to address this issue, studies in evaluation and mitigation methods of hallucinations have been presented in various tasks, but have not been reviewed in a combined manner. In this survey, we provide a broad overview of the research progress and challenges in the hallucination problem of NLG. The survey is organized into two big divisions: (i) a general overview of metrics, mitigation methods, and future directions; (ii) task-specific research progress for hallucinations in a large set of downstream tasks: abstractive summarization, dialogue generation, generative question answering, data-to-text generation, and machine translation. This survey could facilitate collaborative efforts among researchers in these tasks.

Via

Access Paper or Ask Questions

Automatic Speech Recognition Datasets in Cantonese: A Survey and New Dataset

Jan 17, 2022

Tiezheng Yu, Rita Frieske, Peng Xu, Samuel Cahyawijaya, Cheuk Tung Shadow Yiu, Holy Lovenia, Wenliang Dai, Elham J. Barezi, Qifeng Chen, Xiaojuan Ma(+2 more)

Figure 1 for Automatic Speech Recognition Datasets in Cantonese: A Survey and New Dataset

Figure 2 for Automatic Speech Recognition Datasets in Cantonese: A Survey and New Dataset

Figure 3 for Automatic Speech Recognition Datasets in Cantonese: A Survey and New Dataset

Figure 4 for Automatic Speech Recognition Datasets in Cantonese: A Survey and New Dataset

Abstract:Automatic speech recognition (ASR) on low resource languages improves the access of linguistic minorities to technological advantages provided by artificial intelligence (AI). In this paper, we address the problem of data scarcity for the Hong Kong Cantonese language by creating a new Cantonese dataset. Our dataset, Multi-Domain Cantonese Corpus (MDCC), consists of 73.6 hours of clean read speech paired with transcripts, collected from Cantonese audiobooks from Hong Kong. It comprises philosophy, politics, education, culture, lifestyle and family domains, covering a wide range of topics. We also review all existing Cantonese datasets and analyze them according to their speech type, data source, total size and availability. We further conduct experiments with Fairseq S2T Transformer, a state-of-the-art ASR model, on the biggest existing dataset, Common Voice zh-HK, and our proposed MDCC, and the results show the effectiveness of our dataset. In addition, we create a powerful and robust Cantonese ASR model by applying multi-dataset learning on MDCC and Common Voice zh-HK.

Via

Access Paper or Ask Questions

CI-AVSR: A Cantonese Audio-Visual Speech Dataset for In-car Command Recognition

Jan 11, 2022

Wenliang Dai, Samuel Cahyawijaya, Tiezheng Yu, Elham J. Barezi, Peng Xu, Cheuk Tung Shadow Yiu, Rita Frieske, Holy Lovenia, Genta Indra Winata, Qifeng Chen(+3 more)

Figure 1 for CI-AVSR: A Cantonese Audio-Visual Speech Dataset for In-car Command Recognition

Figure 2 for CI-AVSR: A Cantonese Audio-Visual Speech Dataset for In-car Command Recognition

Figure 3 for CI-AVSR: A Cantonese Audio-Visual Speech Dataset for In-car Command Recognition

Figure 4 for CI-AVSR: A Cantonese Audio-Visual Speech Dataset for In-car Command Recognition

Abstract:With the rise of deep learning and intelligent vehicle, the smart assistant has become an essential in-car component to facilitate driving and provide extra functionalities. In-car smart assistants should be able to process general as well as car-related commands and perform corresponding actions, which eases driving and improves safety. However, there is a data scarcity issue for low resource languages, hindering the development of research and applications. In this paper, we introduce a new dataset, Cantonese In-car Audio-Visual Speech Recognition (CI-AVSR), for in-car command recognition in the Cantonese language with both video and audio data. It consists of 4,984 samples (8.3 hours) of 200 in-car commands recorded by 30 native Cantonese speakers. Furthermore, we augment our dataset using common in-car background noises to simulate real environments, producing a dataset 10 times larger than the collected one. We provide detailed statistics of both the clean and the augmented versions of our dataset. Moreover, we implement two multimodal baselines to demonstrate the validity of CI-AVSR. Experiment results show that leveraging the visual signal improves the overall performance of the model. Although our best model can achieve a considerable quality on the clean test set, the speech recognition quality on the noisy data is still inferior and remains as an extremely challenging task for real in-car speech recognition systems. The dataset and code will be released at https://github.com/HLTCHKUST/CI-AVSR.

* 6 pages

Via

Access Paper or Ask Questions

ASCEND: A Spontaneous Chinese-English Dataset for Code-switching in Multi-turn Conversation

Jan 07, 2022

Holy Lovenia, Samuel Cahyawijaya, Genta Indra Winata, Peng Xu, Xu Yan, Zihan Liu, Rita Frieske, Tiezheng Yu, Wenliang Dai, Elham J. Barezi(+4 more)

Figure 1 for ASCEND: A Spontaneous Chinese-English Dataset for Code-switching in Multi-turn Conversation

Figure 2 for ASCEND: A Spontaneous Chinese-English Dataset for Code-switching in Multi-turn Conversation

Figure 3 for ASCEND: A Spontaneous Chinese-English Dataset for Code-switching in Multi-turn Conversation

Figure 4 for ASCEND: A Spontaneous Chinese-English Dataset for Code-switching in Multi-turn Conversation

Abstract:Code-switching is a speech phenomenon when a speaker switches language during a conversation. Despite the spontaneous nature of code-switching in conversational spoken language, most existing works collect code-switching data through read speech instead of spontaneous speech. ASCEND (A Spontaneous Chinese-English Dataset) introduces a high-quality resource of spontaneous multi-turn conversational dialogue Chinese-English code-switching corpus collected in Hong Kong. We report ASCEND's design and procedure of collecting the speech data, including the annotations in this work. ASCEND includes 23 bilinguals that are fluent in both Chinese and English and consists of 10.62 hours clean speech corpus. We also conduct a baseline experiment using pre-trained wav2vec 2.0 models, achieving the best performance of 22.69% character error rate and 27.05% mixed error rate.

Via

Access Paper or Ask Questions