Alert button
Picture for Samuel Cahyawijaya

Samuel Cahyawijaya

Alert button

GEMv2: Multilingual NLG Benchmarking in a Single Line of Code

Jun 23, 2022
Sebastian Gehrmann, Abhik Bhattacharjee, Abinaya Mahendiran, Alex Wang, Alexandros Papangelis, Aman Madaan, Angelina McMillan-Major, Anna Shvets, Ashish Upadhyay, Bingsheng Yao, Bryan Wilie, Chandra Bhagavatula, Chaobin You, Craig Thomson, Cristina Garbacea, Dakuo Wang, Daniel Deutsch, Deyi Xiong, Di Jin, Dimitra Gkatzia, Dragomir Radev, Elizabeth Clark, Esin Durmus, Faisal Ladhak, Filip Ginter, Genta Indra Winata, Hendrik Strobelt, Hiroaki Hayashi, Jekaterina Novikova, Jenna Kanerva, Jenny Chim, Jiawei Zhou, Jordan Clive, Joshua Maynez, João Sedoc, Juraj Juraska, Kaustubh Dhole, Khyathi Raghavi Chandu, Laura Perez-Beltrachini, Leonardo F. R. Ribeiro, Lewis Tunstall, Li Zhang, Mahima Pushkarna, Mathias Creutz, Michael White, Mihir Sanjay Kale, Moussa Kamal Eddine, Nico Daheim, Nishant Subramani, Ondrej Dusek, Paul Pu Liang, Pawan Sasanka Ammanamanchi, Qi Zhu, Ratish Puduppully, Reno Kriz, Rifat Shahriyar, Ronald Cardenas, Saad Mahamood, Salomey Osei, Samuel Cahyawijaya, Sanja Štajner, Sebastien Montella, Shailza, Shailza Jolly, Simon Mille, Tahmid Hasan, Tianhao Shen, Tosin Adewumi, Vikas Raunak, Vipul Raheja, Vitaly Nikolaev, Vivian Tsai, Yacine Jernite, Ying Xu, Yisi Sang, Yixin Liu, Yufang Hou

Figure 1 for GEMv2: Multilingual NLG Benchmarking in a Single Line of Code
Figure 2 for GEMv2: Multilingual NLG Benchmarking in a Single Line of Code
Figure 3 for GEMv2: Multilingual NLG Benchmarking in a Single Line of Code
Figure 4 for GEMv2: Multilingual NLG Benchmarking in a Single Line of Code
Viaarxiv icon

NusaX: Multilingual Parallel Sentiment Dataset for 10 Indonesian Local Languages

May 31, 2022
Genta Indra Winata, Alham Fikri Aji, Samuel Cahyawijaya, Rahmad Mahendra, Fajri Koto, Ade Romadhony, Kemal Kurniawan, David Moeljadi, Radityo Eko Prasojo, Pascale Fung, Timothy Baldwin, Jey Han Lau, Rico Sennrich, Sebastian Ruder

Figure 1 for NusaX: Multilingual Parallel Sentiment Dataset for 10 Indonesian Local Languages
Figure 2 for NusaX: Multilingual Parallel Sentiment Dataset for 10 Indonesian Local Languages
Figure 3 for NusaX: Multilingual Parallel Sentiment Dataset for 10 Indonesian Local Languages
Figure 4 for NusaX: Multilingual Parallel Sentiment Dataset for 10 Indonesian Local Languages
Viaarxiv icon

SNP2Vec: Scalable Self-Supervised Pre-Training for Genome-Wide Association Study

Apr 14, 2022
Samuel Cahyawijaya, Tiezheng Yu, Zihan Liu, Tiffany T. W. Mak, Xiaopu Zhou, Nancy Y. Ip, Pascale Fung

Figure 1 for SNP2Vec: Scalable Self-Supervised Pre-Training for Genome-Wide Association Study
Figure 2 for SNP2Vec: Scalable Self-Supervised Pre-Training for Genome-Wide Association Study
Figure 3 for SNP2Vec: Scalable Self-Supervised Pre-Training for Genome-Wide Association Study
Figure 4 for SNP2Vec: Scalable Self-Supervised Pre-Training for Genome-Wide Association Study
Viaarxiv icon

Can Question Rewriting Help Conversational Question Answering?

Apr 13, 2022
Etsuko Ishii, Yan Xu, Samuel Cahyawijaya, Bryan Wilie

Figure 1 for Can Question Rewriting Help Conversational Question Answering?
Figure 2 for Can Question Rewriting Help Conversational Question Answering?
Figure 3 for Can Question Rewriting Help Conversational Question Answering?
Figure 4 for Can Question Rewriting Help Conversational Question Answering?
Viaarxiv icon

Clozer: Adaptable Data Augmentation for Cloze-style Reading Comprehension

Apr 12, 2022
Holy Lovenia, Bryan Wilie, Willy Chung, Min Zeng, Samuel Cahyawijaya, Su Dan, Pascale Fung

Figure 1 for Clozer: Adaptable Data Augmentation for Cloze-style Reading Comprehension
Figure 2 for Clozer: Adaptable Data Augmentation for Cloze-style Reading Comprehension
Figure 3 for Clozer: Adaptable Data Augmentation for Cloze-style Reading Comprehension
Figure 4 for Clozer: Adaptable Data Augmentation for Cloze-style Reading Comprehension
Viaarxiv icon

One Country, 700+ Languages: NLP Challenges for Underrepresented Languages and Dialects in Indonesia

Mar 24, 2022
Alham Fikri Aji, Genta Indra Winata, Fajri Koto, Samuel Cahyawijaya, Ade Romadhony, Rahmad Mahendra, Kemal Kurniawan, David Moeljadi, Radityo Eko Prasojo, Timothy Baldwin, Jey Han Lau, Sebastian Ruder

Figure 1 for One Country, 700+ Languages: NLP Challenges for Underrepresented Languages and Dialects in Indonesia
Figure 2 for One Country, 700+ Languages: NLP Challenges for Underrepresented Languages and Dialects in Indonesia
Figure 3 for One Country, 700+ Languages: NLP Challenges for Underrepresented Languages and Dialects in Indonesia
Figure 4 for One Country, 700+ Languages: NLP Challenges for Underrepresented Languages and Dialects in Indonesia
Viaarxiv icon

VScript: Controllable Script Generation with Audio-Visual Presentation

Mar 01, 2022
Ziwei Ji, Yan Xu, I-Tsun Cheng, Samuel Cahyawijaya, Rita Frieske, Etsuko Ishii, Min Zeng, Andrea Madotto, Pascale Fung

Figure 1 for VScript: Controllable Script Generation with Audio-Visual Presentation
Figure 2 for VScript: Controllable Script Generation with Audio-Visual Presentation
Figure 3 for VScript: Controllable Script Generation with Audio-Visual Presentation
Figure 4 for VScript: Controllable Script Generation with Audio-Visual Presentation
Viaarxiv icon

Automatic Speech Recognition Datasets in Cantonese: A Survey and New Dataset

Jan 17, 2022
Tiezheng Yu, Rita Frieske, Peng Xu, Samuel Cahyawijaya, Cheuk Tung Shadow Yiu, Holy Lovenia, Wenliang Dai, Elham J. Barezi, Qifeng Chen, Xiaojuan Ma, Bertram E. Shi, Pascale Fung

Figure 1 for Automatic Speech Recognition Datasets in Cantonese: A Survey and New Dataset
Figure 2 for Automatic Speech Recognition Datasets in Cantonese: A Survey and New Dataset
Figure 3 for Automatic Speech Recognition Datasets in Cantonese: A Survey and New Dataset
Figure 4 for Automatic Speech Recognition Datasets in Cantonese: A Survey and New Dataset
Viaarxiv icon

CI-AVSR: A Cantonese Audio-Visual Speech Dataset for In-car Command Recognition

Jan 11, 2022
Wenliang Dai, Samuel Cahyawijaya, Tiezheng Yu, Elham J. Barezi, Peng Xu, Cheuk Tung Shadow Yiu, Rita Frieske, Holy Lovenia, Genta Indra Winata, Qifeng Chen, Xiaojuan Ma, Bertram E. Shi, Pascale Fung

Figure 1 for CI-AVSR: A Cantonese Audio-Visual Speech Dataset for In-car Command Recognition
Figure 2 for CI-AVSR: A Cantonese Audio-Visual Speech Dataset for In-car Command Recognition
Figure 3 for CI-AVSR: A Cantonese Audio-Visual Speech Dataset for In-car Command Recognition
Figure 4 for CI-AVSR: A Cantonese Audio-Visual Speech Dataset for In-car Command Recognition
Viaarxiv icon

ASCEND: A Spontaneous Chinese-English Dataset for Code-switching in Multi-turn Conversation

Jan 07, 2022
Holy Lovenia, Samuel Cahyawijaya, Genta Indra Winata, Peng Xu, Xu Yan, Zihan Liu, Rita Frieske, Tiezheng Yu, Wenliang Dai, Elham J. Barezi, Qifeng Chen, Xiaojuan Ma, Bertram E. Shi, Pascale Fung

Figure 1 for ASCEND: A Spontaneous Chinese-English Dataset for Code-switching in Multi-turn Conversation
Figure 2 for ASCEND: A Spontaneous Chinese-English Dataset for Code-switching in Multi-turn Conversation
Figure 3 for ASCEND: A Spontaneous Chinese-English Dataset for Code-switching in Multi-turn Conversation
Figure 4 for ASCEND: A Spontaneous Chinese-English Dataset for Code-switching in Multi-turn Conversation
Viaarxiv icon