Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Ni Putu Intan Maharani

Domain-Specific Language Model Post-Training for Indonesian Financial NLP

Oct 15, 2023

Ni Putu Intan Maharani, Yoga Yustiawan, Fauzy Caesar Rochim, Ayu Purwarianti

Figure 1 for Domain-Specific Language Model Post-Training for Indonesian Financial NLP

Figure 2 for Domain-Specific Language Model Post-Training for Indonesian Financial NLP

Figure 3 for Domain-Specific Language Model Post-Training for Indonesian Financial NLP

Figure 4 for Domain-Specific Language Model Post-Training for Indonesian Financial NLP

Abstract:BERT and IndoBERT have achieved impressive performance in several NLP tasks. There has been several investigation on its adaption in specialized domains especially for English language. We focus on financial domain and Indonesian language, where we perform post-training on pre-trained IndoBERT for financial domain using a small scale of Indonesian financial corpus. In this paper, we construct an Indonesian self-supervised financial corpus, Indonesian financial sentiment analysis dataset, Indonesian financial topic classification dataset, and release a family of BERT models for financial NLP. We also evaluate the effectiveness of domain-specific post-training on sentiment analysis and topic classification tasks. Our findings indicate that the post-training increases the effectiveness of a language model when it is fine-tuned to domain-specific downstream tasks.

* Accepted in ICEEI 2023 (International Conference on Electrical Engineering and Informatics 2023)

Via

Access Paper or Ask Questions

Low-Resource Clickbait Spoiling for Indonesian via Question Answering

Oct 12, 2023

Ni Putu Intan Maharani, Ayu Purwarianti, Alham Fikri Aji

Figure 1 for Low-Resource Clickbait Spoiling for Indonesian via Question Answering

Figure 2 for Low-Resource Clickbait Spoiling for Indonesian via Question Answering

Figure 3 for Low-Resource Clickbait Spoiling for Indonesian via Question Answering

Figure 4 for Low-Resource Clickbait Spoiling for Indonesian via Question Answering

Abstract:Clickbait spoiling aims to generate a short text to satisfy the curiosity induced by a clickbait post. As it is a newly introduced task, the dataset is only available in English so far. Our contributions include the construction of manually labeled clickbait spoiling corpus in Indonesian and an evaluation on using cross-lingual zero-shot question answering-based models to tackle clikcbait spoiling for low-resource language like Indonesian. We utilize selection of multilingual language models. The experimental results suggest that XLM-RoBERTa (large) model outperforms other models for phrase and passage spoilers, meanwhile, mDeBERTa (base) model outperforms other models for multipart spoilers.

* Accepted in ICAICTA 2023 (10th International Conference on Advanced Informatics: Concepts, Theory and Applications)

Via

Access Paper or Ask Questions