The rapid advancement of large language models (LLMs) has enabled natural language processing capabilities similar to those of humans, and LLMs are being widely utilized across various societal domains such as education and healthcare. While the versatility of these models has increased, they have the potential to generate subjective and normative language, leading to discriminatory treatment or outcomes among social groups, especially due to online offensive language. In this paper, we define such harm as societal bias and assess ethnic, gender, and racial biases in a model fine-tuned with Korean comments using Bidirectional Encoder Representations from Transformers (KcBERT) and KOLD data through template-based Masked Language Modeling (MLM). To quantitatively evaluate biases, we employ LPBS and CBS metrics. Compared to KcBERT, the fine-tuned model shows a reduction in ethnic bias but demonstrates significant changes in gender and racial biases. Based on these results, we propose two methods to mitigate societal bias. Firstly, a data balancing approach during the pre-training phase adjusts the uniformity of data by aligning the distribution of the occurrences of specific words and converting surrounding harmful words into non-harmful words. Secondly, during the in-training phase, we apply Debiasing Regularization by adjusting dropout and regularization, confirming a decrease in training loss. Our contribution lies in demonstrating that societal bias exists in Korean language models due to language-dependent characteristics.
In this multi-task learning study on simultaneous analysis of emotions and their underlying causes in conversational contexts, deep neural network methods were employed to effectively process and train large labeled datasets. However, these approaches are typically limited to conducting context analyses across the entire corpus because they rely on one of the two methods: word- or sentence-level embedding. The former struggles with polysemy and homonyms, whereas the latter causes information loss when processing long sentences. In this study, we overcome the limitations of previous embeddings by utilizing both word- and sentence-level embeddings. Furthermore, we propose the emotion-causality recognition in conversation (ECRC) model, which is based on a novel graph structure, thereby leveraging the strengths of both embedding methods. This model uniquely integrates the bidirectional long short-term memory (Bi-LSTM) and graph neural network (GCN) models for Korean conversation analysis. Compared with models that rely solely on one embedding method, the proposed model effectively structures abstract concepts, such as language features and relationships, thereby minimizing information loss. To assess model performance, we compared the multi-task learning results of three deep neural network models with varying graph structures. Additionally, we evaluated the proposed model using Korean and English datasets. The experimental results show that the proposed model performs better in emotion and causality multi-task learning (74.62% and 75.30%, respectively) when node and edge characteristics are incorporated into the graph structure. Similar results were recorded for the Korean ECC and Wellness datasets (74.62% and 73.44%, respectively) with 71.35% on the IEMOCAP English dataset.