Alert button
Picture for Minjin Kim

Minjin Kim

Alert button

Dialogue Chain-of-Thought Distillation for Commonsense-aware Conversational Agents

Oct 22, 2023
Hyungjoo Chae, Yongho Song, Kai Tzu-iunn Ong, Taeyoon Kwon, Minjin Kim, Youngjae Yu, Dongha Lee, Dongyeop Kang, Jinyoung Yeo

Figure 1 for Dialogue Chain-of-Thought Distillation for Commonsense-aware Conversational Agents
Figure 2 for Dialogue Chain-of-Thought Distillation for Commonsense-aware Conversational Agents
Figure 3 for Dialogue Chain-of-Thought Distillation for Commonsense-aware Conversational Agents
Figure 4 for Dialogue Chain-of-Thought Distillation for Commonsense-aware Conversational Agents

Human-like chatbots necessitate the use of commonsense reasoning in order to effectively comprehend and respond to implicit information present within conversations. Achieving such coherence and informativeness in responses, however, is a non-trivial task. Even for large language models (LLMs), the task of identifying and aggregating key evidence within a single hop presents a substantial challenge. This complexity arises because such evidence is scattered across multiple turns in a conversation, thus necessitating integration over multiple hops. Hence, our focus is to facilitate such multi-hop reasoning over a dialogue context, namely dialogue chain-of-thought (CoT) reasoning. To this end, we propose a knowledge distillation framework that leverages LLMs as unreliable teachers and selectively distills consistent and helpful rationales via alignment filters. We further present DOCTOR, a DialOgue Chain-of-ThOught Reasoner that provides reliable CoT rationales for response generation. We conduct extensive experiments to show that enhancing dialogue agents with high-quality rationales from DOCTOR significantly improves the quality of their responses.

* 25 pages, 8 figures, Accepted to EMNLP 2023 
Viaarxiv icon

Evidence-empowered Transfer Learning for Alzheimer's Disease

Mar 03, 2023
Kai Tzu-iunn Ong, Hana Kim, Minjin Kim, Jinseong Jang, Beomseok Sohn, Yoon Seong Choi, Dosik Hwang, Seong Jae Hwang, Jinyoung Yeo

Figure 1 for Evidence-empowered Transfer Learning for Alzheimer's Disease
Figure 2 for Evidence-empowered Transfer Learning for Alzheimer's Disease
Figure 3 for Evidence-empowered Transfer Learning for Alzheimer's Disease
Figure 4 for Evidence-empowered Transfer Learning for Alzheimer's Disease

Transfer learning has been widely utilized to mitigate the data scarcity problem in the field of Alzheimer's disease (AD). Conventional transfer learning relies on re-using models trained on AD-irrelevant tasks such as natural image classification. However, it often leads to negative transfer due to the discrepancy between the non-medical source and target medical domains. To address this, we present evidence-empowered transfer learning for AD diagnosis. Unlike conventional approaches, we leverage an AD-relevant auxiliary task, namely morphological change prediction, without requiring additional MRI data. In this auxiliary task, the diagnosis model learns the evidential and transferable knowledge from morphological features in MRI scans. Experimental results demonstrate that our framework is not only effective in improving detection performance regardless of model capacity, but also more data-efficient and faithful.

* Accepted to IEEE International Symposium on Biomedical Imaging (ISBI) 2023 
Viaarxiv icon

TUTORING: Instruction-Grounded Conversational Agent for Language Learners

Feb 24, 2023
Hyungjoo Chae, Minjin Kim, Chaehyeong Kim, Wonseok Jeong, Hyejoong Kim, Junmyung Lee, Jinyoung Yeo

Figure 1 for TUTORING: Instruction-Grounded Conversational Agent for Language Learners
Figure 2 for TUTORING: Instruction-Grounded Conversational Agent for Language Learners

In this paper, we propose Tutoring bot, a generative chatbot trained on a large scale of tutor-student conversations for English-language learning. To mimic a human tutor's behavior in language education, the tutor bot leverages diverse educational instructions and grounds to each instruction as additional input context for the tutor response generation. As a single instruction generally involves multiple dialogue turns to give the student sufficient speaking practice, the tutor bot is required to monitor and capture when the current instruction should be kept or switched to the next instruction. For that, the tutor bot is learned to not only generate responses but also infer its teaching action and progress on the current conversation simultaneously by a multi-task learning scheme. Our Tutoring bot is deployed under a non-commercial use license at https://tutoringai.com.

Viaarxiv icon

Kernel-convoluted Deep Neural Networks with Data Augmentation

Dec 24, 2020
Minjin Kim, Young-geun Kim, Dongha Kim, Yongdai Kim, Myunghee Cho Paik

Figure 1 for Kernel-convoluted Deep Neural Networks with Data Augmentation
Figure 2 for Kernel-convoluted Deep Neural Networks with Data Augmentation
Figure 3 for Kernel-convoluted Deep Neural Networks with Data Augmentation
Figure 4 for Kernel-convoluted Deep Neural Networks with Data Augmentation

The Mixup method (Zhang et al. 2018), which uses linearly interpolated data, has emerged as an effective data augmentation tool to improve generalization performance and the robustness to adversarial examples. The motivation is to curtail undesirable oscillations by its implicit model constraint to behave linearly at in-between observed data points and promote smoothness. In this work, we formally investigate this premise, propose a way to explicitly impose smoothness constraints, and extend it to incorporate with implicit model constraints. First, we derive a new function class composed of kernel-convoluted models (KCM) where the smoothness constraint is directly imposed by locally averaging the original functions with a kernel function. Second, we propose to incorporate the Mixup method into KCM to expand the domains of smoothness. In both cases of KCM and the KCM adapted with the Mixup, we provide risk analysis, respectively, under some conditions for kernels. We show that the upper bound of the excess risk is not slower than that of the original function class. The upper bound of the KCM with the Mixup remains dominated by that of the KCM if the perturbation of the Mixup vanishes faster than \(O(n^{-1/2})\) where \(n\) is a sample size. Using CIFAR-10 and CIFAR-100 datasets, our experiments demonstrate that the KCM with the Mixup outperforms the Mixup method in terms of generalization and robustness to adversarial examples.

Viaarxiv icon