Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Minseok Choi

ExpGuard: LLM Content Moderation in Specialized Domains

Mar 03, 2026

Minseok Choi, Dongjin Kim, Seungbin Yang, Subin Kim, Youngjun Kwak, Juyoung Oh, Jaegul Choo, Jungmin Son

Abstract:With the growing deployment of large language models (LLMs) in real-world applications, establishing robust safety guardrails to moderate their inputs and outputs has become essential to ensure adherence to safety policies. Current guardrail models predominantly address general human-LLM interactions, rendering LLMs vulnerable to harmful and adversarial content within domain-specific contexts, particularly those rich in technical jargon and specialized concepts. To address this limitation, we introduce ExpGuard, a robust and specialized guardrail model designed to protect against harmful prompts and responses across financial, medical, and legal domains. In addition, we present ExpGuardMix, a meticulously curated dataset comprising 58,928 labeled prompts paired with corresponding refusal and compliant responses, from these specific sectors. This dataset is divided into two subsets: ExpGuardTrain, for model training, and ExpGuardTest, a high-quality test set annotated by domain experts to evaluate model robustness against technical and domain-specific content. Comprehensive evaluations conducted on ExpGuardTest and eight established public benchmarks reveal that ExpGuard delivers competitive performance across the board while demonstrating exceptional resilience to domain-specific adversarial attacks, surpassing state-of-the-art models such as WildGuard by up to 8.9% in prompt classification and 15.3% in response classification. To encourage further research and development, we open-source our code, data, and model, enabling adaptation to additional domains and supporting the creation of increasingly robust guardrail models.

* ICLR 2026

Via

Access Paper or Ask Questions

Physics-Informed Laplace Neural Operator for Solving Partial Differential Equations

Feb 13, 2026

Heechang Kim, Qianying Cao, Hyomin Shin, Seungchul Lee, George Em Karniadakis, Minseok Choi

Abstract:Neural operators have emerged as fast surrogate solvers for parametric partial differential equations (PDEs). However, purely data-driven models often require extensive training data and can generalize poorly, especially in small-data regimes and under unseen (out-of-distribution) input functions that are not represented in the training data. To address these limitations, we propose the Physics-Informed Laplace Neural Operator (PILNO), which enhances the Laplace Neural Operator (LNO) by embedding governing physics into training through PDE, boundary condition, and initial condition residuals. To improve expressivity, we first introduce an Advanced LNO (ALNO) backbone that retains a pole-residue transient representation while replacing the steady-state branch with an FNO-style Fourier multiplier. To make physics-informed training both data-efficient and robust, PILNO further leverages (i) virtual inputs: an unlabeled ensemble of input functions spanning a broad spectral range that provides abundant physics-only supervision and explicitly targets out-of-distribution (OOD) regimes; and (ii) temporal-causality weighting: a time-decaying reweighting of the physics residual that prioritizes early-time dynamics and stabilizes optimization for time-dependent PDEs. Across four representative benchmarks -- Burgers' equation, Darcy flow, a reaction-diffusion system, and a forced KdV equation -- PILNO consistently improves accuracy in small-data settings (e.g., N_train <= 27), reduces run-to-run variability across random seeds, and achieves stronger OOD generalization than purely data-driven baselines.

* 38 pages,19 figures

Via

Access Paper or Ask Questions

NavFormer: IGRF Forecasting in Moving Coordinate Frames

Jan 14, 2026

Yoontae Hwang, Dongwoo Lee, Minseok Choi, Yong Sup Ihn, Daham Kim, Deok-Young Lee

Abstract:Triad magnetometer components change with sensor attitude even when the IGRF total intensity target stays invariant. NavFormer forecasts this invariant target with rotation invariant scalar features and a Canonical SPD module that stabilizes the spectrum of window level second moments of the triads without sign discontinuities. The module builds a canonical frame from a Gram matrix per window and applies state dependent spectral scaling in the original coordinates. Experiments across five flights show lower error than strong baselines in standard training, few shot training, and zero shot transfer. The code is available at: https://anonymous.4open.science/r/NavFormer-Robust-IGRF-Forecasting-for-Autonomous-Navigators-0765

Via

Access Paper or Ask Questions

Breaking Chains: Unraveling the Links in Multi-Hop Knowledge Unlearning

Oct 17, 2024

Minseok Choi, ChaeHun Park, Dohyun Lee, Jaegul Choo

Figure 1 for Breaking Chains: Unraveling the Links in Multi-Hop Knowledge Unlearning

Figure 2 for Breaking Chains: Unraveling the Links in Multi-Hop Knowledge Unlearning

Figure 3 for Breaking Chains: Unraveling the Links in Multi-Hop Knowledge Unlearning

Figure 4 for Breaking Chains: Unraveling the Links in Multi-Hop Knowledge Unlearning

Abstract:Large language models (LLMs) serve as giant information stores, often including personal or copyrighted data, and retraining them from scratch is not a viable option. This has led to the development of various fast, approximate unlearning techniques to selectively remove knowledge from LLMs. Prior research has largely focused on minimizing the probabilities of specific token sequences by reversing the language modeling objective. However, these methods still leave LLMs vulnerable to adversarial attacks that exploit indirect references. In this work, we examine the limitations of current unlearning techniques in effectively erasing a particular type of indirect prompt: multi-hop queries. Our findings reveal that existing methods fail to completely remove multi-hop knowledge when one of the intermediate hops is unlearned. To address this issue, we propose MUNCH, a simple uncertainty-based approach that breaks down multi-hop queries into subquestions and leverages the uncertainty of the unlearned model in final decision-making. Empirical results demonstrate the effectiveness of our framework, and MUNCH can be easily integrated with existing unlearning techniques, making it a flexible and useful solution for enhancing unlearning processes.

* 16 pages, 5 figures

Via

Access Paper or Ask Questions

Online-Score-Aided Federated Learning: Taming the Resource Constraints in Wireless Networks

Aug 12, 2024

Md Ferdous Pervej, Minseok Choi, Andreas F. Molisch

Abstract:While FL is a widely popular distributed ML strategy that protects data privacy, time-varying wireless network parameters and heterogeneous system configurations of the wireless device pose significant challenges. Although the limited radio and computational resources of the network and the clients, respectively, are widely acknowledged, two critical yet often ignored aspects are (a) wireless devices can only dedicate a small chunk of their limited storage for the FL task and (b) new training samples may arrive in an online manner in many practical wireless applications. Therefore, we propose a new FL algorithm called OSAFL, specifically designed to learn tasks relevant to wireless applications under these practical considerations. Since it has long been proven that under extreme resource constraints, clients may perform an arbitrary number of local training steps, which may lead to client drift under statistically heterogeneous data distributions, we leverage normalized gradient similarities and exploit weighting clients' updates based on optimized scores that facilitate the convergence rate of the proposed OSAFL algorithm. Our extensive simulation results on two different tasks -- each with three different datasets -- with four popular ML models validate the effectiveness of OSAFL compared to six existing state-of-the-art FL baselines.

* Under review for possible publication in IEEE Transactions on Wireless Communications (TWC)

Via

Access Paper or Ask Questions

Protecting Privacy Through Approximating Optimal Parameters for Sequence Unlearning in Language Models

Jun 20, 2024

Dohyun Lee, Daniel Rim, Minseok Choi, Jaegul Choo

Abstract:Although language models (LMs) demonstrate exceptional capabilities on various tasks, they are potentially vulnerable to extraction attacks, which represent a significant privacy risk. To mitigate the privacy concerns of LMs, machine unlearning has emerged as an important research area, which is utilized to induce the LM to selectively forget about some of its training data. While completely retraining the model will guarantee successful unlearning and privacy assurance, it is impractical for LMs, as it would be time-consuming and resource-intensive. Prior works efficiently unlearn the target token sequences, but upon subsequent iterations, the LM displays significant degradation in performance. In this work, we propose Privacy Protection via Optimal Parameters (POP), a novel unlearning method that effectively forgets the target token sequences from the pretrained LM by applying optimal gradient updates to the parameters. Inspired by the gradient derivation of complete retraining, we approximate the optimal training objective that successfully unlearns the target sequence while retaining the knowledge from the rest of the training data. Experimental results demonstrate that POP exhibits remarkable retention performance post-unlearning across 9 classification and 4 dialogue benchmarks, outperforming the state-of-the-art by a large margin. Furthermore, we introduce Remnant Memorization Accuracy that quantifies privacy risks based on token likelihood and validate its effectiveness through both qualitative and quantitative analyses.

* Accepted to ACL2024 findings

Via

Access Paper or Ask Questions

SNAP: Unlearning Selective Knowledge in Large Language Models with Negative Instructions

Jun 18, 2024

Minseok Choi, Daniel Rim, Dohyun Lee, Jaegul Choo

Abstract:Instruction-following large language models (LLMs), such as ChatGPT, have become increasingly popular with the general audience, many of whom are incorporating them into their daily routines. However, these LLMs inadvertently disclose personal or copyrighted information, which calls for a machine unlearning method to remove selective knowledge. Previous attempts sought to forget the link between the target information and its associated entities, but it rather led to generating undesirable responses about the target, compromising the end-user experience. In this work, we propose SNAP, an innovative framework designed to selectively unlearn information by 1) training an LLM with negative instructions to generate obliterated responses, 2) augmenting hard positives to retain the original LLM performance, and 3) applying the novel Wasserstein regularization to ensure adequate deviation from the initial weights of the LLM. We evaluate our framework on various NLP benchmarks and demonstrate that our approach retains the original LLM capabilities, while successfully unlearning the specified information.

* 16 pages, 5 figures

Via

Access Paper or Ask Questions

Cross-Lingual Unlearning of Selective Knowledge in Multilingual Language Models

Jun 18, 2024

Minseok Choi, Kyunghyun Min, Jaegul Choo

Figure 1 for Cross-Lingual Unlearning of Selective Knowledge in Multilingual Language Models

Figure 2 for Cross-Lingual Unlearning of Selective Knowledge in Multilingual Language Models

Figure 3 for Cross-Lingual Unlearning of Selective Knowledge in Multilingual Language Models

Figure 4 for Cross-Lingual Unlearning of Selective Knowledge in Multilingual Language Models

Abstract:Pretrained language models memorize vast amounts of information, including private and copyrighted data, raising significant safety concerns. Retraining these models after excluding sensitive data is prohibitively expensive, making machine unlearning a viable, cost-effective alternative. Previous research has focused on machine unlearning for monolingual models, but we find that unlearning in one language does not necessarily transfer to others. This vulnerability makes models susceptible to low-resource language attacks, where sensitive information remains accessible in less dominant languages. This paper presents a pioneering approach to machine unlearning for multilingual language models, selectively erasing information across different languages while maintaining overall performance. Specifically, our method employs an adaptive unlearning scheme that assigns language-dependent weights to address different language performances of multilingual language models. Empirical results demonstrate the effectiveness of our framework compared to existing unlearning baselines, setting a new standard for secure and adaptable multilingual language models.

* 15 pages, 5 figures

Via

Access Paper or Ask Questions

PairEval: Open-domain Dialogue Evaluation with Pairwise Comparison

Apr 01, 2024

ChaeHun Park, Minseok Choi, Dohyun Lee, Jaegul Choo

Figure 1 for PairEval: Open-domain Dialogue Evaluation with Pairwise Comparison

Figure 2 for PairEval: Open-domain Dialogue Evaluation with Pairwise Comparison

Figure 3 for PairEval: Open-domain Dialogue Evaluation with Pairwise Comparison

Figure 4 for PairEval: Open-domain Dialogue Evaluation with Pairwise Comparison

Abstract:Building a reliable and automated evaluation metric is a necessary but challenging problem for open-domain dialogue systems. Recent studies proposed evaluation metrics that assess generated responses by considering their relevance to previous dialogue histories. Although effective, these metrics evaluate individual responses directly rather than considering their relative quality compared to other responses. To handle this, we propose PairEval, a novel dialogue evaluation metric for assessing responses by comparing their quality against responses in different conversations. PairEval is built on top of open-sourced and moderate-size language models, and we make them specialized in pairwise comparison between dialogue responses. Extensive experiments on multiple benchmarks demonstrate that our metric exhibits a higher correlation with human judgments than baseline metrics. We also find that the proposed comparative metric is more robust in detecting common failures from open-domain dialogue systems, including repetition and speaker insensitivity.

Via

Access Paper or Ask Questions

SimCKP: Simple Contrastive Learning of Keyphrase Representations

Oct 12, 2023

Minseok Choi, Chaeheon Gwak, Seho Kim, Si Hyeong Kim, Jaegul Choo

Figure 1 for SimCKP: Simple Contrastive Learning of Keyphrase Representations

Figure 2 for SimCKP: Simple Contrastive Learning of Keyphrase Representations

Figure 3 for SimCKP: Simple Contrastive Learning of Keyphrase Representations

Figure 4 for SimCKP: Simple Contrastive Learning of Keyphrase Representations

Abstract:Keyphrase generation (KG) aims to generate a set of summarizing words or phrases given a source document, while keyphrase extraction (KE) aims to identify them from the text. Because the search space is much smaller in KE, it is often combined with KG to predict keyphrases that may or may not exist in the corresponding document. However, current unified approaches adopt sequence labeling and maximization-based generation that primarily operate at a token level, falling short in observing and scoring keyphrases as a whole. In this work, we propose SimCKP, a simple contrastive learning framework that consists of two stages: 1) An extractor-generator that extracts keyphrases by learning context-aware phrase-level representations in a contrastive manner while also generating keyphrases that do not appear in the document; 2) A reranker that adapts scores for each generated phrase by likewise aligning their representations with the corresponding document. Experimental results on multiple benchmark datasets demonstrate the effectiveness of our proposed approach, which outperforms the state-of-the-art models by a significant margin.

* Accepted to Findings of EMNLP 2023

Via

Access Paper or Ask Questions