Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Jaemin Jo

Measuring the Depth of LLM Unlearning via Activation Patching

May 23, 2026

Jaeung Lee, Dohyun Kim, Jaemin Jo

Abstract:Large language model (LLM) unlearning has emerged as a crucial post-hoc mechanism for privacy protection and AI safety, yet auditing whether target knowledge is truly erased remains challenging. Existing output-level metrics fail to detect when this knowledge remains recoverable from internal representations. Recent white-box studies reveal such residual knowledge but often rely on auxiliary training or dataset-specific adaptations, leaving no generalizable metric. To address these limitations, we propose the Unlearning Depth Score (UDS), a metric that quantifies the mechanistic depth of unlearning via activation patching. UDS first identifies layers that encode the target knowledge using a retain model baseline, then measures how much of it is erased in the unlearned model on a 0-1 scale. In a meta-evaluation across 20 metrics on 150 unlearned models spanning 8 methods, UDS achieves the highest faithfulness and robustness, confirming our causal approach as the most reliable for unlearning evaluation. Case studies further reveal that white-box metrics can disagree at the layer level and that erasure depth varies across examples. We provide guidelines for integrating UDS into existing benchmarking frameworks and streamlining the evaluation pipeline. Code and data are available at https://github.com/gnueaj/unlearning-depth-score

* 18 pages

Via

Access Paper or Ask Questions

GhostUMAP2: Measuring and Analyzing (r,d)-Stability of UMAP

Jul 23, 2025

Myeongwon Jung, Takanori Fujiwara, Jaemin Jo

Abstract:Despite the widespread use of Uniform Manifold Approximation and Projection (UMAP), the impact of its stochastic optimization process on the results remains underexplored. We observed that it often produces unstable results where the projections of data points are determined mostly by chance rather than reflecting neighboring structures. To address this limitation, we introduce (r,d)-stability to UMAP: a framework that analyzes the stochastic positioning of data points in the projection space. To assess how stochastic elements, specifically initial projection positions and negative sampling, impact UMAP results, we introduce "ghosts", or duplicates of data points representing potential positional variations due to stochasticity. We define a data point's projection as (r,d)-stable if its ghosts perturbed within a circle of radius r in the initial projection remain confined within a circle of radius d for their final positions. To efficiently compute the ghost projections, we develop an adaptive dropping scheme that reduces a runtime up to 60% compared to an unoptimized baseline while maintaining approximately 90% of unstable points. We also present a visualization tool that supports the interactive exploration of the (r,d)-stability of data points. Finally, we demonstrate the effectiveness of our framework by examining the stability of projections of real-world datasets and present usage guidelines for the effective use of our framework.

Via

Access Paper or Ask Questions

ZADU: A Python Library for Evaluating the Reliability of Dimensionality Reduction Embeddings

Aug 11, 2023

Hyeon Jeon, Aeri Cho, Jinhwa Jang, Soohyun Lee, Jake Hyun, Hyung-Kwon Ko, Jaemin Jo, Jinwook Seo

Figure 1 for ZADU: A Python Library for Evaluating the Reliability of Dimensionality Reduction Embeddings

Figure 2 for ZADU: A Python Library for Evaluating the Reliability of Dimensionality Reduction Embeddings

Figure 3 for ZADU: A Python Library for Evaluating the Reliability of Dimensionality Reduction Embeddings

Abstract:Dimensionality reduction (DR) techniques inherently distort the original structure of input high-dimensional data, producing imperfect low-dimensional embeddings. Diverse distortion measures have thus been proposed to evaluate the reliability of DR embeddings. However, implementing and executing distortion measures in practice has so far been time-consuming and tedious. To address this issue, we present ZADU, a Python library that provides distortion measures. ZADU is not only easy to install and execute but also enables comprehensive evaluation of DR embeddings through three key features. First, the library covers a wide range of distortion measures. Second, it automatically optimizes the execution of distortion measures, substantially reducing the running time required to execute multiple measures. Last, the library informs how individual points contribute to the overall distortions, facilitating the detailed analysis of DR embeddings. By simulating a real-world scenario of optimizing DR embeddings, we verify that our optimization scheme substantially reduces the time required to execute distortion measures. Finally, as an application of ZADU, we present another library called ZADUVis that allows users to easily create distortion visualizations that depict the extent to which each region of an embedding suffers from distortions.

* 2023 IEEE Visualization and Visual Analytics (IEEE VIS 2023) Short paper

Via

Access Paper or Ask Questions

Uniform Manifold Approximation with Two-phase Optimization

May 01, 2022

Hyeon Jeon, Hyung-Kwon Ko, Soohyun Lee, Jaemin Jo, Jinwook Seo

Figure 1 for Uniform Manifold Approximation with Two-phase Optimization

Figure 2 for Uniform Manifold Approximation with Two-phase Optimization

Figure 3 for Uniform Manifold Approximation with Two-phase Optimization

Abstract:We introduce Uniform Manifold Approximation with Two-phase Optimization (UMATO), a dimensionality reduction (DR) technique that improves UMAP to capture the global structure of high-dimensional data more accurately. In UMATO, optimization is divided into two phases so that the resulting embeddings can depict the global structure reliably while preserving the local structure with sufficient accuracy. As the first phase, hub points are identified and projected to construct a skeletal layout for the global structure. In the second phase, the remaining points are added to the embedding preserving the regional characteristics of local areas. Through quantitative experiments, we found that UMATO (1) outperformed widely used DR techniques in preserving the global structure while (2) producing competitive accuracy in representing the local structure. We also verified that UMATO is preferable in terms of robustness over diverse initialization methods, number of epochs, and subsampling techniques.

* Under review. Hyeon Jeon and Hyung-Kwon Ko equally contributed to this work

Via

Access Paper or Ask Questions

Measuring and Explaining the Inter-Cluster Reliability of Multidimensional Projections

Jul 22, 2021

Hyeon Jeon, Hyung-Kwon Ko, Jaemin Jo, Youngtaek Kim, Jinwook Seo

Figure 1 for Measuring and Explaining the Inter-Cluster Reliability of Multidimensional Projections

Figure 2 for Measuring and Explaining the Inter-Cluster Reliability of Multidimensional Projections

Figure 3 for Measuring and Explaining the Inter-Cluster Reliability of Multidimensional Projections

Figure 4 for Measuring and Explaining the Inter-Cluster Reliability of Multidimensional Projections

Abstract:We propose Steadiness and Cohesiveness, two novel metrics to measure the inter-cluster reliability of multidimensional projection (MDP), specifically how well the inter-cluster structures are preserved between the original high-dimensional space and the low-dimensional projection space. Measuring inter-cluster reliability is crucial as it directly affects how well inter-cluster tasks (e.g., identifying cluster relationships in the original space from a projected view) can be conducted; however, despite the importance of inter-cluster tasks, we found that previous metrics, such as Trustworthiness and Continuity, fail to measure inter-cluster reliability. Our metrics consider two aspects of the inter-cluster reliability: Steadiness measures the extent to which clusters in the projected space form clusters in the original space, and Cohesiveness measures the opposite. They extract random clusters with arbitrary shapes and positions in one space and evaluate how much the clusters are stretched or dispersed in the other space. Furthermore, our metrics can quantify pointwise distortions, allowing for the visualization of inter-cluster reliability in a projection, which we call a reliability map. Through quantitative experiments, we verify that our metrics precisely capture the distortions that harm inter-cluster reliability while previous metrics have difficulty capturing the distortions. A case study also demonstrates that our metrics and the reliability map 1) support users in selecting the proper projection techniques or hyperparameters and 2) prevent misinterpretation while performing inter-cluster tasks, thus allow an adequate identification of inter-cluster structure.

* IEEE Transactions of Visualization and Computer Graphics (TVCG, Proc. VIS 2021), to appear

Via

Access Paper or Ask Questions

IntelliCAT: Intelligent Machine Translation Post-Editing with Quality Estimation and Translation Suggestion

May 25, 2021

Dongjun Lee, Junhyeong Ahn, Heesoo Park, Jaemin Jo

Figure 1 for IntelliCAT: Intelligent Machine Translation Post-Editing with Quality Estimation and Translation Suggestion

Figure 2 for IntelliCAT: Intelligent Machine Translation Post-Editing with Quality Estimation and Translation Suggestion

Figure 3 for IntelliCAT: Intelligent Machine Translation Post-Editing with Quality Estimation and Translation Suggestion

Figure 4 for IntelliCAT: Intelligent Machine Translation Post-Editing with Quality Estimation and Translation Suggestion

Abstract:We present IntelliCAT, an interactive translation interface with neural models that streamline the post-editing process on machine translation output. We leverage two quality estimation (QE) models at different granularities: sentence-level QE, to predict the quality of each machine-translated sentence, and word-level QE, to locate the parts of the machine-translated sentence that need correction. Additionally, we introduce a novel translation suggestion model conditioned on both the left and right contexts, providing alternatives for specific words or phrases for correction. Finally, with word alignments, IntelliCAT automatically preserves the original document's styles in the translated document. The experimental results show that post-editing based on the proposed QE and translation suggestions can significantly improve translation quality. Furthermore, a user study reveals that three features provided in IntelliCAT significantly accelerate the post-editing task, achieving a 52.9\% speedup in translation time compared to translating from scratch. The interface is publicly available at https://intellicat.beringlab.com/.

* ACL 2021 (system demonstration)

Via

Access Paper or Ask Questions