Abstract:With the increasing demand for data sharing across platforms and organizations, ensuring the privacy and security of sensitive information has become a critical challenge. This paper introduces "TableGuard". An innovative approach to data obfuscation tailored for relational databases. Building on the principles and techniques developed in prior work on context-sensitive obfuscation, TableGuard applies these methods to ensure that API calls return only obfuscated data, thereby safeguarding privacy when sharing data with third parties. TableGuard leverages advanced context-sensitive obfuscation techniques to replace sensitive data elements with contextually appropriate alternatives. By maintaining the relational integrity and coherence of the data, our approach mitigates the risks of cognitive dissonance and data leakage. We demonstrate the implementation of TableGuard using a BERT based transformer model, which identifies and obfuscates sensitive entities within relational tables. Our evaluation shows that TableGuard effectively balances privacy protection with data utility, minimizing information loss while ensuring that the obfuscated data remains functionally useful for downstream applications. The results highlight the importance of domain-specific obfuscation strategies and the role of context length in preserving data integrity. The implications of this research are significant for organizations that need to share data securely with external parties. TableGuard offers a robust framework for implementing privacy-preserving data sharing mechanisms, thereby contributing to the broader field of data privacy and security.
Abstract:In this study, we evaluate the performance of multiple state-of-the-art SRGAN (Super Resolution Generative Adversarial Network) models, ESRGAN, Real-ESRGAN and EDSR, on a benchmark dataset of real-world images which undergo degradation using a pipeline. Our results show that some models seem to significantly increase the resolution of the input images while preserving their visual quality, this is assessed using Tesseract OCR engine. We observe that EDSR-BASE model from huggingface outperforms the remaining candidate models in terms of both quantitative metrics and subjective visual quality assessments with least compute overhead. Specifically, EDSR generates images with higher peak signal-to-noise ratio (PSNR) and structural similarity index (SSIM) values and are seen to return high quality OCR results with Tesseract OCR engine. These findings suggest that EDSR is a robust and effective approach for single-image super-resolution and may be particularly well-suited for applications where high-quality visual fidelity is critical and optimized compute.
Abstract:Protecting sensitive information is crucial in today's world of Large Language Models (LLMs) and data-driven services. One common method used to preserve privacy is by using data perturbation techniques to reduce overreaching utility of (sensitive) Personal Identifiable Information (PII) data while maintaining its statistical and semantic properties. Data perturbation methods often result in significant information loss, making them impractical for use. In this paper, we propose 'Life of PII', a novel Obfuscation Transformer framework for transforming PII into faux-PII while preserving the original information, intent, and context as much as possible. Our approach includes an API to interface with the given document, a configuration-based obfuscator, and a model based on the Transformer architecture, which has shown high context preservation and performance in natural language processing tasks and LLMs. Our Transformer-based approach learns mapping between the original PII and its transformed faux-PII representation, which we call "obfuscated" data. Our experiments demonstrate that our method, called Life of PII, outperforms traditional data perturbation techniques in terms of both utility preservation and privacy protection. We show that our approach can effectively reduce utility loss while preserving the original information, offering greater flexibility in the trade-off between privacy protection and data utility. Our work provides a solution for protecting PII in various real-world applications.
Abstract:This technical report summarizes submissions and compiles from Actor-Action video classification challenge held as a final project in CSC 249/449 Machine Vision course (Spring 2020) at University of Rochester