Abstract:Graph neural networks (GNNs) are fundamental tools in graph machine learning. The performance of GNNs relies crucially on the availability of informative node features, which can be limited or absent in real-life datasets and applications. A natural remedy is to augment the node features with embeddings computed from eigenvectors of the graph Laplacian matrix. While it is natural to default to Laplacian spectral embeddings, which capture meaningful graph connectivity information, we ask whether spectral embeddings from alternative graph matrices can also provide useful representations for learning. We introduce Interpolated Laplacian Embeddings (ILEs), which are derived from a simple yet expressive family of graph matrices. Using tools from spectral graph theory, we offer a straightforward interpretation of the structural information that ILEs capture. We demonstrate through simulations and experiments on real-world datasets that feature augmentation via ILEs can improve performance across commonly used GNN architectures. Our work offers a straightforward and practical approach that broadens the practitioner's spectral augmentation toolkit when node features are limited.
Abstract:Nowadays, Large Language Models (LLMs) are trained on huge datasets, some including sensitive information. This poses a serious privacy concern because privacy attacks such as Membership Inference Attacks (MIAs) may detect this sensitive information. While knowledge distillation compresses LLMs into efficient, smaller student models, its impact on privacy remains underexplored. In this paper, we investigate how knowledge distillation affects model robustness against MIA. We focus on two questions. First, how is private data protected in teacher and student models? Second, how can we strengthen privacy preservation against MIAs in knowledge distillation? Through comprehensive experiments, we show that while teacher and student models achieve similar overall MIA accuracy, teacher models better protect member data, the primary target of MIA, whereas student models better protect non-member data. To address this vulnerability in student models, we propose 5 privacy-preserving distillation methods and demonstrate that they successfully reduce student models' vulnerability to MIA, with ensembling further stabilizing the robustness, offering a reliable approach for distilling more secure and efficient student models. Our implementation source code is available at https://github.com/richardcui18/MIA_in_KD.