This paper investigates physical-layer security (PLS) enabled by graph neural networks (GNNs). We propose a two-stage heterogeneous GNN (HGNN) to maximize the secrecy energy efficiency (SEE) of a reconfigurable intelligent surface (RIS)-assisted multi-input-single-output (MISO) system that serves multiple legitimate users (LUs) and eavesdroppers (Eves). The first stage formulates the system as a bipartite graph involving three types of nodes-RIS reflecting elements, LUs, and Eves-with the goal of generating the RIS phase shift matrix. The second stage models the system as a fully connected graph with two types of nodes (LUs and Eves), aiming to produce beamforming and artificial noise (AN) vectors. Both stages adopt an HGNN integrated with a multi-head attention mechanism, and the second stage incorporates two output methods: beam-direct and model-based approaches. The two-stage HGNN is trained in an unsupervised manner and designed to scale with the number of RIS reflecting elements, LUs, and Eves. Numerical results demonstrate that the proposed two-stage HGNN outperforms state-of-the-art GNNs in RIS-aided PLS scenarios. Compared with convex optimization algorithms, it reduces the average running time by three orders of magnitude with a performance loss of less than $4\%$. Additionally, the scalability of the two-stage HGNN is validated through extensive simulations.