Modal identification is crucial for structural health monitoring and structural control, providing critical insights into structural dynamics and performance. This study presents a novel deep learning framework that integrates graph neural networks (GNNs), transformers, and a physics-informed loss function to achieve modal decomposition and identification across a population of structures. The transformer module decomposes multi-degrees-of-freedom (MDOF) structural dynamic measurements into single-degree-of-freedom (SDOF) modal responses, facilitating the identification of natural frequencies and damping ratios. Concurrently, the GNN captures the structural configurations and identifies mode shapes corresponding to the decomposed SDOF modal responses. The proposed model is trained in a purely physics-informed and unsupervised manner, leveraging modal decomposition theory and the independence of structural modes to guide learning without the need for labeled data. Validation through numerical simulations and laboratory experiments demonstrates its effectiveness in accurately decomposing dynamic responses and identifying modal properties from sparse structural dynamic measurements, regardless of variations in external loads or structural configurations. Comparative analyses against established modal identification techniques and model variations further underscore its superior performance, positioning it as a favorable approach for population-based structural health monitoring.