Abstract:In data-driven scientific discovery, a challenge lies in classifying well-characterized phenomena while identifying novel anomalies. Current semi-supervised clustering algorithms do not always fully address this duality, often assuming that supervisory signals are globally representative. Consequently, methods often enforce rigid constraints that suppress unanticipated patterns or require a pre-specified number of clusters, rendering them ineffective for genuine novelty detection. To bridge this gap, we introduce CLiMB (CLustering in Multiphase Boundaries), a domain-informed framework decoupling the exploitation of prior knowledge from the exploration of unknown structures. Using a sequential two-phase approach, CLiMB first anchors known clusters using constrained partitioning, and subsequently applies density-based clustering to residual data to reveal arbitrary topologies. We demonstrate this framework on RR Lyrae stars data from the Gaia Data Release 3. CLiMB attains an Adjusted Rand Index of 0.829 with 90% seed coverage in recovering known Milky Way substructures, drastically outperforming heuristic and constraint-based baselines, which stagnate below 0.20. Furthermore, sensitivity analysis confirms CLiMB's superior data efficiency, showing monotonic improvement as knowledge increases. Finally, the framework successfully isolates three dynamical features (Shiva, Shakti, and the Galactic Disk) in the unlabelled field, validating its potential for scientific discovery.
Abstract:RR Lyrae stars (RRLs) are old pulsating variables widely used as metallicity tracers due to the correlation between their metal abundances and light curve morphology. With ESA Gaia DR3 providing light curves for about 270,000 RRLs, there is a pressing need for scalable methods to estimate their metallicities from photometric data. We introduce a unified deep learning framework that estimates metallicities for both fundamental-mode (RRab) and first-overtone (RRc) RRLs using Gaia G-band light curves. This approach extends our previous work on RRab stars to include RRc stars, aiming for high predictive accuracy and broad generalization across both pulsation types. The model is based on a Gated Recurrent Unit (GRU) neural network optimized for time-series extrinsic regression. Our pipeline includes preprocessing steps such as phase folding, smoothing, and sample weighting, and uses photometric metallicities from the literature as training targets. The architecture is designed to handle morphological differences between RRab and RRc light curves without requiring separate models. On held-out validation sets, our GRU model achieves strong performance: for RRab stars, MAE = 0.0565 dex, RMSE = 0.0765 dex, R^2 = 0.9401; for RRc stars, MAE = 0.0505 dex, RMSE = 0.0720 dex, R^2 = 0.9625. These results show the effectiveness of deep learning for large-scale photometric metallicity estimation and support its application to studies of stellar populations and Galactic structure.




Abstract:Astronomy is entering an unprecedented era of Big Data science, driven by missions like the ESA's Gaia telescope, which aims to map the Milky Way in three dimensions. Gaia's vast dataset presents a monumental challenge for traditional analysis methods. The sheer scale of this data exceeds the capabilities of manual exploration, necessitating the utilization of advanced computational techniques. In response to this challenge, we developed a novel approach leveraging deep learning to estimate the metallicity of fundamental mode (ab-type) RR Lyrae stars from their light curves in the Gaia optical G-band. Our study explores applying deep learning techniques, particularly advanced neural network architectures, in predicting photometric metallicity from time-series data. Our deep learning models demonstrated notable predictive performance, with a low mean absolute error (MAE) of 0.0565, the root mean square error (RMSE) achieved is 0.0765 and a high $R^2$ regression performance of 0.9401 measured by cross-validation. The weighted mean absolute error (wMAE) is 0.0563, while the weighted root mean square error (wRMSE) is 0.0763. These results showcase the effectiveness of our approach in accurately estimating metallicity values. Our work underscores the importance of deep learning in astronomical research, particularly with large datasets from missions like Gaia. By harnessing the power of deep learning methods, we can provide precision in analyzing vast datasets, contributing to more precise and comprehensive insights into complex astronomical phenomena.