Abstract:Deep learning models benefit from increasing data diversity and volume, motivating synthetic data augmentation to improve existing datasets. However, existing evaluation metrics for synthetic data typically calculate latent feature similarity, which is difficult to interpret and does not always correlate with the contribution to downstream tasks. We propose a vision-language grounded framework for interpretable synthetic data augmentation and evaluation in remote sensing. Our approach combines generative models, semantic segmentation and image captioning with vision and language models. Based on this framework, we introduce ARAS400k: A large-scale Remote sensing dataset Augmented with Synthetic data for segmentation and captioning, containing 100k real images and 300k synthetic images, each paired with segmentation maps and descriptions. ARAS400k enables the automated evaluation of synthetic data by analyzing semantic composition, minimizing caption redundancy, and verifying cross-modal consistency between visual structures and language descriptions. Experimental results indicate that while models trained exclusively on synthetic data reach competitive performance levels, those trained with augmented data (a combination of real and synthetic images) consistently outperform real-data baselines. Consequently, this work establishes a scalable benchmark for remote sensing tasks, specifically in semantic segmentation and image captioning. The dataset is available at zenodo.org/records/18890661 and the code base at github.com/caglarmert/ARAS400k.




Abstract:Contemporary deep learning models have demonstrated promising results across various applications within seismology and earthquake engineering. These models rely primarily on utilizing ground motion records for tasks such as earthquake event classification, localization, earthquake early warning systems, and structural health monitoring. However, the extent to which these models effectively learn from these complex time-series signals has not been thoroughly analyzed. In this study, our objective is to evaluate the degree to which auxiliary information, such as seismic phase arrival times or seismic station distribution within a network, dominates the process of deep learning from ground motion records, potentially hindering its effectiveness. We perform a hyperparameter search on two deep learning models to assess their effectiveness in deep learning from ground motion records while also examining the impact of auxiliary information on model performance. Experimental results reveal a strong reliance on the highly correlated P and S phase arrival information. Our observations highlight a potential gap in the field, indicating an absence of robust methodologies for deep learning of single-station ground motion recordings independent of any auxiliary information.




Abstract:Endoscopic imaging is commonly used to diagnose Ulcerative Colitis (UC) and classify its severity. It has been shown that deep learning based methods are effective in automated analysis of these images and can potentially be used to aid medical doctors. Unleashing the full potential of these methods depends on the availability of large amount of labeled images; however, obtaining and labeling these images are quite challenging. In this paper, we propose a active learning based generative augmentation method. The method involves generating a large number of synthetic samples by training using a small dataset consisting of real endoscopic images. The resulting data pool is narrowed down by using active learning methods to select the most informative samples, which are then used to train a classifier. We demonstrate the effectiveness of our method through experiments on a publicly available endoscopic image dataset. The results show that using synthesized samples in conjunction with active learning leads to improved classification performance compared to using only the original labeled examples and the baseline classification performance of 68.1% increases to 74.5% in terms of Quadratic Weighted Kappa (QWK) Score. Another observation is that, attaining equivalent performance using only real data necessitated three times higher number of images.