Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Rony Abecidan

CRIStAL

Tackle CSM in JPEG Steganalysis with Data Adaptation

May 19, 2026

Rony Abecidan, Vincent Itier, Jérémie Boulanger, Patrick Bas, Tomáš Pevný

Abstract:Steganalysis models excel on benchmark datasets but struggle in the wild when analyzed images are produced by a processing pipeline unseen during training. This problem known as Cover Source Mismatch (CSM) is particularly hard in realistic settings where practitioners (1) have access to only a small, unlabeled dataset, (2) are unsure of the processing techniques applied to these images, and (3) lack information on the proportion of covers and stegos in that set. To answer this challenge, we introduce TADA (Target Alignment through Data Adaptation), a framework learning to emulate the unknown processing pipeline from a small unlabeled target set. This architecture is trained with a loss combining residual covariance alignment, residual distribution matching, and a $\ell^2$ loss constraining the emulator to produce realistic images. Across toy and operational targets, TADA yields substantial gains in robustness to CSM and improves operational generalization compared to strong holistic and atomistic baselines. Additional resources are available at this link: https://github.com/RonyAbecidan/TADA

* ACM Workshop on Information Hiding and Multimedia Security, (IH&MMSec '26), Jun 2026, Florence, Italy

Via

Access Paper or Ask Questions

Pick the Largest Margin for Robust Detection of Splicing

Sep 05, 2024

Julien Simon de Kergunic, Rony Abecidan, Patrick Bas, Vincent Itier

Figure 1 for Pick the Largest Margin for Robust Detection of Splicing

Figure 2 for Pick the Largest Margin for Robust Detection of Splicing

Figure 3 for Pick the Largest Margin for Robust Detection of Splicing

Figure 4 for Pick the Largest Margin for Robust Detection of Splicing

Abstract:Despite advancements in splicing detection, practitioners still struggle to fully leverage forensic tools from the literature due to a critical issue: deep learning-based detectors are extremely sensitive to their trained instances. Simple post-processing applied to evaluation images can easily decrease their performances, leading to a lack of confidence in splicing detectors for operational contexts. In this study, we show that a deep splicing detector behaves differently against unknown post-processes for different learned weights, even if it achieves similar performances on a test set from the same distribution as its training one. We connect this observation to the fact that different learnings create different latent spaces separating training samples differently. Our experiments reveal a strong correlation between the distributions of latent margins and the ability of the detector to generalize to post-processed images. We thus provide to the practitioner a way to build deep detectors that are more robust than others against post-processing operations, suggesting to train their architecture under different conditions and picking the one maximizing the latent space margin.

Via

Access Paper or Ask Questions

Blind Data Adaptation to tackle Covariate Shift in Operational Steganalysis

May 29, 2024

Rony Abecidan, Vincent Itier, Jérémie Boulanger, Patrick Bas, Tomáš Pevný

Figure 1 for Blind Data Adaptation to tackle Covariate Shift in Operational Steganalysis

Figure 2 for Blind Data Adaptation to tackle Covariate Shift in Operational Steganalysis

Figure 3 for Blind Data Adaptation to tackle Covariate Shift in Operational Steganalysis

Figure 4 for Blind Data Adaptation to tackle Covariate Shift in Operational Steganalysis

Abstract:The proliferation of image manipulation for unethical purposes poses significant challenges in social networks. One particularly concerning method is Image Steganography, allowing individuals to hide illegal information in digital images without arousing suspicions. Such a technique pose severe security risks, making it crucial to develop effective steganalysis methods enabling to detect manipulated images for clandestine communications. Although significant advancements have been achieved with machine learning models, a critical issue remains: the disparity between the controlled datasets used to train steganalysis models against real-world datasets of forensic practitioners, undermining severely the practical effectiveness of standardized steganalysis models. In this paper, we address this issue focusing on a realistic scenario where practitioners lack crucial information about the limited target set of images under analysis, including details about their development process and even whereas it contains manipulated images or not. By leveraging geometric alignment and distribution matching of source and target residuals, we develop TADA (Target Alignment through Data Adaptation), a novel methodology enabling to emulate sources aligned with specific targets in steganalysis, which is also relevant for highly unbalanced targets. The emulator is represented by a light convolutional network trained to align distributions of image residuals. Experimental validation demonstrates the potential of our strategy over traditional methods fighting covariate shift in steganalysis.

Via

Access Paper or Ask Questions

Leveraging Data Geometry to Mitigate CSM in Steganalysis

Oct 06, 2023

Rony Abecidan, Vincent Itier, Jérémie Boulanger, Patrick Bas, Tomáš Pevný

Figure 1 for Leveraging Data Geometry to Mitigate CSM in Steganalysis

Figure 2 for Leveraging Data Geometry to Mitigate CSM in Steganalysis

Figure 3 for Leveraging Data Geometry to Mitigate CSM in Steganalysis

Figure 4 for Leveraging Data Geometry to Mitigate CSM in Steganalysis

Abstract:In operational scenarios, steganographers use sets of covers from various sensors and processing pipelines that differ significantly from those used by researchers to train steganalysis models. This leads to an inevitable performance gap when dealing with out-of-distribution covers, commonly referred to as Cover Source Mismatch (CSM). In this study, we consider the scenario where test images are processed using the same pipeline. However, knowledge regarding both the labels and the balance between cover and stego is missing. Our objective is to identify a training dataset that allows for maximum generalization to our target. By exploring a grid of processing pipelines fostering CSM, we discovered a geometrical metric based on the chordal distance between subspaces spanned by DCTr features, that exhibits high correlation with operational regret while being not affected by the cover-stego balance. Our contribution lies in the development of a strategy that enables the selection or derivation of customized training datasets, enhancing the overall generalization performance for a given target. Experimental validation highlights that our geometry-based optimization strategy outperforms traditional atomistic methods given reasonable assumptions. Additional resources are available at github.com/RonyAbecidan/LeveragingGeometrytoMitigateCSM.

* IEEE International Workshop on Information Forensics and Security (WIFS 2023), Dec 2023, Nuremberg, Germany

Via

Access Paper or Ask Questions

Using Set Covering to Generate Databases for Holistic Steganalysis

Nov 07, 2022

Rony Abecidan, Vincent Itier, Jérémie Boulanger, Patrick Bas, Tomáš Pevný

Figure 1 for Using Set Covering to Generate Databases for Holistic Steganalysis

Figure 2 for Using Set Covering to Generate Databases for Holistic Steganalysis

Figure 3 for Using Set Covering to Generate Databases for Holistic Steganalysis

Figure 4 for Using Set Covering to Generate Databases for Holistic Steganalysis

Abstract:Within an operational framework, covers used by a steganographer are likely to come from different sensors and different processing pipelines than the ones used by researchers for training their steganalysis models. Thus, a performance gap is unavoidable when it comes to out-of-distributions covers, an extremely frequent scenario called Cover Source Mismatch (CSM). Here, we explore a grid of processing pipelines to study the origins of CSM, to better understand it, and to better tackle it. A set-covering greedy algorithm is used to select representative pipelines minimizing the maximum regret between the representative and the pipelines within the set. Our main contribution is a methodology for generating relevant bases able to tackle operational CSM. Experimental validation highlights that, for a given number of training samples, our set covering selection is a better strategy than selecting random pipelines or using all the available pipelines. Our analysis also shows that parameters as denoising, sharpening, and downsampling are very important to foster diversity. Finally, different benchmarks for classical and wild databases show the good generalization property of the extracted databases. Additional resources are available at github.com/RonyAbecidan/HolisticSteganalysisWithSetCovering.

* IEEE International Workshop on Information Forensics and Security (WIFS 2022), Dec 2022, Shanghai, China

Via

Access Paper or Ask Questions