Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Ruolin Shen

Align and Surpass Human Camouflaged Perception: Visual Refocus Reinforcement Fine-Tuning

May 26, 2025

Ruolin Shen, Xiaozhong Ji, Kai WU, Jiangning Zhang, Yijun He, HaiHua Yang, Xiaobin Hu, Xiaoyu Sun

Figure 1 for Align and Surpass Human Camouflaged Perception: Visual Refocus Reinforcement Fine-Tuning

Figure 2 for Align and Surpass Human Camouflaged Perception: Visual Refocus Reinforcement Fine-Tuning

Figure 3 for Align and Surpass Human Camouflaged Perception: Visual Refocus Reinforcement Fine-Tuning

Figure 4 for Align and Surpass Human Camouflaged Perception: Visual Refocus Reinforcement Fine-Tuning

Abstract:Current multi-modal models exhibit a notable misalignment with the human visual system when identifying objects that are visually assimilated into the background. Our observations reveal that these multi-modal models cannot distinguish concealed objects, demonstrating an inability to emulate human cognitive processes which effectively utilize foreground-background similarity principles for visual analysis. To analyze this hidden human-model visual thinking discrepancy, we build a visual system that mimicks human visual camouflaged perception to progressively and iteratively `refocus' visual concealed content. The refocus is a progressive guidance mechanism enabling models to logically localize objects in visual images through stepwise reasoning. The localization process of concealed objects requires hierarchical attention shifting with dynamic adjustment and refinement of prior cognitive knowledge. In this paper, we propose a visual refocus reinforcement framework via the policy optimization algorithm to encourage multi-modal models to think and refocus more before answering, and achieve excellent reasoning abilities to align and even surpass human camouflaged perception systems. Our extensive experiments on camouflaged perception successfully demonstrate the emergence of refocus visual phenomena, characterized by multiple reasoning tokens and dynamic adjustment of the detection box. Besides, experimental results on both camouflaged object classification and detection tasks exhibit significantly superior performance compared to Supervised Fine-Tuning (SFT) baselines.

* Project Website: \url{https://github.com/HUuxiaobin/VRRF}

Via

Access Paper or Ask Questions

deepregression: a Flexible Neural Network Framework for Semi-Structured Deep Distributional Regression

Apr 06, 2021

David Rügamer, Ruolin Shen, Christina Bukas, Lisa Barros de Andrade e Sousa, Dominik Thalmeier, Nadja Klein, Chris Kolb, Florian Pfisterer, Philipp Kopper, Bernd Bischl(+1 more)

Figure 1 for deepregression: a Flexible Neural Network Framework for Semi-Structured Deep Distributional Regression

Figure 2 for deepregression: a Flexible Neural Network Framework for Semi-Structured Deep Distributional Regression

Figure 3 for deepregression: a Flexible Neural Network Framework for Semi-Structured Deep Distributional Regression

Figure 4 for deepregression: a Flexible Neural Network Framework for Semi-Structured Deep Distributional Regression

Abstract:This paper describes the implementation of semi-structured deep distributional regression, a flexible framework to learn distributions based on a combination of additive regression models and deep neural networks. deepregression is implemented in both R and Python, using the deep learning libraries TensorFlow and PyTorch, respectively. The implementation consists of (1) a modular neural network building system for the combination of various statistical and deep learning approaches, (2) an orthogonalization cell to allow for an interpretable combination of different subnetworks as well as (3) pre-processing steps necessary to initialize such models. The software package allows to define models in a user-friendly manner using distribution definitions via a formula environment that is inspired by classical statistical model frameworks such as mgcv. The packages' modular design and functionality provides a unique resource for rapid and reproducible prototyping of complex statistical and deep learning models while simultaneously retaining the indispensable interpretability of classical statistical models.

Via

Access Paper or Ask Questions