Abstract:Large language models are increasingly integrated into news recommendation systems, raising concerns about their role in spreading misinformation. In humans, visual content is known to boost credibility and shareability of information, yet its effect on vision-language models (VLMs) remains unclear. We present the first study examining how images influence VLMs' propensity to reshare news content, whether this effect varies across model families, and how persona conditioning and content attributes modulate this behavior. To support this analysis, we introduce two methodological contributions: a jailbreaking-inspired prompting strategy that elicits resharing decisions from VLMs while simulating users with antisocial traits and political alignments; and a multimodal dataset of fact-checked political news from PolitiFact, paired with corresponding images and ground-truth veracity labels. Experiments across model families reveal that image presence increases resharing rates by 4.8% for true news and 15.0% for false news. Persona conditioning further modulates this effect: Dark Triad traits amplify resharing of false news, whereas Republican-aligned profiles exhibit reduced veracity sensitivity. Of all the tested models, only Claude-3-Haiku demonstrates robustness to visual misinformation. These findings highlight emerging risks in multimodal model behavior and motivate the development of tailored evaluation frameworks and mitigation strategies for personalized AI systems. Code and dataset are available at: https://github.com/3lis/misinfo_vlm
Abstract:This paper proposes a strategy for visual prediction in the context of autonomous driving. Humans, when not distracted or drunk, are still the best drivers you can currently find. For this reason we take inspiration from two theoretical ideas about the human mind and its neural organization. The first idea concerns how the brain uses a hierarchical structure of neuron ensembles to extract abstract concepts from visual experience and code them into compact representations. The second idea suggests that these neural perceptual representations are not neutral but functional to the prediction of the future state of affairs in the environment. Similarly, the prediction mechanism is not neutral but oriented to the current planning of a future action. We identify within the deep learning framework two artificial counterparts of the aforementioned neurocognitive theories. We find a correspondence between the first theoretical idea and the architecture of convolutional autoencoders, while we translate the second theory into a training procedure that learns compact representations which are not neutral but oriented to driving tasks, from two distinct perspectives. From a static perspective, we force groups of neural units in the compact representations to distinctly represent specific concepts crucial to the driving task. From a dynamic perspective, we encourage the compact representations to be predictive of how the current road scenario will change in the future. We successfully learn compact representations that use as few as 16 neural units for each of the two basic driving concepts we consider: car and lane. We prove the efficiency of our proposed perceptual representations on the SYNTHIA dataset. Our source code is available at https://github.com/3lis/rnn_vae