Picture for Francesco Pinto

Francesco Pinto

MAC: Multi-Agent Constitution Learning

Add code
Mar 16, 2026
Viaarxiv icon

VMDT: Decoding the Trustworthiness of Video Foundation Models

Add code
Nov 07, 2025
Viaarxiv icon

Do Role-Playing Agents Practice What They Preach? Belief-Behavior Consistency in LLM-Based Simulations of Human Trust

Add code
Jul 02, 2025
Viaarxiv icon

Where You Place the Norm Matters: From Prejudiced to Neutral Initializations

Add code
May 16, 2025
Viaarxiv icon

AutoRedTeamer: Autonomous Red Teaming with Lifelong Attack Integration

Add code
Mar 20, 2025
Figure 1 for AutoRedTeamer: Autonomous Red Teaming with Lifelong Attack Integration
Figure 2 for AutoRedTeamer: Autonomous Red Teaming with Lifelong Attack Integration
Figure 3 for AutoRedTeamer: Autonomous Red Teaming with Lifelong Attack Integration
Figure 4 for AutoRedTeamer: Autonomous Red Teaming with Lifelong Attack Integration
Viaarxiv icon

MMDT: Decoding the Trustworthiness and Safety of Multimodal Foundation Models

Add code
Mar 19, 2025
Viaarxiv icon

SafeWatch: An Efficient Safety-Policy Following Video Guardrail Model with Transparent Explanations

Add code
Dec 09, 2024
Viaarxiv icon

Copyright-Protected Language Generation via Adaptive Model Fusion

Add code
Dec 09, 2024
Viaarxiv icon

Hidden in Plain Sight: Evaluating Abstract Shape Recognition in Vision-Language Models

Add code
Nov 09, 2024
Viaarxiv icon

Focus On This, Not That! Steering LLMs With Adaptive Feature Specification

Add code
Oct 30, 2024
Viaarxiv icon