Picture for Juntao Dai

Juntao Dai

VISA: Value Injection via Shielded Adaptation for Personalized LLM Alignment

Add code
Mar 05, 2026
Viaarxiv icon

Align Once, Benefit Multilingually: Enforcing Multilingual Consistency for LLM Safety Alignment

Add code
Feb 18, 2026
Viaarxiv icon

What, Whether and How? Unveiling Process Reward Models for Thinking with Images Reasoning

Add code
Feb 09, 2026
Viaarxiv icon

AgentDoG: A Diagnostic Guardrail Framework for AI Agent Safety and Security

Add code
Jan 26, 2026
Viaarxiv icon

VLA-Arena: An Open-Source Framework for Benchmarking Vision-Language-Action Models

Add code
Dec 27, 2025
Viaarxiv icon

The Singapore Consensus on Global AI Safety Research Priorities

Add code
Jun 25, 2025
Figure 1 for The Singapore Consensus on Global AI Safety Research Priorities
Figure 2 for The Singapore Consensus on Global AI Safety Research Priorities
Figure 3 for The Singapore Consensus on Global AI Safety Research Priorities
Viaarxiv icon

A Game-Theoretic Negotiation Framework for Cross-Cultural Consensus in LLMs

Add code
Jun 16, 2025
Viaarxiv icon

SafeLawBench: Towards Safe Alignment of Large Language Models

Add code
Jun 07, 2025
Figure 1 for SafeLawBench: Towards Safe Alignment of Large Language Models
Figure 2 for SafeLawBench: Towards Safe Alignment of Large Language Models
Figure 3 for SafeLawBench: Towards Safe Alignment of Large Language Models
Figure 4 for SafeLawBench: Towards Safe Alignment of Large Language Models
Viaarxiv icon

InterMT: Multi-Turn Interleaved Preference Alignment with Human Feedback

Add code
May 29, 2025
Viaarxiv icon

The Mirage of Multimodality: Where Truth is Tested and Honesty Unravels

Add code
May 26, 2025
Viaarxiv icon