Picture for Xiuying Chen

Xiuying Chen

M3MAD-Bench: Are Multi-Agent Debates Really Effective Across Domains and Modalities?

Add code
Jan 06, 2026
Viaarxiv icon

AI Security Beyond Core Domains: Resume Screening as a Case Study of Adversarial Vulnerabilities in Specialized LLM Applications

Add code
Dec 23, 2025
Figure 1 for AI Security Beyond Core Domains: Resume Screening as a Case Study of Adversarial Vulnerabilities in Specialized LLM Applications
Figure 2 for AI Security Beyond Core Domains: Resume Screening as a Case Study of Adversarial Vulnerabilities in Specialized LLM Applications
Figure 3 for AI Security Beyond Core Domains: Resume Screening as a Case Study of Adversarial Vulnerabilities in Specialized LLM Applications
Figure 4 for AI Security Beyond Core Domains: Resume Screening as a Case Study of Adversarial Vulnerabilities in Specialized LLM Applications
Viaarxiv icon

When Personalization Tricks Detectors: The Feature-Inversion Trap in Machine-Generated Text Detection

Add code
Oct 14, 2025
Viaarxiv icon

A Symbolic Adversarial Learning Framework for Evolving Fake News Generation and Detection

Add code
Aug 27, 2025
Viaarxiv icon

MedKGent: A Large Language Model Agent Framework for Constructing Temporally Evolving Medical Knowledge Graph

Add code
Aug 17, 2025
Figure 1 for MedKGent: A Large Language Model Agent Framework for Constructing Temporally Evolving Medical Knowledge Graph
Figure 2 for MedKGent: A Large Language Model Agent Framework for Constructing Temporally Evolving Medical Knowledge Graph
Figure 3 for MedKGent: A Large Language Model Agent Framework for Constructing Temporally Evolving Medical Knowledge Graph
Figure 4 for MedKGent: A Large Language Model Agent Framework for Constructing Temporally Evolving Medical Knowledge Graph
Viaarxiv icon

SocialMaze: A Benchmark for Evaluating Social Reasoning in Large Language Models

Add code
May 29, 2025
Figure 1 for SocialMaze: A Benchmark for Evaluating Social Reasoning in Large Language Models
Figure 2 for SocialMaze: A Benchmark for Evaluating Social Reasoning in Large Language Models
Figure 3 for SocialMaze: A Benchmark for Evaluating Social Reasoning in Large Language Models
Figure 4 for SocialMaze: A Benchmark for Evaluating Social Reasoning in Large Language Models
Viaarxiv icon

CulFiT: A Fine-grained Cultural-aware LLM Training Paradigm via Multilingual Critique Data Synthesis

Add code
May 26, 2025
Figure 1 for CulFiT: A Fine-grained Cultural-aware LLM Training Paradigm via Multilingual Critique Data Synthesis
Figure 2 for CulFiT: A Fine-grained Cultural-aware LLM Training Paradigm via Multilingual Critique Data Synthesis
Figure 3 for CulFiT: A Fine-grained Cultural-aware LLM Training Paradigm via Multilingual Critique Data Synthesis
Figure 4 for CulFiT: A Fine-grained Cultural-aware LLM Training Paradigm via Multilingual Critique Data Synthesis
Viaarxiv icon

VSCBench: Bridging the Gap in Vision-Language Model Safety Calibration

Add code
May 26, 2025
Figure 1 for VSCBench: Bridging the Gap in Vision-Language Model Safety Calibration
Figure 2 for VSCBench: Bridging the Gap in Vision-Language Model Safety Calibration
Figure 3 for VSCBench: Bridging the Gap in Vision-Language Model Safety Calibration
Figure 4 for VSCBench: Bridging the Gap in Vision-Language Model Safety Calibration
Viaarxiv icon

Cross-Lingual Pitfalls: Automatic Probing Cross-Lingual Weakness of Multilingual Large Language Models

Add code
May 24, 2025
Viaarxiv icon

ManipLVM-R1: Reinforcement Learning for Reasoning in Embodied Manipulation with Large Vision-Language Models

Add code
May 22, 2025
Viaarxiv icon