Picture for Jiayang Song

Jiayang Song

MMR-AD: A Large-Scale Multimodal Dataset for Benchmarking General Anomaly Detection with Multimodal Large Language Models

Add code
Apr 13, 2026
Viaarxiv icon

Towards Understanding Retrieval Accuracy and Prompt Quality in RAG Systems

Add code
Nov 29, 2024
Viaarxiv icon

LADEV: A Language-Driven Testing and Evaluation Platform for Vision-Language-Action Models in Robotic Manipulation

Add code
Oct 07, 2024
Figure 1 for LADEV: A Language-Driven Testing and Evaluation Platform for Vision-Language-Action Models in Robotic Manipulation
Figure 2 for LADEV: A Language-Driven Testing and Evaluation Platform for Vision-Language-Action Models in Robotic Manipulation
Figure 3 for LADEV: A Language-Driven Testing and Evaluation Platform for Vision-Language-Action Models in Robotic Manipulation
Figure 4 for LADEV: A Language-Driven Testing and Evaluation Platform for Vision-Language-Action Models in Robotic Manipulation
Viaarxiv icon

LeCov: Multi-level Testing Criteria for Large Language Models

Add code
Aug 20, 2024
Figure 1 for LeCov: Multi-level Testing Criteria for Large Language Models
Figure 2 for LeCov: Multi-level Testing Criteria for Large Language Models
Figure 3 for LeCov: Multi-level Testing Criteria for Large Language Models
Figure 4 for LeCov: Multi-level Testing Criteria for Large Language Models
Viaarxiv icon

MORTAR: A Model-based Runtime Action Repair Framework for AI-enabled Cyber-Physical Systems

Add code
Aug 07, 2024
Figure 1 for MORTAR: A Model-based Runtime Action Repair Framework for AI-enabled Cyber-Physical Systems
Figure 2 for MORTAR: A Model-based Runtime Action Repair Framework for AI-enabled Cyber-Physical Systems
Figure 3 for MORTAR: A Model-based Runtime Action Repair Framework for AI-enabled Cyber-Physical Systems
Figure 4 for MORTAR: A Model-based Runtime Action Repair Framework for AI-enabled Cyber-Physical Systems
Viaarxiv icon

Active Testing of Large Language Model via Multi-Stage Sampling

Add code
Aug 07, 2024
Viaarxiv icon

Multilingual Blending: LLM Safety Alignment Evaluation with Language Mixture

Add code
Jul 10, 2024
Figure 1 for Multilingual Blending: LLM Safety Alignment Evaluation with Language Mixture
Figure 2 for Multilingual Blending: LLM Safety Alignment Evaluation with Language Mixture
Figure 3 for Multilingual Blending: LLM Safety Alignment Evaluation with Language Mixture
Figure 4 for Multilingual Blending: LLM Safety Alignment Evaluation with Language Mixture
Viaarxiv icon

GenSafe: A Generalizable Safety Enhancer for Safe Reinforcement Learning Algorithms Based on Reduced Order Markov Decision Process Model

Add code
Jun 06, 2024
Figure 1 for GenSafe: A Generalizable Safety Enhancer for Safe Reinforcement Learning Algorithms Based on Reduced Order Markov Decision Process Model
Figure 2 for GenSafe: A Generalizable Safety Enhancer for Safe Reinforcement Learning Algorithms Based on Reduced Order Markov Decision Process Model
Figure 3 for GenSafe: A Generalizable Safety Enhancer for Safe Reinforcement Learning Algorithms Based on Reduced Order Markov Decision Process Model
Figure 4 for GenSafe: A Generalizable Safety Enhancer for Safe Reinforcement Learning Algorithms Based on Reduced Order Markov Decision Process Model
Viaarxiv icon

Online Safety Analysis for LLMs: a Benchmark, an Assessment, and a Path Forward

Add code
Apr 12, 2024
Viaarxiv icon

LUNA: A Model-Based Universal Analysis Framework for Large Language Models

Add code
Oct 22, 2023
Figure 1 for LUNA: A Model-Based Universal Analysis Framework for Large Language Models
Figure 2 for LUNA: A Model-Based Universal Analysis Framework for Large Language Models
Figure 3 for LUNA: A Model-Based Universal Analysis Framework for Large Language Models
Figure 4 for LUNA: A Model-Based Universal Analysis Framework for Large Language Models
Viaarxiv icon