Picture for Ronghui Mu

Ronghui Mu

Principal Eigenvalue Regularization for Improved Worst-Class Certified Robustness of Smoothed Classifiers

Add code
Mar 21, 2025
Viaarxiv icon

Invariant Correlation of Representation with Label

Add code
Jul 01, 2024
Viaarxiv icon

Safeguarding Large Language Models: A Survey

Add code
Jun 03, 2024
Figure 1 for Safeguarding Large Language Models: A Survey
Figure 2 for Safeguarding Large Language Models: A Survey
Figure 3 for Safeguarding Large Language Models: A Survey
Figure 4 for Safeguarding Large Language Models: A Survey
Viaarxiv icon

Towards Fairness-Aware Adversarial Learning

Add code
Feb 27, 2024
Viaarxiv icon

Building Guardrails for Large Language Models

Add code
Feb 02, 2024
Figure 1 for Building Guardrails for Large Language Models
Figure 2 for Building Guardrails for Large Language Models
Figure 3 for Building Guardrails for Large Language Models
Figure 4 for Building Guardrails for Large Language Models
Viaarxiv icon

Reward Certification for Policy Smoothed Reinforcement Learning

Add code
Dec 12, 2023
Figure 1 for Reward Certification for Policy Smoothed Reinforcement Learning
Figure 2 for Reward Certification for Policy Smoothed Reinforcement Learning
Figure 3 for Reward Certification for Policy Smoothed Reinforcement Learning
Viaarxiv icon

A Survey of Safety and Trustworthiness of Large Language Models through the Lens of Verification and Validation

Add code
May 19, 2023
Figure 1 for A Survey of Safety and Trustworthiness of Large Language Models through the Lens of Verification and Validation
Figure 2 for A Survey of Safety and Trustworthiness of Large Language Models through the Lens of Verification and Validation
Figure 3 for A Survey of Safety and Trustworthiness of Large Language Models through the Lens of Verification and Validation
Figure 4 for A Survey of Safety and Trustworthiness of Large Language Models through the Lens of Verification and Validation
Viaarxiv icon

Randomized Adversarial Training via Taylor Expansion

Add code
Mar 19, 2023
Figure 1 for Randomized Adversarial Training via Taylor Expansion
Figure 2 for Randomized Adversarial Training via Taylor Expansion
Figure 3 for Randomized Adversarial Training via Taylor Expansion
Figure 4 for Randomized Adversarial Training via Taylor Expansion
Viaarxiv icon

Certified Policy Smoothing for Cooperative Multi-Agent Reinforcement Learning

Add code
Dec 22, 2022
Viaarxiv icon

3DVerifier: Efficient Robustness Verification for 3D Point Cloud Models

Add code
Jul 15, 2022
Figure 1 for 3DVerifier: Efficient Robustness Verification for 3D Point Cloud Models
Figure 2 for 3DVerifier: Efficient Robustness Verification for 3D Point Cloud Models
Figure 3 for 3DVerifier: Efficient Robustness Verification for 3D Point Cloud Models
Figure 4 for 3DVerifier: Efficient Robustness Verification for 3D Point Cloud Models
Viaarxiv icon