Picture for Yaodong Yang

Yaodong Yang

SAE-V: Interpreting Multimodal Models for Enhanced Alignment

Add code
Feb 22, 2025
Viaarxiv icon

Model Evolution Framework with Genetic Algorithm for Multi-Task Reinforcement Learning

Add code
Feb 19, 2025
Viaarxiv icon

Dual Ensembled Multiagent Q-Learning with Hypernet Regularizer

Add code
Feb 04, 2025
Viaarxiv icon

RedStar: Does Scaling Long-CoT Data Unlock Better Slow-Reasoning Systems?

Add code
Jan 20, 2025
Viaarxiv icon

Stream Aligner: Efficient Sentence-Level Alignment via Distribution Induction

Add code
Jan 09, 2025
Viaarxiv icon

Libra-Leaderboard: Towards Responsible AI through a Balanced Leaderboard of Safety and Capability

Add code
Dec 24, 2024
Figure 1 for Libra-Leaderboard: Towards Responsible AI through a Balanced Leaderboard of Safety and Capability
Figure 2 for Libra-Leaderboard: Towards Responsible AI through a Balanced Leaderboard of Safety and Capability
Figure 3 for Libra-Leaderboard: Towards Responsible AI through a Balanced Leaderboard of Safety and Capability
Figure 4 for Libra-Leaderboard: Towards Responsible AI through a Balanced Leaderboard of Safety and Capability
Viaarxiv icon

Align Anything: Training All-Modality Models to Follow Instructions with Language Feedback

Add code
Dec 20, 2024
Figure 1 for Align Anything: Training All-Modality Models to Follow Instructions with Language Feedback
Figure 2 for Align Anything: Training All-Modality Models to Follow Instructions with Language Feedback
Figure 3 for Align Anything: Training All-Modality Models to Follow Instructions with Language Feedback
Figure 4 for Align Anything: Training All-Modality Models to Follow Instructions with Language Feedback
Viaarxiv icon

Safe Reinforcement Learning using Finite-Horizon Gradient-based Estimation

Add code
Dec 15, 2024
Viaarxiv icon

RAT: Adversarial Attacks on Deep Reinforcement Agents for Targeted Behaviors

Add code
Dec 14, 2024
Viaarxiv icon

Random Feature Models with Learnable Activation Functions

Add code
Nov 29, 2024
Viaarxiv icon