Picture for Songyang Gao

Songyang Gao

Chinese Tiny LLM: Pretraining a Chinese-Centric Large Language Model

Add code
Apr 09, 2024
Viaarxiv icon

The Fine Line: Navigating Large Language Model Pretraining with Down-streaming Capability Analysis

Add code
Apr 01, 2024
Viaarxiv icon

EasyJailbreak: A Unified Framework for Jailbreaking Large Language Models

Add code
Mar 18, 2024
Figure 1 for EasyJailbreak: A Unified Framework for Jailbreaking Large Language Models
Figure 2 for EasyJailbreak: A Unified Framework for Jailbreaking Large Language Models
Figure 3 for EasyJailbreak: A Unified Framework for Jailbreaking Large Language Models
Figure 4 for EasyJailbreak: A Unified Framework for Jailbreaking Large Language Models
Viaarxiv icon

ToolSword: Unveiling Safety Issues of Large Language Models in Tool Learning Across Three Stages

Add code
Feb 16, 2024
Viaarxiv icon

Navigating the OverKill in Large Language Models

Add code
Jan 31, 2024
Viaarxiv icon

Linear Alignment: A Closed-form Solution for Aligning Human Preferences without Tuning and Feedback

Add code
Jan 21, 2024
Viaarxiv icon

RoTBench: A Multi-Level Benchmark for Evaluating the Robustness of Large Language Models in Tool Learning

Add code
Jan 19, 2024
Viaarxiv icon

ToolEyes: Fine-Grained Evaluation for Tool Learning Capabilities of Large Language Models in Real-world Scenarios

Add code
Jan 14, 2024
Viaarxiv icon

Secrets of RLHF in Large Language Models Part II: Reward Modeling

Add code
Jan 12, 2024
Viaarxiv icon

LoRAMoE: Revolutionizing Mixture of Experts for Maintaining World Knowledge in Language Model Alignment

Add code
Dec 18, 2023
Viaarxiv icon