Picture for Tuo Zhao

Tuo Zhao

Robust Reinforcement Learning from Corrupted Human Feedback

Add code
Jun 21, 2024
Viaarxiv icon

RoseLoRA: Row and Column-wise Sparse Low-rank Adaptation of Pre-trained Language Model for Knowledge Editing and Fine-tuning

Add code
Jun 16, 2024
Viaarxiv icon

Adaptive Preference Scaling for Reinforcement Learning with Human Feedback

Add code
Jun 04, 2024
Viaarxiv icon

To Cool or not to Cool? Temperature Network Meets Large Foundation Models via DRO

Add code
Apr 06, 2024
Figure 1 for To Cool or not to Cool? Temperature Network Meets Large Foundation Models via DRO
Figure 2 for To Cool or not to Cool? Temperature Network Meets Large Foundation Models via DRO
Figure 3 for To Cool or not to Cool? Temperature Network Meets Large Foundation Models via DRO
Figure 4 for To Cool or not to Cool? Temperature Network Meets Large Foundation Models via DRO
Viaarxiv icon

Stochastic Constrained Decentralized Optimization for Machine Learning with Fewer Data Oracles: a Gradient Sliding Approach

Add code
Apr 03, 2024
Viaarxiv icon

GEAR: An Efficient KV Cache Compression Recipe for Near-Lossless Generative Inference of LLM

Add code
Mar 11, 2024
Figure 1 for GEAR: An Efficient KV Cache Compression Recipe for Near-Lossless Generative Inference of LLM
Figure 2 for GEAR: An Efficient KV Cache Compression Recipe for Near-Lossless Generative Inference of LLM
Figure 3 for GEAR: An Efficient KV Cache Compression Recipe for Near-Lossless Generative Inference of LLM
Figure 4 for GEAR: An Efficient KV Cache Compression Recipe for Near-Lossless Generative Inference of LLM
Viaarxiv icon

BlendFilter: Advancing Retrieval-Augmented Large Language Models via Query Generation Blending and Knowledge Filtering

Add code
Feb 16, 2024
Viaarxiv icon

Data Diversity Matters for Robust Instruction Tuning

Add code
Nov 21, 2023
Figure 1 for Data Diversity Matters for Robust Instruction Tuning
Figure 2 for Data Diversity Matters for Robust Instruction Tuning
Figure 3 for Data Diversity Matters for Robust Instruction Tuning
Figure 4 for Data Diversity Matters for Robust Instruction Tuning
Viaarxiv icon

Tell Your Model Where to Attend: Post-hoc Attention Steering for LLMs

Add code
Nov 03, 2023
Viaarxiv icon

Model-Based Reparameterization Policy Gradient Methods: Theory and Practical Algorithms

Add code
Oct 30, 2023
Viaarxiv icon