Picture for Yuandong Tian

Yuandong Tian

Towards General-Purpose Model-Free Reinforcement Learning

Add code
Jan 27, 2025
Figure 1 for Towards General-Purpose Model-Free Reinforcement Learning
Figure 2 for Towards General-Purpose Model-Free Reinforcement Learning
Figure 3 for Towards General-Purpose Model-Free Reinforcement Learning
Figure 4 for Towards General-Purpose Model-Free Reinforcement Learning
Viaarxiv icon

Step-KTO: Optimizing Mathematical Reasoning through Stepwise Binary Feedback

Add code
Jan 18, 2025
Viaarxiv icon

Tensor-GaLore: Memory-Efficient Training via Gradient Tensor Decomposition

Add code
Jan 04, 2025
Figure 1 for Tensor-GaLore: Memory-Efficient Training via Gradient Tensor Decomposition
Figure 2 for Tensor-GaLore: Memory-Efficient Training via Gradient Tensor Decomposition
Figure 3 for Tensor-GaLore: Memory-Efficient Training via Gradient Tensor Decomposition
Figure 4 for Tensor-GaLore: Memory-Efficient Training via Gradient Tensor Decomposition
Viaarxiv icon

AdvPrefix: An Objective for Nuanced LLM Jailbreaks

Add code
Dec 13, 2024
Figure 1 for AdvPrefix: An Objective for Nuanced LLM Jailbreaks
Figure 2 for AdvPrefix: An Objective for Nuanced LLM Jailbreaks
Figure 3 for AdvPrefix: An Objective for Nuanced LLM Jailbreaks
Figure 4 for AdvPrefix: An Objective for Nuanced LLM Jailbreaks
Viaarxiv icon

Sail into the Headwind: Alignment via Robust Rewards and Dynamic Labels against Reward Hacking

Add code
Dec 12, 2024
Figure 1 for Sail into the Headwind: Alignment via Robust Rewards and Dynamic Labels against Reward Hacking
Figure 2 for Sail into the Headwind: Alignment via Robust Rewards and Dynamic Labels against Reward Hacking
Figure 3 for Sail into the Headwind: Alignment via Robust Rewards and Dynamic Labels against Reward Hacking
Figure 4 for Sail into the Headwind: Alignment via Robust Rewards and Dynamic Labels against Reward Hacking
Viaarxiv icon

Training Large Language Models to Reason in a Continuous Latent Space

Add code
Dec 09, 2024
Figure 1 for Training Large Language Models to Reason in a Continuous Latent Space
Figure 2 for Training Large Language Models to Reason in a Continuous Latent Space
Figure 3 for Training Large Language Models to Reason in a Continuous Latent Space
Figure 4 for Training Large Language Models to Reason in a Continuous Latent Space
Viaarxiv icon

Towards Full Delegation: Designing Ideal Agentic Behaviors for Travel Planning

Add code
Nov 21, 2024
Figure 1 for Towards Full Delegation: Designing Ideal Agentic Behaviors for Travel Planning
Figure 2 for Towards Full Delegation: Designing Ideal Agentic Behaviors for Travel Planning
Figure 3 for Towards Full Delegation: Designing Ideal Agentic Behaviors for Travel Planning
Figure 4 for Towards Full Delegation: Designing Ideal Agentic Behaviors for Travel Planning
Viaarxiv icon

On the Surprising Effectiveness of Attention Transfer for Vision Transformers

Add code
Nov 14, 2024
Figure 1 for On the Surprising Effectiveness of Attention Transfer for Vision Transformers
Figure 2 for On the Surprising Effectiveness of Attention Transfer for Vision Transformers
Figure 3 for On the Surprising Effectiveness of Attention Transfer for Vision Transformers
Figure 4 for On the Surprising Effectiveness of Attention Transfer for Vision Transformers
Viaarxiv icon

MagicPIG: LSH Sampling for Efficient LLM Generation

Add code
Oct 21, 2024
Figure 1 for MagicPIG: LSH Sampling for Efficient LLM Generation
Figure 2 for MagicPIG: LSH Sampling for Efficient LLM Generation
Figure 3 for MagicPIG: LSH Sampling for Efficient LLM Generation
Figure 4 for MagicPIG: LSH Sampling for Efficient LLM Generation
Viaarxiv icon

To the Globe (TTG): Towards Language-Driven Guaranteed Travel Planning

Add code
Oct 21, 2024
Figure 1 for To the Globe (TTG): Towards Language-Driven Guaranteed Travel Planning
Figure 2 for To the Globe (TTG): Towards Language-Driven Guaranteed Travel Planning
Figure 3 for To the Globe (TTG): Towards Language-Driven Guaranteed Travel Planning
Figure 4 for To the Globe (TTG): Towards Language-Driven Guaranteed Travel Planning
Viaarxiv icon