Picture for Siyan Zhao

Siyan Zhao

The performances of the Chinese and U.S. Large Language Models on the Topic of Chinese Culture

Add code
Jan 07, 2026
Viaarxiv icon

SPG: Sandwiched Policy Gradient for Masked Diffusion Language Models

Add code
Oct 10, 2025
Viaarxiv icon

d1: Scaling Reasoning in Diffusion Large Language Models via Reinforcement Learning

Add code
Apr 16, 2025
Viaarxiv icon

Multi-fidelity Reinforcement Learning Control for Complex Dynamical Systems

Add code
Apr 08, 2025
Figure 1 for Multi-fidelity Reinforcement Learning Control for Complex Dynamical Systems
Figure 2 for Multi-fidelity Reinforcement Learning Control for Complex Dynamical Systems
Figure 3 for Multi-fidelity Reinforcement Learning Control for Complex Dynamical Systems
Figure 4 for Multi-fidelity Reinforcement Learning Control for Complex Dynamical Systems
Viaarxiv icon

Do LLMs Recognize Your Preferences? Evaluating Personalized Preference Following in LLMs

Add code
Feb 13, 2025
Figure 1 for Do LLMs Recognize Your Preferences? Evaluating Personalized Preference Following in LLMs
Figure 2 for Do LLMs Recognize Your Preferences? Evaluating Personalized Preference Following in LLMs
Figure 3 for Do LLMs Recognize Your Preferences? Evaluating Personalized Preference Following in LLMs
Figure 4 for Do LLMs Recognize Your Preferences? Evaluating Personalized Preference Following in LLMs
Viaarxiv icon

MedMax: Mixed-Modal Instruction Tuning for Training Biomedical Assistants

Add code
Dec 17, 2024
Viaarxiv icon

DODT: Enhanced Online Decision Transformer Learning through Dreamer's Actor-Critic Trajectory Forecasting

Add code
Oct 15, 2024
Figure 1 for DODT: Enhanced Online Decision Transformer Learning through Dreamer's Actor-Critic Trajectory Forecasting
Figure 2 for DODT: Enhanced Online Decision Transformer Learning through Dreamer's Actor-Critic Trajectory Forecasting
Figure 3 for DODT: Enhanced Online Decision Transformer Learning through Dreamer's Actor-Critic Trajectory Forecasting
Figure 4 for DODT: Enhanced Online Decision Transformer Learning through Dreamer's Actor-Critic Trajectory Forecasting
Viaarxiv icon

Probing the Decision Boundaries of In-context Learning in Large Language Models

Add code
Jun 17, 2024
Viaarxiv icon

Prepacking: A Simple Method for Fast Prefilling and Increased Throughput in Large Language Models

Add code
Apr 15, 2024
Figure 1 for Prepacking: A Simple Method for Fast Prefilling and Increased Throughput in Large Language Models
Figure 2 for Prepacking: A Simple Method for Fast Prefilling and Increased Throughput in Large Language Models
Figure 3 for Prepacking: A Simple Method for Fast Prefilling and Increased Throughput in Large Language Models
Figure 4 for Prepacking: A Simple Method for Fast Prefilling and Increased Throughput in Large Language Models
Viaarxiv icon

Group Preference Optimization: Few-Shot Alignment of Large Language Models

Add code
Oct 17, 2023
Viaarxiv icon