Picture for Jihun Yun

Jihun Yun

Pruning and Distilling Mixture-of-Experts into Dense Language Models

Add code
May 27, 2026
Viaarxiv icon

AMUSE: Anytime Muon with Stable Gradient Evaluation

Add code
May 21, 2026
Viaarxiv icon

Uniform Spectral Growth and Convergence of Muon in LoRA-Style Matrix Factorization

Add code
Feb 06, 2026
Viaarxiv icon

THINKSAFE: Self-Generated Safety Alignment for Reasoning Models

Add code
Jan 30, 2026
Viaarxiv icon

Coverage Improvement and Fast Convergence of On-policy Preference Learning

Add code
Jan 13, 2026
Viaarxiv icon

Unraveling Zeroth-Order Optimization through the Lens of Low-Dimensional Structured Perturbations

Add code
Jan 31, 2025
Viaarxiv icon

LANTERN: Accelerating Visual Autoregressive Models with Relaxed Speculative Decoding

Add code
Oct 04, 2024
Figure 1 for LANTERN: Accelerating Visual Autoregressive Models with Relaxed Speculative Decoding
Figure 2 for LANTERN: Accelerating Visual Autoregressive Models with Relaxed Speculative Decoding
Figure 3 for LANTERN: Accelerating Visual Autoregressive Models with Relaxed Speculative Decoding
Figure 4 for LANTERN: Accelerating Visual Autoregressive Models with Relaxed Speculative Decoding
Viaarxiv icon

TEDDY: Trimming Edges with Degree-based Discrimination strategY

Add code
Feb 02, 2024
Figure 1 for TEDDY: Trimming Edges with Degree-based Discrimination strategY
Figure 2 for TEDDY: Trimming Edges with Degree-based Discrimination strategY
Figure 3 for TEDDY: Trimming Edges with Degree-based Discrimination strategY
Figure 4 for TEDDY: Trimming Edges with Degree-based Discrimination strategY
Viaarxiv icon

Cluster-Promoting Quantization with Bit-Drop for Minimizing Network Quantization Loss

Add code
Sep 05, 2021
Figure 1 for Cluster-Promoting Quantization with Bit-Drop for Minimizing Network Quantization Loss
Figure 2 for Cluster-Promoting Quantization with Bit-Drop for Minimizing Network Quantization Loss
Figure 3 for Cluster-Promoting Quantization with Bit-Drop for Minimizing Network Quantization Loss
Figure 4 for Cluster-Promoting Quantization with Bit-Drop for Minimizing Network Quantization Loss
Viaarxiv icon

A General Family of Stochastic Proximal Gradient Methods for Deep Learning

Add code
Jul 15, 2020
Figure 1 for A General Family of Stochastic Proximal Gradient Methods for Deep Learning
Figure 2 for A General Family of Stochastic Proximal Gradient Methods for Deep Learning
Figure 3 for A General Family of Stochastic Proximal Gradient Methods for Deep Learning
Figure 4 for A General Family of Stochastic Proximal Gradient Methods for Deep Learning
Viaarxiv icon