Picture for Minki Kang

Minki Kang

Zone of Proximal Policy Optimization: Teacher in Prompts, Not Gradients

Add code
Jun 16, 2026
Viaarxiv icon

TIDE: Proactive Multi-Problem Discovery via Template-Guided Iteration

Add code
Jun 03, 2026
Viaarxiv icon

OmniRetrieval: Unified Retrieval across Heterogeneous Knowledge Sources

Add code
May 28, 2026
Viaarxiv icon

Agent Explorative Policy Optimization for Multimodal Agentic Reasoning

Add code
May 27, 2026
Viaarxiv icon

Memory Transfer Learning: How Memories are Transferred Across Domains in Coding Agents

Add code
Apr 15, 2026
Viaarxiv icon

THINKSAFE: Self-Generated Safety Alignment for Reasoning Models

Add code
Jan 30, 2026
Viaarxiv icon

Rethinking Reward Models for Multi-Domain Test-Time Scaling

Add code
Oct 02, 2025
Viaarxiv icon

ACON: Optimizing Context Compression for Long-horizon LLM Agents

Add code
Oct 01, 2025
Figure 1 for ACON: Optimizing Context Compression for Long-horizon LLM Agents
Figure 2 for ACON: Optimizing Context Compression for Long-horizon LLM Agents
Figure 3 for ACON: Optimizing Context Compression for Long-horizon LLM Agents
Figure 4 for ACON: Optimizing Context Compression for Long-horizon LLM Agents
Viaarxiv icon

Distilling LLM Agent into Small Models with Retrieval and Code Tools

Add code
May 23, 2025
Viaarxiv icon

T1: Tool-integrated Self-verification for Test-time Compute Scaling in Small Language Models

Add code
Apr 07, 2025
Viaarxiv icon