Picture for Sen Hu

Sen Hu

EvoFSM: Controllable Self-Evolution for Deep Research with Finite State Machines

Add code
Jan 14, 2026
Viaarxiv icon

Controlled Self-Evolution for Algorithmic Code Optimization

Add code
Jan 13, 2026
Viaarxiv icon

MemGovern: Enhancing Code Agents through Learning from Governed Human Experiences

Add code
Jan 13, 2026
Viaarxiv icon

CloneMem: Benchmarking Long-Term Memory for AI Clones

Add code
Jan 11, 2026
Viaarxiv icon

Watching, Reasoning, and Searching: A Video Deep Research Benchmark on Open Web for Agentic Video Reasoning

Add code
Jan 11, 2026
Viaarxiv icon

RealMem: Benchmarking LLMs in Real-World Memory-Driven Interaction

Add code
Jan 11, 2026
Viaarxiv icon

KnowMe-Bench: Benchmarking Person Understanding for Lifelong Digital Companions

Add code
Jan 08, 2026
Viaarxiv icon

Does Memory Need Graphs? A Unified Framework and Empirical Analysis for Long-Term Dialog Memory

Add code
Jan 07, 2026
Viaarxiv icon

Octopus: Agentic Multimodal Reasoning with Six-Capability Orchestration

Add code
Nov 19, 2025
Viaarxiv icon

GitTaskBench: A Benchmark for Code Agents Solving Real-World Tasks Through Code Repository Leveraging

Add code
Aug 26, 2025
Figure 1 for GitTaskBench: A Benchmark for Code Agents Solving Real-World Tasks Through Code Repository Leveraging
Figure 2 for GitTaskBench: A Benchmark for Code Agents Solving Real-World Tasks Through Code Repository Leveraging
Figure 3 for GitTaskBench: A Benchmark for Code Agents Solving Real-World Tasks Through Code Repository Leveraging
Figure 4 for GitTaskBench: A Benchmark for Code Agents Solving Real-World Tasks Through Code Repository Leveraging
Viaarxiv icon