Openai


BrowseComp-ZH: Benchmarking Web Browsing Ability of Large Language Models in Chinese

Add code
May 01, 2025
Viaarxiv icon

WebThinker: Empowering Large Reasoning Models with Deep Research Capability

Add code
Apr 30, 2025
Viaarxiv icon

Why Compress What You Can Generate? When GPT-4o Generation Ushers in Image Compression Fields

Add code
Apr 30, 2025
Viaarxiv icon

ShorterBetter: Guiding Reasoning Models to Find Optimal Inference Length for Efficient Reasoning

Add code
Apr 30, 2025
Viaarxiv icon

Real-World Gaps in AI Governance Research

Add code
Apr 30, 2025
Viaarxiv icon

Simple Visual Artifact Detection in Sora-Generated Videos

Add code
Apr 30, 2025
Viaarxiv icon

Automatic Legal Writing Evaluation of LLMs

Add code
Apr 29, 2025
Viaarxiv icon

The Leaderboard Illusion

Add code
Apr 29, 2025
Viaarxiv icon

Mem0: Building Production-Ready AI Agents with Scalable Long-Term Memory

Add code
Apr 28, 2025
Viaarxiv icon

AutoP2C: An LLM-Based Agent Framework for Code Repository Generation from Multimodal Content in Academic Papers

Add code
Apr 28, 2025
Viaarxiv icon