Topic


Retrieval-Infused Reasoning Sandbox: A Benchmark for Decoupling Retrieval and Reasoning Capabilities

Add code
Jan 29, 2026
Viaarxiv icon

Evolution of Benchmark: Black-Box Optimization Benchmark Design through Large Language Model

Add code
Jan 29, 2026
Viaarxiv icon

RATE: Reviewer Profiling and Annotation-free Training for Expertise Ranking in Peer Review Systems

Add code
Jan 27, 2026
Viaarxiv icon

Enhancing Academic Paper Recommendations Using Fine-Grained Knowledge Entities and Multifaceted Document Embeddings

Add code
Jan 27, 2026
Viaarxiv icon

CanaryBench: Stress Testing Privacy Leakage in Cluster-Level Conversation Summaries

Add code
Jan 25, 2026
Viaarxiv icon

Beyond the Rabbit Hole: Mapping the Relational Harms of QAnon Radicalization

Add code
Jan 25, 2026
Viaarxiv icon

PingPong: A Natural Benchmark for Multi-Turn Code-Switching Dialogues

Add code
Jan 24, 2026
Viaarxiv icon

Building a Bridge between the Two Schools: Realizing a Practical Path to Include Literacy-based Skills within the STEM Curricula

Add code
Jan 24, 2026
Viaarxiv icon

Beyond Factual QA: Mentorship-Oriented Question Answering over Long-Form Multilingual Content

Add code
Jan 23, 2026
Viaarxiv icon

DMV-AVP: Distributed Multi-Vehicle Autonomous Valet Parking using Autoware

Add code
Jan 22, 2026
Viaarxiv icon