Picture for Besmira Nushi

Besmira Nushi

NVIDIA Nemotron 3: Efficient and Open Intelligence

Add code
Dec 24, 2025
Viaarxiv icon

Nemotron 3 Nano: Open, Efficient Mixture-of-Experts Hybrid Mamba-Transformer Model for Agentic Reasoning

Add code
Dec 23, 2025
Viaarxiv icon

Detecting Data Contamination in LLMs via In-Context Learning

Add code
Oct 30, 2025
Viaarxiv icon

Just Do It!? Computer-Use Agents Exhibit Blind Goal-Directedness

Add code
Oct 02, 2025
Figure 1 for Just Do It!? Computer-Use Agents Exhibit Blind Goal-Directedness
Figure 2 for Just Do It!? Computer-Use Agents Exhibit Blind Goal-Directedness
Figure 3 for Just Do It!? Computer-Use Agents Exhibit Blind Goal-Directedness
Figure 4 for Just Do It!? Computer-Use Agents Exhibit Blind Goal-Directedness
Viaarxiv icon

The Singapore Consensus on Global AI Safety Research Priorities

Add code
Jun 25, 2025
Figure 1 for The Singapore Consensus on Global AI Safety Research Priorities
Figure 2 for The Singapore Consensus on Global AI Safety Research Priorities
Figure 3 for The Singapore Consensus on Global AI Safety Research Priorities
Viaarxiv icon

LogiPlan: A Structured Benchmark for Logical Planning and Relational Reasoning in LLMs

Add code
Jun 12, 2025
Figure 1 for LogiPlan: A Structured Benchmark for Logical Planning and Relational Reasoning in LLMs
Figure 2 for LogiPlan: A Structured Benchmark for Logical Planning and Relational Reasoning in LLMs
Figure 3 for LogiPlan: A Structured Benchmark for Logical Planning and Relational Reasoning in LLMs
Figure 4 for LogiPlan: A Structured Benchmark for Logical Planning and Relational Reasoning in LLMs
Viaarxiv icon

Phi-4-reasoning Technical Report

Add code
Apr 30, 2025
Viaarxiv icon

Inference-Time Scaling for Complex Tasks: Where We Stand and What Lies Ahead

Add code
Mar 31, 2025
Viaarxiv icon

MM-GEN: Enhancing Task Performance Through Targeted Multimodal Data Curation

Add code
Jan 07, 2025
Figure 1 for MM-GEN: Enhancing Task Performance Through Targeted Multimodal Data Curation
Figure 2 for MM-GEN: Enhancing Task Performance Through Targeted Multimodal Data Curation
Figure 3 for MM-GEN: Enhancing Task Performance Through Targeted Multimodal Data Curation
Figure 4 for MM-GEN: Enhancing Task Performance Through Targeted Multimodal Data Curation
Viaarxiv icon

BENCHAGENTS: Automated Benchmark Creation with Agent Interaction

Add code
Oct 29, 2024
Figure 1 for BENCHAGENTS: Automated Benchmark Creation with Agent Interaction
Figure 2 for BENCHAGENTS: Automated Benchmark Creation with Agent Interaction
Figure 3 for BENCHAGENTS: Automated Benchmark Creation with Agent Interaction
Figure 4 for BENCHAGENTS: Automated Benchmark Creation with Agent Interaction
Viaarxiv icon