Picture for Ido Shahaf

Ido Shahaf

Quantization-Aware Distillation for NVFP4 Inference Accuracy Recovery

Add code
Jan 27, 2026
Viaarxiv icon

NVIDIA Nemotron 3: Efficient and Open Intelligence

Add code
Dec 24, 2025
Viaarxiv icon

Nemotron 3 Nano: Open, Efficient Mixture-of-Experts Hybrid Mamba-Transformer Model for Agentic Reasoning

Add code
Dec 23, 2025
Viaarxiv icon

Llama-Nemotron: Efficient Reasoning Models

Add code
May 02, 2025
Figure 1 for Llama-Nemotron: Efficient Reasoning Models
Figure 2 for Llama-Nemotron: Efficient Reasoning Models
Figure 3 for Llama-Nemotron: Efficient Reasoning Models
Figure 4 for Llama-Nemotron: Efficient Reasoning Models
Viaarxiv icon

FFN Fusion: Rethinking Sequential Computation in Large Language Models

Add code
Mar 24, 2025
Viaarxiv icon

Puzzle: Distillation-Based NAS for Inference-Optimized LLMs

Add code
Dec 03, 2024
Figure 1 for Puzzle: Distillation-Based NAS for Inference-Optimized LLMs
Figure 2 for Puzzle: Distillation-Based NAS for Inference-Optimized LLMs
Figure 3 for Puzzle: Distillation-Based NAS for Inference-Optimized LLMs
Figure 4 for Puzzle: Distillation-Based NAS for Inference-Optimized LLMs
Viaarxiv icon