Picture for Nikunj Saunshi

Nikunj Saunshi

Reasoning with Latent Thoughts: On the Power of Looped Transformers

Add code
Feb 24, 2025
Viaarxiv icon

Learning to Keep a Promise: Scaling Language Model Decoding Parallelism with Learned Asynchronous Decoding

Add code
Feb 17, 2025
Figure 1 for Learning to Keep a Promise: Scaling Language Model Decoding Parallelism with Learned Asynchronous Decoding
Figure 2 for Learning to Keep a Promise: Scaling Language Model Decoding Parallelism with Learned Asynchronous Decoding
Figure 3 for Learning to Keep a Promise: Scaling Language Model Decoding Parallelism with Learned Asynchronous Decoding
Figure 4 for Learning to Keep a Promise: Scaling Language Model Decoding Parallelism with Learned Asynchronous Decoding
Viaarxiv icon

StagFormer: Time Staggering Transformer Decoding for RunningLayers In Parallel

Add code
Jan 26, 2025
Viaarxiv icon

On the Role of Depth and Looping for In-Context Learning with Task Diversity

Add code
Oct 29, 2024
Figure 1 for On the Role of Depth and Looping for In-Context Learning with Task Diversity
Figure 2 for On the Role of Depth and Looping for In-Context Learning with Task Diversity
Viaarxiv icon

A Little Help Goes a Long Way: Efficient LLM Training by Leveraging Small LMs

Add code
Oct 24, 2024
Figure 1 for A Little Help Goes a Long Way: Efficient LLM Training by Leveraging Small LMs
Figure 2 for A Little Help Goes a Long Way: Efficient LLM Training by Leveraging Small LMs
Figure 3 for A Little Help Goes a Long Way: Efficient LLM Training by Leveraging Small LMs
Figure 4 for A Little Help Goes a Long Way: Efficient LLM Training by Leveraging Small LMs
Viaarxiv icon

Can Looped Transformers Learn to Implement Multi-step Gradient Descent for In-context Learning?

Add code
Oct 10, 2024
Figure 1 for Can Looped Transformers Learn to Implement Multi-step Gradient Descent for In-context Learning?
Figure 2 for Can Looped Transformers Learn to Implement Multi-step Gradient Descent for In-context Learning?
Figure 3 for Can Looped Transformers Learn to Implement Multi-step Gradient Descent for In-context Learning?
Figure 4 for Can Looped Transformers Learn to Implement Multi-step Gradient Descent for In-context Learning?
Viaarxiv icon

On the Inductive Bias of Stacking Towards Improving Reasoning

Add code
Sep 27, 2024
Viaarxiv icon

Landscape-Aware Growing: The Power of a Little LAG

Add code
Jun 04, 2024
Figure 1 for Landscape-Aware Growing: The Power of a Little LAG
Figure 2 for Landscape-Aware Growing: The Power of a Little LAG
Figure 3 for Landscape-Aware Growing: The Power of a Little LAG
Figure 4 for Landscape-Aware Growing: The Power of a Little LAG
Viaarxiv icon

Efficient Stagewise Pretraining via Progressive Subnetworks

Add code
Feb 08, 2024
Figure 1 for Efficient Stagewise Pretraining via Progressive Subnetworks
Figure 2 for Efficient Stagewise Pretraining via Progressive Subnetworks
Figure 3 for Efficient Stagewise Pretraining via Progressive Subnetworks
Figure 4 for Efficient Stagewise Pretraining via Progressive Subnetworks
Viaarxiv icon

Reasoning in Large Language Models Through Symbolic Math Word Problems

Add code
Aug 03, 2023
Viaarxiv icon