Alert button
Picture for Sanjiv Kumar

Sanjiv Kumar

Alert button

On student-teacher deviations in distillation: does it pay to disobey?

Add code
Bookmark button
Alert button
Jan 30, 2023
Vaishnavh Nagarajan, Aditya Krishna Menon, Srinadh Bhojanapalli, Hossein Mobahi, Sanjiv Kumar

Figure 1 for On student-teacher deviations in distillation: does it pay to disobey?
Figure 2 for On student-teacher deviations in distillation: does it pay to disobey?
Figure 3 for On student-teacher deviations in distillation: does it pay to disobey?
Figure 4 for On student-teacher deviations in distillation: does it pay to disobey?
Viaarxiv icon

Supervision Complexity and its Role in Knowledge Distillation

Add code
Bookmark button
Alert button
Jan 28, 2023
Hrayr Harutyunyan, Ankit Singh Rawat, Aditya Krishna Menon, Seungyeon Kim, Sanjiv Kumar

Figure 1 for Supervision Complexity and its Role in Knowledge Distillation
Figure 2 for Supervision Complexity and its Role in Knowledge Distillation
Figure 3 for Supervision Complexity and its Role in Knowledge Distillation
Figure 4 for Supervision Complexity and its Role in Knowledge Distillation
Viaarxiv icon

Leveraging Importance Weights in Subset Selection

Add code
Bookmark button
Alert button
Jan 28, 2023
Gui Citovsky, Giulia DeSalvo, Sanjiv Kumar, Srikumar Ramalingam, Afshin Rostamizadeh, Yunjuan Wang

Figure 1 for Leveraging Importance Weights in Subset Selection
Figure 2 for Leveraging Importance Weights in Subset Selection
Figure 3 for Leveraging Importance Weights in Subset Selection
Figure 4 for Leveraging Importance Weights in Subset Selection
Viaarxiv icon

EmbedDistill: A Geometric Knowledge Distillation for Information Retrieval

Add code
Bookmark button
Alert button
Jan 27, 2023
Seungyeon Kim, Ankit Singh Rawat, Manzil Zaheer, Sadeep Jayasumana, Veeranjaneyulu Sadhanala, Wittawat Jitkrittum, Aditya Krishna Menon, Rob Fergus, Sanjiv Kumar

Figure 1 for EmbedDistill: A Geometric Knowledge Distillation for Information Retrieval
Figure 2 for EmbedDistill: A Geometric Knowledge Distillation for Information Retrieval
Figure 3 for EmbedDistill: A Geometric Knowledge Distillation for Information Retrieval
Figure 4 for EmbedDistill: A Geometric Knowledge Distillation for Information Retrieval
Viaarxiv icon

Automating Nearest Neighbor Search Configuration with Constrained Optimization

Add code
Bookmark button
Alert button
Jan 04, 2023
Philip Sun, Ruiqi Guo, Sanjiv Kumar

Figure 1 for Automating Nearest Neighbor Search Configuration with Constrained Optimization
Figure 2 for Automating Nearest Neighbor Search Configuration with Constrained Optimization
Figure 3 for Automating Nearest Neighbor Search Configuration with Constrained Optimization
Figure 4 for Automating Nearest Neighbor Search Configuration with Constrained Optimization
Viaarxiv icon

Large Language Models with Controllable Working Memory

Add code
Bookmark button
Alert button
Nov 09, 2022
Daliang Li, Ankit Singh Rawat, Manzil Zaheer, Xin Wang, Michal Lukasik, Andreas Veit, Felix Yu, Sanjiv Kumar

Figure 1 for Large Language Models with Controllable Working Memory
Figure 2 for Large Language Models with Controllable Working Memory
Figure 3 for Large Language Models with Controllable Working Memory
Figure 4 for Large Language Models with Controllable Working Memory
Viaarxiv icon

Preserving In-Context Learning ability in Large Language Model Fine-tuning

Add code
Bookmark button
Alert button
Nov 01, 2022
Yihan Wang, Si Si, Daliang Li, Michal Lukasik, Felix Yu, Cho-Jui Hsieh, Inderjit S Dhillon, Sanjiv Kumar

Figure 1 for Preserving In-Context Learning ability in Large Language Model Fine-tuning
Figure 2 for Preserving In-Context Learning ability in Large Language Model Fine-tuning
Figure 3 for Preserving In-Context Learning ability in Large Language Model Fine-tuning
Figure 4 for Preserving In-Context Learning ability in Large Language Model Fine-tuning
Viaarxiv icon

When does mixup promote local linearity in learned representations?

Add code
Bookmark button
Alert button
Oct 28, 2022
Arslan Chaudhry, Aditya Krishna Menon, Andreas Veit, Sadeep Jayasumana, Srikumar Ramalingam, Sanjiv Kumar

Figure 1 for When does mixup promote local linearity in learned representations?
Figure 2 for When does mixup promote local linearity in learned representations?
Figure 3 for When does mixup promote local linearity in learned representations?
Figure 4 for When does mixup promote local linearity in learned representations?
Viaarxiv icon

Large Models are Parsimonious Learners: Activation Sparsity in Trained Transformers

Add code
Bookmark button
Alert button
Oct 12, 2022
Zonglin Li, Chong You, Srinadh Bhojanapalli, Daliang Li, Ankit Singh Rawat, Sashank J. Reddi, Ke Ye, Felix Chern, Felix Yu, Ruiqi Guo, Sanjiv Kumar

Figure 1 for Large Models are Parsimonious Learners: Activation Sparsity in Trained Transformers
Figure 2 for Large Models are Parsimonious Learners: Activation Sparsity in Trained Transformers
Figure 3 for Large Models are Parsimonious Learners: Activation Sparsity in Trained Transformers
Figure 4 for Large Models are Parsimonious Learners: Activation Sparsity in Trained Transformers
Viaarxiv icon

Decoupled Context Processing for Context Augmented Language Modeling

Add code
Bookmark button
Alert button
Oct 11, 2022
Zonglin Li, Ruiqi Guo, Sanjiv Kumar

Figure 1 for Decoupled Context Processing for Context Augmented Language Modeling
Figure 2 for Decoupled Context Processing for Context Augmented Language Modeling
Figure 3 for Decoupled Context Processing for Context Augmented Language Modeling
Figure 4 for Decoupled Context Processing for Context Augmented Language Modeling
Viaarxiv icon