Picture for Simin Fan

Simin Fan

GRAPE: Optimize Data Mixture for Group Robust Multi-target Adaptive Pretraining

Add code
May 26, 2025
Viaarxiv icon

NeuralGrok: Accelerate Grokking by Neural Gradient Transformation

Add code
Apr 24, 2025
Viaarxiv icon

HyperINF: Unleashing the HyperPower of the Schulz's Method for Data Influence Estimation

Add code
Oct 07, 2024
Figure 1 for HyperINF: Unleashing the HyperPower of the Schulz's Method for Data Influence Estimation
Figure 2 for HyperINF: Unleashing the HyperPower of the Schulz's Method for Data Influence Estimation
Figure 3 for HyperINF: Unleashing the HyperPower of the Schulz's Method for Data Influence Estimation
Figure 4 for HyperINF: Unleashing the HyperPower of the Schulz's Method for Data Influence Estimation
Viaarxiv icon

Dynamic Gradient Alignment for Online Data Mixing

Add code
Oct 03, 2024
Figure 1 for Dynamic Gradient Alignment for Online Data Mixing
Figure 2 for Dynamic Gradient Alignment for Online Data Mixing
Figure 3 for Dynamic Gradient Alignment for Online Data Mixing
Figure 4 for Dynamic Gradient Alignment for Online Data Mixing
Viaarxiv icon

Could ChatGPT get an Engineering Degree? Evaluating Higher Education Vulnerability to AI Assistants

Add code
Aug 07, 2024
Figure 1 for Could ChatGPT get an Engineering Degree? Evaluating Higher Education Vulnerability to AI Assistants
Figure 2 for Could ChatGPT get an Engineering Degree? Evaluating Higher Education Vulnerability to AI Assistants
Figure 3 for Could ChatGPT get an Engineering Degree? Evaluating Higher Education Vulnerability to AI Assistants
Figure 4 for Could ChatGPT get an Engineering Degree? Evaluating Higher Education Vulnerability to AI Assistants
Viaarxiv icon

Deep Grokking: Would Deep Neural Networks Generalize Better?

Add code
May 29, 2024
Figure 1 for Deep Grokking: Would Deep Neural Networks Generalize Better?
Figure 2 for Deep Grokking: Would Deep Neural Networks Generalize Better?
Figure 3 for Deep Grokking: Would Deep Neural Networks Generalize Better?
Figure 4 for Deep Grokking: Would Deep Neural Networks Generalize Better?
Viaarxiv icon

MEDITRON-70B: Scaling Medical Pretraining for Large Language Models

Add code
Nov 27, 2023
Figure 1 for MEDITRON-70B: Scaling Medical Pretraining for Large Language Models
Figure 2 for MEDITRON-70B: Scaling Medical Pretraining for Large Language Models
Figure 3 for MEDITRON-70B: Scaling Medical Pretraining for Large Language Models
Figure 4 for MEDITRON-70B: Scaling Medical Pretraining for Large Language Models
Viaarxiv icon

Irreducible Curriculum for Language Model Pretraining

Add code
Oct 23, 2023
Viaarxiv icon

DoGE: Domain Reweighting with Generalization Estimation

Add code
Oct 23, 2023
Viaarxiv icon

Towards Process-Oriented, Modular, and Versatile Question Generation that Meets Educational Needs

Add code
Apr 30, 2022
Figure 1 for Towards Process-Oriented, Modular, and Versatile Question Generation that Meets Educational Needs
Figure 2 for Towards Process-Oriented, Modular, and Versatile Question Generation that Meets Educational Needs
Figure 3 for Towards Process-Oriented, Modular, and Versatile Question Generation that Meets Educational Needs
Figure 4 for Towards Process-Oriented, Modular, and Versatile Question Generation that Meets Educational Needs
Viaarxiv icon