Picture for Yao Fu

Yao Fu

ProTrain: Efficient LLM Training via Memory-Aware Techniques

Add code
Jun 12, 2024
Viaarxiv icon

Challenges in Deploying Long-Context Transformers: A Theoretical Peak Performance Analysis

Add code
May 14, 2024
Viaarxiv icon

Long Context Alignment with Short Instructions and Synthesized Positions

Add code
May 07, 2024
Figure 1 for Long Context Alignment with Short Instructions and Synthesized Positions
Figure 2 for Long Context Alignment with Short Instructions and Synthesized Positions
Figure 3 for Long Context Alignment with Short Instructions and Synthesized Positions
Figure 4 for Long Context Alignment with Short Instructions and Synthesized Positions
Viaarxiv icon

Retrieval Head Mechanistically Explains Long-Context Factuality

Add code
Apr 24, 2024
Figure 1 for Retrieval Head Mechanistically Explains Long-Context Factuality
Figure 2 for Retrieval Head Mechanistically Explains Long-Context Factuality
Figure 3 for Retrieval Head Mechanistically Explains Long-Context Factuality
Figure 4 for Retrieval Head Mechanistically Explains Long-Context Factuality
Viaarxiv icon

Toward Inference-optimal Mixture-of-Expert Large Language Models

Add code
Apr 03, 2024
Figure 1 for Toward Inference-optimal Mixture-of-Expert Large Language Models
Figure 2 for Toward Inference-optimal Mixture-of-Expert Large Language Models
Figure 3 for Toward Inference-optimal Mixture-of-Expert Large Language Models
Figure 4 for Toward Inference-optimal Mixture-of-Expert Large Language Models
Viaarxiv icon

AutoGuide: Automated Generation and Selection of State-Aware Guidelines for Large Language Model Agents

Add code
Mar 13, 2024
Figure 1 for AutoGuide: Automated Generation and Selection of State-Aware Guidelines for Large Language Model Agents
Figure 2 for AutoGuide: Automated Generation and Selection of State-Aware Guidelines for Large Language Model Agents
Figure 3 for AutoGuide: Automated Generation and Selection of State-Aware Guidelines for Large Language Model Agents
Figure 4 for AutoGuide: Automated Generation and Selection of State-Aware Guidelines for Large Language Model Agents
Viaarxiv icon

Data Engineering for Scaling Language Models to 128K Context

Add code
Feb 15, 2024
Figure 1 for Data Engineering for Scaling Language Models to 128K Context
Figure 2 for Data Engineering for Scaling Language Models to 128K Context
Figure 3 for Data Engineering for Scaling Language Models to 128K Context
Figure 4 for Data Engineering for Scaling Language Models to 128K Context
Viaarxiv icon

Critical Data Size of Language Models from a Grokking Perspective

Add code
Feb 06, 2024
Viaarxiv icon

OpenMoE: An Early Effort on Open Mixture-of-Experts Language Models

Add code
Jan 29, 2024
Figure 1 for OpenMoE: An Early Effort on Open Mixture-of-Experts Language Models
Figure 2 for OpenMoE: An Early Effort on Open Mixture-of-Experts Language Models
Figure 3 for OpenMoE: An Early Effort on Open Mixture-of-Experts Language Models
Figure 4 for OpenMoE: An Early Effort on Open Mixture-of-Experts Language Models
Viaarxiv icon

ServerlessLLM: Locality-Enhanced Serverless Inference for Large Language Models

Add code
Jan 25, 2024
Viaarxiv icon