Picture for Junyang Lin

Junyang Lin

additional authors not shown

Disentangling Reasoning Tokens and Boilerplate Tokens For Language Model Fine-tuning

Add code
Dec 19, 2024
Figure 1 for Disentangling Reasoning Tokens and Boilerplate Tokens For Language Model Fine-tuning
Figure 2 for Disentangling Reasoning Tokens and Boilerplate Tokens For Language Model Fine-tuning
Figure 3 for Disentangling Reasoning Tokens and Boilerplate Tokens For Language Model Fine-tuning
Figure 4 for Disentangling Reasoning Tokens and Boilerplate Tokens For Language Model Fine-tuning
Viaarxiv icon

ExecRepoBench: Multi-level Executable Code Completion Evaluation

Add code
Dec 16, 2024
Figure 1 for ExecRepoBench: Multi-level Executable Code Completion Evaluation
Figure 2 for ExecRepoBench: Multi-level Executable Code Completion Evaluation
Figure 3 for ExecRepoBench: Multi-level Executable Code Completion Evaluation
Figure 4 for ExecRepoBench: Multi-level Executable Code Completion Evaluation
Viaarxiv icon

ProcessBench: Identifying Process Errors in Mathematical Reasoning

Add code
Dec 10, 2024
Figure 1 for ProcessBench: Identifying Process Errors in Mathematical Reasoning
Figure 2 for ProcessBench: Identifying Process Errors in Mathematical Reasoning
Figure 3 for ProcessBench: Identifying Process Errors in Mathematical Reasoning
Figure 4 for ProcessBench: Identifying Process Errors in Mathematical Reasoning
Viaarxiv icon

Evaluating and Aligning CodeLLMs on Human Preference

Add code
Dec 06, 2024
Figure 1 for Evaluating and Aligning CodeLLMs on Human Preference
Figure 2 for Evaluating and Aligning CodeLLMs on Human Preference
Figure 3 for Evaluating and Aligning CodeLLMs on Human Preference
Figure 4 for Evaluating and Aligning CodeLLMs on Human Preference
Viaarxiv icon

CC-OCR: A Comprehensive and Challenging OCR Benchmark for Evaluating Large Multimodal Models in Literacy

Add code
Dec 03, 2024
Viaarxiv icon

P-MMEval: A Parallel Multilingual Multitask Benchmark for Consistent Evaluation of LLMs

Add code
Nov 14, 2024
Figure 1 for P-MMEval: A Parallel Multilingual Multitask Benchmark for Consistent Evaluation of LLMs
Figure 2 for P-MMEval: A Parallel Multilingual Multitask Benchmark for Consistent Evaluation of LLMs
Figure 3 for P-MMEval: A Parallel Multilingual Multitask Benchmark for Consistent Evaluation of LLMs
Figure 4 for P-MMEval: A Parallel Multilingual Multitask Benchmark for Consistent Evaluation of LLMs
Viaarxiv icon

Language Models can Self-Lengthen to Generate Long Texts

Add code
Oct 31, 2024
Figure 1 for Language Models can Self-Lengthen to Generate Long Texts
Figure 2 for Language Models can Self-Lengthen to Generate Long Texts
Figure 3 for Language Models can Self-Lengthen to Generate Long Texts
Figure 4 for Language Models can Self-Lengthen to Generate Long Texts
Viaarxiv icon

Aligning CodeLLMs with Direct Preference Optimization

Add code
Oct 24, 2024
Viaarxiv icon

Aligning Large Language Models via Self-Steering Optimization

Add code
Oct 22, 2024
Viaarxiv icon

Rethinking Data Selection at Scale: Random Selection is Almost All You Need

Add code
Oct 12, 2024
Viaarxiv icon