Picture for Dawei Zhu

Dawei Zhu

Language models can learn implicit multi-hop reasoning, but only if they have lots of training data

Add code
May 23, 2025
Viaarxiv icon

MiMo: Unlocking the Reasoning Potential of Language Model -- From Pretraining to Posttraining

Add code
May 12, 2025
Viaarxiv icon

Same evaluation, more tokens: On the effect of input length for machine translation evaluation using Large Language Models

Add code
May 03, 2025
Viaarxiv icon

Chain-of-Thought Matters: Improving Long-Context Language Models with Reasoning Path Supervision

Add code
Feb 28, 2025
Viaarxiv icon

LongAttn: Selecting Long-context Training Data via Token-level Attention

Add code
Feb 24, 2025
Viaarxiv icon

MMTEB: Massive Multilingual Text Embedding Benchmark

Add code
Feb 19, 2025
Viaarxiv icon

AFRIDOC-MT: Document-level MT Corpus for African Languages

Add code
Jan 10, 2025
Figure 1 for AFRIDOC-MT: Document-level MT Corpus for African Languages
Figure 2 for AFRIDOC-MT: Document-level MT Corpus for African Languages
Figure 3 for AFRIDOC-MT: Document-level MT Corpus for African Languages
Figure 4 for AFRIDOC-MT: Document-level MT Corpus for African Languages
Viaarxiv icon

More Tokens, Lower Precision: Towards the Optimal Token-Precision Trade-off in KV Cache Compression

Add code
Dec 17, 2024
Viaarxiv icon

Instruction-Tuned LLMs Succeed in Document-Level MT Without Fine-Tuning -- But BLEU Turns a Blind Eye

Add code
Oct 29, 2024
Figure 1 for Instruction-Tuned LLMs Succeed in Document-Level MT Without Fine-Tuning -- But BLEU Turns a Blind Eye
Figure 2 for Instruction-Tuned LLMs Succeed in Document-Level MT Without Fine-Tuning -- But BLEU Turns a Blind Eye
Figure 3 for Instruction-Tuned LLMs Succeed in Document-Level MT Without Fine-Tuning -- But BLEU Turns a Blind Eye
Figure 4 for Instruction-Tuned LLMs Succeed in Document-Level MT Without Fine-Tuning -- But BLEU Turns a Blind Eye
Viaarxiv icon

AgentBank: Towards Generalized LLM Agents via Fine-Tuning on 50000+ Interaction Trajectories

Add code
Oct 10, 2024
Figure 1 for AgentBank: Towards Generalized LLM Agents via Fine-Tuning on 50000+ Interaction Trajectories
Figure 2 for AgentBank: Towards Generalized LLM Agents via Fine-Tuning on 50000+ Interaction Trajectories
Figure 3 for AgentBank: Towards Generalized LLM Agents via Fine-Tuning on 50000+ Interaction Trajectories
Figure 4 for AgentBank: Towards Generalized LLM Agents via Fine-Tuning on 50000+ Interaction Trajectories
Viaarxiv icon