Picture for Xiaozhe Ren

Xiaozhe Ren

Pangu DeepDiver: Adaptive Search Intensity Scaling via Open-Web Reinforcement Learning

Add code
May 30, 2025
Viaarxiv icon

Self-Adjust Softmax

Add code
Feb 25, 2025
Viaarxiv icon

SepLLM: Accelerate Large Language Models by Compressing One Segment into One Separator

Add code
Dec 16, 2024
Figure 1 for SepLLM: Accelerate Large Language Models by Compressing One Segment into One Separator
Figure 2 for SepLLM: Accelerate Large Language Models by Compressing One Segment into One Separator
Figure 3 for SepLLM: Accelerate Large Language Models by Compressing One Segment into One Separator
Figure 4 for SepLLM: Accelerate Large Language Models by Compressing One Segment into One Separator
Viaarxiv icon

Scaling Law for Language Models Training Considering Batch Size

Add code
Dec 02, 2024
Figure 1 for Scaling Law for Language Models Training Considering Batch Size
Figure 2 for Scaling Law for Language Models Training Considering Batch Size
Figure 3 for Scaling Law for Language Models Training Considering Batch Size
Figure 4 for Scaling Law for Language Models Training Considering Batch Size
Viaarxiv icon

DAPE V2: Process Attention Score as Feature Map for Length Extrapolation

Add code
Oct 07, 2024
Figure 1 for DAPE V2: Process Attention Score as Feature Map for Length Extrapolation
Figure 2 for DAPE V2: Process Attention Score as Feature Map for Length Extrapolation
Figure 3 for DAPE V2: Process Attention Score as Feature Map for Length Extrapolation
Figure 4 for DAPE V2: Process Attention Score as Feature Map for Length Extrapolation
Viaarxiv icon

CAPE: Context-Adaptive Positional Encoding for Length Extrapolation

Add code
May 23, 2024
Figure 1 for CAPE: Context-Adaptive Positional Encoding for Length Extrapolation
Figure 2 for CAPE: Context-Adaptive Positional Encoding for Length Extrapolation
Figure 3 for CAPE: Context-Adaptive Positional Encoding for Length Extrapolation
Figure 4 for CAPE: Context-Adaptive Positional Encoding for Length Extrapolation
Viaarxiv icon

PixArt-Σ: Weak-to-Strong Training of Diffusion Transformer for 4K Text-to-Image Generation

Add code
Mar 07, 2024
Figure 1 for PixArt-Σ: Weak-to-Strong Training of Diffusion Transformer for 4K Text-to-Image Generation
Figure 2 for PixArt-Σ: Weak-to-Strong Training of Diffusion Transformer for 4K Text-to-Image Generation
Figure 3 for PixArt-Σ: Weak-to-Strong Training of Diffusion Transformer for 4K Text-to-Image Generation
Figure 4 for PixArt-Σ: Weak-to-Strong Training of Diffusion Transformer for 4K Text-to-Image Generation
Viaarxiv icon

A Survey of Reasoning with Foundation Models

Add code
Dec 26, 2023
Figure 1 for A Survey of Reasoning with Foundation Models
Figure 2 for A Survey of Reasoning with Foundation Models
Figure 3 for A Survey of Reasoning with Foundation Models
Figure 4 for A Survey of Reasoning with Foundation Models
Viaarxiv icon

EdgeFM: Leveraging Foundation Model for Open-set Learning on the Edge

Add code
Nov 23, 2023
Figure 1 for EdgeFM: Leveraging Foundation Model for Open-set Learning on the Edge
Figure 2 for EdgeFM: Leveraging Foundation Model for Open-set Learning on the Edge
Figure 3 for EdgeFM: Leveraging Foundation Model for Open-set Learning on the Edge
Figure 4 for EdgeFM: Leveraging Foundation Model for Open-set Learning on the Edge
Viaarxiv icon

CAME: Confidence-guided Adaptive Memory Efficient Optimization

Add code
Jul 05, 2023
Figure 1 for CAME: Confidence-guided Adaptive Memory Efficient Optimization
Figure 2 for CAME: Confidence-guided Adaptive Memory Efficient Optimization
Figure 3 for CAME: Confidence-guided Adaptive Memory Efficient Optimization
Figure 4 for CAME: Confidence-guided Adaptive Memory Efficient Optimization
Viaarxiv icon