Picture for Ya Wang

Ya Wang

Searth Transformer: A Transformer Architecture Incorporating Earth's Geospheric Physical Priors for Global Mid-Range Weather Forecasting

Add code
Jan 14, 2026
Viaarxiv icon

Acquiring Common Chinese Emotional Events Using Large Language Model

Add code
Nov 07, 2025
Viaarxiv icon

VGR: Visual Grounded Reasoning

Add code
Jun 16, 2025
Viaarxiv icon

Visual Perturbation and Adaptive Hard Negative Contrastive Learning for Compositional Reasoning in Vision-Language Models

Add code
May 21, 2025
Viaarxiv icon

Efficient Pretraining Length Scaling

Add code
Apr 21, 2025
Viaarxiv icon

HybridNorm: Towards Stable and Efficient Transformer Training via Hybrid Normalization

Add code
Mar 06, 2025
Viaarxiv icon

FwNet-ECA: Facilitating Window Attention with Global Receptive Fields through Fourier Filtering Operations

Add code
Feb 25, 2025
Viaarxiv icon

Scale-Distribution Decoupling: Enabling Stable and Effective Training of Large Language Models

Add code
Feb 21, 2025
Figure 1 for Scale-Distribution Decoupling: Enabling Stable and Effective Training of Large Language Models
Figure 2 for Scale-Distribution Decoupling: Enabling Stable and Effective Training of Large Language Models
Figure 3 for Scale-Distribution Decoupling: Enabling Stable and Effective Training of Large Language Models
Figure 4 for Scale-Distribution Decoupling: Enabling Stable and Effective Training of Large Language Models
Viaarxiv icon

Over-Tokenized Transformer: Vocabulary is Generally Worth Scaling

Add code
Jan 28, 2025
Viaarxiv icon

Explainable Fuzzy Neural Network with Multi-Fidelity Reinforcement Learning for Micro-Architecture Design Space Exploration

Add code
Dec 14, 2024
Viaarxiv icon