Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Xudong Gao

Hunyuan-TurboS: Advancing Large Language Models through Mamba-Transformer Synergy and Adaptive Chain-of-Thought

May 21, 2025

Ao Liu, Botong Zhou, Can Xu, Chayse Zhou, ChenChen Zhang, Chengcheng Xu, Chenhao Wang, Decheng Wu, Dengpeng Wu, Dian Jiao(+239 more)

Abstract:As Large Language Models (LLMs) rapidly advance, we introduce Hunyuan-TurboS, a novel large hybrid Transformer-Mamba Mixture of Experts (MoE) model. It synergistically combines Mamba's long-sequence processing efficiency with Transformer's superior contextual understanding. Hunyuan-TurboS features an adaptive long-short chain-of-thought (CoT) mechanism, dynamically switching between rapid responses for simple queries and deep "thinking" modes for complex problems, optimizing computational resources. Architecturally, this 56B activated (560B total) parameter model employs 128 layers (Mamba2, Attention, FFN) with an innovative AMF/MF block pattern. Faster Mamba2 ensures linear complexity, Grouped-Query Attention minimizes KV cache, and FFNs use an MoE structure. Pre-trained on 16T high-quality tokens, it supports a 256K context length and is the first industry-deployed large-scale Mamba model. Our comprehensive post-training strategy enhances capabilities via Supervised Fine-Tuning (3M instructions), a novel Adaptive Long-short CoT Fusion method, Multi-round Deliberation Learning for iterative improvement, and a two-stage Large-scale Reinforcement Learning process targeting STEM and general instruction-following. Evaluations show strong performance: overall top 7 rank on LMSYS Chatbot Arena with a score of 1356, outperforming leading models like Gemini-2.0-Flash-001 (1352) and o4-mini-2025-04-16 (1345). TurboS also achieves an average of 77.9% across 23 automated benchmarks. Hunyuan-TurboS balances high performance and efficiency, offering substantial capabilities at lower inference costs than many reasoning models, establishing a new paradigm for efficient large-scale pre-trained models.

Via

Access Paper or Ask Questions

On the Convergence of Sigmoid and tanh Fuzzy General Grey Cognitive Maps

Sep 09, 2024

Xudong Gao, Xiao Guang Gao, Jia Rong, Ni Li, Yifeng Niu, Jun Chen

Figure 1 for On the Convergence of Sigmoid and tanh Fuzzy General Grey Cognitive Maps

Figure 2 for On the Convergence of Sigmoid and tanh Fuzzy General Grey Cognitive Maps

Figure 3 for On the Convergence of Sigmoid and tanh Fuzzy General Grey Cognitive Maps

Figure 4 for On the Convergence of Sigmoid and tanh Fuzzy General Grey Cognitive Maps

Abstract:Fuzzy General Grey Cognitive Map (FGGCM) and Fuzzy Grey Cognitive Map (FGCM) are extensions of Fuzzy Cognitive Map (FCM) in terms of uncertainty. FGGCM allows for the processing of general grey number with multiple intervals, enabling FCM to better address uncertain situations. Although the convergence of FCM and FGCM has been discussed in many literature, the convergence of FGGCM has not been thoroughly explored. This paper aims to fill this research gap. First, metrics for the general grey number space and its vector space is given and proved using the Minkowski inequality. By utilizing the characteristic that Cauchy sequences are convergent sequences, the completeness of these two space is demonstrated. On this premise, utilizing Banach fixed point theorem and Browder-Gohde-Kirk fixed point theorem, combined with Lagrange's mean value theorem and Cauchy's inequality, deduces the sufficient conditions for FGGCM to converge to a unique fixed point when using tanh and sigmoid functions as activation functions. The sufficient conditions for the kernels and greyness of FGGCM to converge to a unique fixed point are also provided separately. Finally, based on Web Experience and Civil engineering FCM, designed corresponding FGGCM with sigmoid and tanh as activation functions by modifying the weights to general grey numbers. By comparing with the convergence theorems of FCM and FGCM, the effectiveness of the theorems proposed in this paper was verified. It was also demonstrated that the convergence theorems of FCM are special cases of the theorems proposed in this paper. The study for convergence of FGGCM is of great significance for guiding the learning algorithm of FGGCM, which is needed for designing FGGCM with specific fixed points, lays a solid theoretical foundation for the application of FGGCM in fields such as control, prediction, and decision support systems.

Via

Access Paper or Ask Questions

HB-net: Holistic bursting cell cluster integrated network for occluded multi-objects recognition

Oct 18, 2023

Xudong Gao, Xiao Guang Gao, Jia Rong, Xiaowei Chen, Xiang Liao, Jun Chen

Abstract:Within the realm of image recognition, a specific category of multi-label classification (MLC) challenges arises when objects within the visual field may occlude one another, demanding simultaneous identification of both occluded and occluding objects. Traditional convolutional neural networks (CNNs) can tackle these challenges; however, those models tend to be bulky and can only attain modest levels of accuracy. Leveraging insights from cutting-edge neural science research, specifically the Holistic Bursting (HB) cell, this paper introduces a pioneering integrated network framework named HB-net. Built upon the foundation of HB cell clusters, HB-net is designed to address the intricate task of simultaneously recognizing multiple occluded objects within images. Various Bursting cell cluster structures are introduced, complemented by an evidence accumulation mechanism. Testing is conducted on multiple datasets comprising digits and letters. The results demonstrate that models incorporating the HB framework exhibit a significant $2.98\%$ enhancement in recognition accuracy compared to models without the HB framework ($1.0298$ times, $p=0.0499$). Although in high-noise settings, standard CNNs exhibit slightly greater robustness when compared to HB-net models, the models that combine the HB framework and EA mechanism achieve a comparable level of accuracy and resilience to ResNet50, despite having only three convolutional layers and approximately $1/30$ of the parameters. The findings of this study offer valuable insights for improving computer vision algorithms. The essential code is provided at https://github.com/d-lab438/hb-net.git.

Via

Access Paper or Ask Questions