Alert button
Picture for Mang Wang

Mang Wang

Alert button

Baichuan 2: Open Large-scale Language Models

Sep 20, 2023
Aiyuan Yang, Bin Xiao, Bingning Wang, Borong Zhang, Ce Bian, Chao Yin, Chenxu Lv, Da Pan, Dian Wang, Dong Yan, Fan Yang, Fei Deng, Feng Wang, Feng Liu, Guangwei Ai, Guosheng Dong, Haizhou Zhao, Hang Xu, Haoze Sun, Hongda Zhang, Hui Liu, Jiaming Ji, Jian Xie, JunTao Dai, Kun Fang, Lei Su, Liang Song, Lifeng Liu, Liyun Ru, Luyao Ma, Mang Wang, Mickel Liu, MingAn Lin, Nuolan Nie, Peidong Guo, Ruiyang Sun, Tao Zhang, Tianpeng Li, Tianyu Li, Wei Cheng, Weipeng Chen, Xiangrong Zeng, Xiaochuan Wang, Xiaoxi Chen, Xin Men, Xin Yu, Xuehai Pan, Yanjun Shen, Yiding Wang, Yiyu Li, Youxin Jiang, Yuchen Gao, Yupeng Zhang, Zenan Zhou, Zhiying Wu

Large language models (LLMs) have demonstrated remarkable performance on a variety of natural language tasks based on just a few examples of natural language instructions, reducing the need for extensive feature engineering. However, most powerful LLMs are closed-source or limited in their capability for languages other than English. In this technical report, we present Baichuan 2, a series of large-scale multilingual language models containing 7 billion and 13 billion parameters, trained from scratch, on 2.6 trillion tokens. Baichuan 2 matches or outperforms other open-source models of similar size on public benchmarks like MMLU, CMMLU, GSM8K, and HumanEval. Furthermore, Baichuan 2 excels in vertical domains such as medicine and law. We will release all pre-training model checkpoints to benefit the research community in better understanding the training dynamics of Baichuan 2.

* Baichuan 2 technical report. Github: https://github.com/baichuan-inc/Baichuan2 
Viaarxiv icon

Progressive Learning without Forgetting

Nov 28, 2022
Tao Feng, Hangjie Yuan, Mang Wang, Ziyuan Huang, Ang Bian, Jianzhou Zhang

Figure 1 for Progressive Learning without Forgetting
Figure 2 for Progressive Learning without Forgetting
Figure 3 for Progressive Learning without Forgetting
Figure 4 for Progressive Learning without Forgetting

Learning from changing tasks and sequential experience without forgetting the obtained knowledge is a challenging problem for artificial neural networks. In this work, we focus on two challenging problems in the paradigm of Continual Learning (CL) without involving any old data: (i) the accumulation of catastrophic forgetting caused by the gradually fading knowledge space from which the model learns the previous knowledge; (ii) the uncontrolled tug-of-war dynamics to balance the stability and plasticity during the learning of new tasks. In order to tackle these problems, we present Progressive Learning without Forgetting (PLwF) and a credit assignment regime in the optimizer. PLwF densely introduces model functions from previous tasks to construct a knowledge space such that it contains the most reliable knowledge on each task and the distribution information of different tasks, while credit assignment controls the tug-of-war dynamics by removing gradient conflict through projection. Extensive ablative experiments demonstrate the effectiveness of PLwF and credit assignment. In comparison with other CL methods, we report notably better results even without relying on any raw data.

Viaarxiv icon

Overcoming Catastrophic Forgetting in Incremental Object Detection via Elastic Response Distillation

Apr 05, 2022
Tao Feng, Mang Wang, Hangjie Yuan

Figure 1 for Overcoming Catastrophic Forgetting in Incremental Object Detection via Elastic Response Distillation
Figure 2 for Overcoming Catastrophic Forgetting in Incremental Object Detection via Elastic Response Distillation
Figure 3 for Overcoming Catastrophic Forgetting in Incremental Object Detection via Elastic Response Distillation
Figure 4 for Overcoming Catastrophic Forgetting in Incremental Object Detection via Elastic Response Distillation

Traditional object detectors are ill-equipped for incremental learning. However, fine-tuning directly on a well-trained detection model with only new data will lead to catastrophic forgetting. Knowledge distillation is a flexible way to mitigate catastrophic forgetting. In Incremental Object Detection (IOD), previous work mainly focuses on distilling for the combination of features and responses. However, they under-explore the information that contains in responses. In this paper, we propose a response-based incremental distillation method, dubbed Elastic Response Distillation (ERD), which focuses on elastically learning responses from the classification head and the regression head. Firstly, our method transfers category knowledge while equipping student detector with the ability to retain localization information during incremental learning. In addition, we further evaluate the quality of all locations and provide valuable responses by the Elastic Response Selection (ERS) strategy. Finally, we elucidate that the knowledge from different responses should be assigned with different importance during incremental distillation. Extensive experiments conducted on MS COCO demonstrate our method achieves state-of-the-art result, which substantially narrows the performance gap towards full training.

* Accepted by CVPR 2022. arXiv admin note: substantial text overlap with arXiv:2110.13471 
Viaarxiv icon

Detecting Human-Object Interactions with Object-Guided Cross-Modal Calibrated Semantics

Feb 01, 2022
Hangjie Yuan, Mang Wang, Dong Ni, Liangpeng Xu

Figure 1 for Detecting Human-Object Interactions with Object-Guided Cross-Modal Calibrated Semantics
Figure 2 for Detecting Human-Object Interactions with Object-Guided Cross-Modal Calibrated Semantics
Figure 3 for Detecting Human-Object Interactions with Object-Guided Cross-Modal Calibrated Semantics
Figure 4 for Detecting Human-Object Interactions with Object-Guided Cross-Modal Calibrated Semantics

Human-Object Interaction (HOI) detection is an essential task to understand human-centric images from a fine-grained perspective. Although end-to-end HOI detection models thrive, their paradigm of parallel human/object detection and verb class prediction loses two-stage methods' merit: object-guided hierarchy. The object in one HOI triplet gives direct clues to the verb to be predicted. In this paper, we aim to boost end-to-end models with object-guided statistical priors. Specifically, We propose to utilize a Verb Semantic Model (VSM) and use semantic aggregation to profit from this object-guided hierarchy. Similarity KL (SKL) loss is proposed to optimize VSM to align with the HOI dataset's priors. To overcome the static semantic embedding problem, we propose to generate cross-modality-aware visual and semantic features by Cross-Modal Calibration (CMC). The above modules combined composes Object-guided Cross-modal Calibration Network (OCN). Experiments conducted on two popular HOI detection benchmarks demonstrate the significance of incorporating the statistical prior knowledge and produce state-of-the-art performances. More detailed analysis indicates proposed modules serve as a stronger verb predictor and a more superior method of utilizing prior knowledge. The codes are available at \url{https://github.com/JacobYuan7/OCN-HOI-Benchmark}.

* Accepted to AAAI2022 
Viaarxiv icon

Response-based Distillation for Incremental Object Detection

Oct 26, 2021
Tao Feng, Mang Wang

Figure 1 for Response-based Distillation for Incremental Object Detection
Figure 2 for Response-based Distillation for Incremental Object Detection
Figure 3 for Response-based Distillation for Incremental Object Detection
Figure 4 for Response-based Distillation for Incremental Object Detection

Traditional object detection are ill-equipped for incremental learning. However, fine-tuning directly on a well-trained detection model with only new data will leads to catastrophic forgetting. Knowledge distillation is a straightforward way to mitigate catastrophic forgetting. In Incremental Object Detection (IOD), previous work mainly focuses on feature-level knowledge distillation, but the different response of detector has not been fully explored yet. In this paper, we propose a fully response-based incremental distillation method focusing on learning response from detection bounding boxes and classification predictions. Firstly, our method transferring category knowledge while equipping student model with the ability to retain localization knowledge during incremental learning. In addition, we further evaluate the qualities of all locations and provides valuable response by adaptive pseudo-label selection (APS) strategies. Finally, we elucidate that knowledge from different responses should be assigned with different importance during incremental distillation. Extensive experiments conducted on MS COCO demonstrate significant advantages of our method, which substantially narrow the performance gap towards full training.

Viaarxiv icon

Spatio-Temporal Dynamic Inference Network for Group Activity Recognition

Aug 26, 2021
Hangjie Yuan, Dong Ni, Mang Wang

Figure 1 for Spatio-Temporal Dynamic Inference Network for Group Activity Recognition
Figure 2 for Spatio-Temporal Dynamic Inference Network for Group Activity Recognition
Figure 3 for Spatio-Temporal Dynamic Inference Network for Group Activity Recognition
Figure 4 for Spatio-Temporal Dynamic Inference Network for Group Activity Recognition

Group activity recognition aims to understand the activity performed by a group of people. In order to solve it, modeling complex spatio-temporal interactions is the key. Previous methods are limited in reasoning on a predefined graph, which ignores the inherent person-specific interaction context. Moreover, they adopt inference schemes that are computationally expensive and easily result in the over-smoothing problem. In this paper, we manage to achieve spatio-temporal person-specific inferences by proposing Dynamic Inference Network (DIN), which composes of Dynamic Relation (DR) module and Dynamic Walk (DW) module. We firstly propose to initialize interaction fields on a primary spatio-temporal graph. Within each interaction field, we apply DR to predict the relation matrix and DW to predict the dynamic walk offsets in a joint-processing manner, thus forming a person-specific interaction graph. By updating features on the specific graph, a person can possess a global-level interaction field with a local initialization. Experiments indicate both modules' effectiveness. Moreover, DIN achieves significant improvement compared to previous state-of-the-art methods on two popular datasets under the same setting, while costing much less computation overhead of the reasoning module.

* Accepted to ICCV2021 
Viaarxiv icon