Alert button
Picture for Qinkai Zheng

Qinkai Zheng

Alert button

OctoPack: Instruction Tuning Code Large Language Models

Aug 14, 2023
Niklas Muennighoff, Qian Liu, Armel Zebaze, Qinkai Zheng, Binyuan Hui, Terry Yue Zhuo, Swayam Singh, Xiangru Tang, Leandro von Werra, Shayne Longpre

Figure 1 for OctoPack: Instruction Tuning Code Large Language Models
Figure 2 for OctoPack: Instruction Tuning Code Large Language Models
Figure 3 for OctoPack: Instruction Tuning Code Large Language Models
Figure 4 for OctoPack: Instruction Tuning Code Large Language Models

Finetuning large language models (LLMs) on instructions leads to vast performance improvements on natural language tasks. We apply instruction tuning using code, leveraging the natural structure of Git commits, which pair code changes with human instructions. We compile CommitPack: 4 terabytes of Git commits across 350 programming languages. We benchmark CommitPack against other natural and synthetic code instructions (xP3x, Self-Instruct, OASST) on the 16B parameter StarCoder model, and achieve state-of-the-art performance among models not trained on OpenAI outputs, on the HumanEval Python benchmark (46.2% pass@1). We further introduce HumanEvalPack, expanding the HumanEval benchmark to a total of 3 coding tasks (Code Repair, Code Explanation, Code Synthesis) across 6 languages (Python, JavaScript, Java, Go, C++, Rust). Our models, OctoCoder and OctoGeeX, achieve the best performance across HumanEvalPack among all permissive models, demonstrating CommitPack's benefits in generalizing to a wider set of languages and natural coding tasks. Code, models and data are freely available at https://github.com/bigcode-project/octopack.

* 57 pages (9 main), 39 figures, 16 tables 
Viaarxiv icon

CodeGeeX: A Pre-Trained Model for Code Generation with Multilingual Evaluations on HumanEval-X

Mar 30, 2023
Qinkai Zheng, Xiao Xia, Xu Zou, Yuxiao Dong, Shan Wang, Yufei Xue, Zihan Wang, Lei Shen, Andi Wang, Yang Li, Teng Su, Zhilin Yang, Jie Tang

Figure 1 for CodeGeeX: A Pre-Trained Model for Code Generation with Multilingual Evaluations on HumanEval-X
Figure 2 for CodeGeeX: A Pre-Trained Model for Code Generation with Multilingual Evaluations on HumanEval-X
Figure 3 for CodeGeeX: A Pre-Trained Model for Code Generation with Multilingual Evaluations on HumanEval-X
Figure 4 for CodeGeeX: A Pre-Trained Model for Code Generation with Multilingual Evaluations on HumanEval-X

Large pre-trained code generation models, such as OpenAI Codex, can generate syntax- and function-correct code, making the coding of programmers more productive and our pursuit of artificial general intelligence closer. In this paper, we introduce CodeGeeX, a multilingual model with 13 billion parameters for code generation. CodeGeeX is pre-trained on 850 billion tokens of 23 programming languages as of June 2022. Our extensive experiments suggest that CodeGeeX outperforms multilingual code models of similar scale for both the tasks of code generation and translation on HumanEval-X. Building upon HumanEval (Python only), we develop the HumanEval-X benchmark for evaluating multilingual models by hand-writing the solutions in C++, Java, JavaScript, and Go. In addition, we build CodeGeeX-based extensions on Visual Studio Code, JetBrains, and Cloud Studio, generating 4.7 billion tokens for tens of thousands of active users per week. Our user study demonstrates that CodeGeeX can help to increase coding efficiency for 83.4% of its users. Finally, CodeGeeX is publicly accessible and in Sep. 2022, we open-sourced its code, model weights (the version of 850B tokens), API, extensions, and HumanEval-X at https://github.com/THUDM/CodeGeeX.

Viaarxiv icon

GIPA++: A General Information Propagation Algorithm for Graph Learning

Jan 19, 2023
Houyi Li, Zhihong Chen, Zhao Li, Qinkai Zheng, Peng Zhang, Shuigeng Zhou

Figure 1 for GIPA++: A General Information Propagation Algorithm for Graph Learning
Figure 2 for GIPA++: A General Information Propagation Algorithm for Graph Learning
Figure 3 for GIPA++: A General Information Propagation Algorithm for Graph Learning

Graph neural networks (GNNs) have been widely used in graph-structured data computation, showing promising performance in various applications such as node classification, link prediction, and network recommendation. Existing works mainly focus on node-wise correlation when doing weighted aggregation of neighboring nodes based on attention, such as dot product by the dense vectors of two nodes. This may cause conflicting noise in nodes to be propagated when doing information propagation. To solve this problem, we propose a General Information Propagation Algorithm (GIPA in short), which exploits more fine-grained information fusion including bit-wise and feature-wise correlations based on edge features in their propagation. Specifically, the bit-wise correlation calculates the element-wise attention weight through a multi-layer perceptron (MLP) based on the dense representations of two nodes and their edge; The feature-wise correlation is based on the one-hot representations of node attribute features for feature selection. We evaluate the performance of GIPA on the Open Graph Benchmark proteins (OGBN-proteins for short) dataset and the Alipay dataset of Alibaba. Experimental results reveal that GIPA outperforms the state-of-the-art models in terms of prediction accuracy, e.g., GIPA achieves an average ROC-AUC of $0.8901\pm 0.0011$, which is better than that of all the existing methods listed in the OGBN-proteins leaderboard.

* Accepted by DASFAA2023. arXiv admin note: substantial text overlap with arXiv:2105.06035 
Viaarxiv icon

Graph Robustness Benchmark: Benchmarking the Adversarial Robustness of Graph Machine Learning

Nov 08, 2021
Qinkai Zheng, Xu Zou, Yuxiao Dong, Yukuo Cen, Da Yin, Jiarong Xu, Yang Yang, Jie Tang

Figure 1 for Graph Robustness Benchmark: Benchmarking the Adversarial Robustness of Graph Machine Learning
Figure 2 for Graph Robustness Benchmark: Benchmarking the Adversarial Robustness of Graph Machine Learning
Figure 3 for Graph Robustness Benchmark: Benchmarking the Adversarial Robustness of Graph Machine Learning
Figure 4 for Graph Robustness Benchmark: Benchmarking the Adversarial Robustness of Graph Machine Learning

Adversarial attacks on graphs have posed a major threat to the robustness of graph machine learning (GML) models. Naturally, there is an ever-escalating arms race between attackers and defenders. However, the strategies behind both sides are often not fairly compared under the same and realistic conditions. To bridge this gap, we present the Graph Robustness Benchmark (GRB) with the goal of providing a scalable, unified, modular, and reproducible evaluation for the adversarial robustness of GML models. GRB standardizes the process of attacks and defenses by 1) developing scalable and diverse datasets, 2) modularizing the attack and defense implementations, and 3) unifying the evaluation protocol in refined scenarios. By leveraging the GRB pipeline, the end-users can focus on the development of robust GML models with automated data processing and experimental evaluations. To support open and reproducible research on graph adversarial learning, GRB also hosts public leaderboards across different scenarios. As a starting point, we conduct extensive experiments to benchmark baseline techniques. GRB is open-source and welcomes contributions from the community. Datasets, codes, leaderboards are available at https://cogdl.ai/grb/home.

* 21 pages, 12 figures, NeurIPS 2021 Datasets and Benchmarks Track 
Viaarxiv icon

TDGIA:Effective Injection Attacks on Graph Neural Networks

Jun 12, 2021
Xu Zou, Qinkai Zheng, Yuxiao Dong, Xinyu Guan, Evgeny Kharlamov, Jialiang Lu, Jie Tang

Figure 1 for TDGIA:Effective Injection Attacks on Graph Neural Networks
Figure 2 for TDGIA:Effective Injection Attacks on Graph Neural Networks
Figure 3 for TDGIA:Effective Injection Attacks on Graph Neural Networks
Figure 4 for TDGIA:Effective Injection Attacks on Graph Neural Networks

Graph Neural Networks (GNNs) have achieved promising performance in various real-world applications. However, recent studies have shown that GNNs are vulnerable to adversarial attacks. In this paper, we study a recently-introduced realistic attack scenario on graphs -- graph injection attack (GIA). In the GIA scenario, the adversary is not able to modify the existing link structure and node attributes of the input graph, instead the attack is performed by injecting adversarial nodes into it. We present an analysis on the topological vulnerability of GNNs under GIA setting, based on which we propose the Topological Defective Graph Injection Attack (TDGIA) for effective injection attacks. TDGIA first introduces the topological defective edge selection strategy to choose the original nodes for connecting with the injected ones. It then designs the smooth feature optimization objective to generate the features for the injected nodes. Extensive experiments on large-scale datasets show that TDGIA can consistently and significantly outperform various attack baselines in attacking dozens of defense GNN models. Notably, the performance drop on target GNNs resultant from TDGIA is more than double the damage brought by the best attack solution among hundreds of submissions on KDD-CUP 2020.

* KDD 2021 research track paper 
Viaarxiv icon

GIPA: General Information Propagation Algorithm for Graph Learning

May 13, 2021
Qinkai Zheng, Houyi Li, Peng Zhang, Zhixiong Yang, Guowei Zhang, Xintan Zeng, Yongchao Liu

Figure 1 for GIPA: General Information Propagation Algorithm for Graph Learning
Figure 2 for GIPA: General Information Propagation Algorithm for Graph Learning
Figure 3 for GIPA: General Information Propagation Algorithm for Graph Learning

Graph neural networks (GNNs) have been popularly used in analyzing graph-structured data, showing promising results in various applications such as node classification, link prediction and network recommendation. In this paper, we present a new graph attention neural network, namely GIPA, for attributed graph data learning. GIPA consists of three key components: attention, feature propagation and aggregation. Specifically, the attention component introduces a new multi-layer perceptron based multi-head to generate better non-linear feature mapping and representation than conventional implementations such as dot-product. The propagation component considers not only node features but also edge features, which differs from existing GNNs that merely consider node features. The aggregation component uses a residual connection to generate the final embedding. We evaluate the performance of GIPA using the Open Graph Benchmark proteins (ogbn-proteins for short) dataset. The experimental results reveal that GIPA can beat the state-of-the-art models in terms of prediction accuracy, e.g., GIPA achieves an average ROC-AUC of $0.8700\pm 0.0010$ and outperforms all the previous methods listed in the ogbn-proteins leaderboard.

* 4 pages, 1 figure, technical report 
Viaarxiv icon

Mitigating Advanced Adversarial Attacks with More Advanced Gradient Obfuscation Techniques

May 27, 2020
Han Qiu, Yi Zeng, Qinkai Zheng, Tianwei Zhang, Meikang Qiu, Gerard Memmi

Figure 1 for Mitigating Advanced Adversarial Attacks with More Advanced Gradient Obfuscation Techniques
Figure 2 for Mitigating Advanced Adversarial Attacks with More Advanced Gradient Obfuscation Techniques
Figure 3 for Mitigating Advanced Adversarial Attacks with More Advanced Gradient Obfuscation Techniques
Figure 4 for Mitigating Advanced Adversarial Attacks with More Advanced Gradient Obfuscation Techniques

Deep Neural Networks (DNNs) are well-known to be vulnerable to Adversarial Examples (AEs). A large amount of efforts have been spent to launch and heat the arms race between the attackers and defenders. Recently, advanced gradient-based attack techniques were proposed (e.g., BPDA and EOT), which have defeated a considerable number of existing defense methods. Up to today, there are still no satisfactory solutions that can effectively and efficiently defend against those attacks. In this paper, we make a steady step towards mitigating those advanced gradient-based attacks with two major contributions. First, we perform an in-depth analysis about the root causes of those attacks, and propose four properties that can break the fundamental assumptions of those attacks. Second, we identify a set of operations that can meet those properties. By integrating these operations, we design two preprocessing functions that can invalidate these powerful attacks. Extensive evaluations indicate that our solutions can effectively mitigate all existing standard and advanced attack techniques, and beat 11 state-of-the-art defense solutions published in top-tier conferences over the past 2 years. The defender can employ our solutions to constrain the attack success rate below 7% for the strongest attacks even the adversary has spent dozens of GPU hours.

Viaarxiv icon

Investigating Image Applications Based on Spatial-Frequency Transform and Deep Learning Techniques

Mar 20, 2020
Qinkai Zheng, Han Qiu, Gerard Memmi, Isabelle Bloch

Figure 1 for Investigating Image Applications Based on Spatial-Frequency Transform and Deep Learning Techniques
Figure 2 for Investigating Image Applications Based on Spatial-Frequency Transform and Deep Learning Techniques
Figure 3 for Investigating Image Applications Based on Spatial-Frequency Transform and Deep Learning Techniques
Figure 4 for Investigating Image Applications Based on Spatial-Frequency Transform and Deep Learning Techniques

This is the report for the PRIM project in Telecom Paris. This report is about applications based on spatial-frequency transform and deep learning techniques. In this report, there are two main works. The first work is about the enhanced JPEG compression method based on deep learning. we propose a novel method to highly enhance the JPEG compression by transmitting fewer image data at the sender's end. At the receiver's end, we propose a DC recovery algorithm together with the deep residual learning framework to recover images with high quality. The second work is about adversarial examples defenses based on signal processing. We propose the wavelet extension method to extend image data features, which makes it more difficult to generate adversarial examples. We further adopt wavelet denoising to reduce the influence of the adversarial perturbations. With intensive experiments, we demonstrate that both works are effective in their application scenarios.

Viaarxiv icon