Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Lei Shen

A Group Fairness Lens for Large Language Models

Dec 24, 2023

Guanqun Bi, Lei Shen, Yuqiang Xie, Yanan Cao, Tiangang Zhu, Xiaodong He

Abstract:The rapid advancement of large language models has revolutionized various applications but also raised crucial concerns about their potential to perpetuate biases and unfairness when deployed in social media contexts. Evaluating LLMs' potential biases and fairness has become crucial, as existing methods rely on limited prompts focusing on just a few groups, lacking a comprehensive categorical perspective. In this paper, we propose evaluating LLM biases from a group fairness lens using a novel hierarchical schema characterizing diverse social groups. Specifically, we construct a dataset, GFair, encapsulating target-attribute combinations across multiple dimensions. In addition, we introduce statement organization, a new open-ended text generation task, to uncover complex biases in LLMs. Extensive evaluations of popular LLMs reveal inherent safety concerns. To mitigate the biases of LLM from a group fairness perspective, we pioneer a novel chain-of-thought method GF-Think to mitigate biases of LLMs from a group fairness perspective. Experimental results demonstrate its efficacy in mitigating bias in LLMs to achieve fairness.

* Work in progress

Via

Access Paper or Ask Questions

RPG-Palm: Realistic Pseudo-data Generation for Palmprint Recognition

Aug 08, 2023

Lei Shen, Jianlong Jin, Ruixin Zhang, Huaen Li, Kai Zhao, Yingyi Zhang, Jingyun Zhang, Shouhong Ding, Yang Zhao, Wei Jia

Figure 1 for RPG-Palm: Realistic Pseudo-data Generation for Palmprint Recognition

Figure 2 for RPG-Palm: Realistic Pseudo-data Generation for Palmprint Recognition

Figure 3 for RPG-Palm: Realistic Pseudo-data Generation for Palmprint Recognition

Figure 4 for RPG-Palm: Realistic Pseudo-data Generation for Palmprint Recognition

Abstract:Palmprint recently shows great potential in recognition applications as it is a privacy-friendly and stable biometric. However, the lack of large-scale public palmprint datasets limits further research and development of palmprint recognition. In this paper, we propose a novel realistic pseudo-palmprint generation (RPG) model to synthesize palmprints with massive identities. We first introduce a conditional modulation generator to improve the intra-class diversity. Then an identity-aware loss is proposed to ensure identity consistency against unpaired training. We further improve the B\'ezier palm creases generation strategy to guarantee identity independence. Extensive experimental results demonstrate that synthetic pretraining significantly boosts the recognition model performance. For example, our model improves the state-of-the-art B\'ezierPalm by more than $5\%$ and $14\%$ in terms of TAR@FAR=1e-6 under the $1:1$ and $1:3$ Open-set protocol. When accessing only $10\%$ of the real training data, our method still outperforms ArcFace with $100\%$ real training data, indicating that we are closer to real-data-free palmprint recognition.

* 12 pages,8 figures

Via

Access Paper or Ask Questions

MedAugment: Universal Automatic Data Augmentation Plug-in for Medical Image Analysis

Jun 30, 2023

Zhaoshan Liu, Qiujie Lv, Yifan Li, Ziduo Yang, Lei Shen

Figure 1 for MedAugment: Universal Automatic Data Augmentation Plug-in for Medical Image Analysis

Figure 2 for MedAugment: Universal Automatic Data Augmentation Plug-in for Medical Image Analysis

Figure 3 for MedAugment: Universal Automatic Data Augmentation Plug-in for Medical Image Analysis

Figure 4 for MedAugment: Universal Automatic Data Augmentation Plug-in for Medical Image Analysis

Abstract:Data Augmentation (DA) technique has been widely implemented in the computer vision field to relieve the data shortage, while the DA in Medical Image Analysis (MIA) is still mostly experience-driven. Here, we develop a plug-and-use DA method, named MedAugment, to introduce the automatic DA argumentation to the MIA field. To settle the difference between natural images and medical images, we divide the augmentation space into pixel augmentation space and spatial augmentation space. A novel operation sampling strategy is also proposed when sampling DA operations from the spaces. To demonstrate the performance and universality of MedAugment, we implement extensive experiments on four classification datasets and three segmentation datasets. The results show that our MedAugment outperforms most state-of-the-art DA methods. This work shows that the plug-and-use MedAugment may benefit the MIA community. Code is available at https://github.com/NUS-Tim/MedAugment_Pytorch.

* 24 pages, 6 figures

Via

Access Paper or Ask Questions

DiffusEmp: A Diffusion Model-Based Framework with Multi-Grained Control for Empathetic Response Generation

Jun 02, 2023

Guanqun Bi, Lei Shen, Yanan Cao, Meng Chen, Yuqiang Xie, Zheng Lin, Xiaodong He

Figure 1 for DiffusEmp: A Diffusion Model-Based Framework with Multi-Grained Control for Empathetic Response Generation

Figure 2 for DiffusEmp: A Diffusion Model-Based Framework with Multi-Grained Control for Empathetic Response Generation

Figure 3 for DiffusEmp: A Diffusion Model-Based Framework with Multi-Grained Control for Empathetic Response Generation

Figure 4 for DiffusEmp: A Diffusion Model-Based Framework with Multi-Grained Control for Empathetic Response Generation

Abstract:Empathy is a crucial factor in open-domain conversations, which naturally shows one's caring and understanding to others. Though several methods have been proposed to generate empathetic responses, existing works often lead to monotonous empathy that refers to generic and safe expressions. In this paper, we propose to use explicit control to guide the empathy expression and design a framework DiffusEmp based on conditional diffusion language model to unify the utilization of dialogue context and attribute-oriented control signals. Specifically, communication mechanism, intent, and semantic frame are imported as multi-grained signals that control the empathy realization from coarse to fine levels. We then design a specific masking strategy to reflect the relationship between multi-grained signals and response tokens, and integrate it into the diffusion model to influence the generative process. Experimental results on a benchmark dataset EmpatheticDialogue show that our framework outperforms competitive baselines in terms of controllability, informativeness, and diversity without the loss of context-relatedness.

* accepted by ACL 2023 main conference (Oral)

Via

Access Paper or Ask Questions

Is Translation Helpful? An Empirical Analysis of Cross-Lingual Transfer in Low-Resource Dialog Generation

May 21, 2023

Lei Shen, Shuai Yu, Xiaoyu Shen

Figure 1 for Is Translation Helpful? An Empirical Analysis of Cross-Lingual Transfer in Low-Resource Dialog Generation

Figure 2 for Is Translation Helpful? An Empirical Analysis of Cross-Lingual Transfer in Low-Resource Dialog Generation

Figure 3 for Is Translation Helpful? An Empirical Analysis of Cross-Lingual Transfer in Low-Resource Dialog Generation

Figure 4 for Is Translation Helpful? An Empirical Analysis of Cross-Lingual Transfer in Low-Resource Dialog Generation

Abstract:Cross-lingual transfer is important for developing high-quality chatbots in multiple languages due to the strongly imbalanced distribution of language resources. A typical approach is to leverage off-the-shelf machine translation (MT) systems to utilize either the training corpus or developed models from high-resource languages. In this work, we investigate whether it is helpful to utilize MT at all in this task. To do so, we simulate a low-resource scenario assuming access to limited Chinese dialog data in the movie domain and large amounts of English dialog data from multiple domains. Experiments show that leveraging English dialog corpora can indeed improve the naturalness, relevance and cross-domain transferability in Chinese. However, directly using English dialog corpora in its original form, surprisingly, is better than using its translated version. As the topics and wording habits in daily conversations are strongly culture-dependent, MT can reinforce the bias from high-resource languages, yielding unnatural generations in the target language. Considering the cost of translating large amounts of text and the strong effects of the translation quality, we suggest future research should rather focus on utilizing the original English data for cross-lingual transfer in dialog generation. We perform extensive human evaluations and ablation studies. The analysis results, together with the collected dataset, are presented to draw attention towards this area and benefit future research.

Via

Access Paper or Ask Questions

CodeGeeX: A Pre-Trained Model for Code Generation with Multilingual Evaluations on HumanEval-X

Mar 30, 2023

Qinkai Zheng, Xiao Xia, Xu Zou, Yuxiao Dong, Shan Wang, Yufei Xue, Zihan Wang, Lei Shen, Andi Wang, Yang Li(+3 more)

Figure 1 for CodeGeeX: A Pre-Trained Model for Code Generation with Multilingual Evaluations on HumanEval-X

Figure 2 for CodeGeeX: A Pre-Trained Model for Code Generation with Multilingual Evaluations on HumanEval-X

Figure 3 for CodeGeeX: A Pre-Trained Model for Code Generation with Multilingual Evaluations on HumanEval-X

Figure 4 for CodeGeeX: A Pre-Trained Model for Code Generation with Multilingual Evaluations on HumanEval-X

Abstract:Large pre-trained code generation models, such as OpenAI Codex, can generate syntax- and function-correct code, making the coding of programmers more productive and our pursuit of artificial general intelligence closer. In this paper, we introduce CodeGeeX, a multilingual model with 13 billion parameters for code generation. CodeGeeX is pre-trained on 850 billion tokens of 23 programming languages as of June 2022. Our extensive experiments suggest that CodeGeeX outperforms multilingual code models of similar scale for both the tasks of code generation and translation on HumanEval-X. Building upon HumanEval (Python only), we develop the HumanEval-X benchmark for evaluating multilingual models by hand-writing the solutions in C++, Java, JavaScript, and Go. In addition, we build CodeGeeX-based extensions on Visual Studio Code, JetBrains, and Cloud Studio, generating 4.7 billion tokens for tens of thousands of active users per week. Our user study demonstrates that CodeGeeX can help to increase coding efficiency for 83.4% of its users. Finally, CodeGeeX is publicly accessible and in Sep. 2022, we open-sourced its code, model weights (the version of 850B tokens), API, extensions, and HumanEval-X at https://github.com/THUDM/CodeGeeX.

Via

Access Paper or Ask Questions

DistilPose: Tokenized Pose Regression with Heatmap Distillation

Mar 16, 2023

Suhang Ye, Yingyi Zhang, Jie Hu, Liujuan Cao, Shengchuan Zhang, Lei Shen, Jun Wang, Shouhong Ding, Rongrong Ji

Figure 1 for DistilPose: Tokenized Pose Regression with Heatmap Distillation

Figure 2 for DistilPose: Tokenized Pose Regression with Heatmap Distillation

Figure 3 for DistilPose: Tokenized Pose Regression with Heatmap Distillation

Figure 4 for DistilPose: Tokenized Pose Regression with Heatmap Distillation

Abstract:In the field of human pose estimation, regression-based methods have been dominated in terms of speed, while heatmap-based methods are far ahead in terms of performance. How to take advantage of both schemes remains a challenging problem. In this paper, we propose a novel human pose estimation framework termed DistilPose, which bridges the gaps between heatmap-based and regression-based methods. Specifically, DistilPose maximizes the transfer of knowledge from the teacher model (heatmap-based) to the student model (regression-based) through Token-distilling Encoder (TDE) and Simulated Heatmaps. TDE aligns the feature spaces of heatmap-based and regression-based models by introducing tokenization, while Simulated Heatmaps transfer explicit guidance (distribution and confidence) from teacher heatmaps into student models. Extensive experiments show that the proposed DistilPose can significantly improve the performance of the regression-based models while maintaining efficiency. Specifically, on the MSCOCO validation dataset, DistilPose-S obtains 71.6% mAP with 5.36M parameter, 2.38 GFLOPs and 40.2 FPS, which saves 12.95x, 7.16x computational cost and is 4.9x faster than its teacher model with only 0.9 points performance drop. Furthermore, DistilPose-L obtains 74.4% mAP on MSCOCO validation dataset, achieving a new state-of-the-art among predominant regression-based models.

* accepted by CVPR2023

Via

Access Paper or Ask Questions

CECT: Controllable Ensemble CNN and Transformer for COVID-19 image classification by capturing both local and global image features

Feb 05, 2023

Zhaoshan Liu, Lei Shen

Figure 1 for CECT: Controllable Ensemble CNN and Transformer for COVID-19 image classification by capturing both local and global image features

Figure 2 for CECT: Controllable Ensemble CNN and Transformer for COVID-19 image classification by capturing both local and global image features

Figure 3 for CECT: Controllable Ensemble CNN and Transformer for COVID-19 image classification by capturing both local and global image features

Figure 4 for CECT: Controllable Ensemble CNN and Transformer for COVID-19 image classification by capturing both local and global image features

Abstract:Purpose: Most computer vision models are developed based on either convolutional neural network (CNN) or transformer, while the former (latter) method captures local (global) features. To relieve model performance limitations due to the lack of global (local) features, we develop a novel classification network named CECT by controllable ensemble CNN and transformer. Methods: The proposed CECT is composed of a CNN-based encoder block, a deconvolution-ensemble decoder block, and a transformer-based classification block. Different from conventional CNN- or transformer-based methods, our CECT can capture features at both multi-local and global scales, and the contribution of local features at different scales can be controlled with the proposed ensemble coefficients. Results: We evaluate CECT on two public COVID-19 datasets and it outperforms other state-of-the-art methods on all evaluation metrics. Conclusion: With remarkable feature capture ability, we believe CECT can also be used in other medical image classification scenarios to assist the diagnosis.

* 20 pages, 5 figures

Via

Access Paper or Ask Questions

Differentially Private Natural Language Models: Recent Advances and Future Directions

Jan 22, 2023

Lijie Hu, Ivan Habernal, Lei Shen, Di Wang

Abstract:Recent developments in deep learning have led to great success in various natural language processing (NLP) tasks. However, these applications may involve data that contain sensitive information. Therefore, how to achieve good performance while also protect privacy of sensitive data is a crucial challenge in NLP. To preserve privacy, Differential Privacy (DP), which can prevent reconstruction attacks and protect against potential side knowledge, is becoming a de facto technique for private data analysis. In recent years, NLP in DP models (DP-NLP) has been studied from different perspectives, which deserves a comprehensive review. In this paper, we provide the first systematic review of recent advances on DP deep learning models in NLP. In particular, we first discuss some differences and additional challenges of DP-NLP compared with the standard DP deep learning. Then we investigate some existing work on DP-NLP and present its recent developments from two aspects: gradient perturbation based methods and embedding vector perturbation based methods. We also discuss some challenges and future directions of this topic.

Via

Access Paper or Ask Questions

Coordinating Cross-modal Distillation for Molecular Property Prediction

Nov 30, 2022

Hao Zhang, Nan Zhang, Ruixin Zhang, Lei Shen, Yingyi Zhang, Meng Liu

Figure 1 for Coordinating Cross-modal Distillation for Molecular Property Prediction

Figure 2 for Coordinating Cross-modal Distillation for Molecular Property Prediction

Figure 3 for Coordinating Cross-modal Distillation for Molecular Property Prediction

Figure 4 for Coordinating Cross-modal Distillation for Molecular Property Prediction

Abstract:In recent years, molecular graph representation learning (GRL) has drawn much more attention in molecular property prediction (MPP) problems. The existing graph methods have demonstrated that 3D geometric information is significant for better performance in MPP. However, accurate 3D structures are often costly and time-consuming to obtain, limiting the large-scale application of GRL. It is an intuitive solution to train with 3D to 2D knowledge distillation and predict with only 2D inputs. But some challenging problems remain open for 3D to 2D distillation. One is that the 3D view is quite distinct from the 2D view, and the other is that the gradient magnitudes of atoms in distillation are discrepant and unstable due to the variable molecular size. To address these challenging problems, we exclusively propose a distillation framework that contains global molecular distillation and local atom distillation. We also provide a theoretical insight to justify how to coordinate atom and molecular information, which tackles the drawback of variable molecular size for atom information distillation. Experimental results on two popular molecular datasets demonstrate that our proposed model achieves superior performance over other methods. Specifically, on the largest MPP dataset PCQM4Mv2 served as an "ImageNet Large Scale Visual Recognition Challenge" in the field of graph ML, the proposed method achieved a 6.9% improvement compared with the best works. And we obtained fourth place with the MAE of 0.0734 on the test-challenge set for OGB-LSC 2022 Graph Regression Task. We will release the code soon.

Via

Access Paper or Ask Questions