Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Cheng Li

Paul C. Lauterbur Research Center for Biomedical Imaging, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China

The Good, The Bad, and Why: Unveiling Emotions in Generative AI

Dec 19, 2023

Cheng Li, Jindong Wang, Yixuan Zhang, Kaijie Zhu, Xinyi Wang, Wenxin Hou, Jianxun Lian, Fang Luo, Qiang Yang, Xing Xie

Figure 1 for The Good, The Bad, and Why: Unveiling Emotions in Generative AI

Figure 2 for The Good, The Bad, and Why: Unveiling Emotions in Generative AI

Figure 3 for The Good, The Bad, and Why: Unveiling Emotions in Generative AI

Figure 4 for The Good, The Bad, and Why: Unveiling Emotions in Generative AI

Abstract:Emotion significantly impacts our daily behaviors and interactions. While recent generative AI models, such as large language models, have shown impressive performance in various tasks, it remains unclear whether they truly comprehend emotions. This paper aims to address this gap by incorporating psychological theories to gain a holistic understanding of emotions in generative AI models. Specifically, we propose three approaches: 1) EmotionPrompt to enhance AI model performance, 2) EmotionAttack to impair AI model performance, and 3) EmotionDecode to explain the effects of emotional stimuli, both benign and malignant. Through extensive experiments involving language and multi-modal models on semantic understanding, logical reasoning, and generation tasks, we demonstrate that both textual and visual EmotionPrompt can boost the performance of AI models while EmotionAttack can hinder it. Additionally, EmotionDecode reveals that AI models can comprehend emotional stimuli akin to the mechanism of dopamine in the human brain. Our work heralds a novel avenue for exploring psychology to enhance our understanding of generative AI models. This paper is an extended version of our previous work EmotionPrompt (arXiv:2307.11760).

* Technical report; an extension to EmotionPrompt (arXiv:2307.11760); 34 pages

Via

Access Paper or Ask Questions

Gemini: A Family of Highly Capable Multimodal Models

Dec 19, 2023

Gemini Team, Rohan Anil, Sebastian Borgeaud, Yonghui Wu, Jean-Baptiste Alayrac, Jiahui Yu, Radu Soricut, Johan Schalkwyk, Andrew M. Dai, Anja Hauth(+930 more)

Abstract:This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultra model advances the state of the art in 30 of 32 of these benchmarks - notably being the first model to achieve human-expert performance on the well-studied exam benchmark MMLU, and improving the state of the art in every one of the 20 multimodal benchmarks we examined. We believe that the new capabilities of Gemini models in cross-modal reasoning and language understanding will enable a wide variety of use cases and we discuss our approach toward deploying them responsibly to users.

Via

Access Paper or Ask Questions

Collaborative Learning for Annotation-Efficient Volumetric MR Image Segmentation

Dec 18, 2023

Yousuf Babiker M. Osman, Cheng Li, Weijian Huang, Shanshan Wang

Figure 1 for Collaborative Learning for Annotation-Efficient Volumetric MR Image Segmentation

Figure 2 for Collaborative Learning for Annotation-Efficient Volumetric MR Image Segmentation

Figure 3 for Collaborative Learning for Annotation-Efficient Volumetric MR Image Segmentation

Figure 4 for Collaborative Learning for Annotation-Efficient Volumetric MR Image Segmentation

Abstract:Background: Deep learning has presented great potential in accurate MR image segmentation when enough labeled data are provided for network optimization. However, manually annotating 3D MR images is tedious and time-consuming, requiring experts with rich domain knowledge and experience. Purpose: To build a deep learning method exploring sparse annotations, namely only a single 2D slice label for each 3D training MR image. Population: 3D MR images of 150 subjects from two publicly available datasets were included. Among them, 50 (1,377 image slices) are for prostate segmentation. The other 100 (8,800 image slices) are for left atrium segmentation. Five-fold cross-validation experiments were carried out utilizing the first dataset. For the second dataset, 80 subjects were used for training and 20 were used for testing. Assessment: A collaborative learning method by integrating the strengths of semi-supervised and self-supervised learning schemes was developed. The method was trained using labeled central slices and unlabeled non-central slices. Segmentation performance on testing set was reported quantitatively and qualitatively. Results: Compared to FS-LCS, MT, UA-MT, DCT-Seg, ICT, and AC-MT, the proposed method achieved a substantial improvement in segmentation accuracy, increasing the mean B-IoU significantly by more than 10.0% for prostate segmentation (proposed method B-IoU: 70.3% vs. ICT B-IoU: 60.3%) and by more than 6.0% for left atrium segmentation (proposed method B-IoU: 66.1% vs. ICT B-IoU: 60.1%).

* Paper accepted by Journal of Magnetic Resonance Imaging

Via

Access Paper or Ask Questions

Tessel: Boosting Distributed Execution of Large DNN Models via Flexible Schedule Search

Nov 26, 2023

Zhiqi Lin, Youshan Miao, Guanbin Xu, Cheng Li, Olli Saarikivi, Saeed Maleki, Fan Yang

Abstract:Increasingly complex and diverse deep neural network (DNN) models necessitate distributing the execution across multiple devices for training and inference tasks, and also require carefully planned schedules for performance. However, existing practices often rely on predefined schedules that may not fully exploit the benefits of emerging diverse model-aware operator placement strategies. Handcrafting high-efficiency schedules can be challenging due to the large and varying schedule space. This paper presents Tessel, an automated system that searches for efficient schedules for distributed DNN training and inference for diverse operator placement strategies. To reduce search costs, Tessel leverages the insight that the most efficient schedules often exhibit repetitive pattern (repetend) across different data inputs. This leads to a two-phase approach: repetend construction and schedule completion. By exploring schedules for various operator placement strategies, Tessel significantly improves both training and inference performance. Experiments with representative DNN models demonstrate that Tessel achieves up to 5.5x training performance speedup and up to 38% inference latency reduction.

* The paper is accepted by HPCA 2024

Via

Access Paper or Ask Questions

Does Role-Playing Chatbots Capture the Character Personalities? Assessing Personality Traits for Role-Playing Chatbots

Oct 30, 2023

Xintao Wang, Quan Tu, Yaying Fei, Ziang Leng, Cheng Li

Figure 1 for Does Role-Playing Chatbots Capture the Character Personalities? Assessing Personality Traits for Role-Playing Chatbots

Figure 2 for Does Role-Playing Chatbots Capture the Character Personalities? Assessing Personality Traits for Role-Playing Chatbots

Figure 3 for Does Role-Playing Chatbots Capture the Character Personalities? Assessing Personality Traits for Role-Playing Chatbots

Figure 4 for Does Role-Playing Chatbots Capture the Character Personalities? Assessing Personality Traits for Role-Playing Chatbots

Abstract:The emergence of large-scale pretrained language models has revolutionized the capabilities of new AI application, especially in the realm of crafting chatbots with distinct personas. Given the "stimulus-response" nature of chatbots, this paper unveils an innovative open-ended interview-style approach for personality assessment on role-playing chatbots, which offers a richer comprehension of their intrinsic personalities. We conduct personality assessments on 32 role-playing chatbots created by the ChatHaruhi library, across both the Big Five and MBTI dimensions, and measure their alignment with human perception. Evaluation results underscore that modern role-playing chatbots based on LLMs can effectively portray personality traits of corresponding characters, with an alignment rate of 82.8% compared with human-perceived personalities. Besides, we also suggest potential strategies for shaping chatbots' personalities. Hence, this paper serves as a cornerstone study for role-playing chatbots that intersects computational linguistics and psychology. Our resources are available at https://github.com/LC1332/Chat-Haruhi-Suzumiya

* A Personality Traits Test Over ChatHaruhi

Via

Access Paper or Ask Questions

Automated Evaluation of Personalized Text Generation using Large Language Models

Oct 17, 2023

Yaqing Wang, Jiepu Jiang, Mingyang Zhang, Cheng Li, Yi Liang, Qiaozhu Mei, Michael Bendersky

Figure 1 for Automated Evaluation of Personalized Text Generation using Large Language Models

Figure 2 for Automated Evaluation of Personalized Text Generation using Large Language Models

Figure 3 for Automated Evaluation of Personalized Text Generation using Large Language Models

Figure 4 for Automated Evaluation of Personalized Text Generation using Large Language Models

Abstract:Personalized text generation presents a specialized mechanism for delivering content that is specific to a user's personal context. While the research progress in this area has been rapid, evaluation still presents a challenge. Traditional automated metrics such as BLEU and ROUGE primarily measure lexical similarity to human-written references, and are not able to distinguish personalization from other subtle semantic aspects, thus falling short of capturing the nuances of personalized generated content quality. On the other hand, human judgments are costly to obtain, especially in the realm of personalized evaluation. Inspired by these challenges, we explore the use of large language models (LLMs) for evaluating personalized text generation, and examine their ability to understand nuanced user context. We present AuPEL, a novel evaluation method that distills three major semantic aspects of the generated text: personalization, quality and relevance, and automatically measures these aspects. To validate the effectiveness of AuPEL, we design carefully controlled experiments and compare the accuracy of the evaluation judgments made by LLMs versus that of judgements made by human annotators, and conduct rigorous analyses of the consistency and sensitivity of the proposed metric. We find that, compared to existing evaluation metrics, AuPEL not only distinguishes and ranks models based on their personalization abilities more accurately, but also presents commendable consistency and efficiency for this task. Our work suggests that using LLMs as the evaluators of personalized text generation is superior to traditional text similarity metrics, even though interesting new challenges still remain.

Via

Access Paper or Ask Questions

CrossZoom: Simultaneously Motion Deblurring and Event Super-Resolving

Sep 29, 2023

Chi Zhang, Xiang Zhang, Mingyuan Lin, Cheng Li, Chu He, Wen Yang, Gui-Song Xia, Lei Yu

Figure 1 for CrossZoom: Simultaneously Motion Deblurring and Event Super-Resolving

Figure 2 for CrossZoom: Simultaneously Motion Deblurring and Event Super-Resolving

Figure 3 for CrossZoom: Simultaneously Motion Deblurring and Event Super-Resolving

Figure 4 for CrossZoom: Simultaneously Motion Deblurring and Event Super-Resolving

Abstract:Even though the collaboration between traditional and neuromorphic event cameras brings prosperity to frame-event based vision applications, the performance is still confined by the resolution gap crossing two modalities in both spatial and temporal domains. This paper is devoted to bridging the gap by increasing the temporal resolution for images, i.e., motion deblurring, and the spatial resolution for events, i.e., event super-resolving, respectively. To this end, we introduce CrossZoom, a novel unified neural Network (CZ-Net) to jointly recover sharp latent sequences within the exposure period of a blurry input and the corresponding High-Resolution (HR) events. Specifically, we present a multi-scale blur-event fusion architecture that leverages the scale-variant properties and effectively fuses cross-modality information to achieve cross-enhancement. Attention-based adaptive enhancement and cross-interaction prediction modules are devised to alleviate the distortions inherent in Low-Resolution (LR) events and enhance the final results through the prior blur-event complementary information. Furthermore, we propose a new dataset containing HR sharp-blurry images and the corresponding HR-LR event streams to facilitate future research. Extensive qualitative and quantitative experiments on synthetic and real-world datasets demonstrate the effectiveness and robustness of the proposed method. Codes and datasets are released at https://bestrivenzc.github.io/CZ-Net/.

Via

Access Paper or Ask Questions

Automatic Prompt Rewriting for Personalized Text Generation

Sep 29, 2023

Cheng Li, Mingyang Zhang, Qiaozhu Mei, Weize Kong, Michael Bendersky

Figure 1 for Automatic Prompt Rewriting for Personalized Text Generation

Figure 2 for Automatic Prompt Rewriting for Personalized Text Generation

Figure 3 for Automatic Prompt Rewriting for Personalized Text Generation

Figure 4 for Automatic Prompt Rewriting for Personalized Text Generation

Abstract:Facilitated by large language models (LLMs), personalized text generation has become a rapidly growing research direction. Most existing studies focus on designing specialized models for a particular domain, or they require fine-tuning the LLMs to generate personalized text. We consider a typical scenario in which the large language model, which generates personalized output, is frozen and can only be accessed through APIs. Under this constraint, all one can do is to improve the input text (i.e., text prompts) sent to the LLM, a procedure that is usually done manually. In this paper, we propose a novel method to automatically revise prompts for personalized text generation. The proposed method takes the initial prompts generated by a state-of-the-art, multistage framework for personalized generation and rewrites a few critical components that summarize and synthesize the personal context. The prompt rewriter employs a training paradigm that chains together supervised learning (SL) and reinforcement learning (RL), where SL reduces the search space of RL and RL facilitates end-to-end training of the rewriter. Using datasets from three representative domains, we demonstrate that the rewritten prompts outperform both the original prompts and the prompts optimized via supervised learning or reinforcement learning alone. In-depth analysis of the rewritten prompts shows that they are not only human readable, but also able to guide manual revision of prompts when there is limited resource to employ reinforcement learning to train the prompt rewriter, or when it is costly to deploy an automatic prompt rewriter for inference.

Via

Access Paper or Ask Questions

Enhancing Representation in Radiography-Reports Foundation Model: A Granular Alignment Algorithm Using Masked Contrastive Learning

Sep 18, 2023

Weijian Huang, Cheng Li, Hao Yang, Jiarun Liu, Shanshan Wang

Abstract:Recently, multi-modal vision-language foundation models have gained significant attention in the medical field. While these models offer great opportunities, they still face a number of challenges, such as the requirement for fine-grained knowledge understanding in computer-aided diagnosis and capability of utilizing very limited or no task-specific labeled data in real-world clinical applications. In this study, we present MaCo, a novel multi-modal medical foundation model that explores masked contrastive learning to achieve granular alignment and zero-shot learning for a variety of medical imaging tasks. MaCo incorporates a correlation weighting mechanism to adjust the correlation between masked image patches and their corresponding reports, thereby enhancing the representation learning capabilities. We evaluate MaCo on six well-known open-source X-ray datasets, and the experimental results show it outperforms seven state-of-the-art approaches for classification, segmentation, and zero-shot phase grounding, demonstrating its great potential to promote a wide range of medical image analysis tasks.

Via

Access Paper or Ask Questions

PBFormer: Capturing Complex Scene Text Shape with Polynomial Band Transformer

Aug 29, 2023

Ruijin Liu, Ning Lu, Dapeng Chen, Cheng Li, Zejian Yuan, Wei Peng

Figure 1 for PBFormer: Capturing Complex Scene Text Shape with Polynomial Band Transformer

Figure 2 for PBFormer: Capturing Complex Scene Text Shape with Polynomial Band Transformer

Figure 3 for PBFormer: Capturing Complex Scene Text Shape with Polynomial Band Transformer

Figure 4 for PBFormer: Capturing Complex Scene Text Shape with Polynomial Band Transformer

Abstract:We present PBFormer, an efficient yet powerful scene text detector that unifies the transformer with a novel text shape representation Polynomial Band (PB). The representation has four polynomial curves to fit a text's top, bottom, left, and right sides, which can capture a text with a complex shape by varying polynomial coefficients. PB has appealing features compared with conventional representations: 1) It can model different curvatures with a fixed number of parameters, while polygon-points-based methods need to utilize a different number of points. 2) It can distinguish adjacent or overlapping texts as they have apparent different curve coefficients, while segmentation-based or points-based methods suffer from adhesive spatial positions. PBFormer combines the PB with the transformer, which can directly generate smooth text contours sampled from predicted curves without interpolation. A parameter-free cross-scale pixel attention (CPA) module is employed to highlight the feature map of a suitable scale while suppressing the other feature maps. The simple operation can help detect small-scale texts and is compatible with the one-stage DETR framework, where no postprocessing exists for NMS. Furthermore, PBFormer is trained with a shape-contained loss, which not only enforces the piecewise alignment between the ground truth and the predicted curves but also makes curves' positions and shapes consistent with each other. Without bells and whistles about text pre-training, our method is superior to the previous state-of-the-art text detectors on the arbitrary-shaped text datasets.

* 9 pages, 8 figures, accepted by ACM MM 2023

Via

Access Paper or Ask Questions