Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Peixin Qin

SCOP: Evaluating the Comprehension Process of Large Language Models from a Cognitive View

Jun 05, 2025

Yongjie Xiao, Hongru Liang, Peixin Qin, Yao Zhang, Wenqiang Lei

Figure 1 for SCOP: Evaluating the Comprehension Process of Large Language Models from a Cognitive View

Figure 2 for SCOP: Evaluating the Comprehension Process of Large Language Models from a Cognitive View

Figure 3 for SCOP: Evaluating the Comprehension Process of Large Language Models from a Cognitive View

Figure 4 for SCOP: Evaluating the Comprehension Process of Large Language Models from a Cognitive View

Abstract:Despite the great potential of large language models(LLMs) in machine comprehension, it is still disturbing to fully count on them in real-world scenarios. This is probably because there is no rational explanation for whether the comprehension process of LLMs is aligned with that of experts. In this paper, we propose SCOP to carefully examine how LLMs perform during the comprehension process from a cognitive view. Specifically, it is equipped with a systematical definition of five requisite skills during the comprehension process, a strict framework to construct testing data for these skills, and a detailed analysis of advanced open-sourced and closed-sourced LLMs using the testing data. With SCOP, we find that it is still challenging for LLMs to perform an expert-level comprehension process. Even so, we notice that LLMs share some similarities with experts, e.g., performing better at comprehending local information than global information. Further analysis reveals that LLMs can be somewhat unreliable -- they might reach correct answers through flawed comprehension processes. Based on SCOP, we suggest that one direction for improving LLMs is to focus more on the comprehension process, ensuring all comprehension skills are thoroughly developed during training.

* arXiv admin note: text overlap with arXiv:2004.14535 by other authors

Via

Access Paper or Ask Questions

Beyond Persuasion: Towards Conversational Recommender System with Credible Explanations

Sep 22, 2024

Peixin Qin, Chen Huang, Yang Deng, Wenqiang Lei, Tat-Seng Chua

Figure 1 for Beyond Persuasion: Towards Conversational Recommender System with Credible Explanations

Figure 2 for Beyond Persuasion: Towards Conversational Recommender System with Credible Explanations

Figure 3 for Beyond Persuasion: Towards Conversational Recommender System with Credible Explanations

Figure 4 for Beyond Persuasion: Towards Conversational Recommender System with Credible Explanations

Abstract:With the aid of large language models, current conversational recommender system (CRS) has gaining strong abilities to persuade users to accept recommended items. While these CRSs are highly persuasive, they can mislead users by incorporating incredible information in their explanations, ultimately damaging the long-term trust between users and the CRS. To address this, we propose a simple yet effective method, called PC-CRS, to enhance the credibility of CRS's explanations during persuasion. It guides the explanation generation through our proposed credibility-aware persuasive strategies and then gradually refines explanations via post-hoc self-reflection. Experimental results demonstrate the efficacy of PC-CRS in promoting persuasive and credible explanations. Further analysis reveals the reason behind current methods producing incredible explanations and the potential of credible explanations to improve recommendation accuracy.

* Findings of EMNLP 2024

Via

Access Paper or Ask Questions

CLAMBER: A Benchmark of Identifying and Clarifying Ambiguous Information Needs in Large Language Models

May 20, 2024

Tong Zhang, Peixin Qin, Yang Deng, Chen Huang, Wenqiang Lei, Junhong Liu, Dingnan Jin, Hongru Liang, Tat-Seng Chua

Figure 1 for CLAMBER: A Benchmark of Identifying and Clarifying Ambiguous Information Needs in Large Language Models

Figure 2 for CLAMBER: A Benchmark of Identifying and Clarifying Ambiguous Information Needs in Large Language Models

Figure 3 for CLAMBER: A Benchmark of Identifying and Clarifying Ambiguous Information Needs in Large Language Models

Figure 4 for CLAMBER: A Benchmark of Identifying and Clarifying Ambiguous Information Needs in Large Language Models

Abstract:Large language models (LLMs) are increasingly used to meet user information needs, but their effectiveness in dealing with user queries that contain various types of ambiguity remains unknown, ultimately risking user trust and satisfaction. To this end, we introduce CLAMBER, a benchmark for evaluating LLMs using a well-organized taxonomy. Building upon the taxonomy, we construct ~12K high-quality data to assess the strengths, weaknesses, and potential risks of various off-the-shelf LLMs. Our findings indicate the limited practical utility of current LLMs in identifying and clarifying ambiguous user queries, even enhanced by chain-of-thought (CoT) and few-shot prompting. These techniques may result in overconfidence in LLMs and yield only marginal enhancements in identifying ambiguity. Furthermore, current LLMs fall short in generating high-quality clarifying questions due to a lack of conflict resolution and inaccurate utilization of inherent knowledge. In this paper, CLAMBER presents a guidance and promotes further research on proactive and trustworthy LLMs. Our dataset is available at https://github.com/zt991211/CLAMBER

* Accepted to ACL 2024

Via

Access Paper or Ask Questions

Concept -- An Evaluation Protocol on Conversation Recommender Systems with System-centric and User-centric Factors

Apr 06, 2024

Chen Huang, Peixin Qin, Yang Deng, Wenqiang Lei, Jiancheng Lv, Tat-Seng Chua

Figure 1 for Concept -- An Evaluation Protocol on Conversation Recommender Systems with System-centric and User-centric Factors

Figure 2 for Concept -- An Evaluation Protocol on Conversation Recommender Systems with System-centric and User-centric Factors

Figure 3 for Concept -- An Evaluation Protocol on Conversation Recommender Systems with System-centric and User-centric Factors

Figure 4 for Concept -- An Evaluation Protocol on Conversation Recommender Systems with System-centric and User-centric Factors

Abstract:The conversational recommendation system (CRS) has been criticized regarding its user experience in real-world scenarios, despite recent significant progress achieved in academia. Existing evaluation protocols for CRS may prioritize system-centric factors such as effectiveness and fluency in conversation while neglecting user-centric aspects. Thus, we propose a new and inclusive evaluation protocol, Concept, which integrates both system- and user-centric factors. We conceptualise three key characteristics in representing such factors and further divide them into six primary abilities. To implement Concept, we adopt a LLM-based user simulator and evaluator with scoring rubrics that are tailored for each primary ability. Our protocol, Concept, serves a dual purpose. First, it provides an overview of the pros and cons in current CRS models. Second, it pinpoints the problem of low usability in the "omnipotent" ChatGPT and offers a comprehensive reference guide for evaluating CRS, thereby setting the foundation for CRS improvement.

* 27 pages, 18 tables, and 10 figures

Via

Access Paper or Ask Questions

Towards Equipping Transformer with the Ability of Systematic Compositionality

Dec 12, 2023

Chen Huang, Peixin Qin, Wenqiang Lei, Jiancheng Lv

Figure 1 for Towards Equipping Transformer with the Ability of Systematic Compositionality

Figure 2 for Towards Equipping Transformer with the Ability of Systematic Compositionality

Figure 3 for Towards Equipping Transformer with the Ability of Systematic Compositionality

Figure 4 for Towards Equipping Transformer with the Ability of Systematic Compositionality

Abstract:One of the key factors in language productivity and human cognition is the ability of systematic compositionality, which refers to understanding composed unseen examples of seen primitives. However, recent evidence reveals that the Transformers have difficulty generalizing the composed context based on the seen primitives. To this end, we take the first step to propose a compositionality-aware Transformer called CAT and two novel pre-training tasks to facilitate systematic compositionality. We tentatively provide a successful implementation of a multi-layer CAT on the basis of the especially popular BERT. The experimental results demonstrate that CAT outperforms baselines on compositionality-aware tasks with minimal impact on the effectiveness on standardized language understanding tasks.

* Accepted to AAAI 2024. Paper with appendix

Via

Access Paper or Ask Questions