Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Chen Huang

Breaking the Stigma! Unobtrusively Probe Symptoms in Depression Disorder Diagnosis Dialogue

Jan 25, 2025

Jieming Cao, Chen Huang, Yanan Zhang, Ruibo Deng, Jincheng Zhang, Wenqiang Lei

Figure 1 for Breaking the Stigma! Unobtrusively Probe Symptoms in Depression Disorder Diagnosis Dialogue

Figure 2 for Breaking the Stigma! Unobtrusively Probe Symptoms in Depression Disorder Diagnosis Dialogue

Figure 3 for Breaking the Stigma! Unobtrusively Probe Symptoms in Depression Disorder Diagnosis Dialogue

Figure 4 for Breaking the Stigma! Unobtrusively Probe Symptoms in Depression Disorder Diagnosis Dialogue

Abstract:Stigma has emerged as one of the major obstacles to effectively diagnosing depression, as it prevents users from open conversations about their struggles. This requires advanced questioning skills to carefully probe the presence of specific symptoms in an unobtrusive manner. While recent efforts have been made on depression-diagnosis-oriented dialogue systems, they largely ignore this problem, ultimately hampering their practical utility. To this end, we propose a novel and effective method, UPSD$^{4}$, developing a series of strategies to promote a sense of unobtrusiveness within the dialogue system and assessing depression disorder by probing symptoms. We experimentally show that UPSD$^{4}$ demonstrates a significant improvement over current baselines, including unobtrusiveness evaluation of dialogue content and diagnostic accuracy. We believe our work contributes to developing more accessible and user-friendly tools for addressing the widespread need for depression diagnosis.

* Findings of NAACL 2025

Via

Access Paper or Ask Questions

How to Enable Effective Cooperation Between Humans and NLP Models: A Survey of Principles, Formalizations, and Beyond

Jan 10, 2025

Chen Huang, Yang Deng, Wenqiang Lei, Jiancheng Lv, Tat-Seng Chua, Jimmy Xiangji Huang

Figure 1 for How to Enable Effective Cooperation Between Humans and NLP Models: A Survey of Principles, Formalizations, and Beyond

Figure 2 for How to Enable Effective Cooperation Between Humans and NLP Models: A Survey of Principles, Formalizations, and Beyond

Figure 3 for How to Enable Effective Cooperation Between Humans and NLP Models: A Survey of Principles, Formalizations, and Beyond

Figure 4 for How to Enable Effective Cooperation Between Humans and NLP Models: A Survey of Principles, Formalizations, and Beyond

Abstract:With the advancement of large language models (LLMs), intelligent models have evolved from mere tools to autonomous agents with their own goals and strategies for cooperating with humans. This evolution has birthed a novel paradigm in NLP, i.e., human-model cooperation, that has yielded remarkable progress in numerous NLP tasks in recent years. In this paper, we take the first step to present a thorough review of human-model cooperation, exploring its principles, formalizations, and open challenges. In particular, we introduce a new taxonomy that provides a unified perspective to summarize existing approaches. Also, we discuss potential frontier areas and their corresponding challenges. We regard our work as an entry point, paving the way for more breakthrough research in this regard.

* 23 pages

Via

Access Paper or Ask Questions

Cross-model Transferability among Large Language Models on the Platonic Representations of Concepts

Jan 02, 2025

Youcheng Huang, Chen Huang, Duanyu Feng, Wenqiang Lei, Jiancheng Lv

Figure 1 for Cross-model Transferability among Large Language Models on the Platonic Representations of Concepts

Figure 2 for Cross-model Transferability among Large Language Models on the Platonic Representations of Concepts

Figure 3 for Cross-model Transferability among Large Language Models on the Platonic Representations of Concepts

Figure 4 for Cross-model Transferability among Large Language Models on the Platonic Representations of Concepts

Abstract:Understanding the inner workings of Large Language Models (LLMs) is a critical research frontier. Prior research has shown that a single LLM's concept representations can be captured as steering vectors (SVs), enabling the control of LLM behavior (e.g., towards generating harmful content). Our work takes a novel approach by exploring the intricate relationships between concept representations across different LLMs, drawing an intriguing parallel to Plato's Allegory of the Cave. In particular, we introduce a linear transformation method to bridge these representations and present three key findings: 1) Concept representations across different LLMs can be effectively aligned using simple linear transformations, enabling efficient cross-model transfer and behavioral control via SVs. 2) This linear transformation generalizes across concepts, facilitating alignment and control of SVs representing different concepts across LLMs. 3) A weak-to-strong transferability exists between LLM concept representations, whereby SVs extracted from smaller LLMs can effectively control the behavior of larger LLMs.

Via

Access Paper or Ask Questions

Intuitive Axial Augmentation Using Polar-Sine-Based Piecewise Distortion for Medical Slice-Wise Segmentation

Dec 04, 2024

Yiqin Zhang, Qingkui Chen, Chen Huang, Zhengjie Zhang, Meiling Chen, Zhibing Fu

Abstract:Most data-driven models for medical image analysis rely on universal augmentations to improve performance. Experimental evidence has confirmed their effectiveness, but the unclear mechanism underlying them poses a barrier to the widespread acceptance and trust in such methods within the medical community. We revisit and acknowledge the unique characteristics of medical images apart from traditional digital images, and consequently, proposed a medical-specific augmentation algorithm that is more elastic and aligns well with radiology scan procedure. The method performs piecewise affine with sinusoidal distorted ray according to radius on polar coordinates, thus simulating uncertain postures of human lying flat on the scanning table. Our method could generate human visceral distribution without affecting the fundamental relative position on axial plane. Two non-adaptive algorithms, namely Meta-based Scan Table Removal and Similarity-Guided Parameter Search, are introduced to bolster robustness of our augmentation method. Experiments show our method improves accuracy across multiple famous segmentation frameworks without requiring more data samples. Our preview code is available in: https://github.com/MGAMZ/PSBPD.

Via

Access Paper or Ask Questions

GraphOTTER: Evolving LLM-based Graph Reasoning for Complex Table Question Answering

Dec 02, 2024

Qianlong Li, Chen Huang, Shuai Li, Yuanxin Xiang, Deng Xiong, Wenqiang Lei

Figure 1 for GraphOTTER: Evolving LLM-based Graph Reasoning for Complex Table Question Answering

Figure 2 for GraphOTTER: Evolving LLM-based Graph Reasoning for Complex Table Question Answering

Figure 3 for GraphOTTER: Evolving LLM-based Graph Reasoning for Complex Table Question Answering

Figure 4 for GraphOTTER: Evolving LLM-based Graph Reasoning for Complex Table Question Answering

Abstract:Complex Table Question Answering involves providing accurate answers to specific questions based on intricate tables that exhibit complex layouts and flexible header locations. Despite considerable progress having been made in the LLM era, the reasoning processes of existing methods are often implicit, feeding the entire table into prompts, making it difficult to effectively filter out irrelevant information in the table. To this end, we propose GraphOTTER that explicitly establishes the reasoning process to pinpoint the correct answers. In particular, GraphOTTER leverages a graph-based representation, transforming the complex table into an undirected graph. It then conducts step-by-step reasoning on the graph, with each step guided by a set of pre-defined intermediate reasoning actions. As such, it constructs a clear reasoning path and effectively identifies the answer to a given question. Comprehensive experiments on two benchmark datasets and two LLM backbones demonstrate the effectiveness of GraphOTTER. Further analysis indicates that its success may be attributed to the ability to efficiently filter out irrelevant information, thereby focusing the reasoning process on the most pertinent data. Our code and experimental datasets are available at \url{https://github.com/JDing0521/GraphOTTER}.

* COLING 2025, code is available at https://github.com/JDing0521/GraphOTTER

Via

Access Paper or Ask Questions

Aggregate-and-Adapt Natural Language Prompts for Downstream Generalization of CLIP

Oct 31, 2024

Chen Huang, Skyler Seto, Samira Abnar, David Grangier, Navdeep Jaitly, Josh Susskind

Figure 1 for Aggregate-and-Adapt Natural Language Prompts for Downstream Generalization of CLIP

Figure 2 for Aggregate-and-Adapt Natural Language Prompts for Downstream Generalization of CLIP

Figure 3 for Aggregate-and-Adapt Natural Language Prompts for Downstream Generalization of CLIP

Figure 4 for Aggregate-and-Adapt Natural Language Prompts for Downstream Generalization of CLIP

Abstract:Large pretrained vision-language models like CLIP have shown promising generalization capability, but may struggle in specialized domains (e.g., satellite imagery) or fine-grained classification (e.g., car models) where the visual concepts are unseen or under-represented during pretraining. Prompt learning offers a parameter-efficient finetuning framework that can adapt CLIP to downstream tasks even when limited annotation data are available. In this paper, we improve prompt learning by distilling the textual knowledge from natural language prompts (either human- or LLM-generated) to provide rich priors for those under-represented concepts. We first obtain a prompt ``summary'' aligned to each input image via a learned prompt aggregator. Then we jointly train a prompt generator, optimized to produce a prompt embedding that stays close to the aggregated summary while minimizing task loss at the same time. We dub such prompt embedding as Aggregate-and-Adapted Prompt Embedding (AAPE). AAPE is shown to be able to generalize to different downstream data distributions and tasks, including vision-language understanding tasks (e.g., few-shot classification, VQA) and generation tasks (image captioning) where AAPE achieves competitive performance. We also show AAPE is particularly helpful to handle non-canonical and OOD examples. Furthermore, AAPE learning eliminates LLM-based inference cost as required by baselines, and scales better with data and LLM model size.

* NeurIPS 2024

Via

Access Paper or Ask Questions

Cavia: Camera-controllable Multi-view Video Diffusion with View-Integrated Attention

Oct 14, 2024

Dejia Xu, Yifan Jiang, Chen Huang, Liangchen Song, Thorsten Gernoth, Liangliang Cao, Zhangyang Wang, Hao Tang

Abstract:In recent years there have been remarkable breakthroughs in image-to-video generation. However, the 3D consistency and camera controllability of generated frames have remained unsolved. Recent studies have attempted to incorporate camera control into the generation process, but their results are often limited to simple trajectories or lack the ability to generate consistent videos from multiple distinct camera paths for the same scene. To address these limitations, we introduce Cavia, a novel framework for camera-controllable, multi-view video generation, capable of converting an input image into multiple spatiotemporally consistent videos. Our framework extends the spatial and temporal attention modules into view-integrated attention modules, improving both viewpoint and temporal consistency. This flexible design allows for joint training with diverse curated data sources, including scene-level static videos, object-level synthetic multi-view dynamic videos, and real-world monocular dynamic videos. To our best knowledge, Cavia is the first of its kind that allows the user to precisely specify camera motion while obtaining object motion. Extensive experiments demonstrate that Cavia surpasses state-of-the-art methods in terms of geometric consistency and perceptual quality. Project Page: https://ir1d.github.io/Cavia/

* Project Page: https://ir1d.github.io/Cavia/

Via

Access Paper or Ask Questions

Beyond Persuasion: Towards Conversational Recommender System with Credible Explanations

Sep 22, 2024

Peixin Qin, Chen Huang, Yang Deng, Wenqiang Lei, Tat-Seng Chua

Figure 1 for Beyond Persuasion: Towards Conversational Recommender System with Credible Explanations

Figure 2 for Beyond Persuasion: Towards Conversational Recommender System with Credible Explanations

Figure 3 for Beyond Persuasion: Towards Conversational Recommender System with Credible Explanations

Figure 4 for Beyond Persuasion: Towards Conversational Recommender System with Credible Explanations

Abstract:With the aid of large language models, current conversational recommender system (CRS) has gaining strong abilities to persuade users to accept recommended items. While these CRSs are highly persuasive, they can mislead users by incorporating incredible information in their explanations, ultimately damaging the long-term trust between users and the CRS. To address this, we propose a simple yet effective method, called PC-CRS, to enhance the credibility of CRS's explanations during persuasion. It guides the explanation generation through our proposed credibility-aware persuasive strategies and then gradually refines explanations via post-hoc self-reflection. Experimental results demonstrate the efficacy of PC-CRS in promoting persuasive and credible explanations. Further analysis reveals the reason behind current methods producing incredible explanations and the potential of credible explanations to improve recommendation accuracy.

* Findings of EMNLP 2024

Via

Access Paper or Ask Questions

On the Limited Generalization Capability of the Implicit Reward Model Induced by Direct Preference Optimization

Sep 05, 2024

Yong Lin, Skyler Seto, Maartje ter Hoeve, Katherine Metcalf, Barry-John Theobald, Xuan Wang, Yizhe Zhang, Chen Huang, Tong Zhang

Figure 1 for On the Limited Generalization Capability of the Implicit Reward Model Induced by Direct Preference Optimization

Figure 2 for On the Limited Generalization Capability of the Implicit Reward Model Induced by Direct Preference Optimization

Figure 3 for On the Limited Generalization Capability of the Implicit Reward Model Induced by Direct Preference Optimization

Figure 4 for On the Limited Generalization Capability of the Implicit Reward Model Induced by Direct Preference Optimization

Abstract:Reinforcement Learning from Human Feedback (RLHF) is an effective approach for aligning language models to human preferences. Central to RLHF is learning a reward function for scoring human preferences. Two main approaches for learning a reward model are 1) training an EXplicit Reward Model (EXRM) as in RLHF, and 2) using an implicit reward learned from preference data through methods such as Direct Preference Optimization (DPO). Prior work has shown that the implicit reward model of DPO (denoted as DPORM) can approximate an EXRM in the limit. DPORM's effectiveness directly implies the optimality of the learned policy, and also has practical implication for LLM alignment methods including iterative DPO. However, it is unclear how well DPORM empirically matches the performance of EXRM. This work studies the accuracy at distinguishing preferred and rejected answers for both DPORM and EXRM. Our findings indicate that even though DPORM fits the training dataset comparably, it generalizes less effectively than EXRM, especially when the validation datasets contain distribution shifts. Across five out-of-distribution settings, DPORM has a mean drop in accuracy of 3% and a maximum drop of 7%. These findings highlight that DPORM has limited generalization ability and substantiates the integration of an explicit reward model in iterative DPO approaches.

* 12 pages, 8 tables, 2 figures

Via

Access Paper or Ask Questions

How JEPA Avoids Noisy Features: The Implicit Bias of Deep Linear Self Distillation Networks

Jul 03, 2024

Etai Littwin, Omid Saremi, Madhu Advani, Vimal Thilak, Preetum Nakkiran, Chen Huang, Joshua Susskind

Figure 1 for How JEPA Avoids Noisy Features: The Implicit Bias of Deep Linear Self Distillation Networks

Figure 2 for How JEPA Avoids Noisy Features: The Implicit Bias of Deep Linear Self Distillation Networks

Figure 3 for How JEPA Avoids Noisy Features: The Implicit Bias of Deep Linear Self Distillation Networks

Figure 4 for How JEPA Avoids Noisy Features: The Implicit Bias of Deep Linear Self Distillation Networks

Abstract:Two competing paradigms exist for self-supervised learning of data representations. Joint Embedding Predictive Architecture (JEPA) is a class of architectures in which semantically similar inputs are encoded into representations that are predictive of each other. A recent successful approach that falls under the JEPA framework is self-distillation, where an online encoder is trained to predict the output of the target encoder, sometimes using a lightweight predictor network. This is contrasted with the Masked AutoEncoder (MAE) paradigm, where an encoder and decoder are trained to reconstruct missing parts of the input in the data space rather, than its latent representation. A common motivation for using the JEPA approach over MAE is that the JEPA objective prioritizes abstract features over fine-grained pixel information (which can be unpredictable and uninformative). In this work, we seek to understand the mechanism behind this empirical observation by analyzing the training dynamics of deep linear models. We uncover a surprising mechanism: in a simplified linear setting where both approaches learn similar representations, JEPAs are biased to learn high-influence features, i.e., features characterized by having high regression coefficients. Our results point to a distinct implicit bias of predicting in latent space that may shed light on its success in practice.

* Technical report

Via

Access Paper or Ask Questions