Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Louis-Philippe Morency

Shammie

LlaMADRS: Prompting Large Language Models for Interview-Based Depression Assessment

Jan 07, 2025

Gaoussou Youssouf Kebe, Jeffrey M. Girard, Einat Liebenthal, Justin Baker, Fernando De la Torre, Louis-Philippe Morency

Abstract:This study introduces LlaMADRS, a novel framework leveraging open-source Large Language Models (LLMs) to automate depression severity assessment using the Montgomery-Asberg Depression Rating Scale (MADRS). We employ a zero-shot prompting strategy with carefully designed cues to guide the model in interpreting and scoring transcribed clinical interviews. Our approach, tested on 236 real-world interviews from the Context-Adaptive Multimodal Informatics (CAMI) dataset, demonstrates strong correlations with clinician assessments. The Qwen 2.5--72b model achieves near-human level agreement across most MADRS items, with Intraclass Correlation Coefficients (ICC) closely approaching those between human raters. We provide a comprehensive analysis of model performance across different MADRS items, highlighting strengths and current limitations. Our findings suggest that LLMs, with appropriate prompting, can serve as efficient tools for mental health assessment, potentially increasing accessibility in resource-limited settings. However, challenges remain, particularly in assessing symptoms that rely on non-verbal cues, underscoring the need for multimodal approaches in future work.

Via

Access Paper or Ask Questions

Isolated Causal Effects of Natural Language

Oct 18, 2024

Victoria Lin, Louis-Philippe Morency, Eli Ben-Michael

Figure 1 for Isolated Causal Effects of Natural Language

Figure 2 for Isolated Causal Effects of Natural Language

Figure 3 for Isolated Causal Effects of Natural Language

Figure 4 for Isolated Causal Effects of Natural Language

Abstract:As language technologies become widespread, it is important to understand how variations in language affect reader perceptions -- formalized as the isolated causal effect of some focal language-encoded intervention on an external outcome. A core challenge of estimating isolated effects is the need to approximate all non-focal language outside of the intervention. In this paper, we introduce a formal estimation framework for isolated causal effects and explore how different approximations of non-focal language impact effect estimates. Drawing on the principle of omitted variable bias, we present metrics for evaluating the quality of isolated effect estimation and non-focal language approximation along the axes of fidelity and overlap. In experiments on semi-synthetic and real-world data, we validate the ability of our framework to recover ground truth isolated effects, and we demonstrate the utility of our proposed metrics as measures of quality for both isolated effect estimates and non-focal language approximations.

Via

Access Paper or Ask Questions

IoT-LM: Large Multisensory Language Models for the Internet of Things

Jul 13, 2024

Shentong Mo, Russ Salakhutdinov, Louis-Philippe Morency, Paul Pu Liang

Figure 1 for IoT-LM: Large Multisensory Language Models for the Internet of Things

Figure 2 for IoT-LM: Large Multisensory Language Models for the Internet of Things

Figure 3 for IoT-LM: Large Multisensory Language Models for the Internet of Things

Figure 4 for IoT-LM: Large Multisensory Language Models for the Internet of Things

Abstract:The Internet of Things (IoT) network integrating billions of smart physical devices embedded with sensors, software, and communication technologies is a critical and rapidly expanding component of our modern world. The IoT ecosystem provides a rich source of real-world modalities such as motion, thermal, geolocation, imaging, depth, sensors, and audio to recognize the states of humans and physical objects. Machine learning presents a rich opportunity to automatically process IoT data at scale, enabling efficient inference for understanding human wellbeing, controlling physical devices, and interconnecting smart cities. To realize this potential, we introduce IoT-LM, an open-source large multisensory language model tailored for the IoT ecosystem. IoT-LM is enabled by two technical contributions: the first is MultiIoT, the most expansive unified IoT dataset to date, encompassing over 1.15 million samples from 12 modalities and 8 tasks prepared for multisensory pre-training and instruction-tuning. The second is a new multisensory multitask adapter layer to condition pre-trained large language models on multisensory IoT data. Not only does IoT-LM yield substantial improvements on 8 supervised IoT classification tasks, but it also demonstrates new interactive question-answering, reasoning, and dialog capabilities conditioned on IoT sensors. We release IoT-LM's data sources and new multisensory language modeling framework.

* arXiv admin note: text overlap with arXiv:2311.06217

Via

Access Paper or Ask Questions

HEMM: Holistic Evaluation of Multimodal Foundation Models

Jul 03, 2024

Paul Pu Liang, Akshay Goindani, Talha Chafekar, Leena Mathur, Haofei Yu, Ruslan Salakhutdinov, Louis-Philippe Morency

Figure 1 for HEMM: Holistic Evaluation of Multimodal Foundation Models

Figure 2 for HEMM: Holistic Evaluation of Multimodal Foundation Models

Figure 3 for HEMM: Holistic Evaluation of Multimodal Foundation Models

Figure 4 for HEMM: Holistic Evaluation of Multimodal Foundation Models

Abstract:Multimodal foundation models that can holistically process text alongside images, video, audio, and other sensory modalities are increasingly used in a variety of real-world applications. However, it is challenging to characterize and study progress in multimodal foundation models, given the range of possible modeling decisions, tasks, and domains. In this paper, we introduce Holistic Evaluation of Multimodal Models (HEMM) to systematically evaluate the capabilities of multimodal foundation models across a set of 3 dimensions: basic skills, information flow, and real-world use cases. Basic multimodal skills are internal abilities required to solve problems, such as learning interactions across modalities, fine-grained alignment, multi-step reasoning, and the ability to handle external knowledge. Information flow studies how multimodal content changes during a task through querying, translation, editing, and fusion. Use cases span domain-specific challenges introduced in real-world multimedia, affective computing, natural sciences, healthcare, and human-computer interaction applications. Through comprehensive experiments across the 30 tasks in HEMM, we (1) identify key dataset dimensions (e.g., basic skills, information flows, and use cases) that pose challenges to today's models, and (2) distill performance trends regarding how different modeling dimensions (e.g., scale, pre-training data, multimodal alignment, pre-training, and instruction tuning objectives) influence performance. Our conclusions regarding challenging multimodal interactions, use cases, and tasks requiring reasoning and external knowledge, the benefits of data and model scale, and the impacts of instruction tuning yield actionable insights for future work in multimodal foundation models.

* Code available at https://github.com/pliang279/HEMM

Via

Access Paper or Ask Questions

Advancing Social Intelligence in AI Agents: Technical Challenges and Open Questions

Apr 17, 2024

Leena Mathur, Paul Pu Liang, Louis-Philippe Morency

Abstract:Building socially-intelligent AI agents (Social-AI) is a multidisciplinary, multimodal research goal that involves creating agents that can sense, perceive, reason about, learn from, and respond to affect, behavior, and cognition of other agents (human or artificial). Progress towards Social-AI has accelerated in the past decade across several computing communities, including natural language processing, machine learning, robotics, human-machine interaction, computer vision, and speech. Natural language processing, in particular, has been prominent in Social-AI research, as language plays a key role in constructing the social world. In this position paper, we identify a set of underlying technical challenges and open questions for researchers across computing communities to advance Social-AI. We anchor our discussion in the context of social intelligence concepts and prior progress in Social-AI research.

* Position Paper, Under Review, 19 pages, 2 figures

Via

Access Paper or Ask Questions

Improving Dialogue Agents by Decomposing One Global Explicit Annotation with Local Implicit Multimodal Feedback

Mar 17, 2024

Dong Won Lee, Hae Won Park, Yoon Kim, Cynthia Breazeal, Louis-Philippe Morency

Figure 1 for Improving Dialogue Agents by Decomposing One Global Explicit Annotation with Local Implicit Multimodal Feedback

Figure 2 for Improving Dialogue Agents by Decomposing One Global Explicit Annotation with Local Implicit Multimodal Feedback

Figure 3 for Improving Dialogue Agents by Decomposing One Global Explicit Annotation with Local Implicit Multimodal Feedback

Figure 4 for Improving Dialogue Agents by Decomposing One Global Explicit Annotation with Local Implicit Multimodal Feedback

Abstract:We describe an approach for aligning an LLM-based dialogue agent based on global (i.e., dialogue-level) rewards, while also taking into account naturally-occurring multimodal signals. At a high level, our approach (dubbed GELI) learns a local, turn-level reward model by decomposing the human-provided Global Explicit (GE) session-level reward, using Local Implicit (LI} multimodal reward signals to crossmodally shape the reward decomposition step. This decomposed reward model is then used as part of the standard RHLF pipeline improve an LLM-based dialog agent. We run quantitative and qualitative human studies to evaluate the performance of our GELI approach, and find that it shows consistent improvements across various conversational metrics compared to baseline methods.

* 9 pages, 3 figures, 2 tables

Via

Access Paper or Ask Questions

Optimizing Language Models for Human Preferences is a Causal Inference Problem

Feb 22, 2024

Victoria Lin, Eli Ben-Michael, Louis-Philippe Morency

Figure 1 for Optimizing Language Models for Human Preferences is a Causal Inference Problem

Figure 2 for Optimizing Language Models for Human Preferences is a Causal Inference Problem

Figure 3 for Optimizing Language Models for Human Preferences is a Causal Inference Problem

Figure 4 for Optimizing Language Models for Human Preferences is a Causal Inference Problem

Abstract:As large language models (LLMs) see greater use in academic and commercial settings, there is increasing interest in methods that allow language models to generate texts aligned with human preferences. In this paper, we present an initial exploration of language model optimization for human preferences from direct outcome datasets, where each sample consists of a text and an associated numerical outcome measuring the reader's response. We first propose that language model optimization should be viewed as a causal problem to ensure that the model correctly learns the relationship between the text and the outcome. We formalize this causal language optimization problem, and we develop a method--causal preference optimization (CPO)--that solves an unbiased surrogate objective for the problem. We further extend CPO with doubly robust CPO (DR-CPO), which reduces the variance of the surrogate objective while retaining provably strong guarantees on bias. Finally, we empirically demonstrate the effectiveness of (DR-)CPO in optimizing state-of-the-art LLMs for human preferences on direct outcome data, and we validate the robustness of DR-CPO under difficult confounding conditions.

Via

Access Paper or Ask Questions

Think Twice: Perspective-Taking Improves Large Language Models' Theory-of-Mind Capabilities

Nov 16, 2023

Alex Wilf, Sihyun Shawn Lee, Paul Pu Liang, Louis-Philippe Morency

Figure 1 for Think Twice: Perspective-Taking Improves Large Language Models' Theory-of-Mind Capabilities

Figure 2 for Think Twice: Perspective-Taking Improves Large Language Models' Theory-of-Mind Capabilities

Figure 3 for Think Twice: Perspective-Taking Improves Large Language Models' Theory-of-Mind Capabilities

Figure 4 for Think Twice: Perspective-Taking Improves Large Language Models' Theory-of-Mind Capabilities

Abstract:Human interactions are deeply rooted in the interplay of thoughts, beliefs, and desires made possible by Theory of Mind (ToM): our cognitive ability to understand the mental states of ourselves and others. Although ToM may come naturally to us, emulating it presents a challenge to even the most advanced Large Language Models (LLMs). Recent improvements to LLMs' reasoning capabilities from simple yet effective prompting techniques such as Chain-of-Thought have seen limited applicability to ToM. In this paper, we turn to the prominent cognitive science theory "Simulation Theory" to bridge this gap. We introduce SimToM, a novel two-stage prompting framework inspired by Simulation Theory's notion of perspective-taking. To implement this idea on current ToM benchmarks, SimToM first filters context based on what the character in question knows before answering a question about their mental state. Our approach, which requires no additional training and minimal prompt-tuning, shows substantial improvement over existing methods, and our analysis reveals the importance of perspective-taking to Theory-of-Mind capabilities. Our findings suggest perspective-taking as a promising direction for future research into improving LLMs' ToM capabilities.

Via

Access Paper or Ask Questions

MMOE: Mixture of Multimodal Interaction Experts

Nov 16, 2023

Haofei Yu, Paul Pu Liang, Ruslan Salakhutdinov, Louis-Philippe Morency

Figure 1 for MMOE: Mixture of Multimodal Interaction Experts

Figure 2 for MMOE: Mixture of Multimodal Interaction Experts

Figure 3 for MMOE: Mixture of Multimodal Interaction Experts

Figure 4 for MMOE: Mixture of Multimodal Interaction Experts

Abstract:Multimodal machine learning, which studies the information and interactions across various input modalities, has made significant advancements in understanding the relationship between images and descriptive text. However, this is just a portion of the potential multimodal interactions seen in the real world and does not include new interactions between conflicting utterances and gestures in predicting sarcasm, for example. Notably, the current methods for capturing shared information often do not extend well to these more nuanced interactions, sometimes performing as low as 50% in binary classification. In this paper, we address this problem via a new approach called MMOE, which stands for a mixture of multimodal interaction experts. Our method automatically classifies data points from unlabeled multimodal datasets by their interaction type and employs specialized models for each specific interaction. Based on our experiments, this approach improves performance on these challenging interactions by more than 10%, leading to an overall increase of 2% for tasks like sarcasm prediction. As a result, interaction quantification provides new insights for dataset analysis and yields simple approaches that obtain state-of-the-art performance.

Via

Access Paper or Ask Questions

MultiIoT: Towards Large-scale Multisensory Learning for the Internet of Things

Nov 10, 2023

Shentong Mo, Paul Pu Liang, Russ Salakhutdinov, Louis-Philippe Morency

Figure 1 for MultiIoT: Towards Large-scale Multisensory Learning for the Internet of Things

Figure 2 for MultiIoT: Towards Large-scale Multisensory Learning for the Internet of Things

Figure 3 for MultiIoT: Towards Large-scale Multisensory Learning for the Internet of Things

Figure 4 for MultiIoT: Towards Large-scale Multisensory Learning for the Internet of Things

Abstract:The Internet of Things (IoT), the network integrating billions of smart physical devices embedded with sensors, software, and communication technologies for the purpose of connecting and exchanging data with other devices and systems, is a critical and rapidly expanding component of our modern world. The IoT ecosystem provides a rich source of real-world modalities such as motion, thermal, geolocation, imaging, depth, sensors, video, and audio for prediction tasks involving the pose, gaze, activities, and gestures of humans as well as the touch, contact, pose, 3D of physical objects. Machine learning presents a rich opportunity to automatically process IoT data at scale, enabling efficient inference for impact in understanding human wellbeing, controlling physical devices, and interconnecting smart cities. To develop machine learning technologies for IoT, this paper proposes MultiIoT, the most expansive IoT benchmark to date, encompassing over 1.15 million samples from 12 modalities and 8 tasks. MultiIoT introduces unique challenges involving (1) learning from many sensory modalities, (2) fine-grained interactions across long temporal ranges, and (3) extreme heterogeneity due to unique structure and noise topologies in real-world sensors. We also release a set of strong modeling baselines, spanning modality and task-specific methods to multisensory and multitask models to encourage future research in multisensory representation learning for IoT.

Via

Access Paper or Ask Questions