Abstract:Large language models (LLMs) increasingly participate in emotionally sensitive social conversations, where responses may shift from balanced support toward excessive validation or escalatory alignment. Existing sycophancy research primarily focuses on factual agreement and instruction-following settings, leaving culturally grounded conversational sycophancy underexplored. We introduce BenSyc, the first benchmark for studying conversational sycophancy in Bengali social contexts. Starting from 11,840 Reddit posts and 170k comments collected from communities across Bangladesh and West Bengal, we construct a human-validated benchmark with binary labels and a fine-grained five-level taxonomy spanning Invalidation, Neutral, Support, Validation, and Escalation. We evaluate more than 15 open and proprietary LLMs on conversational alignment classification and response generation tasks. Results show that distinguishing empathetic support from reinforcement-oriented validation remains challenging even for frontier instruction-tuned models: the best system achieves only 61.8 Macro-F1 on binary detection and 61.7 Macro-F1 on five-class classification. In generation settings, several models frequently produce strongly validating or escalatory responses in emotionally charged situations. Our findings highlight substantial variation across model families and conversational behaviors, underscoring the importance of culturally grounded multilingual benchmarks for evaluating socially aligned conversational AI systems.




Abstract:Parkinson's disease (PD) diagnosis remains challenging due to lacking a reliable biomarker and limited access to clinical care. In this study, we present an analysis of the largest video dataset containing micro-expressions to screen for PD. We collected 3,871 videos from 1,059 unique participants, including 256 self-reported PD patients. The recordings are from diverse sources encompassing participants' homes across multiple countries, a clinic, and a PD care facility in the US. Leveraging facial landmarks and action units, we extracted features relevant to Hypomimia, a prominent symptom of PD characterized by reduced facial expressions. An ensemble of AI models trained on these features achieved an accuracy of 89.7% and an Area Under the Receiver Operating Characteristic (AUROC) of 89.3% while being free from detectable bias across population subgroups based on sex and ethnicity on held-out data. Further analysis reveals that features from the smiling videos alone lead to comparable performance, even on two external test sets the model has never seen during training, suggesting the potential for PD risk assessment from smiling selfie videos.




Abstract:Seismic intensity prediction in a geographical area from early or initial seismic waves received by a few seismic stations is a critical component of an effective Earthquake Early Warning (EEW) system. State-of-the-art deep learning-based techniques for this task suffer from limited accuracy in the prediction and, more importantly, require input waveforms of a large time window from a handful number of seismic stations, which is not practical for EEW systems. To overcome the above limitations, in this paper, we propose a novel deep learning approach, Seismic Contrastive Graph Neural Network (SC-GNN) for highly accurate seismic intensity prediction using a small portion of initial seismic waveforms received by a few seismic stations. The SC-GNN comprises two key components: (i) a graph neural network (GNN) to propagate spatiotemporal information through the nodes of a graph-like structure of seismic station distribution and wave propagation, and (ii) a self-supervised contrastive learning component to train the model with larger time windows and make predictions using shorter initial waveforms. The efficacy of our proposed model is thoroughly evaluated through experiments on three real-world seismic datasets, showing superior performance over existing state-of-the-art techniques. In particular, the SC-GNN model demonstrates a substantial reduction in mean squared error (MSE) and the lowest standard deviation of the error, indicating its robustness, reliability, and a strong positive relationship between predicted and actual values. More importantly, the model maintains superior performance even with 5s input waveforms, making it particularly efficient for EEW systems.