The global landscape of art-technology institutions, including festivals, biennials, research labs, conferences, and hybrid organizations, has grown increasingly diverse, yet systematic frameworks for analyzing their multidimensional characteristics remain scarce. This paper proposes ARTLAS, a computational methodology combining an eight-axis conceptual framework (Curatorial Philosophy, Territorial Relation, Knowledge Production Mode, Institutional Genealogy, Temporal Orientation, Ecosystem Function, Audience Relation, and Disciplinary Positioning) with a text-embedding and clustering pipeline to map 78 cultural-technology institutions into a unified analytical space. Each institution is characterized through qualitative descriptions along the eight axes, encoded via E5-large-v2 sentence embeddings and quantized through a word-level codebook into TF-IDF feature vectors. Dimensionality reduction using UMAP, followed by agglomerative clustering (Average linkage, k=10), yields a composite score of 0.825, a silhouette coefficient of 0.803, and a Calinski-Harabasz index of 11,196. Non-negative matrix factorization extracts ten latent topics, and a neighbor-cluster entropy measure identifies boundary institutions bridging multiple thematic communities. An interactive web-based visualization tool built with React enables stakeholders to explore institutional similarities, thematic profiles, and cross-disciplinary connections. The results reveal coherent groupings such as an art-science hub cluster anchored by ZKM and ArtScience Museum, an innovation and industry cluster including Ars Electronica, transmediale, and Sonar, an ACM academic community cluster comprising TEI, DIS, and NIME, and an electronic music and media cluster including CTM Festival, MUTEK, and Sonic Acts. This work contributes a replicable, data-driven approach to institutional ecology in the cultural-technology sector.
Modular autonomous driving systems must coordinate global progress objectives with local safety-driven reactions under imperfect sensing and strict real-time constraints. This paper presents a ROS2-native arbitration module that continuously fuses the outputs of two unchanged and interpretable controllers: a global reference-tracking controller based on Pure Pursuit and a reactive LiDAR-based Gap Follow controller. At each control step, both controllers propose Ackermann commands, and a PPO-trained policy predicts a continuous gate from a compact feature observation to produce a single fused drive command, augmented with practical safety checks. For comparison under identical ROS topic inputs and control rate, we implement a lightweight sampling-based predictive baseline. Robustness is evaluated using a ROS2 impairment protocol that injects LiDAR noise, delay, and dropout, and additionally sweeps forward-cone false short-range outliers. In a repeatable close-proximity passing scenario, we report safe success and failure rates together with per-step end-to-end controller runtime as sensing stress increases. The study is intended as a command-level robustness evaluation in a modular ROS2 setting, not as a replacement for planning-level interaction reasoning.
Climate change is a major socio-scientific issue shapes public decision-making and policy discussions. As large language models (LLMs) increasingly serve as an interface for accessing climate knowledge, whether existing benchmarks reflect user needs is critical for evaluating LLM in real-world settings. We propose a Proactive Knowledge Behaviors Framework that captures the different human-human and human-AI knowledge seeking and provision behaviors. We further develop a Topic-Intent-Form taxonomy and apply it to analyze climate-related data representing different knowledge behaviors. Our results reveal a substantial mismatch between current benchmarks and real-world user needs, while knowledge interaction patterns between humans and LLMs closely resemble those in human-human interactions. These findings provide actionable guidance for benchmark design, RAG system development, and LLM training. Code is available at https://github.com/OuchengLiu/LLM-Misalign-Climate-Change.
While Late Interaction models exhibit strong retrieval performance, many of their underlying dynamics remain understudied, potentially hiding performance bottlenecks. In this work, we focus on two topics in Late Interaction retrieval: a length bias that arises when using multi-vector scoring, and the similarity distribution beyond the best scores pooled by the MaxSim operator. We analyze these behaviors for state-of-the-art models on the NanoBEIR benchmark. Results show that while the theoretical length bias of causal Late Interaction models holds in practice, bi-directional models can also suffer from it in extreme cases. We also note that no significant similarity trend lies beyond the top-1 document token, validating that the MaxSim operator efficiently exploits the token-level similarity scores.
We present FormalProofBench, a private benchmark designed to evaluate whether AI models can produce formally verified mathematical proofs at the graduate level. Each task pairs a natural-language problem with a Lean~4 formal statement, and a model must output a Lean proof accepted by the Lean 4 checker. FormalProofBench targets advanced undergraduate and graduate mathematics, with problems drawn from qualifying exams and standard textbooks across topics including analysis, algebra, probability, and logic. We evaluate a range of frontier models with an agentic harness, and find that the best-performing foundation model achieves 33.5% accuracy, with performance dropping rapidly after that. In addition to the accuracy numbers, we also provide empirical analysis of tool-use, failure modes, cost and latency, thereby providing a thorough evaluation of the formal-theorem proving abilities of frontier models.
Multimodal Emotion Recognition in Conversations (MERC) aims to predict speakers' emotional states in multi-turn dialogues through text, audio, and visual cues. In real-world settings, conversation scenarios differ significantly in speakers, topics, styles, and noise levels. Existing MERC methods generally neglect these cross-scenario variations, limiting their ability to transfer models trained on a source domain to unseen target domains. To address this issue, we propose a Dual-branch Graph Domain Adaptation framework (DGDA) for multimodal emotion recognition under cross-scenario conditions. We first construct an emotion interaction graph to characterize complex emotional dependencies among utterances. A dual-branch encoder, consisting of a hypergraph neural network (HGNN) and a path neural network (PathNN), is then designed to explicitly model multivariate relationships and implicitly capture global dependencies. To enable out-of-domain generalization, a domain adversarial discriminator is introduced to learn invariant representations across domains. Furthermore, a regularization loss is incorporated to suppress the negative influence of noisy labels. To the best of our knowledge, DGDA is the first MERC framework that jointly addresses domain shift and label noise. Theoretical analysis provides tighter generalization bounds, and extensive experiments on IEMOCAP and MELD demonstrate that DGDA consistently outperforms strong baselines and better adapts to cross-scenario conversations. Our code is available at https://github.com/Xudmm1239439/DGDA-Net.
Determining whether a piece of text is relevant to a given topic is a fundamental task in natural language processing, yet it remains largely unexplored for Bahasa Indonesia. Unlike sentiment analysis or named entity recognition, relevancy classification requires the model to reason about the relationship between two inputs simultaneously: a topical context and a candidate text. We introduce IndoBERT-Relevancy, a context-conditioned relevancy classifier built on IndoBERT Large (335M parameters) and trained on a novel dataset of 31,360 labeled pairs spanning 188 topics. Through an iterative, failure-driven data construction process, we demonstrate that no single data source is sufficient for robust relevancy classification, and that targeted synthetic data can effectively address specific model weaknesses. Our final model achieves an F1 score of 0.948 and an accuracy of 96.5%, handling both formal and informal Indonesian text. The model is publicly available at HuggingFace.
Large vision-language models have endowed GUI agents with strong general capabilities for interface understanding and interaction. However, due to insufficient exposure to domain-specific software operation data during training, these agents exhibit significant domain bias - they lack familiarity with the specific operation workflows (planning) and UI element layouts (grounding) of particular applications, limiting their real-world task performance. In this paper, we present GUIDE (GUI Unbiasing via Instructional-Video Driven Expertise), a training-free, plug-and-play framework that resolves GUI agent domain bias by autonomously acquiring domain-specific expertise from web tutorial videos through a retrieval-augmented automated annotation pipeline. GUIDE introduces two key innovations. First, a subtitle-driven Video-RAG pipeline unlocks video semantics through subtitle analysis, performing progressive three-stage retrieval - domain classification, topic extraction, and relevance matching - to identify task-relevant tutorial videos. Second, a fully automated annotation pipeline built on an inverse dynamics paradigm feeds consecutive keyframes enhanced with UI element detection into VLMs, inferring the required planning and grounding knowledge that are injected into the agent's corresponding modules to address both manifestations of domain bias. Extensive experiments on OSWorld demonstrate GUIDE's generality as a plug-and-play component for both multi-agent systems and single-model agents. It consistently yields over 5% improvements and reduces execution steps - without modifying any model parameters or architecture - validating GUIDE as an architecture-agnostic enhancement to bridge GUI agent domain bias.
This study presents a computational analysis of the Slovene historical newspapers \textit{Slovenec} and \textit{Slovenski narod} from the sPeriodika corpus, combining topic modelling, large language model (LLM)-based aspect-level sentiment analysis, entity-graph visualisation, and qualitative discourse analysis to examine how collective identities, political orientations, and national belonging were represented in public discourse at the turn of the twentieth century. Using BERTopic, we identify major thematic patterns and show both shared concerns and clear ideological differences between the two newspapers, reflecting their conservative-Catholic and liberal-progressive orientations. We further evaluate four instruction-following LLMs for targeted sentiment classification in OCR-degraded historical Slovene and select the Slovene-adapted GaMS3-12B-Instruct model as the most suitable for large-scale application, while also documenting important limitations, particularly its stronger performance on neutral sentiment than on positive or negative sentiment. Applied at dataset scale, the model reveals meaningful variation in the portrayal of collective identities, with some groups appearing predominantly in neutral descriptive contexts and others more often in evaluative or conflict-related discourse. We then create NER graphs to explore the relationships between collective identities and places. We apply a mixed methods approach to analyse the named entity graphs, combining quantitative network analysis with critical discourse analysis. The investigation focuses on the emergence and development of intertwined historical political and socionomic identities. Overall, the study demonstrates the value of combining scalable computational methods with critical interpretation to support digital humanities research on noisy historical newspaper data.
Reinforcement learning (RL), owing to its adaptability to various dynamic systems in many real-world scenarios and the capability of maximizing long-term outcomes under different constraints, has been used in infectious disease control to optimize the intervention strategies for controlling infectious disease spread and responding to outbreaks in recent years. The potential of RL for assisting public health sectors in preventing and controlling infectious diseases is gradually emerging and being explored by rapidly increasing publications relevant to COVID-19 and other infectious diseases. However, few surveys exclusively discuss this topic, that is, the development and application of RL approaches for optimizing strategies of non-pharmaceutical and pharmaceutical interventions of public health. Therefore, this paper aims to provide a concise review and discussion of the latest literature on how RL approaches have been used to assist in controlling the spread and outbreaks of infectious diseases, covering several critical topics addressing public health demands: resource allocation, balancing between lives and livelihoods, mixed policy of multiple interventions, and inter-regional coordinated control. Finally, we conclude the paper with a discussion of several potential directions for future research.