Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Topic:Topic Modeling

What is Topic Modeling? Topic modeling is a type of statistical modeling for discovering the abstract topics that occur in a collection of documents.

Beyond Task-Specific Reasoning: A Unified Conditional Generative Framework for Abstract Visual Reasoning

Jul 15, 2025

Fan Shi, Bin Li, Xiangyang Xue

Figure 1 for Beyond Task-Specific Reasoning: A Unified Conditional Generative Framework for Abstract Visual Reasoning

Figure 2 for Beyond Task-Specific Reasoning: A Unified Conditional Generative Framework for Abstract Visual Reasoning

Figure 3 for Beyond Task-Specific Reasoning: A Unified Conditional Generative Framework for Abstract Visual Reasoning

Figure 4 for Beyond Task-Specific Reasoning: A Unified Conditional Generative Framework for Abstract Visual Reasoning

Abstract:Abstract visual reasoning (AVR) enables humans to quickly discover and generalize abstract rules to new scenarios. Designing intelligent systems with human-like AVR abilities has been a long-standing topic in the artificial intelligence community. Deep AVR solvers have recently achieved remarkable success in various AVR tasks. However, they usually use task-specific designs or parameters in different tasks. In such a paradigm, solving new tasks often means retraining the model, and sometimes retuning the model architectures, which increases the cost of solving AVR problems. In contrast to task-specific approaches, this paper proposes a novel Unified Conditional Generative Solver (UCGS), aiming to address multiple AVR tasks in a unified framework. First, we prove that some well-known AVR tasks can be reformulated as the problem of estimating the predictability of target images in problem panels. Then, we illustrate that, under the proposed framework, training one conditional generative model can solve various AVR tasks. The experiments show that with a single round of multi-task training, UCGS demonstrates abstract reasoning ability across various AVR tasks. Especially, UCGS exhibits the ability of zero-shot reasoning, enabling it to perform abstract reasoning on problems from unseen AVR tasks in the testing phase.

Via

Access Paper or Ask Questions

MapIQ: Benchmarking Multimodal Large Language Models for Map Question Answering

Jul 15, 2025

Varun Srivastava, Fan Lei, Srija Mukhopadhyay, Vivek Gupta, Ross Maciejewski

Abstract:Recent advancements in multimodal large language models (MLLMs) have driven researchers to explore how well these models read data visualizations, e.g., bar charts, scatter plots. More recently, attention has shifted to visual question answering with maps (Map-VQA). However, Map-VQA research has primarily focused on choropleth maps, which cover only a limited range of thematic categories and visual analytical tasks. To address these gaps, we introduce MapIQ, a benchmark dataset comprising 14,706 question-answer pairs across three map types: choropleth maps, cartograms, and proportional symbol maps spanning topics from six distinct themes (e.g., housing, crime). We evaluate multiple MLLMs using six visual analytical tasks, comparing their performance against one another and a human baseline. An additional experiment examining the impact of map design changes (e.g., altered color schemes, modified legend designs, and removal of map elements) provides insights into the robustness and sensitivity of MLLMs, their reliance on internal geographic knowledge, and potential avenues for improving Map-VQA performance.

* Published as a conference paper at COLM 2025

Via

Access Paper or Ask Questions

Collaborative Prediction: To Join or To Disjoin Datasets

Jun 12, 2025

Kyung Rok Kim, Yansong Wang, Xiaocheng Li, Guanting Chen

Abstract:With the recent rise of generative Artificial Intelligence (AI), the need of selecting high-quality dataset to improve machine learning models has garnered increasing attention. However, some part of this topic remains underexplored, even for simple prediction models. In this work, we study the problem of developing practical algorithms that select appropriate dataset to minimize population loss of our prediction model with high probability. Broadly speaking, we investigate when datasets from different sources can be effectively merged to enhance the predictive model's performance, and propose a practical algorithm with theoretical guarantees. By leveraging an oracle inequality and data-driven estimators, the algorithm reduces population loss with high probability. Numerical experiments demonstrate its effectiveness in both standard linear regression and broader machine learning applications. Code is available at https://github.com/kkrokii/collaborative_prediction.

* To be published in the 41st Conference on Uncertainty in Artificial Intelligence (UAI 2025)

Via

Access Paper or Ask Questions

Hallucination Stations: On Some Basic Limitations of Transformer-Based Language Models

Jul 10, 2025

Varin Sikka, Vishal Sikka

Abstract:With widespread adoption of transformer-based language models in AI, there is significant interest in the limits of LLMs capabilities, specifically so-called hallucinations, occurrences in which LLMs provide spurious, factually incorrect or nonsensical information when prompted on certain subjects. Furthermore, there is growing interest in agentic uses of LLMs - that is, using LLMs to create agents that act autonomously or semi-autonomously to carry out various tasks, including tasks with applications in the real world. This makes it important to understand the types of tasks LLMs can and cannot perform. We explore this topic from the perspective of the computational complexity of LLM inference. We show that LLMs are incapable of carrying out computational and agentic tasks beyond a certain complexity, and further that LLMs are incapable of verifying the accuracy of tasks beyond a certain complexity. We present examples of both, then discuss some consequences of this work.

* 6 pages; to be submitted to AAAI-26 after reviews

Via

Access Paper or Ask Questions

Distribution-free inference for LightGBM and GLM with Tweedie loss

Jul 09, 2025

Alokesh Manna, Aditya Vikram Sett, Dipak K. Dey, Yuwen Gu, Elizabeth D. Schifano, Jichao He

Abstract:Prediction uncertainty quantification is a key research topic in recent years scientific and business problems. In insurance industries (\cite{parodi2023pricing}), assessing the range of possible claim costs for individual drivers improves premium pricing accuracy. It also enables insurers to manage risk more effectively by accounting for uncertainty in accident likelihood and severity. In the presence of covariates, a variety of regression-type models are often used for modeling insurance claims, ranging from relatively simple generalized linear models (GLMs) to regularized GLMs to gradient boosting models (GBMs). Conformal predictive inference has arisen as a popular distribution-free approach for quantifying predictive uncertainty under relatively weak assumptions of exchangeability, and has been well studied under the classic linear regression setting. In this work, we propose new non-conformity measures for GLMs and GBMs with GLM-type loss. Using regularized Tweedie GLM regression and LightGBM with Tweedie loss, we demonstrate conformal prediction performance with these non-conformity measures in insurance claims data. Our simulation results favor the use of locally weighted Pearson residuals for LightGBM over other methods considered, as the resulting intervals maintained the nominal coverage with the smallest average width.

Via

Access Paper or Ask Questions

Exploring Cultural Variations in Moral Judgments with Large Language Models

Jun 14, 2025

Hadi Mohammadi, Efthymia Papadopoulou, Yasmeen F. S. S. Meijer, Ayoub Bagheri

Abstract:Large Language Models (LLMs) have shown strong performance across many tasks, but their ability to capture culturally diverse moral values remains unclear. In this paper, we examine whether LLMs can mirror variations in moral attitudes reported by two major cross-cultural surveys: the World Values Survey and the PEW Research Center's Global Attitudes Survey. We compare smaller, monolingual, and multilingual models (GPT-2, OPT, BLOOMZ, and Qwen) with more recent instruction-tuned models (GPT-4o, GPT-4o-mini, Gemma-2-9b-it, and Llama-3.3-70B-Instruct). Using log-probability-based moral justifiability scores, we correlate each model's outputs with survey data covering a broad set of ethical topics. Our results show that many earlier or smaller models often produce near-zero or negative correlations with human judgments. In contrast, advanced instruction-tuned models (including GPT-4o and GPT-4o-mini) achieve substantially higher positive correlations, suggesting they better reflect real-world moral attitudes. While scaling up model size and using instruction tuning can improve alignment with cross-cultural moral norms, challenges remain for certain topics and regions. We discuss these findings in relation to bias analysis, training data diversity, and strategies for improving the cultural sensitivity of LLMs.

Via

Access Paper or Ask Questions

A Neural Operator based on Dynamic Mode Decomposition

Jul 01, 2025

Nikita Sakovich, Dmitry Aksenov, Ekaterina Pleshakova, Sergey Gataullin

Abstract:The scientific computation methods development in conjunction with artificial intelligence technologies remains a hot research topic. Finding a balance between lightweight and accurate computations is a solid foundation for this direction. The study presents a neural operator based on the dynamic mode decomposition algorithm (DMD), mapping functional spaces, which combines DMD and deep learning (DL) for spatiotemporal processes efficient modeling. Solving PDEs for various initial and boundary conditions requires significant computational resources. The method suggested automatically extracts key modes and system dynamics using them to construct predictions, reducing computational costs compared to traditional numerical methods. The approach has demonstrated its efficiency through comparative analysis of performance with closest analogues DeepONet and FNO in the heat equation, Laplaces equation, and Burgers equation solutions approximation, where it achieves high reconstruction accuracy.

* 30 pages, 10 figures

Via

Access Paper or Ask Questions

WikiMixQA: A Multimodal Benchmark for Question Answering over Tables and Charts

Jun 18, 2025

Negar Foroutan, Angelika Romanou, Matin Ansaripour, Julian Martin Eisenschlos, Karl Aberer, Rémi Lebret

Abstract:Documents are fundamental to preserving and disseminating information, often incorporating complex layouts, tables, and charts that pose significant challenges for automatic document understanding (DU). While vision-language large models (VLLMs) have demonstrated improvements across various tasks, their effectiveness in processing long-context vision inputs remains unclear. This paper introduces WikiMixQA, a benchmark comprising 1,000 multiple-choice questions (MCQs) designed to evaluate cross-modal reasoning over tables and charts extracted from 4,000 Wikipedia pages spanning seven distinct topics. Unlike existing benchmarks, WikiMixQA emphasizes complex reasoning by requiring models to synthesize information from multiple modalities. We evaluate 12 state-of-the-art vision-language models, revealing that while proprietary models achieve ~70% accuracy when provided with direct context, their performance deteriorates significantly when retrieval from long documents is required. Among these, GPT-4-o is the only model exceeding 50% accuracy in this setting, whereas open-source models perform considerably worse, with a maximum accuracy of 27%. These findings underscore the challenges of long-context, multi-modal reasoning and establish WikiMixQA as a crucial benchmark for advancing document understanding research.

* ACL 2025 (Findings)

Via

Access Paper or Ask Questions

AIn't Nothing But a Survey? Using Large Language Models for Coding German Open-Ended Survey Responses on Survey Motivation

Jun 18, 2025

Leah von der Heyde, Anna-Carolina Haensch, Bernd Weiß, Jessica Daikeler

Abstract:The recent development and wider accessibility of LLMs have spurred discussions about how they can be used in survey research, including classifying open-ended survey responses. Due to their linguistic capacities, it is possible that LLMs are an efficient alternative to time-consuming manual coding and the pre-training of supervised machine learning models. As most existing research on this topic has focused on English-language responses relating to non-complex topics or on single LLMs, it is unclear whether its findings generalize and how the quality of these classifications compares to established methods. In this study, we investigate to what extent different LLMs can be used to code open-ended survey responses in other contexts, using German data on reasons for survey participation as an example. We compare several state-of-the-art LLMs and several prompting approaches, and evaluate the LLMs' performance by using human expert codings. Overall performance differs greatly between LLMs, and only a fine-tuned LLM achieves satisfactory levels of predictive performance. Performance differences between prompting approaches are conditional on the LLM used. Finally, LLMs' unequal classification performance across different categories of reasons for survey participation results in different categorical distributions when not using fine-tuning. We discuss the implications of these findings, both for methodological research on coding open-ended responses and for their substantive analysis, and for practitioners processing or substantively analyzing such data. Finally, we highlight the many trade-offs researchers need to consider when choosing automated methods for open-ended response classification in the age of LLMs. In doing so, our study contributes to the growing body of research about the conditions under which LLMs can be efficiently, accurately, and reliably leveraged in survey research.

* to appear in Survey Research Methods

Via

Access Paper or Ask Questions

Running-time Analysis of ($μ+λ$) Evolutionary Combinatorial Optimization Based on Multiple-gain Estimation

Jul 03, 2025

Min Huang, Pengxiang Chen, Han Huang, Tongli He, Yushan Zhang, Zhifeng Hao

Abstract:The running-time analysis of evolutionary combinatorial optimization is a fundamental topic in evolutionary computation. However, theoretical results regarding the $(\mu+\lambda)$ evolutionary algorithm (EA) for combinatorial optimization problems remain relatively scarce compared to those for simple pseudo-Boolean problems. This paper proposes a multiple-gain model to analyze the running time of EAs for combinatorial optimization problems. The proposed model is an improved version of the average gain model, which is a fitness-difference drift approach under the sigma-algebra condition to estimate the running time of evolutionary numerical optimization. The improvement yields a framework for estimating the expected first hitting time of a stochastic process in both average-case and worst-case scenarios. It also introduces novel running-time results of evolutionary combinatorial optimization, including two tighter time complexity upper bounds than the known results in the case of ($\mu+\lambda$) EA for the knapsack problem with favorably correlated weights, a closed-form expression of time complexity upper bound in the case of ($\mu+\lambda$) EA for general $k$-MAX-SAT problems and a tighter time complexity upper bounds than the known results in the case of ($\mu+\lambda$) EA for the traveling salesperson problem. Experimental results indicate that the practical running time aligns with the theoretical results, verifying that the multiple-gain model is an effective tool for running-time analysis of ($\mu+\lambda$) EA for combinatorial optimization problems.

Via

Access Paper or Ask Questions

Topic:Topic Modeling

Papers and Code