Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Bin Han

Can Large Language Models Integrate Spatial Data? Empirical Insights into Reasoning Strengths and Computational Weaknesses

Aug 07, 2025

Bin Han, Robert Wolfe, Anat Caspi, Bill Howe

Abstract:We explore the application of large language models (LLMs) to empower domain experts in integrating large, heterogeneous, and noisy urban spatial datasets. Traditional rule-based integration methods are unable to cover all edge cases, requiring manual verification and repair. Machine learning approaches require collecting and labeling of large numbers of task-specific samples. In this study, we investigate the potential of LLMs for spatial data integration. Our analysis first considers how LLMs reason about environmental spatial relationships mediated by human experience, such as between roads and sidewalks. We show that while LLMs exhibit spatial reasoning capabilities, they struggle to connect the macro-scale environment with the relevant computational geometry tasks, often producing logically incoherent responses. But when provided relevant features, thereby reducing dependence on spatial reasoning, LLMs are able to generate high-performing results. We then adapt a review-and-refine method, which proves remarkably effective in correcting erroneous initial responses while preserving accurate responses. We discuss practical implications of employing LLMs for spatial data integration in real-world contexts and outline future research directions, including post-training, multi-modal integration methods, and support for diverse data formats. Our findings position LLMs as a promising and flexible alternative to traditional rule-based heuristics, advancing the capabilities of adaptive spatial data integration.

Via

Access Paper or Ask Questions

Lightweight Node Selection in Hexagonal Grid Topology for TDoA-Based UAV Localization

Jun 17, 2025

Zexin Fang, Bin Han, Wenwen Chen, Hans D. Schotten

Abstract:This paper investigates the optimization problem for TDoA-based UAV localization in low-altitude urban environments with hexagonal grid node deployment. We derive a lightweight optimized node selection strategy based on only RSSI measurements, to pre-select optimal nodes, avoiding extensive TDoA measurements in energy-constrained UAV scenarios. Theoretical and simulation results demonstrate that dynamically selecting the number of reference nodes improves localization performance while minimizing resource overhead.

* Submitted to GLOBECOM 2025 WKSHPS

Via

Access Paper or Ask Questions

Why Not Replace? Sustaining Long-Term Visual Localization via Handcrafted-Learned Feature Collaboration on CPU

May 24, 2025

Yicheng Lin, Yunlong Jiang, Xujia Jiao, Bin Han

Abstract:Robust long-term visual localization in complex industrial environments is critical for mobile robotic systems. Existing approaches face limitations: handcrafted features are illumination-sensitive, learned features are computationally intensive, and semantic- or marker-based methods are environmentally constrained. Handcrafted and learned features share similar representations but differ functionally. Handcrafted features are optimized for continuous tracking, while learned features excel in wide-baseline matching. Their complementarity calls for integration rather than replacement. Building on this, we propose a hierarchical localization framework. It leverages real-time handcrafted feature extraction for relative pose estimation. In parallel, it employs selective learned keypoint detection on optimized keyframes for absolute positioning. This design enables CPU-efficient, long-term visual localization. Experiments systematically progress through three validation phases: Initially establishing feature complementarity through comparative analysis, followed by computational latency profiling across algorithm stages on CPU platforms. Final evaluation under photometric variations (including seasonal transitions and diurnal cycles) demonstrates 47% average error reduction with significantly improved localization consistency. The code implementation is publicly available at https://github.com/linyicheng1/ORB_SLAM3_localization.

* 8 pages, 6 gifures

Via

Access Paper or Ask Questions

MMMG: a Comprehensive and Reliable Evaluation Suite for Multitask Multimodal Generation

May 23, 2025

Jihan Yao, Yushi Hu, Yujie Yi, Bin Han, Shangbin Feng, Guang Yang, Bingbing Wen, Ranjay Krishna, Lucy Lu Wang, Yulia Tsvetkov(+2 more)

Abstract:Automatically evaluating multimodal generation presents a significant challenge, as automated metrics often struggle to align reliably with human evaluation, especially for complex tasks that involve multiple modalities. To address this, we present MMMG, a comprehensive and human-aligned benchmark for multimodal generation across 4 modality combinations (image, audio, interleaved text and image, interleaved text and audio), with a focus on tasks that present significant challenges for generation models, while still enabling reliable automatic evaluation through a combination of models and programs. MMMG encompasses 49 tasks (including 29 newly developed ones), each with a carefully designed evaluation pipeline, and 937 instructions to systematically assess reasoning, controllability, and other key capabilities of multimodal generation models. Extensive validation demonstrates that MMMG is highly aligned with human evaluation, achieving an average agreement of 94.3%. Benchmarking results on 24 multimodal generation models reveal that even though the state-of-the-art model, GPT Image, achieves 78.3% accuracy for image generation, it falls short on multimodal reasoning and interleaved generation. Furthermore, results suggest considerable headroom for improvement in audio generation, highlighting an important direction for future research.

Via

Access Paper or Ask Questions

Fragments to Facts: Partial-Information Fragment Inference from LLMs

May 20, 2025

Lucas Rosenblatt, Bin Han, Robert Wolfe, Bill Howe

Abstract:Large language models (LLMs) can leak sensitive training data through memorization and membership inference attacks. Prior work has primarily focused on strong adversarial assumptions, including attacker access to entire samples or long, ordered prefixes, leaving open the question of how vulnerable LLMs are when adversaries have only partial, unordered sample information. For example, if an attacker knows a patient has "hypertension," under what conditions can they query a model fine-tuned on patient data to learn the patient also has "osteoarthritis?" In this paper, we introduce a more general threat model under this weaker assumption and show that fine-tuned LLMs are susceptible to these fragment-specific extraction attacks. To systematically investigate these attacks, we propose two data-blind methods: (1) a likelihood ratio attack inspired by methods from membership inference, and (2) a novel approach, PRISM, which regularizes the ratio by leveraging an external prior. Using examples from both medical and legal settings, we show that both methods are competitive with a data-aware baseline classifier that assumes access to labeled in-distribution data, underscoring their robustness.

Via

Access Paper or Ask Questions

How Cyclic Acoustic Patterns Influence ASMR Perception: A Signal Processing Perspective

Apr 02, 2025

Zexin Fang, Bin Han, Henrik H. Sveen, C. Clark Cao, Hans D. Schotten

Figure 1 for How Cyclic Acoustic Patterns Influence ASMR Perception: A Signal Processing Perspective

Figure 2 for How Cyclic Acoustic Patterns Influence ASMR Perception: A Signal Processing Perspective

Figure 3 for How Cyclic Acoustic Patterns Influence ASMR Perception: A Signal Processing Perspective

Figure 4 for How Cyclic Acoustic Patterns Influence ASMR Perception: A Signal Processing Perspective

Abstract:Autonomous Sensory Meridian Response (ASMR) has been remarkably popular in the recent decade. While its effect has been validated through behavioral studies and neuro-physiological measurements such as electroencephalography (EEG) and related bio-signal analyses, its development and triggers remain a subject of debate. Previous studies suggest that its triggers are highly linked with cyclic patterns: predictable patterns introduce relaxation while variations maintain intrigue. To validate this and further understand the impact of acoustic features on ASMR effects, we designed three distinct cyclic patterns with monophonic and stereophonic variations, while controlling their predictability and randomness, and collected ASMR triggering scores through online surveys. Then, we extracted cyclic features and carried out regression analysis, seeking an explainable mapping of cyclic features and ASMR triggers. We found that relaxing effects accumulate progressively and are independent of spatial orientation. Cyclic patterns significantly influence psychological and physical effects, which remain invariant with time. Regression analysis revealed that smoothly spread and energy-dense cyclic patterns most effectively trigger ASMR responses.

* Submitted to IEEE Signal Processing Letters

Via

Access Paper or Ask Questions

Buyer-Initiated Auction Mechanism for Data Redemption in Machine Unlearning

Apr 01, 2025

Bin Han, Di Feng, Jie Wang, Hans D. Schotten

Figure 1 for Buyer-Initiated Auction Mechanism for Data Redemption in Machine Unlearning

Figure 2 for Buyer-Initiated Auction Mechanism for Data Redemption in Machine Unlearning

Figure 3 for Buyer-Initiated Auction Mechanism for Data Redemption in Machine Unlearning

Figure 4 for Buyer-Initiated Auction Mechanism for Data Redemption in Machine Unlearning

Abstract:The rapid growth of artificial intelligence (AI) has raised privacy concerns over user data, leading to regulations like the General Data Protection Regulation (GDPR) and the California Consumer Privacy Act (CCPA). With the essential toolbox provided by machine unlearning, AI service providers are now able to remove user data from their trained models as well as the training datasets, so as to comply with such regulations. However, extensive data redemption can be costly and degrade model accuracy. To balance the cost of unlearning and the privacy protection, we propose a buyer-initiated auction mechanism for data redemption, enabling the service provider to purchase data from willing users with appropriate compensation. This approach does not require the server to have any a priori knowledge about the users' privacy preference, and provides an efficient solution for maximizing the social welfare in the investigated problem.

* Submitted to IEEE GLOBECOM 2025

Via

Access Paper or Ask Questions

Quantum Machine Learning in Log-based Anomaly Detection: Challenges and Opportunities

Dec 18, 2024

Jiaxing Qi, Chang Zeng, Zhongzhi Luan, Shaohan Huang, Shu Yang, Yao Lu, Bin Han, Hailong Yang, Depei Qian

Abstract:Log-based anomaly detection (LogAD) is the main component of Artificial Intelligence for IT Operations (AIOps), which can detect anomalous that occur during the system on-the-fly. Existing methods commonly extract log sequence features using classical machine learning techniques to identify whether a new sequence is an anomaly or not. However, these classical approaches often require trade-offs between efficiency and accuracy. The advent of quantum machine learning (QML) offers a promising alternative. By transforming parts of classical machine learning computations into parameterized quantum circuits (PQCs), QML can significantly reduce the number of trainable parameters while maintaining accuracy comparable to classical counterparts. In this work, we introduce a unified framework, \ourframework{}, for evaluating QML models in the context of LogAD. This framework incorporates diverse log data, integrated QML models, and comprehensive evaluation metrics. State-of-the-art methods such as DeepLog, LogAnomaly, and LogRobust, along with their quantum-transformed counterparts, are included in our framework.Beyond standard metrics like F1 score, precision, and recall, our evaluation extends to factors critical to QML performance, such as specificity, the number of circuits, circuit design, and quantum state encoding. Using \ourframework{}, we conduct extensive experiments to assess the performance of these models and their quantum counterparts, uncovering valuable insights and paving the way for future research in QML model selection and design for LogAD.

Via

Access Paper or Ask Questions

Privacy Protection Framework against Unauthorized Sensing in the 5.8 GHz ISM Band

Nov 08, 2024

Zexin Fang, Bin Han, Hans D. Schotten

Abstract:Unauthorized sensing activities pose an increasing threat to individual privacy, particularly in the industrial, scientific, and medical (ISM) band where regulatory frameworks remain limited. This paper presents a novel signal process methodology to monitor and counter unauthorized sensing activities. Specifically, we model the pedestrian trajectories as a random process. Then, we leverage the Cram\'er-Rao bound (CRB) to evaluate sensing performance and model it as sampling error of such a random process. Through simulation, we verify the accuracy of monitoring unauthorized sensing activities in urban scenarios, and validate the effectiveness of corresponding mitigation strategies.

* Submitted to ICC 2025

Via

Access Paper or Ask Questions

Towards Zero-Shot Annotation of the Built Environment with Vision-Language Models (Vision Paper)

Aug 01, 2024

Bin Han, Yiwei Yang, Anat Caspi, Bill Howe

Abstract:Equitable urban transportation applications require high-fidelity digital representations of the built environment: not just streets and sidewalks, but bike lanes, marked and unmarked crossings, curb ramps and cuts, obstructions, traffic signals, signage, street markings, potholes, and more. Direct inspections and manual annotations are prohibitively expensive at scale. Conventional machine learning methods require substantial annotated training data for adequate performance. In this paper, we consider vision language models as a mechanism for annotating diverse urban features from satellite images, reducing the dependence on human annotation to produce large training sets. While these models have achieved impressive results in describing common objects in images captured from a human perspective, their training sets are less likely to include strong signals for esoteric features in the built environment, and their performance in these settings is therefore unclear. We demonstrate proof-of-concept combining a state-of-the-art vision language model and variants of a prompting strategy that asks the model to consider segmented elements independently of the original image. Experiments on two urban features -- stop lines and raised tables -- show that while direct zero-shot prompting correctly annotates nearly zero images, the pre-segmentation strategies can annotate images with near 40% intersection-over-union accuracy. We describe how these results inform a new research agenda in automatic annotation of the built environment to improve equity, accessibility, and safety at broad scale and in diverse environments.

Via

Access Paper or Ask Questions