Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Weihao Li

SmokeBench: Evaluating Multimodal Large Language Models for Wildfire Smoke Detection

Dec 12, 2025

Tianye Qi, Weihao Li, Nick Barnes

Figure 1 for SmokeBench: Evaluating Multimodal Large Language Models for Wildfire Smoke Detection

Figure 2 for SmokeBench: Evaluating Multimodal Large Language Models for Wildfire Smoke Detection

Figure 3 for SmokeBench: Evaluating Multimodal Large Language Models for Wildfire Smoke Detection

Figure 4 for SmokeBench: Evaluating Multimodal Large Language Models for Wildfire Smoke Detection

Abstract:Wildfire smoke is transparent, amorphous, and often visually confounded with clouds, making early-stage detection particularly challenging. In this work, we introduce a benchmark, called SmokeBench, to evaluate the ability of multimodal large language models (MLLMs) to recognize and localize wildfire smoke in images. The benchmark consists of four tasks: (1) smoke classification, (2) tile-based smoke localization, (3) grid-based smoke localization, and (4) smoke detection. We evaluate several MLLMs, including Idefics2, Qwen2.5-VL, InternVL3, Unified-IO 2, Grounding DINO, GPT-4o, and Gemini-2.5 Pro. Our results show that while some models can classify the presence of smoke when it covers a large area, all models struggle with accurate localization, especially in the early stages. Further analysis reveals that smoke volume is strongly correlated with model performance, whereas contrast plays a comparatively minor role. These findings highlight critical limitations of current MLLMs for safety-critical wildfire monitoring and underscore the need for methods that improve early-stage smoke localization.

* Accepted to WACV 2026

Via

Access Paper or Ask Questions

SWE-Compass: Towards Unified Evaluation of Agentic Coding Abilities for Large Language Models

Nov 07, 2025

Jingxuan Xu, Ken Deng, Weihao Li, Songwei Yu, Huaixi Tang, Haoyang Huang, Zhiyi Lai, Zizheng Zhan, Yanan Wu, Chenchen Zhang(+26 more)

Abstract:Evaluating large language models (LLMs) for software engineering has been limited by narrow task coverage, language bias, and insufficient alignment with real-world developer workflows. Existing benchmarks often focus on algorithmic problems or Python-centric bug fixing, leaving critical dimensions of software engineering underexplored. To address these gaps, we introduce SWE-Compass1, a comprehensive benchmark that unifies heterogeneous code-related evaluations into a structured and production-aligned framework. SWE-Compass spans 8 task types, 8 programming scenarios, and 10 programming languages, with 2000 high-quality instances curated from authentic GitHub pull requests and refined through systematic filtering and validation. We benchmark ten state-of-the-art LLMs under two agentic frameworks, SWE-Agent and Claude Code, revealing a clear hierarchy of difficulty across task types, languages, and scenarios. Moreover, by aligning evaluation with real-world developer practices, SWE-Compass provides a rigorous and reproducible foundation for diagnosing and advancing agentic coding capabilities in large language models.

Via

Access Paper or Ask Questions

NoiseSDF2NoiseSDF: Learning Clean Neural Fields from Noisy Supervision

Jul 18, 2025

Tengkai Wang, Weihao Li, Ruikai Cui, Shi Qiu, Nick Barnes

Abstract:Reconstructing accurate implicit surface representations from point clouds remains a challenging task, particularly when data is captured using low-quality scanning devices. These point clouds often contain substantial noise, leading to inaccurate surface reconstructions. Inspired by the Noise2Noise paradigm for 2D images, we introduce NoiseSDF2NoiseSDF, a novel method designed to extend this concept to 3D neural fields. Our approach enables learning clean neural SDFs directly from noisy point clouds through noisy supervision by minimizing the MSE loss between noisy SDF representations, allowing the network to implicitly denoise and refine surface estimations. We evaluate the effectiveness of NoiseSDF2NoiseSDF on benchmarks, including the ShapeNet, ABC, Famous, and Real datasets. Experimental results demonstrate that our framework significantly improves surface reconstruction quality from noisy inputs.

* 14 pages, 4 figures

Via

Access Paper or Ask Questions

Pushing DSP-Free Coherent Interconnect to the Last Inch by Optically Analog Signal Processing

Mar 14, 2025

Mingming Zhang, Haoze Du, Xuefeng Wang, Junda Chen, Weihao Li, Zihe Hu, Yizhao Chen, Can Zhao, Hao Wu, Jiajun Zhou(+3 more)

Figure 1 for Pushing DSP-Free Coherent Interconnect to the Last Inch by Optically Analog Signal Processing

Figure 2 for Pushing DSP-Free Coherent Interconnect to the Last Inch by Optically Analog Signal Processing

Figure 3 for Pushing DSP-Free Coherent Interconnect to the Last Inch by Optically Analog Signal Processing

Figure 4 for Pushing DSP-Free Coherent Interconnect to the Last Inch by Optically Analog Signal Processing

Abstract:To support the boosting interconnect capacity of the AI-related data centers, novel techniques enabled high-speed and low-cost optics are continuously emerging. When the baud rate approaches 200 GBaud per lane, the bottle-neck of traditional intensity modulation direct detection (IM-DD) architectures becomes increasingly evident. The simplified coherent solutions are widely discussed and considered as one of the most promising candidates. In this paper, a novel coherent architecture based on self-homodyne coherent detection and optically analog signal processing (OASP) is demonstrated. Proved by experiment, the first DSP-free baud-rate sampled 64-GBaud QPSK/16-QAM receptions are achieved, with BERs of 1e-6 and 2e-2, respectively. Even with 1-km fiber link propagation, the BER for QPSK reception remains at 3.6e-6. When an ultra-simple 1-sps SISO filter is utilized, the performance degradation of the proposed scheme is less than 1 dB compared to legacy DSP-based coherent reception. The proposed results pave the way for the ultra-high-speed coherent optical interconnections, offering high power and cost efficiency.

Via

Access Paper or Ask Questions

Knowing Where to Focus: Attention-Guided Alignment for Text-based Person Search

Dec 19, 2024

Lei Tan, Weihao Li, Pingyang Dai, Jie Chen, Liujuan Cao, Rongrong Ji

Figure 1 for Knowing Where to Focus: Attention-Guided Alignment for Text-based Person Search

Figure 2 for Knowing Where to Focus: Attention-Guided Alignment for Text-based Person Search

Figure 3 for Knowing Where to Focus: Attention-Guided Alignment for Text-based Person Search

Figure 4 for Knowing Where to Focus: Attention-Guided Alignment for Text-based Person Search

Abstract:In the realm of Text-Based Person Search (TBPS), mainstream methods aim to explore more efficient interaction frameworks between text descriptions and visual data. However, recent approaches encounter two principal challenges. Firstly, the widely used random-based Masked Language Modeling (MLM) considers all the words in the text equally during training. However, massive semantically vacuous words ('with', 'the', etc.) be masked fail to contribute efficient interaction in the cross-modal MLM and hampers the representation alignment. Secondly, manual descriptions in TBPS datasets are tedious and inevitably contain several inaccuracies. To address these issues, we introduce an Attention-Guided Alignment (AGA) framework featuring two innovative components: Attention-Guided Mask (AGM) Modeling and Text Enrichment Module (TEM). AGM dynamically masks semantically meaningful words by aggregating the attention weight derived from the text encoding process, thereby cross-modal MLM can capture information related to the masked word from text context and images and align their representations. Meanwhile, TEM alleviates low-quality representations caused by repetitive and erroneous text descriptions by replacing those semantically meaningful words with MLM's prediction. It not only enriches text descriptions but also prevents overfitting. Extensive experiments across three challenging benchmarks demonstrate the effectiveness of our AGA, achieving new state-of-the-art results with Rank-1 accuracy reaching 78.36%, 67.31%, and 67.4% on CUHK-PEDES, ICFG-PEDES, and RSTPReid, respectively.

Via

Access Paper or Ask Questions

Automated Assessment of Residual Plots with Computer Vision Models

Nov 01, 2024

Weihao Li, Dianne Cook, Emi Tanaka, Susan VanderPlas, Klaus Ackermann

Abstract:Plotting the residuals is a recommended procedure to diagnose deviations from linear model assumptions, such as non-linearity, heteroscedasticity, and non-normality. The presence of structure in residual plots can be tested using the lineup protocol to do visual inference. There are a variety of conventional residual tests, but the lineup protocol, used as a statistical test, performs better for diagnostic purposes because it is less sensitive and applies more broadly to different types of departures. However, the lineup protocol relies on human judgment which limits its scalability. This work presents a solution by providing a computer vision model to automate the assessment of residual plots. It is trained to predict a distance measure that quantifies the disparity between the residual distribution of a fitted classical normal linear regression model and the reference distribution, based on Kullback-Leibler divergence. From extensive simulation studies, the computer vision model exhibits lower sensitivity than conventional tests but higher sensitivity than human visual tests. It is slightly less effective on non-linearity patterns. Several examples from classical papers and contemporary data illustrate the new procedures, highlighting its usefulness in automating the diagnostic process and supplementing existing methods.

Via

Access Paper or Ask Questions

SDI-Paste: Synthetic Dynamic Instance Copy-Paste for Video Instance Segmentation

Oct 16, 2024

Sahir Shrestha, Weihao Li, Gao Zhu, Nick Barnes

Figure 1 for SDI-Paste: Synthetic Dynamic Instance Copy-Paste for Video Instance Segmentation

Figure 2 for SDI-Paste: Synthetic Dynamic Instance Copy-Paste for Video Instance Segmentation

Figure 3 for SDI-Paste: Synthetic Dynamic Instance Copy-Paste for Video Instance Segmentation

Figure 4 for SDI-Paste: Synthetic Dynamic Instance Copy-Paste for Video Instance Segmentation

Abstract:Data augmentation methods such as Copy-Paste have been studied as effective ways to expand training datasets while incurring minimal costs. While such methods have been extensively implemented for image level tasks, we found no scalable implementation of Copy-Paste built specifically for video tasks. In this paper, we leverage the recent growth in video fidelity of generative models to explore effective ways of incorporating synthetically generated objects into existing video datasets to artificially expand object instance pools. We first procure synthetic video sequences featuring objects that morph dynamically with time. Our carefully devised pipeline automatically segments then copy-pastes these dynamic instances across the frames of any target background video sequence. We name our video data augmentation pipeline Synthetic Dynamic Instance Copy-Paste, and test it on the complex task of Video Instance Segmentation which combines detection, segmentation and tracking of object instances across a video sequence. Extensive experiments on the popular Youtube-VIS 2021 dataset using two separate popular networks as baselines achieve strong gains of +2.9 AP (6.5%) and +2.1 AP (4.9%). We make our code and models publicly available.

Via

Access Paper or Ask Questions

Fermat Number Transform Based Chromatic Dispersion Compensation and Adaptive Equalization Algorithm

May 07, 2024

Siyu Chen, Zheli Liu, Weihao Li, Zihe Hu, Mingming Zhang, Sheng Cui, Ming Tang

Figure 1 for Fermat Number Transform Based Chromatic Dispersion Compensation and Adaptive Equalization Algorithm

Figure 2 for Fermat Number Transform Based Chromatic Dispersion Compensation and Adaptive Equalization Algorithm

Figure 3 for Fermat Number Transform Based Chromatic Dispersion Compensation and Adaptive Equalization Algorithm

Abstract:By introducing the Fermat number transform into chromatic dispersion compensation and adaptive equalization, the computational complexity has been reduced by 68% compared with the con?ventional implementation. Experimental results validate its transmission performance with only 0.8 dB receiver sensitivity penalty in a 75 km-40 GBaud-PDM-16QAM system.

Via

Access Paper or Ask Questions

Prompt Decoupling for Text-to-Image Person Re-identification

Jan 04, 2024

Weihao Li, Lei Tan, Pingyang Dai, Yan Zhang

Abstract:Text-to-image person re-identification (TIReID) aims to retrieve the target person from an image gallery via a textual description query. Recently, pre-trained vision-language models like CLIP have attracted significant attention and have been widely utilized for this task due to their robust capacity for semantic concept learning and rich multi-modal knowledge. However, recent CLIP-based TIReID methods commonly rely on direct fine-tuning of the entire network to adapt the CLIP model for the TIReID task. Although these methods show competitive performance on this topic, they are suboptimal as they necessitate simultaneous domain adaptation and task adaptation. To address this issue, we attempt to decouple these two processes during the training stage. Specifically, we introduce the prompt tuning strategy to enable domain adaptation and propose a two-stage training approach to disentangle domain adaptation from task adaptation. In the first stage, we freeze the two encoders from CLIP and solely focus on optimizing the prompts to alleviate domain gap between the original training data of CLIP and downstream tasks. In the second stage, we maintain the fixed prompts and fine-tune the CLIP model to prioritize capturing fine-grained information, which is more suitable for TIReID task. Finally, we evaluate the effectiveness of our method on three widely used datasets. Compared to the directly fine-tuned approach, our method achieves significant improvements.

Via

Access Paper or Ask Questions

A Clustering Algorithm to Organize Satellite Hotspot Data for the Purpose of Tracking Bushfires Remotely

Aug 21, 2023

Weihao Li, Emily Dodwell, Dianne Cook

Figure 1 for A Clustering Algorithm to Organize Satellite Hotspot Data for the Purpose of Tracking Bushfires Remotely

Figure 2 for A Clustering Algorithm to Organize Satellite Hotspot Data for the Purpose of Tracking Bushfires Remotely

Figure 3 for A Clustering Algorithm to Organize Satellite Hotspot Data for the Purpose of Tracking Bushfires Remotely

Figure 4 for A Clustering Algorithm to Organize Satellite Hotspot Data for the Purpose of Tracking Bushfires Remotely

Abstract:This paper proposes a spatiotemporal clustering algorithm and its implementation in the R package spotoroo. This work is motivated by the catastrophic bushfires in Australia throughout the summer of 2019-2020 and made possible by the availability of satellite hotspot data. The algorithm is inspired by two existing spatiotemporal clustering algorithms but makes enhancements to cluster points spatially in conjunction with their movement across consecutive time periods. It also allows for the adjustment of key parameters, if required, for different locations and satellite data sources. Bushfire data from Victoria, Australia, is used to illustrate the algorithm and its use within the package.

Via

Access Paper or Ask Questions