Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Shiqi Zhang

Hybrid Disagreement-Diversity Active Learning for Bioacoustic Sound Event Detection

May 28, 2025

Shiqi Zhang, Tuomas Virtanen

Abstract:Bioacoustic sound event detection (BioSED) is crucial for biodiversity conservation but faces practical challenges during model development and training: limited amounts of annotated data, sparse events, species diversity, and class imbalance. To address these challenges efficiently with a limited labeling budget, we apply the mismatch-first farthest-traversal (MFFT), an active learning method integrating committee voting disagreement and diversity analysis. We also refine an existing BioSED dataset specifically for evaluating active learning algorithms. Experimental results demonstrate that MFFT achieves a mAP of 68% when cold-starting and 71% when warm-starting (which is close to the fully-supervised mAP of 75%) while using only 2.3% of the annotations. Notably, MFFT excels in cold-start scenarios and with rare species, which are critical for monitoring endangered species, demonstrating its practical value.

* 5 pages, 1 figure, accepted by EUSIPCO 2025 v2: add our github repo

Via

Access Paper or Ask Questions

You Are What You Bought: Generating Customer Personas for E-commerce Applications

Apr 24, 2025

Yimin Shi, Yang Fei, Shiqi Zhang, Haixun Wang, Xiaokui Xiao

Abstract:In e-commerce, user representations are essential for various applications. Existing methods often use deep learning techniques to convert customer behaviors into implicit embeddings. However, these embeddings are difficult to understand and integrate with external knowledge, limiting the effectiveness of applications such as customer segmentation, search navigation, and product recommendations. To address this, our paper introduces the concept of the customer persona. Condensed from a customer's numerous purchasing histories, a customer persona provides a multi-faceted and human-readable characterization of specific purchase behaviors and preferences, such as Busy Parents or Bargain Hunters. This work then focuses on representing each customer by multiple personas from a predefined set, achieving readable and informative explicit user representations. To this end, we propose an effective and efficient solution GPLR. To ensure effectiveness, GPLR leverages pre-trained LLMs to infer personas for customers. To reduce overhead, GPLR applies LLM-based labeling to only a fraction of users and utilizes a random walk technique to predict personas for the remaining customers. We further propose RevAff, which provides an absolute error $\epsilon$ guarantee while improving the time complexity of the exact solution by a factor of at least $O(\frac{\epsilon\cdot|E|N}{|E|+N\log N})$, where $N$ represents the number of customers and products, and $E$ represents the interactions between them. We evaluate the performance of our persona-based representation in terms of accuracy and robustness for recommendation and customer segmentation tasks using three real-world e-commerce datasets. Most notably, we find that integrating customer persona representations improves the state-of-the-art graph convolution-based recommendation model by up to 12% in terms of NDCG@K and F1-Score@K.

* SIGIR 2025

Via

Access Paper or Ask Questions

Robo-Troj: Attacking LLM-based Task Planners

Apr 23, 2025

Mohaiminul Al Nahian, Zainab Altaweel, David Reitano, Sabbir Ahmed, Saumitra Lohokare, Shiqi Zhang, Adnan Siraj Rakin

Abstract:Robots need task planning methods to achieve goals that require more than individual actions. Recently, large language models (LLMs) have demonstrated impressive performance in task planning. LLMs can generate a step-by-step solution using a description of actions and the goal. Despite the successes in LLM-based task planning, there is limited research studying the security aspects of those systems. In this paper, we develop Robo-Troj, the first multi-trigger backdoor attack for LLM-based task planners, which is the main contribution of this work. As a multi-trigger attack, Robo-Troj is trained to accommodate the diversity of robot application domains. For instance, one can use unique trigger words, e.g., "herical", to activate a specific malicious behavior, e.g., cutting hand on a kitchen robot. In addition, we develop an optimization method for selecting the trigger words that are most effective. Through demonstrating the vulnerability of LLM-based planners, we aim to promote the development of secured robot systems.

Via

Access Paper or Ask Questions

Fighting Fire with Fire: Channel-Independent RF Fingerprinting via the Ratio of Linear to Logarithmic Differential Spectrum

Mar 28, 2025

Tianshu Chen, Aiqun Hu, Shiqi Zhang

Abstract:Eliminating the influence of temporally varying channel components on the radio frequency fingerprint (RFF) extraction has been an enduring and challenging issue. To overcome this problem, we propose a channel-independent RFF extraction method inspired by the idea of 'fighting fire with fire'. Specifically, we derive the linear differential spectrum and the logarithmic differential spectrum of the channel frequency responses (CFRs) from the received signals at different times, and then calculate the ratio of the two spectrums. It is found that the division operation effectively counteracts the channel effects, while simultaneously preserving the integrity of the RFFs. Our experiments on LTE-V2X, LoRa and Wi-Fi devices show that the proposed method achieves an average identification accuracy exceeding 95% across various environments.

* Submitted to IEEE Wireless Communications Letters

Via

Access Paper or Ask Questions

OG-Gaussian: Occupancy Based Street Gaussians for Autonomous Driving

Feb 20, 2025

Yedong Shen, Xinran Zhang, Yifan Duan, Shiqi Zhang, Heng Li, Yilong Wu, Jianmin Ji, Yanyong Zhang

Abstract:Accurate and realistic 3D scene reconstruction enables the lifelike creation of autonomous driving simulation environments. With advancements in 3D Gaussian Splatting (3DGS), previous studies have applied it to reconstruct complex dynamic driving scenes. These methods typically require expensive LiDAR sensors and pre-annotated datasets of dynamic objects. To address these challenges, we propose OG-Gaussian, a novel approach that replaces LiDAR point clouds with Occupancy Grids (OGs) generated from surround-view camera images using Occupancy Prediction Network (ONet). Our method leverages the semantic information in OGs to separate dynamic vehicles from static street background, converting these grids into two distinct sets of initial point clouds for reconstructing both static and dynamic objects. Additionally, we estimate the trajectories and poses of dynamic objects through a learning-based approach, eliminating the need for complex manual annotations. Experiments on Waymo Open dataset demonstrate that OG-Gaussian is on par with the current state-of-the-art in terms of reconstruction quality and rendering speed, achieving an average PSNR of 35.13 and a rendering speed of 143 FPS, while significantly reducing computational costs and economic overhead.

Via

Access Paper or Ask Questions

Plan-over-Graph: Towards Parallelable LLM Agent Schedule

Feb 20, 2025

Shiqi Zhang, Xinbei Ma, Zouying Cao, Zhuosheng Zhang, Hai Zhao

Figure 1 for Plan-over-Graph: Towards Parallelable LLM Agent Schedule

Figure 2 for Plan-over-Graph: Towards Parallelable LLM Agent Schedule

Figure 3 for Plan-over-Graph: Towards Parallelable LLM Agent Schedule

Figure 4 for Plan-over-Graph: Towards Parallelable LLM Agent Schedule

Abstract:Large Language Models (LLMs) have demonstrated exceptional abilities in reasoning for task planning. However, challenges remain under-explored for parallel schedules. This paper introduces a novel paradigm, plan-over-graph, in which the model first decomposes a real-life textual task into executable subtasks and constructs an abstract task graph. The model then understands this task graph as input and generates a plan for parallel execution. To enhance the planning capability of complex, scalable graphs, we design an automated and controllable pipeline to generate synthetic graphs and propose a two-stage training scheme. Experimental results show that our plan-over-graph method significantly improves task performance on both API-based LLMs and trainable open-sourced LLMs. By normalizing complex tasks as graphs, our method naturally supports parallel execution, demonstrating global efficiency. The code and data are available at https://github.com/zsq259/Plan-over-Graph.

Via

Access Paper or Ask Questions

KET-RAG: A Cost-Efficient Multi-Granular Indexing Framework for Graph-RAG

Feb 13, 2025

Yiqian Huang, Shiqi Zhang, Xiaokui Xiao

Figure 1 for KET-RAG: A Cost-Efficient Multi-Granular Indexing Framework for Graph-RAG

Figure 2 for KET-RAG: A Cost-Efficient Multi-Granular Indexing Framework for Graph-RAG

Figure 3 for KET-RAG: A Cost-Efficient Multi-Granular Indexing Framework for Graph-RAG

Figure 4 for KET-RAG: A Cost-Efficient Multi-Granular Indexing Framework for Graph-RAG

Abstract:Graph-RAG constructs a knowledge graph from text chunks to improve retrieval in Large Language Model (LLM)-based question answering. It is particularly useful in domains such as biomedicine, law, and political science, where retrieval often requires multi-hop reasoning over proprietary documents. Some existing Graph-RAG systems construct KNN graphs based on text chunk relevance, but this coarse-grained approach fails to capture entity relationships within texts, leading to sub-par retrieval and generation quality. To address this, recent solutions leverage LLMs to extract entities and relationships from text chunks, constructing triplet-based knowledge graphs. However, this approach incurs significant indexing costs, especially for large document collections. To ensure a good result accuracy while reducing the indexing cost, we propose KET-RAG, a multi-granular indexing framework. KET-RAG first identifies a small set of key text chunks and leverages an LLM to construct a knowledge graph skeleton. It then builds a text-keyword bipartite graph from all text chunks, serving as a lightweight alternative to a full knowledge graph. During retrieval, KET-RAG searches both structures: it follows the local search strategy of existing Graph-RAG systems on the skeleton while mimicking this search on the bipartite graph to improve retrieval quality. We evaluate eight solutions on two real-world datasets, demonstrating that KET-RAG outperforms all competitors in indexing cost, retrieval effectiveness, and generation quality. Notably, it achieves comparable or superior retrieval quality to Microsoft's Graph-RAG while reducing indexing costs by over an order of magnitude. Additionally, it improves the generation quality by up to 32.4% while lowering indexing costs by around 20%.

Via

Access Paper or Ask Questions

Data Quality Enhancement on the Basis of Diversity with Large Language Models for Text Classification: Uncovered, Difficult, and Noisy

Dec 10, 2024

Min Zeng, Caiquan Liu, Shiqi Zhang, Li Xie, Chen Sang, Xiaoxin Chen

Figure 1 for Data Quality Enhancement on the Basis of Diversity with Large Language Models for Text Classification: Uncovered, Difficult, and Noisy

Figure 2 for Data Quality Enhancement on the Basis of Diversity with Large Language Models for Text Classification: Uncovered, Difficult, and Noisy

Figure 3 for Data Quality Enhancement on the Basis of Diversity with Large Language Models for Text Classification: Uncovered, Difficult, and Noisy

Figure 4 for Data Quality Enhancement on the Basis of Diversity with Large Language Models for Text Classification: Uncovered, Difficult, and Noisy

Abstract:In recent years, the use of large language models (LLMs) for text classification has attracted widespread attention. Despite this, the classification accuracy of LLMs has not yet universally surpassed that of smaller models. LLMs can enhance their performance in text classification through fine-tuning. However, existing data quality research based on LLMs is challenging to apply directly to solve text classification problems. To further improve the performance of LLMs in classification tasks, this paper proposes a data quality enhancement (DQE) method for text classification based on LLMs. This method starts by using a greedy algorithm to select data, dividing the dataset into sampled and unsampled subsets, and then performing fine-tuning of the LLMs using the sampled data. Subsequently, this model is used to predict the outcomes for the unsampled data, categorizing incorrectly predicted data into uncovered, difficult, and noisy data. Experimental results demonstrate that our method effectively enhances the performance of LLMs in text classification tasks and significantly improves training efficiency, saving nearly half of the training time. Our method has achieved state-of-the-art performance in several open-source classification tasks.

* Accepted by COLING 2025(main, long paper)

Via

Access Paper or Ask Questions

NeuroFly: A framework for whole-brain single neuron reconstruction

Nov 07, 2024

Rubin Zhao, Yang Liu, Shiqi Zhang, Zijian Yi, Yanyang Xiao, Fang Xu, Yi Yang, Pencheng Zhou

Figure 1 for NeuroFly: A framework for whole-brain single neuron reconstruction

Figure 2 for NeuroFly: A framework for whole-brain single neuron reconstruction

Figure 3 for NeuroFly: A framework for whole-brain single neuron reconstruction

Figure 4 for NeuroFly: A framework for whole-brain single neuron reconstruction

Abstract:Neurons, with their elongated, tree-like dendritic and axonal structures, enable efficient signal integration and long-range communication across brain regions. By reconstructing individual neurons' morphology, we can gain valuable insights into brain connectivity, revealing the structure basis of cognition, movement, and perception. Despite the accumulation of extensive 3D microscopic imaging data, progress has been considerably hindered by the absence of automated tools to streamline this process. Here we introduce NeuroFly, a validated framework for large-scale automatic single neuron reconstruction. This framework breaks down the process into three distinct stages: segmentation, connection, and proofreading. In the segmentation stage, we perform automatic segmentation followed by skeletonization to generate over-segmented neuronal fragments without branches. During the connection stage, we use a 3D image-based path following approach to extend each fragment and connect it with other fragments of the same neuron. Finally, human annotators are required only to proofread the few unresolved positions. The first two stages of our process are clearly defined computer vision problems, and we have trained robust baseline models to solve them. We validated NeuroFly's efficiency using in-house datasets that include a variety of challenging scenarios, such as dense arborizations, weak axons, images with contamination. We will release the datasets along with a suite of visualization and annotation tools for better reproducibility. Our goal is to foster collaboration among researchers to address the neuron reconstruction challenge, ultimately accelerating advancements in neuroscience research. The dataset and code are available at https://github.com/beanli161514/neurofly

Via

Access Paper or Ask Questions

DKPROMPT: Domain Knowledge Prompting Vision-Language Models for Open-World Planning

Jun 25, 2024

Xiaohan Zhang, Zainab Altaweel, Yohei Hayamizu, Yan Ding, Saeid Amiri, Hao Yang, Andy Kaminski, Chad Esselink, Shiqi Zhang

Abstract:Vision-language models (VLMs) have been applied to robot task planning problems, where the robot receives a task in natural language and generates plans based on visual inputs. While current VLMs have demonstrated strong vision-language understanding capabilities, their performance is still far from being satisfactory in planning tasks. At the same time, although classical task planners, such as PDDL-based, are strong in planning for long-horizon tasks, they do not work well in open worlds where unforeseen situations are common. In this paper, we propose a novel task planning and execution framework, called DKPROMPT, which automates VLM prompting using domain knowledge in PDDL for classical planning in open worlds. Results from quantitative experiments show that DKPROMPT outperforms classical planning, pure VLM-based and a few other competitive baselines in task completion rate.

Via

Access Paper or Ask Questions