Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Jun Zhou

Yilin

CREST: An Efficient Conjointly-trained Spike-driven Framework for Event-based Object Detection Exploiting Spatiotemporal Dynamics

Dec 17, 2024

Ruixin Mao, Aoyu Shen, Lin Tang, Jun Zhou

Abstract:Event-based cameras feature high temporal resolution, wide dynamic range, and low power consumption, which is ideal for high-speed and low-light object detection. Spiking neural networks (SNNs) are promising for event-based object recognition and detection due to their spiking nature but lack efficient training methods, leading to gradient vanishing and high computational complexity, especially in deep SNNs. Additionally, existing SNN frameworks often fail to effectively handle multi-scale spatiotemporal features, leading to increased data redundancy and reduced accuracy. To address these issues, we propose CREST, a novel conjointly-trained spike-driven framework to exploit spatiotemporal dynamics in event-based object detection. We introduce the conjoint learning rule to accelerate SNN learning and alleviate gradient vanishing. It also supports dual operation modes for efficient and flexible implementation on different hardware types. Additionally, CREST features a fully spike-driven framework with a multi-scale spatiotemporal event integrator (MESTOR) and a spatiotemporal-IoU (ST-IoU) loss. Our approach achieves superior object recognition & detection performance and up to 100X energy efficiency compared with state-of-the-art SNN algorithms on three datasets, providing an efficient solution for event-based object detection algorithms suitable for SNN hardware implementation.

* Accepted by AAAI 2025

Via

Access Paper or Ask Questions

Robust Noisy Correspondence Learning via Self-Drop and Dual-Weight

Dec 09, 2024

Fan Liu, Chenwei Dong, Chuanyi Zhang, Hualiang Zhou, Jun Zhou

Figure 1 for Robust Noisy Correspondence Learning via Self-Drop and Dual-Weight

Figure 2 for Robust Noisy Correspondence Learning via Self-Drop and Dual-Weight

Figure 3 for Robust Noisy Correspondence Learning via Self-Drop and Dual-Weight

Figure 4 for Robust Noisy Correspondence Learning via Self-Drop and Dual-Weight

Abstract:Many researchers collect data from the internet through crowd-sourcing or web crawling to alleviate the data-hungry challenge associated with cross-modal matching. Although such practice does not require expensive annotations, it inevitably introduces mismatched pairs and results in a noisy correspondence problem. Current approaches leverage the memorization effect of deep neural networks to distinguish noise and perform re-weighting. However, briefly lowering the weight of noisy pairs cannot eliminate the negative impact of noisy correspondence in the training process. In this paper, we propose a novel self-drop and dual-weight approach, which achieves elaborate data processing by qua-partitioning the data. Specifically, our approach partitions all data into four types: clean and significant, clean yet insignificant, vague, and noisy. We analyze the effect of noisy and clean data pairs and find that for vision-language pre-training models, a small number of clean samples is more valuable than a majority of noisy ones. Based on this observation, we employ self-drop to discard noisy samples to effectively mitigate the impact of noise. In addition, we adopt a dual-weight strategy to ensure that the model focuses more on significant samples while appropriately leveraging vague samples. Compared to the prior works, our approach is more robust and demonstrates relatively more stable performance on noisy datasets, especially under a high noise ratio. Extensive experiments on three widely used datasets, including Flickr30K, MS-COCO, and Conceptual Captions, validate the effectiveness of our approach. The source code is available at https://github.com/DongChenwei2000/SDD.

Via

Access Paper or Ask Questions

Graph Disentangle Causal Model: Enhancing Causal Inference in Networked Observational Data

Dec 05, 2024

Binbin Hu, Zhicheng An, Zhengwei Wu, Ke Tu, Ziqi Liu, Zhiqiang Zhang, Jun Zhou, Yufei Feng, Jiawei Chen

Figure 1 for Graph Disentangle Causal Model: Enhancing Causal Inference in Networked Observational Data

Figure 2 for Graph Disentangle Causal Model: Enhancing Causal Inference in Networked Observational Data

Figure 3 for Graph Disentangle Causal Model: Enhancing Causal Inference in Networked Observational Data

Figure 4 for Graph Disentangle Causal Model: Enhancing Causal Inference in Networked Observational Data

Abstract:Estimating individual treatment effects (ITE) from observational data is a critical task across various domains. However, many existing works on ITE estimation overlook the influence of hidden confounders, which remain unobserved at the individual unit level. To address this limitation, researchers have utilized graph neural networks to aggregate neighbors' features to capture the hidden confounders and mitigate confounding bias by minimizing the discrepancy of confounder representations between the treated and control groups. Despite the success of these approaches, practical scenarios often treat all features as confounders and involve substantial differences in feature distributions between the treated and control groups. Confusing the adjustment and confounder and enforcing strict balance on the confounder representations could potentially undermine the effectiveness of outcome prediction. To mitigate this issue, we propose a novel framework called the \textit{Graph Disentangle Causal model} (GDC) to conduct ITE estimation in the network setting. GDC utilizes a causal disentangle module to separate unit features into adjustment and confounder representations. Then we design a graph aggregation module consisting of three distinct graph aggregators to obtain adjustment, confounder, and counterfactual confounder representations. Finally, a causal constraint module is employed to enforce the disentangled representations as true causal factors. The effectiveness of our proposed method is demonstrated by conducting comprehensive experiments on two networked datasets.

* Accepted by WSDM 2025

Via

Access Paper or Ask Questions

HSLiNets: Hyperspectral Image and LiDAR Data Fusion Using Efficient Dual Non-Linear Feature Learning Networks

Dec 03, 2024

Judy X Yang, Jing Wang, Chen Hong Sui, Zekun Long, Jun Zhou

Figure 1 for HSLiNets: Hyperspectral Image and LiDAR Data Fusion Using Efficient Dual Non-Linear Feature Learning Networks

Figure 2 for HSLiNets: Hyperspectral Image and LiDAR Data Fusion Using Efficient Dual Non-Linear Feature Learning Networks

Figure 3 for HSLiNets: Hyperspectral Image and LiDAR Data Fusion Using Efficient Dual Non-Linear Feature Learning Networks

Figure 4 for HSLiNets: Hyperspectral Image and LiDAR Data Fusion Using Efficient Dual Non-Linear Feature Learning Networks

Abstract:The integration of hyperspectral imaging (HSI) and LiDAR data within new linear feature spaces offers a promising solution to the challenges posed by the high-dimensionality and redundancy inherent in HSIs. This study introduces a dual linear fused space framework that capitalizes on bidirectional reversed convolutional neural network (CNN) pathways, coupled with a specialized spatial analysis block. This approach combines the computational efficiency of CNNs with the adaptability of attention mechanisms, facilitating the effective fusion of spectral and spatial information. The proposed method not only enhances data processing and classification accuracy, but also mitigates the computational burden typically associated with advanced models such as Transformers. Evaluations of the Houston 2013 dataset demonstrate that our approach surpasses existing state-of-the-art models. This advancement underscores the potential of the framework in resource-constrained environments and its significant contributions to the field of remote sensing.

* 5 pages, 2 figues

Via

Access Paper or Ask Questions

Hyperspectral Images Efficient Spatial and Spectral non-Linear Model with Bidirectional Feature Learning

Dec 03, 2024

Judy X Yang, Jing Wang, Zekun Long, Chenhong Sui, Jun Zhou

Figure 1 for Hyperspectral Images Efficient Spatial and Spectral non-Linear Model with Bidirectional Feature Learning

Figure 2 for Hyperspectral Images Efficient Spatial and Spectral non-Linear Model with Bidirectional Feature Learning

Figure 3 for Hyperspectral Images Efficient Spatial and Spectral non-Linear Model with Bidirectional Feature Learning

Figure 4 for Hyperspectral Images Efficient Spatial and Spectral non-Linear Model with Bidirectional Feature Learning

Abstract:Classifying hyperspectral images (HSIs) is a complex task in remote sensing due to the high-dimensional nature and volume of data involved. To address these challenges, we propose the Spectral-Spatial non-Linear Model, a novel framework that significantly reduces data volume while enhancing classification accuracy. Our model employs a bidirectional reversed convolutional neural network (CNN) to efficiently extract spectral features, complemented by a specialized block for spatial feature analysis. This hybrid approach leverages the operational efficiency of CNNs and incorporates dynamic feature extraction inspired by attention mechanisms, optimizing performance without the high computational demands typically associated with transformer-based models. The SS non-Linear Model is designed to process hyperspectral data bidirectionally, achieving notable classification and efficiency improvements by fusing spectral and spatial features effectively. This approach yields superior classification accuracy compared to existing benchmarks while maintaining computational efficiency, making it suitable for resource-constrained environments. We validate the SS non-Linear Model on three widely recognized datasets, Houston 2013, Indian Pines, and Pavia University, demonstrating its ability to outperform current state-of-the-art models in HSI classification and efficiency. This work highlights the innovative methodology of the SS non-Linear Model and its practical benefits for remote sensing applications, where both data efficiency and classification accuracy are critical. For further details, please refer to our code repository on GitHub: HSILinearModel.

* 17 pages, 4 figures and 10 tables

Via

Access Paper or Ask Questions

Constructing accurate machine-learned potentials and performing highly efficient atomistic simulations to predict structural and thermal properties

Nov 16, 2024

Junlan Liu, Qian Yin, Mengshu He, Jun Zhou

Figure 1 for Constructing accurate machine-learned potentials and performing highly efficient atomistic simulations to predict structural and thermal properties

Figure 2 for Constructing accurate machine-learned potentials and performing highly efficient atomistic simulations to predict structural and thermal properties

Figure 3 for Constructing accurate machine-learned potentials and performing highly efficient atomistic simulations to predict structural and thermal properties

Figure 4 for Constructing accurate machine-learned potentials and performing highly efficient atomistic simulations to predict structural and thermal properties

Abstract:The $\text{Cu}_7\text{P}\text{S}_6$ compound has garnered significant attention due to its potential in thermoelectric applications. In this study, we introduce a neuroevolution potential (NEP), trained on a dataset generated from ab initio molecular dynamics (AIMD) simulations, using the moment tensor potential (MTP) as a reference. The low root mean square errors (RMSEs) for total energy and atomic forces demonstrate the high accuracy and transferability of both the MTP and NEP. We further calculate the phonon density of states (DOS) and radial distribution function (RDF) using both machine learning potentials, comparing the results to density functional theory (DFT) calculations. While the MTP potential offers slightly higher accuracy, the NEP achieves a remarkable 41-fold increase in computational speed. These findings provide detailed microscopic insights into the dynamics and rapid Cu-ion diffusion, paving the way for future studies on Cu-based solid electrolytes and their applications in energy devices.

Via

Access Paper or Ask Questions

Photon-Counting CT in Cancer Radiotherapy: Technological Advances and Clinical Benefits

Oct 26, 2024

Keyur D. Shah, Jun Zhou, Justin Roper, Anees Dhabaan, Hania Al-Hallaq, Amir Pourmorteza, Xiaofeng Yang

Figure 1 for Photon-Counting CT in Cancer Radiotherapy: Technological Advances and Clinical Benefits

Figure 2 for Photon-Counting CT in Cancer Radiotherapy: Technological Advances and Clinical Benefits

Figure 3 for Photon-Counting CT in Cancer Radiotherapy: Technological Advances and Clinical Benefits

Figure 4 for Photon-Counting CT in Cancer Radiotherapy: Technological Advances and Clinical Benefits

Abstract:Photon-counting computed tomography (PCCT) marks a significant advancement over conventional energy-integrating detector (EID) CT systems. This review highlights PCCT's superior spatial and contrast resolution, reduced radiation dose, and multi-energy imaging capabilities, which address key challenges in radiotherapy, such as accurate tumor delineation, precise dose calculation, and treatment response monitoring. PCCT's improved anatomical clarity enhances tumor targeting while minimizing damage to surrounding healthy tissues. Additionally, metal artifact reduction (MAR) and quantitative imaging capabilities optimize workflows, enabling adaptive radiotherapy and radiomics-driven personalized treatment. Emerging clinical applications in brachytherapy and radiopharmaceutical therapy (RPT) show promising outcomes, although challenges like high costs and limited software integration remain. With advancements in artificial intelligence (AI) and dedicated radiotherapy packages, PCCT is poised to transform precision, safety, and efficacy in cancer radiotherapy, marking it as a pivotal technology for future clinical practice.

Via

Access Paper or Ask Questions

LLMOPT: Learning to Define and Solve General Optimization Problems from Scratch

Oct 17, 2024

Caigao Jiang, Xiang Shu, Hong Qian, Xingyu Lu, Jun Zhou, Aimin Zhou, Yang Yu

Figure 1 for LLMOPT: Learning to Define and Solve General Optimization Problems from Scratch

Figure 2 for LLMOPT: Learning to Define and Solve General Optimization Problems from Scratch

Figure 3 for LLMOPT: Learning to Define and Solve General Optimization Problems from Scratch

Figure 4 for LLMOPT: Learning to Define and Solve General Optimization Problems from Scratch

Abstract:Optimization problems are prevalent across various scenarios. Formulating and then solving optimization problems described by natural language often requires highly specialized human expertise, which could block the widespread application of optimization-based decision making. To make problem formulating and solving automated, leveraging large language models (LLMs) has emerged as a potential way. However, this kind of way suffers from the issue of optimization generalization. Namely, the accuracy of most current LLM-based methods and the generality of optimization problem types that they can model are still limited. In this paper, we propose a unified learning-based framework called LLMOPT to boost optimization generalization. Starting from the natural language descriptions of optimization problems and a pre-trained LLM, LLMOPT constructs the introduced five-element formulation as a universal model for learning to define diverse optimization problem types. Then, LLMOPT employs the multi-instruction tuning to enhance both problem formalization and solver code generation accuracy and generality. After that, to prevent hallucinations in LLMs, such as sacrificing solving accuracy to avoid execution errors, model alignment and self-correction mechanism are adopted in LLMOPT. We evaluate the optimization generalization ability of LLMOPT and compared methods across six real-world datasets covering roughly 20 fields such as health, environment, energy and manufacturing, etc. Extensive experiment results show that LLMOPT is able to model various optimization problem types such as linear/nonlinear programming, mixed integer programming and combinatorial optimization, and achieves a notable 11.08% average solving accuracy improvement compared with the state-of-the-art methods. The code is available at https://github.com/caigaojiang/LLMOPT.

Via

Access Paper or Ask Questions

From Generalist to Specialist: Adapting Vision Language Models via Task-Specific Visual Instruction Tuning

Oct 09, 2024

Yang Bai, Yang Zhou, Jun Zhou, Rick Siow Mong Goh, Daniel Shu Wei Ting, Yong Liu

Figure 1 for From Generalist to Specialist: Adapting Vision Language Models via Task-Specific Visual Instruction Tuning

Figure 2 for From Generalist to Specialist: Adapting Vision Language Models via Task-Specific Visual Instruction Tuning

Figure 3 for From Generalist to Specialist: Adapting Vision Language Models via Task-Specific Visual Instruction Tuning

Figure 4 for From Generalist to Specialist: Adapting Vision Language Models via Task-Specific Visual Instruction Tuning

Abstract:Large vision language models (VLMs) combine large language models with vision encoders, demonstrating promise across various tasks. However, they often underperform in task-specific applications due to domain gaps between pre-training and fine-tuning. We introduce VITask, a novel framework that enhances task-specific adaptability of VLMs by integrating task-specific models (TSMs). VITask employs three key strategies: exemplar prompting (EP), response distribution alignment (RDA), and contrastive response tuning (CRT) to improve the task-specific performance of VLMs by adjusting their response distributions. EP allows TSM features to guide VLMs, while RDA enables VLMs to adapt without TSMs during inference by learning from exemplar-prompted models. CRT further optimizes the ranking of correct image-response pairs, thereby reducing the risk of generating undesired responses. Experiments on 12 medical diagnosis datasets across 9 imaging modalities show that VITask outperforms both vanilla instruction-tuned VLMs and TSMs, showcasing its ability to integrate complementary features from both models effectively. Additionally, VITask offers practical advantages such as flexible TSM integration and robustness to incomplete instructions, making it a versatile and efficient solution for task-specific VLM tuning. Our code are available at https://github.com/baiyang4/VITask.

Via

Access Paper or Ask Questions

Prompting DirectSAM for Semantic Contour Extraction in Remote Sensing Images

Oct 08, 2024

Shiyu Miao, Delong Chen, Fan Liu, Chuanyi Zhang, Yanhui Gu, Shengjie Guo, Jun Zhou

Figure 1 for Prompting DirectSAM for Semantic Contour Extraction in Remote Sensing Images

Figure 2 for Prompting DirectSAM for Semantic Contour Extraction in Remote Sensing Images

Figure 3 for Prompting DirectSAM for Semantic Contour Extraction in Remote Sensing Images

Figure 4 for Prompting DirectSAM for Semantic Contour Extraction in Remote Sensing Images

Abstract:The Direct Segment Anything Model (DirectSAM) excels in class-agnostic contour extraction. In this paper, we explore its use by applying it to optical remote sensing imagery, where semantic contour extraction-such as identifying buildings, road networks, and coastlines-holds significant practical value. Those applications are currently handled via training specialized small models separately on small datasets in each domain. We introduce a foundation model derived from DirectSAM, termed DirectSAM-RS, which not only inherits the strong segmentation capability acquired from natural images, but also benefits from a large-scale dataset we created for remote sensing semantic contour extraction. This dataset comprises over 34k image-text-contour triplets, making it at least 30 times larger than individual dataset. DirectSAM-RS integrates a prompter module: a text encoder and cross-attention layers attached to the DirectSAM architecture, which allows flexible conditioning on target class labels or referring expressions. We evaluate the DirectSAM-RS in both zero-shot and fine-tuning setting, and demonstrate that it achieves state-of-the-art performance across several downstream benchmarks.

Via

Access Paper or Ask Questions