Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Yang Li

Shanghai Center for Systems Biomedicine, Key Laboratory of Systems Biomedicine

Vision Mamba Distillation for Low-resolution Fine-grained Image Classification

Nov 27, 2024

Yao Chen, Jiabao Wang, Peichao Wang, Rui Zhang, Yang Li

Figure 1 for Vision Mamba Distillation for Low-resolution Fine-grained Image Classification

Figure 2 for Vision Mamba Distillation for Low-resolution Fine-grained Image Classification

Figure 3 for Vision Mamba Distillation for Low-resolution Fine-grained Image Classification

Figure 4 for Vision Mamba Distillation for Low-resolution Fine-grained Image Classification

Abstract:Low-resolution fine-grained image classification has recently made significant progress, largely thanks to the super-resolution techniques and knowledge distillation methods. However, these approaches lead to an exponential increase in the number of parameters and computational complexity of models. In order to solve this problem, in this letter, we propose a Vision Mamba Distillation (ViMD) approach to enhance the effectiveness and efficiency of low-resolution fine-grained image classification. Concretely, a lightweight super-resolution vision Mamba classification network (SRVM-Net) is proposed to improve its capability for extracting visual features by redesigning the classification sub-network with Mamba modeling. Moreover, we design a novel multi-level Mamba knowledge distillation loss boosting the performance, which can transfer prior knowledge obtained from a High-resolution Vision Mamba classification Network (HRVM-Net) as a teacher into the proposed SRVM-Net as a student. Extensive experiments on seven public fine-grained classification datasets related to benchmarks confirm our ViMD achieves a new state-of-the-art performance. While having higher accuracy, ViMD outperforms similar methods with fewer parameters and FLOPs, which is more suitable for embedded device applications. Code is available at https://github.com/boa2004plaust/ViMD.

Via

Access Paper or Ask Questions

Can LLMs be Good Graph Judger for Knowledge Graph Construction?

Nov 26, 2024

Haoyu Huang, Chong Chen, Conghui He, Yang Li, Jiawei Jiang, Wentao Zhang

Figure 1 for Can LLMs be Good Graph Judger for Knowledge Graph Construction?

Figure 2 for Can LLMs be Good Graph Judger for Knowledge Graph Construction?

Figure 3 for Can LLMs be Good Graph Judger for Knowledge Graph Construction?

Figure 4 for Can LLMs be Good Graph Judger for Knowledge Graph Construction?

Abstract:In real-world scenarios, most of the data obtained from information retrieval (IR) system is unstructured. Converting natural language sentences into structured Knowledge Graphs (KGs) remains a critical challenge. The quality of constructed KGs may also impact the performance of some KG-dependent domains like GraphRAG systems and recommendation systems. Recently, Large Language Models (LLMs) have demonstrated impressive capabilities in addressing a wide range of natural language processing tasks. However, there are still challenges when utilizing LLMs to address the task of generating structured KGs. And we have identified three limitations with respect to existing KG construction methods. (1)There is a large amount of information and excessive noise in real-world documents, which could result in extracting messy information. (2)Native LLMs struggle to effectively extract accuracy knowledge from some domain-specific documents. (3)Hallucinations phenomenon cannot be overlooked when utilizing LLMs directly as an unsupervised method for constructing KGs. In this paper, we propose GraphJudger, a knowledge graph construction framework to address the aforementioned challenges. We introduce three innovative modules in our method, which are entity-centric iterative text denoising, knowledge aware instruction tuning and graph judgement, respectively. We seek to utilize the capacity of LLMs to function as a graph judger, a capability superior to their role only as a predictor for KG construction problems. Experiments conducted on two general text-graph pair datasets and one domain-specific text-graph pair dataset show superior performances compared to baseline methods. The code of our proposed method is available at https://github.com/hhy-huang/GraphJudger.

Via

Access Paper or Ask Questions

One is Plenty: A Polymorphic Feature Interpreter for Immutable Heterogeneous Collaborative Perception

Nov 25, 2024

Yuchen Xia, Quan Yuan, Guiyang Luo, Xiaoyuan Fu, Yang Li, Xuanhan Zhu, Tianyou Luo, Siheng Chen, Jinglin Li

Abstract:Collaborative perception in autonomous driving significantly enhances the perception capabilities of individual agents. Immutable heterogeneity in collaborative perception, where agents have different and fixed perception networks, presents a major challenge due to the semantic gap in their exchanged intermediate features without modifying the perception networks. Most existing methods bridge the semantic gap through interpreters. However, they either require training a new interpreter for each new agent type, limiting extensibility, or rely on a two-stage interpretation via an intermediate standardized semantic space, causing cumulative semantic loss. To achieve both extensibility in immutable heterogeneous scenarios and low-loss feature interpretation, we propose PolyInter, a polymorphic feature interpreter. It contains an extension point through which emerging new agents can seamlessly integrate by overriding only their specific prompts, which are learnable parameters intended to guide the interpretation, while reusing PolyInter's remaining parameters. By leveraging polymorphism, our design ensures that a single interpreter is sufficient to accommodate diverse agents and interpret their features into the ego agent's semantic space. Experiments conducted on the OPV2V dataset demonstrate that PolyInter improves collaborative perception precision by up to 11.1% compared to SOTA interpreters, while comparable results can be achieved by training only 1.4% of PolyInter's parameters when adapting to new agents.

Via

Access Paper or Ask Questions

MolMetaLM: a Physicochemical Knowledge-Guided Molecular Meta Language Model

Nov 23, 2024

Yifan Wu, Min Zeng, Yang Li, Yang Zhang, Min Li

Abstract:Most current molecular language models transfer the masked language model or image-text generation model from natural language processing to molecular field. However, molecules are not solely characterized by atom/bond symbols; they encapsulate important physical/chemical properties. Moreover, normal language models bring grammar rules that are irrelevant for understanding molecules. In this study, we propose a novel physicochemical knowledge-guided molecular meta language framework MolMetaLM. We design a molecule-specialized meta language paradigm, formatted as multiple <S,P,O> (subject, predicate, object) knowledge triples sharing the same S (i.e., molecule) to enhance learning the semantic relationships between physicochemical knowledge and molecules. By introducing different molecular knowledge and noises, the meta language paradigm generates tens of thousands of pretraining tasks. By recovering the token/sequence/order-level noises, MolMetaLM exhibits proficiency in large-scale benchmark evaluations involving property prediction, molecule generation, conformation inference, and molecular optimization. Through MolMetaLM, we offer a new insight for designing language models.

Via

Access Paper or Ask Questions

Paying more attention to local contrast: improving infrared small target detection performance via prior knowledge

Nov 20, 2024

Peichao Wang, Jiabao Wang, Yao Chen, Rui Zhang, Yang Li, Zhuang Miao

Abstract:The data-driven method for infrared small target detection (IRSTD) has achieved promising results. However, due to the small scale of infrared small target datasets and the limited number of pixels occupied by the targets themselves, it is a challenging task for deep learning methods to directly learn from these samples. Utilizing human expert knowledge to assist deep learning methods in better learning is worthy of exploration. To effectively guide the model to focus on targets' spatial features, this paper proposes the Local Contrast Attention Enhanced infrared small target detection Network (LCAE-Net), combining prior knowledge with data-driven deep learning methods. LCAE-Net is a U-shaped neural network model which consists of two developed modules: a Local Contrast Enhancement (LCE) module and a Channel Attention Enhancement (CAE) module. The LCE module takes advantages of prior knowledge, leveraging handcrafted convolution operator to acquire Local Contrast Attention (LCA), which could realize background suppression while enhance the potential target region, thus guiding the neural network to pay more attention to potential infrared small targets' location information. To effectively utilize the response information throughout downsampling progresses, the CAE module is proposed to achieve the information fusion among feature maps' different channels. Experimental results indicate that our LCAE-Net outperforms existing state-of-the-art methods on the three public datasets NUDT-SIRST, NUAA-SIRST, and IRSTD-1K, and its detection speed could reach up to 70 fps. Meanwhile, our model has a parameter count and Floating-Point Operations (FLOPs) of 1.945M and 4.862G respectively, which is suitable for deployment on edge devices.

* 16 pages, 8 figures

Via

Access Paper or Ask Questions

CCIS-Diff: A Generative Model with Stable Diffusion Prior for Controlled Colonoscopy Image Synthesis

Nov 19, 2024

Yifan Xie, Jingge Wang, Tao Feng, Fei Ma, Yang Li

Figure 1 for CCIS-Diff: A Generative Model with Stable Diffusion Prior for Controlled Colonoscopy Image Synthesis

Figure 2 for CCIS-Diff: A Generative Model with Stable Diffusion Prior for Controlled Colonoscopy Image Synthesis

Figure 3 for CCIS-Diff: A Generative Model with Stable Diffusion Prior for Controlled Colonoscopy Image Synthesis

Figure 4 for CCIS-Diff: A Generative Model with Stable Diffusion Prior for Controlled Colonoscopy Image Synthesis

Abstract:Colonoscopy is crucial for identifying adenomatous polyps and preventing colorectal cancer. However, developing robust models for polyp detection is challenging by the limited size and accessibility of existing colonoscopy datasets. While previous efforts have attempted to synthesize colonoscopy images, current methods suffer from instability and insufficient data diversity. Moreover, these approaches lack precise control over the generation process, resulting in images that fail to meet clinical quality standards. To address these challenges, we propose CCIS-DIFF, a Controlled generative model for high-quality Colonoscopy Image Synthesis based on a Diffusion architecture. Our method offers precise control over both the spatial attributes (polyp location and shape) and clinical characteristics of polyps that align with clinical descriptions. Specifically, we introduce a blur mask weighting strategy to seamlessly blend synthesized polyps with the colonic mucosa, and a text-aware attention mechanism to guide the generated images to reflect clinical characteristics. Notably, to achieve this, we construct a new multi-modal colonoscopy dataset that integrates images, mask annotations, and corresponding clinical text descriptions. Experimental results demonstrate that our method generates high-quality, diverse colonoscopy images with fine control over both spatial constraints and clinical consistency, offering valuable support for downstream segmentation and diagnostic tasks.

* 5 pages, 4 figures

Via

Access Paper or Ask Questions

Covariance-Based Device Activity Detection with Massive MIMO for Near-Field Correlated Channels

Nov 08, 2024

Ziyue Wang, Yang Li, Ya-Feng Liu, Junjie Ma

Figure 1 for Covariance-Based Device Activity Detection with Massive MIMO for Near-Field Correlated Channels

Figure 2 for Covariance-Based Device Activity Detection with Massive MIMO for Near-Field Correlated Channels

Figure 3 for Covariance-Based Device Activity Detection with Massive MIMO for Near-Field Correlated Channels

Figure 4 for Covariance-Based Device Activity Detection with Massive MIMO for Near-Field Correlated Channels

Abstract:This paper studies the device activity detection problem in a massive multiple-input multiple-output (MIMO) system for near-field communications (NFC). In this system, active devices transmit their signature sequences to the base station (BS), which detects the active devices based on the received signal. In this paper, we model the near-field channels as correlated Rician fading channels and formulate the device activity detection problem as a maximum likelihood estimation (MLE) problem. Compared to the traditional uncorrelated channel model, the correlation of channels complicates both algorithm design and theoretical analysis of the MLE problem. On the algorithmic side, we propose two computationally efficient algorithms for solving the MLE problem: an exact coordinate descent (CD) algorithm and an inexact CD algorithm. The exact CD algorithm solves the one-dimensional optimization subproblem exactly using matrix eigenvalue decomposition and polynomial root-finding. By approximating the objective function appropriately, the inexact CD algorithm solves the one-dimensional optimization subproblem inexactly with lower complexity and more robust numerical performance. Additionally, we analyze the detection performance of the MLE problem under correlated channels by comparing it with the case of uncorrelated channels. The analysis shows that when the overall number of devices $N$ is large or the signature sequence length $L$ is small, the detection performance of MLE under correlated channels tends to be better than that under uncorrelated channels. Conversely, when $N$ is small or $L$ is large, MLE performs better under uncorrelated channels than under correlated ones. Simulation results demonstrate the computational efficiency of the proposed algorithms and verify the correctness of the analysis.

* 15 pages, 8 figures, submitted for possible publication

Via

Access Paper or Ask Questions

Automating Exploratory Proteomics Research via Language Models

Nov 06, 2024

Ning Ding, Shang Qu, Linhai Xie, Yifei Li, Zaoqu Liu, Kaiyan Zhang, Yibai Xiong, Yuxin Zuo, Zhangren Chen, Ermo Hua(+6 more)

Figure 1 for Automating Exploratory Proteomics Research via Language Models

Figure 2 for Automating Exploratory Proteomics Research via Language Models

Figure 3 for Automating Exploratory Proteomics Research via Language Models

Figure 4 for Automating Exploratory Proteomics Research via Language Models

Abstract:With the development of artificial intelligence, its contribution to science is evolving from simulating a complex problem to automating entire research processes and producing novel discoveries. Achieving this advancement requires both specialized general models grounded in real-world scientific data and iterative, exploratory frameworks that mirror human scientific methodologies. In this paper, we present PROTEUS, a fully automated system for scientific discovery from raw proteomics data. PROTEUS uses large language models (LLMs) to perform hierarchical planning, execute specialized bioinformatics tools, and iteratively refine analysis workflows to generate high-quality scientific hypotheses. The system takes proteomics datasets as input and produces a comprehensive set of research objectives, analysis results, and novel biological hypotheses without human intervention. We evaluated PROTEUS on 12 proteomics datasets collected from various biological samples (e.g. immune cells, tumors) and different sample types (single-cell and bulk), generating 191 scientific hypotheses. These were assessed using both automatic LLM-based scoring on 5 metrics and detailed reviews from human experts. Results demonstrate that PROTEUS consistently produces reliable, logically coherent results that align well with existing literature while also proposing novel, evaluable hypotheses. The system's flexible architecture facilitates seamless integration of diverse analysis tools and adaptation to different proteomics data types. By automating complex proteomics analysis workflows and hypothesis generation, PROTEUS has the potential to considerably accelerate the pace of scientific discovery in proteomics research, enabling researchers to efficiently explore large-scale datasets and uncover biological insights.

Via

Access Paper or Ask Questions

ChatTracker: Enhancing Visual Tracking Performance via Chatting with Multimodal Large Language Model

Nov 04, 2024

Yiming Sun, Fan Yu, Shaoxiang Chen, Yu Zhang, Junwei Huang, Chenhui Li, Yang Li, Changbo Wang

Figure 1 for ChatTracker: Enhancing Visual Tracking Performance via Chatting with Multimodal Large Language Model

Figure 2 for ChatTracker: Enhancing Visual Tracking Performance via Chatting with Multimodal Large Language Model

Figure 3 for ChatTracker: Enhancing Visual Tracking Performance via Chatting with Multimodal Large Language Model

Figure 4 for ChatTracker: Enhancing Visual Tracking Performance via Chatting with Multimodal Large Language Model

Abstract:Visual object tracking aims to locate a targeted object in a video sequence based on an initial bounding box. Recently, Vision-Language~(VL) trackers have proposed to utilize additional natural language descriptions to enhance versatility in various applications. However, VL trackers are still inferior to State-of-The-Art (SoTA) visual trackers in terms of tracking performance. We found that this inferiority primarily results from their heavy reliance on manual textual annotations, which include the frequent provision of ambiguous language descriptions. In this paper, we propose ChatTracker to leverage the wealth of world knowledge in the Multimodal Large Language Model (MLLM) to generate high-quality language descriptions and enhance tracking performance. To this end, we propose a novel reflection-based prompt optimization module to iteratively refine the ambiguous and inaccurate descriptions of the target with tracking feedback. To further utilize semantic information produced by MLLM, a simple yet effective VL tracking framework is proposed and can be easily integrated as a plug-and-play module to boost the performance of both VL and visual trackers. Experimental results show that our proposed ChatTracker achieves a performance comparable to existing methods.

Via

Access Paper or Ask Questions

Adaptive Paradigm Synergy: Can a Cross-Paradigm Objective Enhance Long-Tailed Learning?

Oct 30, 2024

Haowen Xiao, Guanghui Liu, Xinyi Gao, Yang Li, Fengmao Lv, Jielei Chu

Abstract:Self-supervised learning (SSL) has achieved impressive results across several computer vision tasks, even rivaling supervised methods. However, its performance degrades on real-world datasets with long-tailed distributions due to difficulties in capturing inherent class imbalances. Although supervised long-tailed learning offers significant insights, the absence of labels in SSL prevents direct transfer of these strategies.To bridge this gap, we introduce Adaptive Paradigm Synergy (APS), a cross-paradigm objective that seeks to unify the strengths of both paradigms. Our approach reexamines contrastive learning from a spatial structure perspective, dynamically adjusting the uniformity of latent space structure through adaptive temperature tuning. Furthermore, we draw on a re-weighting strategy from supervised learning to compensate for the shortcomings of temperature adjustment in explicit quantity perception.Extensive experiments on commonly used long-tailed datasets demonstrate that APS improves performance effectively and efficiently. Our findings reveal the potential for deeper integration between supervised and self-supervised learning, paving the way for robust models that handle real-world class imbalance.

* 11 pages, 3 figures

Via

Access Paper or Ask Questions