Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Lei Zhang

Sid

Meta-Exploiting Frequency Prior for Cross-Domain Few-Shot Learning

Nov 03, 2024

Fei Zhou, Peng Wang, Lei Zhang, Zhenghua Chen, Wei Wei, Chen Ding, Guosheng Lin, Yanning Zhang

Figure 1 for Meta-Exploiting Frequency Prior for Cross-Domain Few-Shot Learning

Figure 2 for Meta-Exploiting Frequency Prior for Cross-Domain Few-Shot Learning

Figure 3 for Meta-Exploiting Frequency Prior for Cross-Domain Few-Shot Learning

Figure 4 for Meta-Exploiting Frequency Prior for Cross-Domain Few-Shot Learning

Abstract:Meta-learning offers a promising avenue for few-shot learning (FSL), enabling models to glean a generalizable feature embedding through episodic training on synthetic FSL tasks in a source domain. Yet, in practical scenarios where the target task diverges from that in the source domain, meta-learning based method is susceptible to over-fitting. To overcome this, we introduce a novel framework, Meta-Exploiting Frequency Prior for Cross-Domain Few-Shot Learning, which is crafted to comprehensively exploit the cross-domain transferable image prior that each image can be decomposed into complementary low-frequency content details and high-frequency robust structural characteristics. Motivated by this insight, we propose to decompose each query image into its high-frequency and low-frequency components, and parallel incorporate them into the feature embedding network to enhance the final category prediction. More importantly, we introduce a feature reconstruction prior and a prediction consistency prior to separately encourage the consistency of the intermediate feature as well as the final category prediction between the original query image and its decomposed frequency components. This allows for collectively guiding the network's meta-learning process with the aim of learning generalizable image feature embeddings, while not introducing any extra computational cost in the inference phase. Our framework establishes new state-of-the-art results on multiple cross-domain few-shot learning benchmarks.

Via

Access Paper or Ask Questions

Automating Quantum Software Maintenance: Flakiness Detection and Root Cause Analysis

Oct 31, 2024

Janakan Sivaloganathan, Ainaz Jamshidi, Andriy Miranskyy, Lei Zhang

Abstract:Flaky tests, which pass or fail inconsistently without code changes, are a major challenge in software engineering in general and in quantum software engineering in particular due to their complexity and probabilistic nature, leading to hidden issues and wasted developer effort. We aim to create an automated framework to detect flaky tests in quantum software and an extended dataset of quantum flaky tests, overcoming the limitations of manual methods. Building on prior manual analysis of 14 quantum software repositories, we expanded the dataset and automated flaky test detection using transformers and cosine similarity. We conducted experiments with Large Language Models (LLMs) from the OpenAI GPT and Meta LLaMA families to assess their ability to detect and classify flaky tests from code and issue descriptions. Embedding transformers proved effective: we identified 25 new flaky tests, expanding the dataset by 54%. Top LLMs achieved an F1-score of 0.8871 for flakiness detection but only 0.5839 for root cause identification. We introduced an automated flaky test detection framework using machine learning, showing promising results but highlighting the need for improved root cause detection and classification in large quantum codebases. Future work will focus on improving detection techniques and developing automatic flaky test fixes.

* 5 pages, 1 figure

Via

Access Paper or Ask Questions

AdaNeg: Adaptive Negative Proxy Guided OOD Detection with Vision-Language Models

Oct 26, 2024

Yabin Zhang, Lei Zhang

Figure 1 for AdaNeg: Adaptive Negative Proxy Guided OOD Detection with Vision-Language Models

Figure 2 for AdaNeg: Adaptive Negative Proxy Guided OOD Detection with Vision-Language Models

Figure 3 for AdaNeg: Adaptive Negative Proxy Guided OOD Detection with Vision-Language Models

Figure 4 for AdaNeg: Adaptive Negative Proxy Guided OOD Detection with Vision-Language Models

Abstract:Recent research has shown that pre-trained vision-language models are effective at identifying out-of-distribution (OOD) samples by using negative labels as guidance. However, employing consistent negative labels across different OOD datasets often results in semantic misalignments, as these text labels may not accurately reflect the actual space of OOD images. To overcome this issue, we introduce \textit{adaptive negative proxies}, which are dynamically generated during testing by exploring actual OOD images, to align more closely with the underlying OOD label space and enhance the efficacy of negative proxy guidance. Specifically, our approach utilizes a feature memory bank to selectively cache discriminative features from test images, representing the targeted OOD distribution. This facilitates the creation of proxies that can better align with specific OOD datasets. While task-adaptive proxies average features to reflect the unique characteristics of each dataset, the sample-adaptive proxies weight features based on their similarity to individual test samples, exploring detailed sample-level nuances. The final score for identifying OOD samples integrates static negative labels with our proposed adaptive proxies, effectively combining textual and visual knowledge for enhanced performance. Our method is training-free and annotation-free, and it maintains fast testing speed. Extensive experiments across various benchmarks demonstrate the effectiveness of our approach, abbreviated as AdaNeg. Notably, on the large-scale ImageNet benchmark, our AdaNeg significantly outperforms existing methods, with a 2.45\% increase in AUROC and a 6.48\% reduction in FPR95. Codes are available at \url{https://github.com/YBZh/OpenOOD-VLM}.

* NIPS 2024 Camera Ready, Codes are available at \url{https://github.com/YBZh/OpenOOD-VLM}

Via

Access Paper or Ask Questions

FreCaS: Efficient Higher-Resolution Image Generation via Frequency-aware Cascaded Sampling

Oct 24, 2024

Zhengqiang Zhang, Ruihuang Li, Lei Zhang

Figure 1 for FreCaS: Efficient Higher-Resolution Image Generation via Frequency-aware Cascaded Sampling

Figure 2 for FreCaS: Efficient Higher-Resolution Image Generation via Frequency-aware Cascaded Sampling

Figure 3 for FreCaS: Efficient Higher-Resolution Image Generation via Frequency-aware Cascaded Sampling

Figure 4 for FreCaS: Efficient Higher-Resolution Image Generation via Frequency-aware Cascaded Sampling

Abstract:While image generation with diffusion models has achieved a great success, generating images of higher resolution than the training size remains a challenging task due to the high computational cost. Current methods typically perform the entire sampling process at full resolution and process all frequency components simultaneously, contradicting with the inherent coarse-to-fine nature of latent diffusion models and wasting computations on processing premature high-frequency details at early diffusion stages. To address this issue, we introduce an efficient $\textbf{Fre}$quency-aware $\textbf{Ca}$scaded $\textbf{S}$ampling framework, $\textbf{FreCaS}$ in short, for higher-resolution image generation. FreCaS decomposes the sampling process into cascaded stages with gradually increased resolutions, progressively expanding frequency bands and refining the corresponding details. We propose an innovative frequency-aware classifier-free guidance (FA-CFG) strategy to assign different guidance strengths for different frequency components, directing the diffusion model to add new details in the expanded frequency domain of each stage. Additionally, we fuse the cross-attention maps of previous and current stages to avoid synthesizing unfaithful layouts. Experiments demonstrate that FreCaS significantly outperforms state-of-the-art methods in image quality and generation speed. In particular, FreCaS is about 2.86$\times$ and 6.07$\times$ faster than ScaleCrafter and DemoFusion in generating a 2048$\times$2048 image using a pre-trained SDXL model and achieves an FID$_b$ improvement of 11.6 and 3.7, respectively. FreCaS can be easily extended to more complex models such as SD3. The source code of FreCaS can be found at $\href{\text{https://github.com/xtudbxk/FreCaS}}{https://github.com/xtudbxk/FreCaS}$.

Via

Access Paper or Ask Questions

MotionCLR: Motion Generation and Training-free Editing via Understanding Attention Mechanisms

Oct 24, 2024

Ling-Hao Chen, Wenxun Dai, Xuan Ju, Shunlin Lu, Lei Zhang

Abstract:This research delves into the problem of interactive editing of human motion generation. Previous motion diffusion models lack explicit modeling of the word-level text-motion correspondence and good explainability, hence restricting their fine-grained editing ability. To address this issue, we propose an attention-based motion diffusion model, namely MotionCLR, with CLeaR modeling of attention mechanisms. Technically, MotionCLR models the in-modality and cross-modality interactions with self-attention and cross-attention, respectively. More specifically, the self-attention mechanism aims to measure the sequential similarity between frames and impacts the order of motion features. By contrast, the cross-attention mechanism works to find the fine-grained word-sequence correspondence and activate the corresponding timesteps in the motion sequence. Based on these key properties, we develop a versatile set of simple yet effective motion editing methods via manipulating attention maps, such as motion (de-)emphasizing, in-place motion replacement, and example-based motion generation, etc. For further verification of the explainability of the attention mechanism, we additionally explore the potential of action-counting and grounded motion generation ability via attention maps. Our experimental results show that our method enjoys good generation and editing ability with good explainability.

* MotionCLR v1 technical report

Via

Access Paper or Ask Questions

A Deep Learning-Based Method for Metal Artifact-Resistant Syn-MP-RAGE Contrast Synthesis

Oct 22, 2024

Ziyi Zeng, Yuhao Wang, Dianlin Hu, T. Michael O'Shea, Rebecca C. Fry, Jing Cai, Lei Zhang

Figure 1 for A Deep Learning-Based Method for Metal Artifact-Resistant Syn-MP-RAGE Contrast Synthesis

Figure 2 for A Deep Learning-Based Method for Metal Artifact-Resistant Syn-MP-RAGE Contrast Synthesis

Abstract:In certain brain volumetric studies, synthetic T1-weighted magnetization-prepared rapid gradient-echo (MP-RAGE) contrast, derived from quantitative T1 MRI (T1-qMRI), proves highly valuable due to its clear white/gray matter boundaries for brain segmentation. However, generating synthetic MP-RAGE (syn-MP-RAGE) typically requires pairs of high-quality, artifact-free, multi-modality inputs, which can be challenging in retrospective studies, where missing or corrupted data is common. To overcome this limitation, our research explores the feasibility of employing a deep learning-based approach to synthesize syn-MP-RAGE contrast directly from a single channel turbo spin-echo (TSE) input, renowned for its resistance to metal artifacts. We evaluated this deep learning-based synthetic MP-RAGE (DL-Syn-MPR) on 31 non-artifact and 11 metal-artifact subjects. The segmentation results, measured by the Dice Similarity Coefficient (DSC), consistently achieved high agreement (DSC values above 0.83), indicating a strong correlation with reference segmentations, with lower input requirements. Also, no significant difference in segmentation performance was observed between the artifact and non-artifact groups.

* 11 pages, 8 figures, 2 tables

Via

Access Paper or Ask Questions

Efficient Antibody Structure Refinement Using Energy-Guided SE(3) Flow Matching

Oct 22, 2024

Jiying Zhang, Zijing Liu, Shengyuan Bai, He Cao, Yu Li, Lei Zhang

Figure 1 for Efficient Antibody Structure Refinement Using Energy-Guided SE(3) Flow Matching

Figure 2 for Efficient Antibody Structure Refinement Using Energy-Guided SE(3) Flow Matching

Figure 3 for Efficient Antibody Structure Refinement Using Energy-Guided SE(3) Flow Matching

Figure 4 for Efficient Antibody Structure Refinement Using Energy-Guided SE(3) Flow Matching

Abstract:Antibodies are proteins produced by the immune system that recognize and bind to specific antigens, and their 3D structures are crucial for understanding their binding mechanism and designing therapeutic interventions. The specificity of antibody-antigen binding predominantly depends on the complementarity-determining regions (CDR) within antibodies. Despite recent advancements in antibody structure prediction, the quality of predicted CDRs remains suboptimal. In this paper, we develop a novel antibody structure refinement method termed FlowAB based on energy-guided flow matching. FlowAB adopts the powerful deep generative method SE(3) flow matching and simultaneously incorporates important physical prior knowledge into the flow model to guide the generation process. The extensive experiments demonstrate that FlowAB can significantly improve the antibody CDR structures. It achieves new state-of-the-art performance on the antibody structure prediction task when used in conjunction with an appropriate prior model while incurring only marginal computational overhead. This advantage makes FlowAB a practical tool in antibody engineering.

* BIBM 2024 regular paper

Via

Access Paper or Ask Questions

Toward Generalizing Visual Brain Decoding to Unseen Subjects

Oct 21, 2024

Xiangtao Kong, Kexin Huang, Ping Li, Lei Zhang

Abstract:Visual brain decoding aims to decode visual information from human brain activities. Despite the great progress, one critical limitation of current brain decoding research lies in the lack of generalization capability to unseen subjects. Prior works typically focus on decoding brain activity of individuals based on the observation that different subjects exhibit different brain activities, while it remains unclear whether brain decoding can be generalized to unseen subjects. This study aims to answer this question. We first consolidate an image-fMRI dataset consisting of stimulus-image and fMRI-response pairs, involving 177 subjects in the movie-viewing task of the Human Connectome Project (HCP). This dataset allows us to investigate the brain decoding performance with the increase of participants. We then present a learning paradigm that applies uniform processing across all subjects, instead of employing different network heads or tokenizers for individuals as in previous methods, which can accommodate a large number of subjects to explore the generalization capability across different subjects. A series of experiments are conducted and we have the following findings. First, the network exhibits clear generalization capabilities with the increase of training subjects. Second, the generalization capability is common to popular network architectures (MLP, CNN and Transformer). Third, the generalization performance is affected by the similarity between subjects. Our findings reveal the inherent similarities in brain activities across individuals. With the emerging of larger and more comprehensive datasets, it is possible to train a brain decoding foundation model in the future. Codes and models can be found at https://github.com/Xiangtaokong/TGBD.

Via

Access Paper or Ask Questions

Spatial-Mamba: Effective Visual State Space Models via Structure-Aware State Fusion

Oct 19, 2024

Chaodong Xiao, Minghan Li, Zhengqiang Zhang, Deyu Meng, Lei Zhang

Figure 1 for Spatial-Mamba: Effective Visual State Space Models via Structure-Aware State Fusion

Figure 2 for Spatial-Mamba: Effective Visual State Space Models via Structure-Aware State Fusion

Figure 3 for Spatial-Mamba: Effective Visual State Space Models via Structure-Aware State Fusion

Figure 4 for Spatial-Mamba: Effective Visual State Space Models via Structure-Aware State Fusion

Abstract:Selective state space models (SSMs), such as Mamba, highly excel at capturing long-range dependencies in 1D sequential data, while their applications to 2D vision tasks still face challenges. Current visual SSMs often convert images into 1D sequences and employ various scanning patterns to incorporate local spatial dependencies. However, these methods are limited in effectively capturing the complex image spatial structures and the increased computational cost caused by the lengthened scanning paths. To address these limitations, we propose Spatial-Mamba, a novel approach that establishes neighborhood connectivity directly in the state space. Instead of relying solely on sequential state transitions, we introduce a structure-aware state fusion equation, which leverages dilated convolutions to capture image spatial structural dependencies, significantly enhancing the flow of visual contextual information. Spatial-Mamba proceeds in three stages: initial state computation in a unidirectional scan, spatial context acquisition through structure-aware state fusion, and final state computation using the observation equation. Our theoretical analysis shows that Spatial-Mamba unifies the original Mamba and linear attention under the same matrix multiplication framework, providing a deeper understanding of our method. Experimental results demonstrate that Spatial-Mamba, even with a single scan, attains or surpasses the state-of-the-art SSM-based models in image classification, detection and segmentation. Source codes and trained models can be found at $\href{https://github.com/EdwardChasel/Spatial-Mamba}{\text{this https URL}}$.

* 16 pages, 8 figures, 5 tables

Via

Access Paper or Ask Questions

IntelliMove: Enhancing Robotic Planning with Semantic Mapping

Oct 18, 2024

Fama Ngom, Huaxi Zhang, Lei Zhang, Karen Godary-Dejean, Marianne Huchard

Figure 1 for IntelliMove: Enhancing Robotic Planning with Semantic Mapping

Figure 2 for IntelliMove: Enhancing Robotic Planning with Semantic Mapping

Figure 3 for IntelliMove: Enhancing Robotic Planning with Semantic Mapping

Figure 4 for IntelliMove: Enhancing Robotic Planning with Semantic Mapping

Abstract:Semantic navigation enables robots to understand their environments beyond basic geometry, allowing them to reason about objects, their functions, and their interrelationships. In semantic robotic navigation, creating accurate and semantically enriched maps is fundamental. Planning based on semantic maps not only enhances the robot's planning efficiency and computational speed but also makes the planning more meaningful, supporting a broader range of semantic tasks. In this paper, we introduce two core modules of IntelliMove: IntelliMap, a generic hierarchical semantic topometric map framework developed through an analysis of current technologies strengths and weaknesses, and Semantic Planning, which utilizes the semantic maps from IntelliMap. We showcase use cases that highlight IntelliMove's adaptability and effectiveness. Through experiments in simulated environments, we further demonstrate IntelliMove's capability in semantic navigation.

Via

Access Paper or Ask Questions