Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Qian Yu

Marketing and Commercialization Center, JD.com

VectorPainter: A Novel Approach to Stylized Vector Graphics Synthesis with Vectorized Strokes

May 05, 2024

Juncheng Hu, Ximing Xing, Zhengqi Zhang, Jing Zhang, Qian Yu

Figure 1 for VectorPainter: A Novel Approach to Stylized Vector Graphics Synthesis with Vectorized Strokes

Figure 2 for VectorPainter: A Novel Approach to Stylized Vector Graphics Synthesis with Vectorized Strokes

Figure 3 for VectorPainter: A Novel Approach to Stylized Vector Graphics Synthesis with Vectorized Strokes

Figure 4 for VectorPainter: A Novel Approach to Stylized Vector Graphics Synthesis with Vectorized Strokes

Abstract:We propose a novel method, VectorPainter, for the task of stylized vector graphics synthesis. Given a text prompt and a reference style image, VectorPainter generates a vector graphic that aligns in content with the text prompt and remains faithful in style to the reference image. We recognize that the key to this task lies in fully leveraging the intrinsic properties of vector graphics. Innovatively, we conceptualize the stylization process as the rearrangement of vectorized strokes extracted from the reference image. VectorPainter employs an optimization-based pipeline. It begins by extracting vectorized strokes from the reference image, which are then used to initialize the synthesis process. To ensure fidelity to the reference style, a novel style preservation loss is introduced. Extensive experiments have been conducted to demonstrate that our method is capable of aligning with the text description while remaining faithful to the reference image.

Via

Access Paper or Ask Questions

Constructing and Exploring Intermediate Domains in Mixed Domain Semi-supervised Medical Image Segmentation

Apr 13, 2024

Qinghe Ma, Jian Zhang, Lei Qi, Qian Yu, Yinghuan Shi, Yang Gao

Figure 1 for Constructing and Exploring Intermediate Domains in Mixed Domain Semi-supervised Medical Image Segmentation

Figure 2 for Constructing and Exploring Intermediate Domains in Mixed Domain Semi-supervised Medical Image Segmentation

Figure 3 for Constructing and Exploring Intermediate Domains in Mixed Domain Semi-supervised Medical Image Segmentation

Figure 4 for Constructing and Exploring Intermediate Domains in Mixed Domain Semi-supervised Medical Image Segmentation

Abstract:Both limited annotation and domain shift are prevalent challenges in medical image segmentation. Traditional semi-supervised segmentation and unsupervised domain adaptation methods address one of these issues separately. However, the coexistence of limited annotation and domain shift is quite common, which motivates us to introduce a novel and challenging scenario: Mixed Domain Semi-supervised medical image Segmentation (MiDSS). In this scenario, we handle data from multiple medical centers, with limited annotations available for a single domain and a large amount of unlabeled data from multiple domains. We found that the key to solving the problem lies in how to generate reliable pseudo labels for the unlabeled data in the presence of domain shift with labeled data. To tackle this issue, we employ Unified Copy-Paste (UCP) between images to construct intermediate domains, facilitating the knowledge transfer from the domain of labeled data to the domains of unlabeled data. To fully utilize the information within the intermediate domain, we propose a symmetric Guidance training strategy (SymGD), which additionally offers direct guidance to unlabeled data by merging pseudo labels from intermediate samples. Subsequently, we introduce a Training Process aware Random Amplitude MixUp (TP-RAM) to progressively incorporate style-transition components into intermediate samples. Compared with existing state-of-the-art approaches, our method achieves a notable 13.57% improvement in Dice score on Prostate dataset, as demonstrated on three public datasets. Our code is available at https://github.com/MQinghe/MiDSS .

Via

Access Paper or Ask Questions

Multi-view Aggregation Network for Dichotomous Image Segmentation

Apr 11, 2024

Qian Yu, Xiaoqi Zhao, Youwei Pang, Lihe Zhang, Huchuan Lu

Abstract:Dichotomous Image Segmentation (DIS) has recently emerged towards high-precision object segmentation from high-resolution natural images. When designing an effective DIS model, the main challenge is how to balance the semantic dispersion of high-resolution targets in the small receptive field and the loss of high-precision details in the large receptive field. Existing methods rely on tedious multiple encoder-decoder streams and stages to gradually complete the global localization and local refinement. Human visual system captures regions of interest by observing them from multiple views. Inspired by it, we model DIS as a multi-view object perception problem and provide a parsimonious multi-view aggregation network (MVANet), which unifies the feature fusion of the distant view and close-up view into a single stream with one encoder-decoder structure. With the help of the proposed multi-view complementary localization and refinement modules, our approach established long-range, profound visual interactions across multiple views, allowing the features of the detailed close-up view to focus on highly slender structures.Experiments on the popular DIS-5K dataset show that our MVANet significantly outperforms state-of-the-art methods in both accuracy and speed. The source code and datasets will be publicly available at \href{https://github.com/qianyu-dlut/MVANet}{MVANet}.

* Accepted by CVPR2024 as Highlight

Via

Access Paper or Ask Questions

GenesisTex: Adapting Image Denoising Diffusion to Texture Space

Mar 26, 2024

Chenjian Gao, Boyan Jiang, Xinghui Li, Yingpeng Zhang, Qian Yu

Figure 1 for GenesisTex: Adapting Image Denoising Diffusion to Texture Space

Figure 2 for GenesisTex: Adapting Image Denoising Diffusion to Texture Space

Figure 3 for GenesisTex: Adapting Image Denoising Diffusion to Texture Space

Figure 4 for GenesisTex: Adapting Image Denoising Diffusion to Texture Space

Abstract:We present GenesisTex, a novel method for synthesizing textures for 3D geometries from text descriptions. GenesisTex adapts the pretrained image diffusion model to texture space by texture space sampling. Specifically, we maintain a latent texture map for each viewpoint, which is updated with predicted noise on the rendering of the corresponding viewpoint. The sampled latent texture maps are then decoded into a final texture map. During the sampling process, we focus on both global and local consistency across multiple viewpoints: global consistency is achieved through the integration of style consistency mechanisms within the noise prediction network, and low-level consistency is achieved by dynamically aligning latent textures. Finally, we apply reference-based inpainting and img2img on denser views for texture refinement. Our approach overcomes the limitations of slow optimization in distillation-based methods and instability in inpainting-based methods. Experiments on meshes from various sources demonstrate that our method surpasses the baseline methods quantitatively and qualitatively.

* 12 pages, 10 figures

Via

Access Paper or Ask Questions

Concatenate, Fine-tuning, Re-training: A SAM-enabled Framework for Semi-supervised 3D Medical Image Segmentation

Mar 17, 2024

Shumeng Li, Lei Qi, Qian Yu, Jing Huo, Yinghuan Shi, Yang Gao

Abstract:Segment Anything Model (SAM) fine-tuning has shown remarkable performance in medical image segmentation in a fully supervised manner, but requires precise annotations. To reduce the annotation cost and maintain satisfactory performance, in this work, we leverage the capabilities of SAM for establishing semi-supervised medical image segmentation models. Rethinking the requirements of effectiveness, efficiency, and compatibility, we propose a three-stage framework, i.e., Concatenate, Fine-tuning, and Re-training (CFR). The current fine-tuning approaches mostly involve 2D slice-wise fine-tuning that disregards the contextual information between adjacent slices. Our concatenation strategy mitigates the mismatch between natural and 3D medical images. The concatenated images are then used for fine-tuning SAM, providing robust initialization pseudo-labels. Afterwards, we train a 3D semi-supervised segmentation model while maintaining the same parameter size as the conventional segmenter such as V-Net. Our CFR framework is plug-and-play, and easily compatible with various popular semi-supervised methods. Extensive experiments validate that our CFR achieves significant improvements in both moderate annotation and scarce annotation across four datasets. In particular, CFR framework improves the Dice score of Mean Teacher from 29.68% to 74.40% with only one labeled data of LA dataset.

Via

Access Paper or Ask Questions

Unsupervised Cross-Domain Image Retrieval via Prototypical Optimal Transport

Feb 28, 2024

Bin Li, Ye Shi, Qian Yu, Jingya Wang

Figure 1 for Unsupervised Cross-Domain Image Retrieval via Prototypical Optimal Transport

Figure 2 for Unsupervised Cross-Domain Image Retrieval via Prototypical Optimal Transport

Figure 3 for Unsupervised Cross-Domain Image Retrieval via Prototypical Optimal Transport

Figure 4 for Unsupervised Cross-Domain Image Retrieval via Prototypical Optimal Transport

Abstract:Unsupervised cross-domain image retrieval (UCIR) aims to retrieve images sharing the same category across diverse domains without relying on labeled data. Prior approaches have typically decomposed the UCIR problem into two distinct tasks: intra-domain representation learning and cross-domain feature alignment. However, these segregated strategies overlook the potential synergies between these tasks. This paper introduces ProtoOT, a novel Optimal Transport formulation explicitly tailored for UCIR, which integrates intra-domain feature representation learning and cross-domain alignment into a unified framework. ProtoOT leverages the strengths of the K-means clustering method to effectively manage distribution imbalances inherent in UCIR. By utilizing K-means for generating initial prototypes and approximating class marginal distributions, we modify the constraints in Optimal Transport accordingly, significantly enhancing its performance in UCIR scenarios. Furthermore, we incorporate contrastive learning into the ProtoOT framework to further improve representation learning. This encourages local semantic consistency among features with similar semantics, while also explicitly enforcing separation between features and unmatched prototypes, thereby enhancing global discriminativeness. ProtoOT surpasses existing state-of-the-art methods by a notable margin across benchmark datasets. Notably, on DomainNet, ProtoOT achieves an average P@200 enhancement of 24.44%, and on Office-Home, it demonstrates a P@15 improvement of 12.12%. Code is available at https://github.com/HCVLAB/ProtoOT.

* Accepted by AAAI2024

Via

Access Paper or Ask Questions

Data-Free Generalized Zero-Shot Learning

Jan 28, 2024

Bowen Tang, Long Yan, Jing Zhang, Qian Yu, Lu Sheng, Dong Xu

Figure 1 for Data-Free Generalized Zero-Shot Learning

Figure 2 for Data-Free Generalized Zero-Shot Learning

Figure 3 for Data-Free Generalized Zero-Shot Learning

Figure 4 for Data-Free Generalized Zero-Shot Learning

Abstract:Deep learning models have the ability to extract rich knowledge from large-scale datasets. However, the sharing of data has become increasingly challenging due to concerns regarding data copyright and privacy. Consequently, this hampers the effective transfer of knowledge from existing data to novel downstream tasks and concepts. Zero-shot learning (ZSL) approaches aim to recognize new classes by transferring semantic knowledge learned from base classes. However, traditional generative ZSL methods often require access to real images from base classes and rely on manually annotated attributes, which presents challenges in terms of data restrictions and model scalability. To this end, this paper tackles a challenging and practical problem dubbed as data-free zero-shot learning (DFZSL), where only the CLIP-based base classes data pre-trained classifier is available for zero-shot classification. Specifically, we propose a generic framework for DFZSL, which consists of three main components. Firstly, to recover the virtual features of the base data, we model the CLIP features of base class images as samples from a von Mises-Fisher (vMF) distribution based on the pre-trained classifier. Secondly, we leverage the text features of CLIP as low-cost semantic information and propose a feature-language prompt tuning (FLPT) method to further align the virtual image features and textual features. Thirdly, we train a conditional generative model using the well-aligned virtual image features and corresponding semantic text features, enabling the generation of new classes features and achieve better zero-shot generalization. Our framework has been evaluated on five commonly used benchmarks for generalized ZSL, as well as 11 benchmarks for the base-to-new ZSL. The results demonstrate the superiority and effectiveness of our approach. Our code is available in https://github.com/ylong4/DFZSL

* Accepted by AAAI24

Via

Access Paper or Ask Questions

SVGDreamer: Text Guided SVG Generation with Diffusion Model

Jan 03, 2024

Ximing Xing, Haitao Zhou, Chuang Wang, Jing Zhang, Dong Xu, Qian Yu

Figure 1 for SVGDreamer: Text Guided SVG Generation with Diffusion Model

Figure 2 for SVGDreamer: Text Guided SVG Generation with Diffusion Model

Figure 3 for SVGDreamer: Text Guided SVG Generation with Diffusion Model

Figure 4 for SVGDreamer: Text Guided SVG Generation with Diffusion Model

Abstract:Recently, text-guided scalable vector graphics (SVGs) synthesis has shown promise in domains such as iconography and sketch. However, existing text-to-SVG generation methods lack editability and struggle with visual quality and result diversity. To address these limitations, we propose a novel text-guided vector graphics synthesis method called SVGDreamer. SVGDreamer incorporates a semantic-driven image vectorization (SIVE) process that enables the decomposition of synthesis into foreground objects and background, thereby enhancing editability. Specifically, the SIVE process introduce attention-based primitive control and an attention-mask loss function for effective control and manipulation of individual elements. Additionally, we propose a Vectorized Particle-based Score Distillation (VPSD) approach to tackle the challenges of color over-saturation, vector primitives over-smoothing, and limited result diversity in existing text-to-SVG generation methods. Furthermore, on the basis of VPSD, we introduce Reward Feedback Learning (ReFL) to accelerate VPSD convergence and improve aesthetic appeal. Extensive experiments have been conducted to validate the effectiveness of SVGDreamer, demonstrating its superiority over baseline methods in terms of editability, visual quality, and diversity. The code and demo of SVGDreamer can be found at \href{https://ximinng.github.io/SVGDreamer-project/}{https://ximinng.github.io/SVGDreamer-project/}.

* 19 pages, 15 figures, project link: https://ximinng.github.io/SVGDreamer-project/

Via

Access Paper or Ask Questions

Multi-modality Affinity Inference for Weakly Supervised 3D Semantic Segmentation

Dec 29, 2023

Xiawei Li, Qingyuan Xu, Jing Zhang, Tianyi Zhang, Qian Yu, Lu Sheng, Dong Xu

Abstract:3D point cloud semantic segmentation has a wide range of applications. Recently, weakly supervised point cloud segmentation methods have been proposed, aiming to alleviate the expensive and laborious manual annotation process by leveraging scene-level labels. However, these methods have not effectively exploited the rich geometric information (such as shape and scale) and appearance information (such as color and texture) present in RGB-D scans. Furthermore, current approaches fail to fully leverage the point affinity that can be inferred from the feature extraction network, which is crucial for learning from weak scene-level labels. Additionally, previous work overlooks the detrimental effects of the long-tailed distribution of point cloud data in weakly supervised 3D semantic segmentation. To this end, this paper proposes a simple yet effective scene-level weakly supervised point cloud segmentation method with a newly introduced multi-modality point affinity inference module. The point affinity proposed in this paper is characterized by features from multiple modalities (e.g., point cloud and RGB), and is further refined by normalizing the classifier weights to alleviate the detrimental effects of long-tailed distribution without the need of the prior of category distribution. Extensive experiments on the ScanNet and S3DIS benchmarks verify the effectiveness of our proposed method, which outperforms the state-of-the-art by ~4% to ~6% mIoU. Codes are released at https://github.com/Sunny599/AAAI24-3DWSSG-MMA.

Via

Access Paper or Ask Questions

Data Contamination Issues in Brain-to-Text Decoding

Dec 26, 2023

Congchi Yin, Qian Yu, Zhiwei Fang, Jie He, Changping Peng, Zhangang Lin, Jingping Shao, Piji Li

Figure 1 for Data Contamination Issues in Brain-to-Text Decoding

Figure 2 for Data Contamination Issues in Brain-to-Text Decoding

Figure 3 for Data Contamination Issues in Brain-to-Text Decoding

Figure 4 for Data Contamination Issues in Brain-to-Text Decoding

Abstract:Decoding non-invasive cognitive signals to natural language has long been the goal of building practical brain-computer interfaces (BCIs). Recent major milestones have successfully decoded cognitive signals like functional Magnetic Resonance Imaging (fMRI) and electroencephalogram (EEG) into text under open vocabulary setting. However, how to split the datasets for training, validating, and testing in cognitive signal decoding task still remains controversial. In this paper, we conduct systematic analysis on current dataset splitting methods and find the existence of data contamination largely exaggerates model performance. Specifically, first we find the leakage of test subjects' cognitive signals corrupts the training of a robust encoder. Second, we prove the leakage of text stimuli causes the auto-regressive decoder to memorize information in test set. The decoder generates highly accurate text not because it truly understands cognitive signals. To eliminate the influence of data contamination and fairly evaluate different models' generalization ability, we propose a new splitting method for different types of cognitive datasets (e.g. fMRI, EEG). We also test the performance of SOTA Brain-to-Text decoding models under the proposed dataset splitting paradigm as baselines for further research.

* 12 pages, 4 figures

Via

Access Paper or Ask Questions