Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Qian Yu

Marketing and Commercialization Center, JD.com

SVGFusion: Scalable Text-to-SVG Generation via Vector Space Diffusion

Dec 11, 2024

Ximing Xing, Juncheng Hu, Jing Zhang, Dong Xu, Qian Yu

Abstract:The generation of Scalable Vector Graphics (SVG) assets from textual data remains a significant challenge, largely due to the scarcity of high-quality vector datasets and the limitations in scalable vector representations required for modeling intricate graphic distributions. This work introduces SVGFusion, a Text-to-SVG model capable of scaling to real-world SVG data without reliance on a text-based discrete language model or prolonged SDS optimization. The essence of SVGFusion is to learn a continuous latent space for vector graphics with a popular Text-to-Image framework. Specifically, SVGFusion consists of two modules: a Vector-Pixel Fusion Variational Autoencoder (VP-VAE) and a Vector Space Diffusion Transformer (VS-DiT). VP-VAE takes both the SVGs and corresponding rasterizations as inputs and learns a continuous latent space, whereas VS-DiT learns to generate a latent code within this space based on the text prompt. Based on VP-VAE, a novel rendering sequence modeling strategy is proposed to enable the latent space to embed the knowledge of construction logics in SVGs. This empowers the model to achieve human-like design capabilities in vector graphics, while systematically preventing occlusion in complex graphic compositions. Moreover, our SVGFusion's ability can be continuously improved by leveraging the scalability of the VS-DiT by adding more VS-DiT blocks. A large-scale SVG dataset is collected to evaluate the effectiveness of our proposed method. Extensive experimentation has confirmed the superiority of our SVGFusion over existing SVG generation methods, achieving enhanced quality and generalizability, thereby establishing a novel framework for SVG content creation. Code, model, and data will be released at: \href{https://ximinng.github.io/SVGFusionProject/}{https://ximinng.github.io/SVGFusionProject/}

* project page: \href{https://ximinng.github.io/SVGFusionProject/}{https://ximinng.github.io/SVGFusionProject/}

Via

Access Paper or Ask Questions

SVGDreamer++: Advancing Editability and Diversity in Text-Guided SVG Generation

Nov 26, 2024

Ximing Xing, Qian Yu, Chuang Wang, Haitao Zhou, Jing Zhang, Dong Xu

Figure 1 for SVGDreamer++: Advancing Editability and Diversity in Text-Guided SVG Generation

Figure 2 for SVGDreamer++: Advancing Editability and Diversity in Text-Guided SVG Generation

Figure 3 for SVGDreamer++: Advancing Editability and Diversity in Text-Guided SVG Generation

Figure 4 for SVGDreamer++: Advancing Editability and Diversity in Text-Guided SVG Generation

Abstract:Recently, text-guided scalable vector graphics (SVG) synthesis has demonstrated significant potential in domains such as iconography and sketching. However, SVGs generated from existing Text-to-SVG methods often lack editability and exhibit deficiencies in visual quality and diversity. In this paper, we propose a novel text-guided vector graphics synthesis method to address these limitations. To improve the diversity of output SVGs, we present a Vectorized Particle-based Score Distillation (VPSD) approach. VPSD addresses over-saturation issues in existing methods and enhances sample diversity. A pre-trained reward model is incorporated to re-weight vector particles, improving aesthetic appeal and enabling faster convergence. Additionally, we design a novel adaptive vector primitives control strategy, which allows for the dynamic adjustment of the number of primitives, thereby enhancing the presentation of graphic details. Extensive experiments validate the effectiveness of the proposed method, demonstrating its superiority over baseline methods in terms of editability, visual quality, and diversity. We also show that our new method supports up to six distinct vector styles, capable of generating high-quality vector assets suitable for stylized vector design and poster design.

* 17 pages, 17 figures. arXiv admin note: substantial text overlap with arXiv:2312.16476

Via

Access Paper or Ask Questions

Dual-Teacher Ensemble Models with Double-Copy-Paste for 3D Semi-Supervised Medical Image Segmentation

Oct 15, 2024

Zhan Fa, Shumeng Li, Jian Zhang, Lei Qi, Qian Yu, Yinghuan Shi

Figure 1 for Dual-Teacher Ensemble Models with Double-Copy-Paste for 3D Semi-Supervised Medical Image Segmentation

Figure 2 for Dual-Teacher Ensemble Models with Double-Copy-Paste for 3D Semi-Supervised Medical Image Segmentation

Figure 3 for Dual-Teacher Ensemble Models with Double-Copy-Paste for 3D Semi-Supervised Medical Image Segmentation

Figure 4 for Dual-Teacher Ensemble Models with Double-Copy-Paste for 3D Semi-Supervised Medical Image Segmentation

Abstract:Semi-supervised learning (SSL) techniques address the high labeling costs in 3D medical image segmentation, with the teacher-student model being a common approach. However, using an exponential moving average (EMA) in single-teacher models may cause coupling issues, where the weights of the student and teacher models become similar, limiting the teacher's ability to provide additional knowledge for the student. Dual-teacher models were introduced to address this problem but often neglected the importance of maintaining teacher model diversity, leading to coupling issues among teachers. To address the coupling issue, we incorporate a double-copy-paste (DCP) technique to enhance the diversity among the teachers. Additionally, we introduce the Staged Selective Ensemble (SSE) module, which selects different ensemble methods based on the characteristics of the samples and enables more accurate segmentation of label boundaries, thereby improving the quality of pseudo-labels. Experimental results demonstrate the effectiveness of our proposed method in 3D medical image segmentation tasks. Here is the code link: https://github.com/Fazhan-cs/DCP.

* 35 pages, 5 figures

Via

Access Paper or Ask Questions

High-Precision Dichotomous Image Segmentation via Probing Diffusion Capacity

Oct 14, 2024

Qian Yu, Peng-Tao Jiang, Hao Zhang, Jinwei Chen, Bo Li, Lihe Zhang, Huchuan Lu

Figure 1 for High-Precision Dichotomous Image Segmentation via Probing Diffusion Capacity

Figure 2 for High-Precision Dichotomous Image Segmentation via Probing Diffusion Capacity

Figure 3 for High-Precision Dichotomous Image Segmentation via Probing Diffusion Capacity

Figure 4 for High-Precision Dichotomous Image Segmentation via Probing Diffusion Capacity

Abstract:In the realm of high-resolution (HR), fine-grained image segmentation, the primary challenge is balancing broad contextual awareness with the precision required for detailed object delineation, capturing intricate details and the finest edges of objects. Diffusion models, trained on vast datasets comprising billions of image-text pairs, such as SD V2.1, have revolutionized text-to-image synthesis by delivering exceptional quality, fine detail resolution, and strong contextual awareness, making them an attractive solution for high-resolution image segmentation. To this end, we propose DiffDIS, a diffusion-driven segmentation model that taps into the potential of the pre-trained U-Net within diffusion models, specifically designed for high-resolution, fine-grained object segmentation. By leveraging the robust generalization capabilities and rich, versatile image representation prior of the SD models, coupled with a task-specific stable one-step denoising approach, we significantly reduce the inference time while preserving high-fidelity, detailed generation. Additionally, we introduce an auxiliary edge generation task to not only enhance the preservation of fine details of the object boundaries, but reconcile the probabilistic nature of diffusion with the deterministic demands of segmentation. With these refined strategies in place, DiffDIS serves as a rapid object mask generation model, specifically optimized for generating detailed binary maps at high resolutions, while demonstrating impressive accuracy and swift processing. Experiments on the DIS5K dataset demonstrate the superiority of DiffDIS, achieving state-of-the-art results through a streamlined inference process. Our code will be made publicly available.

* 13 pages

Via

Access Paper or Ask Questions

Graph Neural Network Framework for Sentiment Analysis Using Syntactic Feature

Sep 21, 2024

Linxiao Wu, Yuanshuai Luo, Binrong Zhu, Guiran Liu, Rui Wang, Qian Yu

Figure 1 for Graph Neural Network Framework for Sentiment Analysis Using Syntactic Feature

Figure 2 for Graph Neural Network Framework for Sentiment Analysis Using Syntactic Feature

Figure 3 for Graph Neural Network Framework for Sentiment Analysis Using Syntactic Feature

Figure 4 for Graph Neural Network Framework for Sentiment Analysis Using Syntactic Feature

Abstract:Amidst the swift evolution of social media platforms and e-commerce ecosystems, the domain of opinion mining has surged as a pivotal area of exploration within natural language processing. A specialized segment within this field focuses on extracting nuanced evaluations tied to particular elements within textual contexts. This research advances a composite framework that amalgamates the positional cues of topical descriptors. The proposed system converts syntactic structures into a matrix format, leveraging convolutions and attention mechanisms within a graph to distill salient characteristics. Incorporating the positional relevance of descriptors relative to lexical items enhances the sequential integrity of the input. Trials have substantiated that this integrated graph-centric scheme markedly elevates the efficacy of evaluative categorization, showcasing preeminence.

Via

Access Paper or Ask Questions

Contrastive Learning for Knowledge-Based Question Generation in Large Language Models

Sep 21, 2024

Zhenhong Zhang, Jiajing Chen, Weiyan Shi, Lingjie Yi, Chihang Wang, Qian Yu

Abstract:With the rapid development of artificial intelligence technology, especially the increasingly widespread application of question-and-answer systems, high-quality question generation has become a key component in supporting the development of these systems. This article focuses on knowledge-based question generation technology, which aims to enable computers to simulate the human questioning process based on understanding specific texts or knowledge bases. In light of the issues of hallucination and knowledge gaps present in large-scale language models when applied to knowledge-intensive tasks, this paper proposes an enhanced question generation method that incorporates contrastive learning. This method utilizes multiple models to jointly mine domain knowledge and uses contrastive learning to guide the model in reducing noise and hallucinations in generation. Experimental results show that by designing prompts containing contrasting examples, the model's performance in question generation improves considerably, particularly when contrasting instructions and examples are used simultaneously, leading to the highest quality of generated questions and improved accuracy. These results demonstrate that the method proposed in this study, which combines contrasting context and chain-of-thought prompts, can effectively improve both the quality and the practicality of question generation.

* 5 pages, 2 figures

Via

Access Paper or Ask Questions

Unlocking Attributes' Contribution to Successful Camouflage: A Combined Textual and VisualAnalysis Strategy

Aug 22, 2024

Hong Zhang, Yixuan Lyu, Qian Yu, Hanyang Liu, Huimin Ma, Ding Yuan, Yifan Yang

Figure 1 for Unlocking Attributes' Contribution to Successful Camouflage: A Combined Textual and VisualAnalysis Strategy

Figure 2 for Unlocking Attributes' Contribution to Successful Camouflage: A Combined Textual and VisualAnalysis Strategy

Figure 3 for Unlocking Attributes' Contribution to Successful Camouflage: A Combined Textual and VisualAnalysis Strategy

Figure 4 for Unlocking Attributes' Contribution to Successful Camouflage: A Combined Textual and VisualAnalysis Strategy

Abstract:In the domain of Camouflaged Object Segmentation (COS), despite continuous improvements in segmentation performance, the underlying mechanisms of effective camouflage remain poorly understood, akin to a black box. To address this gap, we present the first comprehensive study to examine the impact of camouflage attributes on the effectiveness of camouflage patterns, offering a quantitative framework for the evaluation of camouflage designs. To support this analysis, we have compiled the first dataset comprising descriptions of camouflaged objects and their attribute contributions, termed COD-Text And X-attributions (COD-TAX). Moreover, drawing inspiration from the hierarchical process by which humans process information: from high-level textual descriptions of overarching scenarios, through mid-level summaries of local areas, to low-level pixel data for detailed analysis. We have developed a robust framework that combines textual and visual information for the task of COS, named Attribution CUe Modeling with Eye-fixation Network (ACUMEN). ACUMEN demonstrates superior performance, outperforming nine leading methods across three widely-used datasets. We conclude by highlighting key insights derived from the attributes identified in our study. Code: https://github.com/lyu-yx/ACUMEN.

* Accepted by ECCV 2024

Via

Access Paper or Ask Questions

TrackGo: A Flexible and Efficient Method for Controllable Video Generation

Aug 21, 2024

Haitao Zhou, Chuang Wang, Rui Nie, Jinxiao Lin, Dongdong Yu, Qian Yu, Changhu Wang

Figure 1 for TrackGo: A Flexible and Efficient Method for Controllable Video Generation

Figure 2 for TrackGo: A Flexible and Efficient Method for Controllable Video Generation

Figure 3 for TrackGo: A Flexible and Efficient Method for Controllable Video Generation

Figure 4 for TrackGo: A Flexible and Efficient Method for Controllable Video Generation

Abstract:Recent years have seen substantial progress in diffusion-based controllable video generation. However, achieving precise control in complex scenarios, including fine-grained object parts, sophisticated motion trajectories, and coherent background movement, remains a challenge. In this paper, we introduce TrackGo, a novel approach that leverages free-form masks and arrows for conditional video generation. This method offers users with a flexible and precise mechanism for manipulating video content. We also propose the TrackAdapter for control implementation, an efficient and lightweight adapter designed to be seamlessly integrated into the temporal self-attention layers of a pretrained video generation model. This design leverages our observation that the attention map of these layers can accurately activate regions corresponding to motion in videos. Our experimental results demonstrate that our new approach, enhanced by the TrackAdapter, achieves state-of-the-art performance on key metrics such as FVD, FID, and ObjMC scores. The project page of TrackGo can be found at: https://zhtjtcz.github.io/TrackGo-Page/

Via

Access Paper or Ask Questions

AbdomenAtlas: A Large-Scale, Detailed-Annotated, & Multi-Center Dataset for Efficient Transfer Learning and Open Algorithmic Benchmarking

Jul 23, 2024

Wenxuan Li, Chongyu Qu, Xiaoxi Chen, Pedro R. A. S. Bassi, Yijia Shi, Yuxiang Lai, Qian Yu, Huimin Xue, Yixiong Chen, Xiaorui Lin(+11 more)

Figure 1 for AbdomenAtlas: A Large-Scale, Detailed-Annotated, & Multi-Center Dataset for Efficient Transfer Learning and Open Algorithmic Benchmarking

Figure 2 for AbdomenAtlas: A Large-Scale, Detailed-Annotated, & Multi-Center Dataset for Efficient Transfer Learning and Open Algorithmic Benchmarking

Figure 3 for AbdomenAtlas: A Large-Scale, Detailed-Annotated, & Multi-Center Dataset for Efficient Transfer Learning and Open Algorithmic Benchmarking

Figure 4 for AbdomenAtlas: A Large-Scale, Detailed-Annotated, & Multi-Center Dataset for Efficient Transfer Learning and Open Algorithmic Benchmarking

Abstract:We introduce the largest abdominal CT dataset (termed AbdomenAtlas) of 20,460 three-dimensional CT volumes sourced from 112 hospitals across diverse populations, geographies, and facilities. AbdomenAtlas provides 673K high-quality masks of anatomical structures in the abdominal region annotated by a team of 10 radiologists with the help of AI algorithms. We start by having expert radiologists manually annotate 22 anatomical structures in 5,246 CT volumes. Following this, a semi-automatic annotation procedure is performed for the remaining CT volumes, where radiologists revise the annotations predicted by AI, and in turn, AI improves its predictions by learning from revised annotations. Such a large-scale, detailed-annotated, and multi-center dataset is needed for two reasons. Firstly, AbdomenAtlas provides important resources for AI development at scale, branded as large pre-trained models, which can alleviate the annotation workload of expert radiologists to transfer to broader clinical applications. Secondly, AbdomenAtlas establishes a large-scale benchmark for evaluating AI algorithms -- the more data we use to test the algorithms, the better we can guarantee reliable performance in complex clinical scenarios. An ISBI & MICCAI challenge named BodyMaps: Towards 3D Atlas of Human Body was launched using a subset of our AbdomenAtlas, aiming to stimulate AI innovation and to benchmark segmentation accuracy, inference efficiency, and domain generalizability. We hope our AbdomenAtlas can set the stage for larger-scale clinical trials and offer exceptional opportunities to practitioners in the medical imaging community. Codes, models, and datasets are available at https://www.zongweiz.com/dataset

* Published in Medical Image Analysis

Via

Access Paper or Ask Questions

The Devil is in the Statistics: Mitigating and Exploiting Statistics Difference for Generalizable Semi-supervised Medical Image Segmentation

Jul 16, 2024

Muyang Qiu, Jian Zhang, Lei Qi, Qian Yu, Yinghuan Shi, Yang Gao

Figure 1 for The Devil is in the Statistics: Mitigating and Exploiting Statistics Difference for Generalizable Semi-supervised Medical Image Segmentation

Figure 2 for The Devil is in the Statistics: Mitigating and Exploiting Statistics Difference for Generalizable Semi-supervised Medical Image Segmentation

Figure 3 for The Devil is in the Statistics: Mitigating and Exploiting Statistics Difference for Generalizable Semi-supervised Medical Image Segmentation

Figure 4 for The Devil is in the Statistics: Mitigating and Exploiting Statistics Difference for Generalizable Semi-supervised Medical Image Segmentation

Abstract:Despite the recent success of domain generalization in medical image segmentation, voxel-wise annotation for all source domains remains a huge burden. Semi-supervised domain generalization has been proposed very recently to combat this challenge by leveraging limited labeled data along with abundant unlabeled data collected from multiple medical institutions, depending on precisely harnessing unlabeled data while improving generalization simultaneously. In this work, we observe that domain shifts between medical institutions cause disparate feature statistics, which significantly deteriorates pseudo-label quality due to an unexpected normalization process. Nevertheless, this phenomenon could be exploited to facilitate unseen domain generalization. Therefore, we propose 1) multiple statistics-individual branches to mitigate the impact of domain shifts for reliable pseudo-labels and 2) one statistics-aggregated branch for domain-invariant feature learning. Furthermore, to simulate unseen domains with statistics difference, we approach this from two aspects, i.e., a perturbation with histogram matching at image level and a random batch normalization selection strategy at feature level, producing diverse statistics to expand the training distribution. Evaluation results on three medical image datasets demonstrate the effectiveness of our method compared with recent SOTA methods. The code is available at https://github.com/qiumuyang/SIAB.

Via

Access Paper or Ask Questions