Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Yiming Chen

Geo-UNet: A Geometrically Constrained Neural Framework for Clinical-Grade Lumen Segmentation in Intravascular Ultrasound

Aug 09, 2024

Yiming Chen, Niharika S. D'Souza, Akshith Mandepally, Patrick Henninger, Satyananda Kashyap, Neerav Karani, Neel Dey, Marcos Zachary, Raed Rizq, Paul Chouinard(+2 more)

Figure 1 for Geo-UNet: A Geometrically Constrained Neural Framework for Clinical-Grade Lumen Segmentation in Intravascular Ultrasound

Figure 2 for Geo-UNet: A Geometrically Constrained Neural Framework for Clinical-Grade Lumen Segmentation in Intravascular Ultrasound

Figure 3 for Geo-UNet: A Geometrically Constrained Neural Framework for Clinical-Grade Lumen Segmentation in Intravascular Ultrasound

Figure 4 for Geo-UNet: A Geometrically Constrained Neural Framework for Clinical-Grade Lumen Segmentation in Intravascular Ultrasound

Abstract:Precisely estimating lumen boundaries in intravascular ultrasound (IVUS) is needed for sizing interventional stents to treat deep vein thrombosis (DVT). Unfortunately, current segmentation networks like the UNet lack the precision needed for clinical adoption in IVUS workflows. This arises due to the difficulty of automatically learning accurate lumen contour from limited training data while accounting for the radial geometry of IVUS imaging. We propose the Geo-UNet framework to address these issues via a design informed by the geometry of the lumen contour segmentation task. We first convert the input data and segmentation targets from Cartesian to polar coordinates. Starting from a convUNet feature extractor, we propose a two-task setup, one for conventional pixel-wise labeling and the other for single boundary lumen-contour localization. We directly combine the two predictions by passing the predicted lumen contour through a new activation (named CDFeLU) to filter out spurious pixel-wise predictions. Our unified loss function carefully balances area-based, distance-based, and contour-based penalties to provide near clinical-grade generalization in unseen patient data. We also introduce a lightweight, inference-time technique to enhance segmentation smoothness. The efficacy of our framework on a venous IVUS dataset is shown against state-of-the-art models.

* Accepted into the 15th workshop on Machine Learning in Medical Imaging at MICCAI 2024. (* indicates equal contribution)

Via

Access Paper or Ask Questions

Multi-Time Scale Service Caching and Pricing in MEC Systems with Dynamic Program Popularity

Jul 04, 2024

Yiming Chen, Xingyuan Hu, Bo Gu, Shimin Gong, Zhou Su

Figure 1 for Multi-Time Scale Service Caching and Pricing in MEC Systems with Dynamic Program Popularity

Figure 2 for Multi-Time Scale Service Caching and Pricing in MEC Systems with Dynamic Program Popularity

Figure 3 for Multi-Time Scale Service Caching and Pricing in MEC Systems with Dynamic Program Popularity

Figure 4 for Multi-Time Scale Service Caching and Pricing in MEC Systems with Dynamic Program Popularity

Abstract:In mobile edge computing systems, base stations (BSs) equipped with edge servers can provide computing services to users to reduce their task execution time. However, there is always a conflict of interest between the BS and users. The BS prices the service programs based on user demand to maximize its own profit, while the users determine their offloading strategies based on the prices to minimize their costs. Moreover, service programs need to be pre-cached to meet immediate computing needs. Due to the limited caching capacity and variations in service program popularity, the BS must dynamically select which service programs to cache. Since service caching and pricing have different needs for adjustment time granularities, we propose a two-time scale framework to jointly optimize service caching, pricing and task offloading. For the large time scale, we propose a game-nested deep reinforcement learning algorithm to dynamically adjust service caching according to the estimated popularity information. For the small time scale, by modeling the interaction between the BS and users as a two-stage game, we prove the existence of the equilibrium under incomplete information and then derive the optimal pricing and offloading strategies. Extensive simulations based on a real-world dataset demonstrate the efficiency of the proposed approach.

Via

Access Paper or Ask Questions

TTSlow: Slow Down Text-to-Speech with Efficiency Robustness Evaluations

Jul 02, 2024

Xiaoxue Gao, Yiming Chen, Xianghu Yue, Yu Tsao, Nancy F. Chen

Figure 1 for TTSlow: Slow Down Text-to-Speech with Efficiency Robustness Evaluations

Figure 2 for TTSlow: Slow Down Text-to-Speech with Efficiency Robustness Evaluations

Figure 3 for TTSlow: Slow Down Text-to-Speech with Efficiency Robustness Evaluations

Figure 4 for TTSlow: Slow Down Text-to-Speech with Efficiency Robustness Evaluations

Abstract:Text-to-speech (TTS) has been extensively studied for generating high-quality speech with textual inputs, playing a crucial role in various real-time applications. For real-world deployment, ensuring stable and timely generation in TTS models against minor input perturbations is of paramount importance. Therefore, evaluating the robustness of TTS models against such perturbations, commonly known as adversarial attacks, is highly desirable. In this paper, we propose TTSlow, a novel adversarial approach specifically tailored to slow down the speech generation process in TTS systems. To induce long TTS waiting time, we design novel efficiency-oriented adversarial loss to encourage endless generation process. TTSlow encompasses two attack strategies targeting both text inputs and speaker embedding. Specifically, we propose TTSlow-text, which utilizes a combination of homoglyphs-based and swap-based perturbations, along with TTSlow-spk, which employs a gradient optimization attack approach for speaker embedding. TTSlow serves as the first attack approach targeting a wide range of TTS models, including autoregressive and non-autoregressive TTS ones, thereby advancing exploration in audio security. Extensive experiments are conducted to evaluate the inference efficiency of TTS models, and in-depth analysis of generated speech intelligibility is performed using Gemini. The results demonstrate that TTSlow can effectively slow down two TTS models across three publicly available datasets. We are committed to releasing the source code upon acceptance, facilitating further research and benchmarking in this domain.

* This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

Via

Access Paper or Ask Questions

Research on Information Extraction of LCSTS Dataset Based on an Improved BERTSum-LSTM Model

Jun 26, 2024

Yiming Chen, Haobin Chen, Simin Liu, Yunyun Liu, Fanhao Zhou, Bing Wei

Figure 1 for Research on Information Extraction of LCSTS Dataset Based on an Improved BERTSum-LSTM Model

Figure 2 for Research on Information Extraction of LCSTS Dataset Based on an Improved BERTSum-LSTM Model

Figure 3 for Research on Information Extraction of LCSTS Dataset Based on an Improved BERTSum-LSTM Model

Figure 4 for Research on Information Extraction of LCSTS Dataset Based on an Improved BERTSum-LSTM Model

Abstract:With the continuous advancement of artificial intelligence, natural language processing technology has become widely utilized in various fields. At the same time, there are many challenges in creating Chinese news summaries. First of all, the semantics of Chinese news is complex, and the amount of information is enormous. Extracting critical information from Chinese news presents a significant challenge. Second, the news summary should be concise and clear, focusing on the main content and avoiding redundancy. In addition, the particularity of the Chinese language, such as polysemy, word segmentation, etc., makes it challenging to generate Chinese news summaries. Based on the above, this paper studies the information extraction method of the LCSTS dataset based on an improved BERTSum-LSTM model. We improve the BERTSum-LSTM model to make it perform better in generating Chinese news summaries. The experimental results show that the proposed method has a good effect on creating news summaries, which is of great importance to the construction of news summaries.

* submitted to ICMIII 2024

Via

Access Paper or Ask Questions

Unveiling the Achilles' Heel of NLG Evaluators: A Unified Adversarial Framework Driven by Large Language Models

May 23, 2024

Yiming Chen, Chen Zhang, Danqing Luo, Luis Fernando D'Haro, Robby T. Tan, Haizhou Li

Figure 1 for Unveiling the Achilles' Heel of NLG Evaluators: A Unified Adversarial Framework Driven by Large Language Models

Figure 2 for Unveiling the Achilles' Heel of NLG Evaluators: A Unified Adversarial Framework Driven by Large Language Models

Figure 3 for Unveiling the Achilles' Heel of NLG Evaluators: A Unified Adversarial Framework Driven by Large Language Models

Figure 4 for Unveiling the Achilles' Heel of NLG Evaluators: A Unified Adversarial Framework Driven by Large Language Models

Abstract:The automatic evaluation of natural language generation (NLG) systems presents a long-lasting challenge. Recent studies have highlighted various neural metrics that align well with human evaluations. Yet, the robustness of these evaluators against adversarial perturbations remains largely under-explored due to the unique challenges in obtaining adversarial data for different NLG evaluation tasks. To address the problem, we introduce AdvEval, a novel black-box adversarial framework against NLG evaluators. AdvEval is specially tailored to generate data that yield strong disagreements between human and victim evaluators. Specifically, inspired by the recent success of large language models (LLMs) in text generation and evaluation, we adopt strong LLMs as both the data generator and gold evaluator. Adversarial data are automatically optimized with feedback from the gold and victim evaluator. We conduct experiments on 12 victim evaluators and 11 NLG datasets, spanning tasks including dialogue, summarization, and question evaluation. The results show that AdvEval can lead to significant performance degradation of various victim metrics, thereby validating its efficacy.

* ACL24 Finding

Via

Access Paper or Ask Questions

Controllable Text-to-3D Generation via Surface-Aligned Gaussian Splatting

Mar 19, 2024

Zhiqi Li, Yiming Chen, Lingzhe Zhao, Peidong Liu

Abstract:While text-to-3D and image-to-3D generation tasks have received considerable attention, one important but under-explored field between them is controllable text-to-3D generation, which we mainly focus on in this work. To address this task, 1) we introduce Multi-view ControlNet (MVControl), a novel neural network architecture designed to enhance existing pre-trained multi-view diffusion models by integrating additional input conditions, such as edge, depth, normal, and scribble maps. Our innovation lies in the introduction of a conditioning module that controls the base diffusion model using both local and global embeddings, which are computed from the input condition images and camera poses. Once trained, MVControl is able to offer 3D diffusion guidance for optimization-based 3D generation. And, 2) we propose an efficient multi-stage 3D generation pipeline that leverages the benefits of recent large reconstruction models and score distillation algorithm. Building upon our MVControl architecture, we employ a unique hybrid diffusion guidance method to direct the optimization process. In pursuit of efficiency, we adopt 3D Gaussians as our representation instead of the commonly used implicit representations. We also pioneer the use of SuGaR, a hybrid representation that binds Gaussians to mesh triangle faces. This approach alleviates the issue of poor geometry in 3D Gaussians and enables the direct sculpting of fine-grained geometry on the mesh. Extensive experiments demonstrate that our method achieves robust generalization and enables the controllable generation of high-quality 3D content.

* Project page: https://lizhiqi49.github.io/MVControl/

Via

Access Paper or Ask Questions

A Comprehensive Analysis of the Effectiveness of Large Language Models as Automatic Dialogue Evaluators

Dec 24, 2023

Chen Zhang, Luis Fernando D'Haro, Yiming Chen, Malu Zhang, Haizhou Li

Abstract:Automatic evaluation is an integral aspect of dialogue system research. The traditional reference-based NLG metrics are generally found to be unsuitable for dialogue assessment. Consequently, recent studies have suggested various unique, reference-free neural metrics that better align with human evaluations. Notably among them, large language models (LLMs), particularly the instruction-tuned variants like ChatGPT, are shown to be promising substitutes for human judges. Yet, existing works on utilizing LLMs for automatic dialogue evaluation are limited in their scope in terms of the number of meta-evaluation datasets, mode of evaluation, coverage of LLMs, etc. Hence, it remains inconclusive how effective these LLMs are. To this end, we conduct a comprehensive study on the application of LLMs for automatic dialogue evaluation. Specifically, we analyze the multi-dimensional evaluation capability of 30 recently emerged LLMs at both turn and dialogue levels, using a comprehensive set of 12 meta-evaluation datasets. Additionally, we probe the robustness of the LLMs in handling various adversarial perturbations at both turn and dialogue levels. Finally, we explore how model-level and dimension-level ensembles impact the evaluation performance. All resources are available at https://github.com/e0397123/comp-analysis.

* Accepted to AAAI-2024, appendix included, 15 pages

Via

Access Paper or Ask Questions

Progressive Poisoned Data Isolation for Training-time Backdoor Defense

Dec 20, 2023

Yiming Chen, Haiwei Wu, Jiantao Zhou

Figure 1 for Progressive Poisoned Data Isolation for Training-time Backdoor Defense

Figure 2 for Progressive Poisoned Data Isolation for Training-time Backdoor Defense

Figure 3 for Progressive Poisoned Data Isolation for Training-time Backdoor Defense

Figure 4 for Progressive Poisoned Data Isolation for Training-time Backdoor Defense

Abstract:Deep Neural Networks (DNN) are susceptible to backdoor attacks where malicious attackers manipulate the model's predictions via data poisoning. It is hence imperative to develop a strategy for training a clean model using a potentially poisoned dataset. Previous training-time defense mechanisms typically employ an one-time isolation process, often leading to suboptimal isolation outcomes. In this study, we present a novel and efficacious defense method, termed Progressive Isolation of Poisoned Data (PIPD), that progressively isolates poisoned data to enhance the isolation accuracy and mitigate the risk of benign samples being misclassified as poisoned ones. Once the poisoned portion of the dataset has been identified, we introduce a selective training process to train a clean model. Through the implementation of these techniques, we ensure that the trained model manifests a significantly diminished attack success rate against the poisoned data. Extensive experiments on multiple benchmark datasets and DNN models, assessed against nine state-of-the-art backdoor attacks, demonstrate the superior performance of our PIPD method for backdoor defense. For instance, our PIPD achieves an average True Positive Rate (TPR) of 99.95% and an average False Positive Rate (FPR) of 0.06% for diverse attacks over CIFAR-10 dataset, markedly surpassing the performance of state-of-the-art methods.

* Accepted to AAAI2024

Via

Access Paper or Ask Questions

ET3D: Efficient Text-to-3D Generation via Multi-View Distillation

Nov 27, 2023

Yiming Chen, Zhiqi Li, Peidong Liu

Figure 1 for ET3D: Efficient Text-to-3D Generation via Multi-View Distillation

Figure 2 for ET3D: Efficient Text-to-3D Generation via Multi-View Distillation

Figure 3 for ET3D: Efficient Text-to-3D Generation via Multi-View Distillation

Figure 4 for ET3D: Efficient Text-to-3D Generation via Multi-View Distillation

Abstract:Recent breakthroughs in text-to-image generation has shown encouraging results via large generative models. Due to the scarcity of 3D assets, it is hardly to transfer the success of text-to-image generation to that of text-to-3D generation. Existing text-to-3D generation methods usually adopt the paradigm of DreamFusion, which conducts per-asset optimization by distilling a pretrained text-to-image diffusion model. The generation speed usually ranges from several minutes to tens of minutes per 3D asset, which degrades the user experience and also imposes a burden to the service providers due to the high computational budget. In this work, we present an efficient text-to-3D generation method, which requires only around 8 $ms$ to generate a 3D asset given the text prompt on a consumer graphic card. The main insight is that we exploit the images generated by a large pre-trained text-to-image diffusion model, to supervise the training of a text conditioned 3D generative adversarial network. Once the network is trained, we are able to efficiently generate a 3D asset via a single forward pass. Our method requires no 3D training data and provides an alternative approach for efficient text-to-3D generation by distilling pre-trained image diffusion models.

Via

Access Paper or Ask Questions

MVControl: Adding Conditional Control to Multi-view Diffusion for Controllable Text-to-3D Generation

Nov 27, 2023

Zhiqi Li, Yiming Chen, Lingzhe Zhao, Peidong Liu

Abstract:We introduce MVControl, a novel neural network architecture that enhances existing pre-trained multi-view 2D diffusion models by incorporating additional input conditions, e.g. edge maps. Our approach enables the generation of controllable multi-view images and view-consistent 3D content. To achieve controllable multi-view image generation, we leverage MVDream as our base model, and train a new neural network module as additional plugin for end-to-end task-specific condition learning. To precisely control the shapes and views of generated images, we innovatively propose a new conditioning mechanism that predicts an embedding encapsulating the input spatial and view conditions, which is then injected to the network globally. Once MVControl is trained, score-distillation (SDS) loss based optimization can be performed to generate 3D content, in which process we propose to use a hybrid diffusion prior. The hybrid prior relies on a pre-trained Stable-Diffusion network and our trained MVControl for additional guidance. Extensive experiments demonstrate that our method achieves robust generalization and enables the controllable generation of high-quality 3D content. Code available at https://github.com/WU-CVGL/MVControl/.

* Project page: https://lizhiqi49.github.io/MVControl/

Via

Access Paper or Ask Questions