Alert button
Picture for Dong Guo

Dong Guo

Alert button

InstructME: An Instruction Guided Music Edit And Remix Framework with Latent Diffusion Models

Sep 06, 2023
Bing Han, Junyu Dai, Xuchen Song, Weituo Hao, Xinyan He, Dong Guo, Jitong Chen, Yuxuan Wang, Yanmin Qian

Figure 1 for InstructME: An Instruction Guided Music Edit And Remix Framework with Latent Diffusion Models
Figure 2 for InstructME: An Instruction Guided Music Edit And Remix Framework with Latent Diffusion Models
Figure 3 for InstructME: An Instruction Guided Music Edit And Remix Framework with Latent Diffusion Models
Figure 4 for InstructME: An Instruction Guided Music Edit And Remix Framework with Latent Diffusion Models

Music editing primarily entails the modification of instrument tracks or remixing in the whole, which offers a novel reinterpretation of the original piece through a series of operations. These music processing methods hold immense potential across various applications but demand substantial expertise. Prior methodologies, although effective for image and audio modifications, falter when directly applied to music. This is attributed to music's distinctive data nature, where such methods can inadvertently compromise the intrinsic harmony and coherence of music. In this paper, we develop InstructME, an Instruction guided Music Editing and remixing framework based on latent diffusion models. Our framework fortifies the U-Net with multi-scale aggregation in order to maintain consistency before and after editing. In addition, we introduce chord progression matrix as condition information and incorporate it in the semantic space to improve melodic harmony while editing. For accommodating extended musical pieces, InstructME employs a chunk transformer, enabling it to discern long-term temporal dependencies within music sequences. We tested InstructME in instrument-editing, remixing, and multi-round editing. Both subjective and objective evaluations indicate that our proposed method significantly surpasses preceding systems in music quality, text relevance and harmony. Demo samples are available at https://musicedit.github.io/

* Demo samples are available at https://musicedit.github.io/ 
Viaarxiv icon

2020 CATARACTS Semantic Segmentation Challenge

Oct 21, 2021
Imanol Luengo, Maria Grammatikopoulou, Rahim Mohammadi, Chris Walsh, Chinedu Innocent Nwoye, Deepak Alapatt, Nicolas Padoy, Zhen-Liang Ni, Chen-Chen Fan, Gui-Bin Bian, Zeng-Guang Hou, Heonjin Ha, Jiacheng Wang, Haojie Wang, Dong Guo, Lu Wang, Guotai Wang, Mobarakol Islam, Bharat Giddwani, Ren Hongliang, Theodoros Pissas, Claudio Ravasio Martin Huber, Jeremy Birch, Joan M. Nunez Do Rio, Lyndon da Cruz, Christos Bergeles, Hongyu Chen, Fucang Jia, Nikhil KumarTomar, Debesh Jha, Michael A. Riegler, Pal Halvorsen, Sophia Bano, Uddhav Vaghela, Jianyuan Hong, Haili Ye, Feihong Huang, Da-Han Wang, Danail Stoyanov

Figure 1 for 2020 CATARACTS Semantic Segmentation Challenge
Figure 2 for 2020 CATARACTS Semantic Segmentation Challenge
Figure 3 for 2020 CATARACTS Semantic Segmentation Challenge
Figure 4 for 2020 CATARACTS Semantic Segmentation Challenge

Surgical scene segmentation is essential for anatomy and instrument localization which can be further used to assess tissue-instrument interactions during a surgical procedure. In 2017, the Challenge on Automatic Tool Annotation for cataRACT Surgery (CATARACTS) released 50 cataract surgery videos accompanied by instrument usage annotations. These annotations included frame-level instrument presence information. In 2020, we released pixel-wise semantic annotations for anatomy and instruments for 4670 images sampled from 25 videos of the CATARACTS training set. The 2020 CATARACTS Semantic Segmentation Challenge, which was a sub-challenge of the 2020 MICCAI Endoscopic Vision (EndoVis) Challenge, presented three sub-tasks to assess participating solutions on anatomical structure and instrument segmentation. Their performance was assessed on a hidden test set of 531 images from 10 videos of the CATARACTS test set.

Viaarxiv icon

Multi-Rate Nyquist-SCM for C-Band 100Gbit/s Signal over 50km Dispersion-Uncompensated Link

Jul 25, 2021
Haide Wang, Ji Zhou, Jinlong Wei, Dong Guo, Yuanhua Feng, Weiping Liu, Changyuan Yu, Dawei Wang, Zhaohui Li

Figure 1 for Multi-Rate Nyquist-SCM for C-Band 100Gbit/s Signal over 50km Dispersion-Uncompensated Link
Figure 2 for Multi-Rate Nyquist-SCM for C-Band 100Gbit/s Signal over 50km Dispersion-Uncompensated Link
Figure 3 for Multi-Rate Nyquist-SCM for C-Band 100Gbit/s Signal over 50km Dispersion-Uncompensated Link
Figure 4 for Multi-Rate Nyquist-SCM for C-Band 100Gbit/s Signal over 50km Dispersion-Uncompensated Link

In this paper, to the best of our knowledge, we propose the first multi-rate Nyquist-subcarriers modulation (SCM) for C-band 100Gbit/s signal transmission over 50km dispersion-uncompensated link. Chromatic dispersion (CD) introduces severe spectral nulls on optical double-sideband signal, which greatly degrades the performance of intensity-modulation and direct-detection systems. In the previous works, high-complexity digital signal processing (DSP) is required to resist the CD-caused spectral nulls. Based on the characteristics of dispersive channel, Nyquist-SCM with multi-rate subcarriers is proposed to keep away from the CD-caused spectral nulls flexibly. Signal on each subcarrier can be individually recovered by a DSP with an acceptable complexity, including the feed-forward equalizer with no more than 31 taps, a two-tap post filter, and maximum likelihood sequence estimation with one memory length. Combining with entropy loading based on probabilistic constellation shaping to maximize the capacity-reach, the C-band 100Gbit/s multi-rate Nyquist-SCM signal over 50km dispersion-uncompensated link can achieve 7% hard-decision forward error correction limit and average normalized generalized mutual information of 0.967. In conclusion, the multi-rate Nyquist-SCM shows great potentials in solving the CD-caused spectral distortions.

* Under review of Journal of Lightwave Techonlogy 
Viaarxiv icon

Annotation-Efficient Learning for Medical Image Segmentation based on Noisy Pseudo Labels and Adversarial Learning

Dec 29, 2020
Lu Wang, Dong Guo, Guotai Wang, Shaoting Zhang

Figure 1 for Annotation-Efficient Learning for Medical Image Segmentation based on Noisy Pseudo Labels and Adversarial Learning
Figure 2 for Annotation-Efficient Learning for Medical Image Segmentation based on Noisy Pseudo Labels and Adversarial Learning
Figure 3 for Annotation-Efficient Learning for Medical Image Segmentation based on Noisy Pseudo Labels and Adversarial Learning
Figure 4 for Annotation-Efficient Learning for Medical Image Segmentation based on Noisy Pseudo Labels and Adversarial Learning

Despite that deep learning has achieved state-of-the-art performance for medical image segmentation, its success relies on a large set of manually annotated images for training that are expensive to acquire. In this paper, we propose an annotation-efficient learning framework for segmentation tasks that avoids annotations of training images, where we use an improved Cycle-Consistent Generative Adversarial Network (GAN) to learn from a set of unpaired medical images and auxiliary masks obtained either from a shape model or public datasets. We first use the GAN to generate pseudo labels for our training images under the implicit high-level shape constraint represented by a Variational Auto-encoder (VAE)-based discriminator with the help of the auxiliary masks, and build a Discriminator-guided Generator Channel Calibration (DGCC) module which employs our discriminator's feedback to calibrate the generator for better pseudo labels. To learn from the pseudo labels that are noisy, we further introduce a noise-robust iterative learning method using noise-weighted Dice loss. We validated our framework with two situations: objects with a simple shape model like optic disc in fundus images and fetal head in ultrasound images, and complex structures like lung in X-Ray images and liver in CT images. Experimental results demonstrated that 1) Our VAE-based discriminator and DGCC module help to obtain high-quality pseudo labels. 2) Our proposed noise-robust learning method can effectively overcome the effect of noisy pseudo labels. 3) The segmentation performance of our method without using annotations of training images is close or even comparable to that of learning from human annotations.

* 13 pages, 15 figures 
Viaarxiv icon

Robust Medical Instrument Segmentation Challenge 2019

Mar 23, 2020
Tobias Ross, Annika Reinke, Peter M. Full, Martin Wagner, Hannes Kenngott, Martin Apitz, Hellena Hempe, Diana Mindroc Filimon, Patrick Scholz, Thuy Nuong Tran, Pierangela Bruno, Pablo Arbeláez, Gui-Bin Bian, Sebastian Bodenstedt, Jon Lindström Bolmgren, Laura Bravo-Sánchez, Hua-Bin Chen, Cristina González, Dong Guo, Pål Halvorsen, Pheng-Ann Heng, Enes Hosgor, Zeng-Guang Hou, Fabian Isensee, Debesh Jha, Tingting Jiang, Yueming Jin, Kadir Kirtac, Sabrina Kletz, Stefan Leger, Zhixuan Li, Klaus H. Maier-Hein, Zhen-Liang Ni, Michael A. Riegler, Klaus Schoeffmann, Ruohua Shi, Stefanie Speidel, Michael Stenzel, Isabell Twick, Gutai Wang, Jiacheng Wang, Liansheng Wang, Lu Wang, Yujie Zhang, Yan-Jie Zhou, Lei Zhu, Manuel Wiesenfarth, Annette Kopp-Schneider, Beat P. Müller-Stich, Lena Maier-Hein

Figure 1 for Robust Medical Instrument Segmentation Challenge 2019
Figure 2 for Robust Medical Instrument Segmentation Challenge 2019
Figure 3 for Robust Medical Instrument Segmentation Challenge 2019
Figure 4 for Robust Medical Instrument Segmentation Challenge 2019

Intraoperative tracking of laparoscopic instruments is often a prerequisite for computer and robotic-assisted interventions. While numerous methods for detecting, segmenting and tracking of medical instruments based on endoscopic video images have been proposed in the literature, key limitations remain to be addressed: Firstly, robustness, that is, the reliable performance of state-of-the-art methods when run on challenging images (e.g. in the presence of blood, smoke or motion artifacts). Secondly, generalization; algorithms trained for a specific intervention in a specific hospital should generalize to other interventions or institutions. In an effort to promote solutions for these limitations, we organized the Robust Medical Instrument Segmentation (ROBUST-MIS) challenge as an international benchmarking competition with a specific focus on the robustness and generalization capabilities of algorithms. For the first time in the field of endoscopic image processing, our challenge included a task on binary segmentation and also addressed multi-instance detection and segmentation. The challenge was based on a surgical data set comprising 10,040 annotated images acquired from a total of 30 surgical procedures from three different types of surgery. The validation of the competing methods for the three tasks (binary segmentation, multi-instance detection and multi-instance segmentation) was performed in three different stages with an increasing domain gap between the training and the test data. The results confirm the initial hypothesis, namely that algorithm performance degrades with an increasing domain gap. While the average detection and segmentation quality of the best-performing algorithms is high, future research should concentrate on detection and segmentation of small, crossing, moving and transparent instrument(s) (parts).

* A pre-print 
Viaarxiv icon

Large-Scale Unsupervised Deep Representation Learning for Brain Structure

May 02, 2018
Ayush Jaiswal, Dong Guo, Cauligi S. Raghavendra, Paul Thompson

Figure 1 for Large-Scale Unsupervised Deep Representation Learning for Brain Structure
Figure 2 for Large-Scale Unsupervised Deep Representation Learning for Brain Structure
Figure 3 for Large-Scale Unsupervised Deep Representation Learning for Brain Structure
Figure 4 for Large-Scale Unsupervised Deep Representation Learning for Brain Structure

Machine Learning (ML) is increasingly being used for computer aided diagnosis of brain related disorders based on structural magnetic resonance imaging (MRI) data. Most of such work employs biologically and medically meaningful hand-crafted features calculated from different regions of the brain. The construction of such highly specialized features requires a considerable amount of time, manual oversight and careful quality control to ensure the absence of errors in the computational process. Recent advances in Deep Representation Learning have shown great promise in extracting highly non-linear and information-rich features from data. In this paper, we present a novel large-scale deep unsupervised approach to learn generic feature representations of structural brain MRI scans, which requires no specialized domain knowledge or manual intervention. Our method produces low-dimensional representations of brain structure, which can be used to reconstruct brain images with very low error and exhibit performance comparable to FreeSurfer features on various classification tasks.

Viaarxiv icon

Unifying Local and Global Change Detection in Dynamic Networks

Oct 09, 2017
Wenzhe Li, Dong Guo, Greg Ver Steeg, Aram Galstyan

Figure 1 for Unifying Local and Global Change Detection in Dynamic Networks
Figure 2 for Unifying Local and Global Change Detection in Dynamic Networks
Figure 3 for Unifying Local and Global Change Detection in Dynamic Networks
Figure 4 for Unifying Local and Global Change Detection in Dynamic Networks

Many real-world networks are complex dynamical systems, where both local (e.g., changing node attributes) and global (e.g., changing network topology) processes unfold over time. Local dynamics may provoke global changes in the network, and the ability to detect such effects could have profound implications for a number of real-world problems. Most existing techniques focus individually on either local or global aspects of the problem or treat the two in isolation from each other. In this paper we propose a novel network model that simultaneously accounts for both local and global dynamics. To the best of our knowledge, this is the first attempt at modeling and detecting local and global change points on dynamic networks via a unified generative framework. Our model is built upon the popular mixed membership stochastic blockmodels (MMSB) with sparse co-evolving patterns. We derive an efficient stochastic gradient Langevin dynamics (SGLD) sampler for our proposed model, which allows it to scale to potentially very large networks. Finally, we validate our model on both synthetic and real-world data and demonstrate its superiority over several baselines.

Viaarxiv icon

Kernel Approximation Methods for Speech Recognition

Jan 13, 2017
Avner May, Alireza Bagheri Garakani, Zhiyun Lu, Dong Guo, Kuan Liu, Aurélien Bellet, Linxi Fan, Michael Collins, Daniel Hsu, Brian Kingsbury, Michael Picheny, Fei Sha

Figure 1 for Kernel Approximation Methods for Speech Recognition
Figure 2 for Kernel Approximation Methods for Speech Recognition
Figure 3 for Kernel Approximation Methods for Speech Recognition
Figure 4 for Kernel Approximation Methods for Speech Recognition

We study large-scale kernel methods for acoustic modeling in speech recognition and compare their performance to deep neural networks (DNNs). We perform experiments on four speech recognition datasets, including the TIMIT and Broadcast News benchmark tasks, and compare these two types of models on frame-level performance metrics (accuracy, cross-entropy), as well as on recognition metrics (word/character error rate). In order to scale kernel methods to these large datasets, we use the random Fourier feature method of Rahimi and Recht (2007). We propose two novel techniques for improving the performance of kernel acoustic models. First, in order to reduce the number of random features required by kernel models, we propose a simple but effective method for feature selection. The method is able to explore a large number of non-linear features while maintaining a compact model more efficiently than existing approaches. Second, we present a number of frame-level metrics which correlate very strongly with recognition performance when computed on the heldout set; we take advantage of these correlations by monitoring these metrics during training in order to decide when to stop learning. This technique can noticeably improve the recognition performance of both DNN and kernel models, while narrowing the gap between them. Additionally, we show that the linear bottleneck method of Sainath et al. (2013) improves the performance of our kernel models significantly, in addition to speeding up training and making the models more compact. Together, these three methods dramatically improve the performance of kernel acoustic models, making their performance comparable to DNNs on the tasks we explored.

Viaarxiv icon

Scalable Link Prediction in Dynamic Networks via Non-Negative Matrix Factorization

Jul 23, 2016
Linhong Zhu, Dong Guo, Junming Yin, Greg Ver Steeg, Aram Galstyan

Figure 1 for Scalable Link Prediction in Dynamic Networks via Non-Negative Matrix Factorization
Figure 2 for Scalable Link Prediction in Dynamic Networks via Non-Negative Matrix Factorization
Figure 3 for Scalable Link Prediction in Dynamic Networks via Non-Negative Matrix Factorization
Figure 4 for Scalable Link Prediction in Dynamic Networks via Non-Negative Matrix Factorization

We propose a scalable temporal latent space model for link prediction in dynamic social networks, where the goal is to predict links over time based on a sequence of previous graph snapshots. The model assumes that each user lies in an unobserved latent space and interactions are more likely to form between similar users in the latent space representation. In addition, the model allows each user to gradually move its position in the latent space as the network structure evolves over time. We present a global optimization algorithm to effectively infer the temporal latent space, with a quadratic convergence rate. Two alternative optimization algorithms with local and incremental updates are also proposed, allowing the model to scale to larger networks without compromising prediction accuracy. Empirically, we demonstrate that our model, when evaluated on a number of real-world dynamic networks, significantly outperforms existing approaches for temporal link prediction in terms of both scalability and predictive power.

* Technical report for paper "Scalable Temporal Latent Space Inference for Link Prediction in Dynamic Social Networks" that appears in IEEE Transactions on Knowledge and Data Engineering 2016 
Viaarxiv icon