Nowadays, with the prevalence of social media and music creation tools, musical pieces are spreading much quickly, and music creation is getting much easier. The increasing number of musical pieces have made the problem of music plagiarism prominent. There is an urgent need for a tool that can detect music plagiarism automatically. Researchers have proposed various methods to extract low-level and high-level features of music and compute their similarities. However, low-level features such as cepstrum coefficients have weak relation with the copyright protection of musical pieces. Existing algorithms considering high-level features fail to detect the case in which two musical pieces are not quite similar overall, but have some highly similar regions. This paper proposes a new method named MESMF, which innovatively converts the music plagiarism detection problem into the bipartite graph matching task. It can be solved via the maximum weight matching and edit distances model. We design several kinds of melody representations and the similarity computation methods according to the music theory. The proposed method can deal with the shift, swapping, transposition, and tempo variance problems in music plagiarism. It can also effectively pick out the local similar regions from two musical pieces with relatively low global similarity. We collect a new music plagiarism dataset from real legally-judged music plagiarism cases and conduct detailed ablation studies. Experimental results prove the excellent performance of the proposed algorithm. The source code and our dataset are available at https://anonymous.4open.science/r/a41b8fb4-64cf-4190-a1e1-09b7499a15f5/
Reweighting adversarial data during training has been recently shown to improve adversarial robustness, where data closer to the current decision boundaries are regarded as more critical and given larger weights. However, existing methods measuring the closeness are not very reliable: they are discrete and can take only a few values, and they are path-dependent, i.e., they may change given the same start and end points with different attack paths. In this paper, we propose three types of probabilistic margin (PM), which are continuous and path-independent, for measuring the aforementioned closeness and reweighting adversarial data. Specifically, a PM is defined as the difference between two estimated class-posterior probabilities, e.g., such the probability of the true label minus the probability of the most confusing label given some natural data. Though different PMs capture different geometric properties, all three PMs share a negative correlation with the vulnerability of data: data with larger/smaller PMs are safer/riskier and should have smaller/larger weights. Experiments demonstrate that PMs are reliable measurements and PM-based reweighting methods outperform state-of-the-art methods.
In semantic segmentation, we aim to train a pixel-level classifier to assign category labels to all pixels in an image, where labeled training images and unlabeled test images are from the same distribution and share the same label set. However, in an open world, the unlabeled test images probably contain unknown categories and have different distributions from the labeled images. Hence, in this paper, we consider a new, more realistic, and more challenging problem setting where the pixel-level classifier has to be trained with labeled images and unlabeled open-world images -- we name it open world semantic segmentation (OSS). In OSS, the trained classifier is expected to identify unknown-class pixels and classify known-class pixels well. To solve OSS, we first investigate which distribution that unknown-class pixels obey. Then, motivated by the goodness-of-fit test, we use statistical measurements to show how a pixel fits the distribution of an unknown class and select highly-fitted pixels to form the unknown region in each image. Eventually, we propose an end-to-end learning framework, known-region-aware domain alignment (KRADA), to distinguish unknown classes while aligning distributions of known classes in labeled and unlabeled open-world images. The effectiveness of KRADA has been verified on two synthetic tasks and one COVID-19 segmentation task.
Unlike English letters, Chinese characters have rich and specific meanings. Usually, the meaning of a word can be derived from its constituent characters in some way. Several previous works on syntactic parsing propose to annotate shallow word-internal structures for better utilizing character-level information. This work proposes to model the deep internal structures of Chinese words as dependency trees with 11 labels for distinguishing syntactic relationships. First, based on newly compiled annotation guidelines, we manually annotate a word-internal structure treebank (WIST) consisting of over 30K multi-char words from Chinese Penn Treebank. To guarantee quality, each word is independently annotated by two annotators and inconsistencies are handled by a third senior annotator. Second, we present detailed and interesting analysis on WIST to reveal insights on Chinese word formation. Third, we propose word-internal structure parsing as a new task, and conduct benchmark experiments using a competitive dependency parser. Finally, we present two simple ways to encode word-internal structures, leading to promising gains on the sentence-level syntactic parsing task.
Speech disorders often occur at the early stage of Parkinson's disease (PD). The speech impairments could be indicators of the disorder for early diagnosis, while motor symptoms are not obvious. In this study, we constructed a new speech corpus of Mandarin Chinese and addressed classification of patients with PD. We implemented classical machine learning methods with ranking algorithms for feature selection, convolutional and recurrent deep networks, and an end to end system. Our classification accuracy significantly surpassed state-of-the-art studies. The result suggests that free talk has stronger classification power than standard speech tasks, which could help the design of future speech tasks for efficient early diagnosis of the disease. Based on existing classification methods and our natural speech study, the automatic detection of PD from daily conversation could be accessible to the majority of the clinical population.
Graph representation learning plays a vital role in processing graph-structured data. However, prior arts on graph representation learning heavily rely on the labeling information. To overcome this problem, inspired by the recent success of graph contrastive learning and Siamese networks in visual representation learning, we propose a novel self-supervised approach in this paper to learn node representations by enhancing Siamese self-distillation with multi-scale contrastive learning. Specifically, we first generate two augmented views from the input graph based on local and global perspectives. Then, we employ two objectives called cross-view and cross-network contrastiveness to maximize the agreement between node representations across different views and networks. To demonstrate the effectiveness of our approach, we perform empirical experiments on five real-world datasets. Our method not only achieves new state-of-the-art results but also surpasses some semi-supervised counterparts by large margins.
Machine learning in the context of noise is a challenging but practical setting to plenty of real-world applications. Most of the previous approaches in this area focus on the pairwise relation (casual or correlational relationship) with noise, such as learning with noisy labels. However, the group noise, which is parasitic on the coarse-grained accurate relation with the fine-grained uncertainty, is also universal and has not been well investigated. The challenge under this setting is how to discover true pairwise connections concealed by the group relation with its fine-grained noise. To overcome this issue, we propose a novel Max-Matching method for learning with group noise. Specifically, it utilizes a matching mechanism to evaluate the relation confidence of each object w.r.t. the target, meanwhile considering the Non-IID characteristics among objects in the group. Only the most confident object is considered to learn the model, so that the fine-grained noise is mostly dropped. The performance on arange of real-world datasets in the area of several learning paradigms demonstrates the effectiveness of Max-Matching
Anomaly detection on attributed networks attracts considerable research interests due to wide applications of attributed networks in modeling a wide range of complex systems. Recently, the deep learning-based anomaly detection methods have shown promising results over shallow approaches, especially on networks with high-dimensional attributes and complex structures. However, existing approaches, which employ graph autoencoder as their backbone, do not fully exploit the rich information of the network, resulting in suboptimal performance. Furthermore, these methods do not directly target anomaly detection in their learning objective and fail to scale to large networks due to the full graph training mechanism. To overcome these limitations, in this paper, we present a novel contrastive self-supervised learning framework for anomaly detection on attributed networks. Our framework fully exploits the local information from network data by sampling a novel type of contrastive instance pair, which can capture the relationship between each node and its neighboring substructure in an unsupervised way. Meanwhile, a well-designed graph neural network-based contrastive learning model is proposed to learn informative embedding from high-dimensional attributes and local structure and measure the agreement of each instance pairs with its outputted scores. The multi-round predicted scores by the contrastive learning model are further used to evaluate the abnormality of each node with statistical estimation. In this way, the learning model is trained by a specific anomaly detection-aware target. Furthermore, since the input of the graph neural network module is batches of instance pairs instead of the full network, our framework can adapt to large networks flexibly. Experimental results show that our proposed framework outperforms the state-of-the-art baseline methods on all seven benchmark datasets.
The drastic increase of data quantity often brings the severe decrease of data quality, such as incorrect label annotations, which poses a great challenge for robustly training Deep Neural Networks (DNNs). Existing learning \mbox{methods} with label noise either employ ad-hoc heuristics or restrict to specific noise assumptions. However, more general situations, such as instance-dependent label noise, have not been fully explored, as scarce studies focus on their label corruption process. By categorizing instances into confusing and unconfusing instances, this paper proposes a simple yet universal probabilistic model, which explicitly relates noisy labels to their instances. The resultant model can be realized by DNNs, where the training procedure is accomplished by employing an alternating optimization algorithm. Experiments on datasets with both synthetic and real-world label noise verify that the proposed method yields significant improvements on robustness over state-of-the-art counterparts.
This paper proposes an indoor visible light communication (VLC) system with multiple transmitters and receivers. Due to diffusivity of LED light beams, photodiode receive signals from many directions. We use one concave and one convex lens as optical antenna, and obtain the optimal lens structure by optimizing which corresponds to the minimum condition number of channel gain matrix. In this way the light emitted by different LED can be separated well from each other then minimize signal interference. However, interference increases in the case of system deviation, so we explore the system mobility. Then subsequent signal processing is carried out, including signal combining and successive interference cancellation (SIC). We combine the same signal received by different receivers to improve signal to interference noise ratio (SINR). And SIC can effectively restore interference and eliminate its impact. The simulation results show that channel capacity can be increased by more than 5 times and up to 20 times under the condition of receiver and transmitter alignment. In the case of movement, channel capacity can also be increased by about 4 times on average. Moreover, the mobile range of system is also significantly expanded.