Speaker verification aims to verify whether an input speech corresponds to the claimed speaker, and conventionally, this kind of system is deployed based on single-stream scenario, wherein the feature extractor operates in full frequency range. In this paper, we hypothesize that machine can learn enough knowledge to do classification task when listening to partial frequency range instead of full frequency range, which is so called frequency selection technique, and further propose a novel framework of multi-stream Convolutional Neural Network (CNN) with this technique for speaker verification tasks. The proposed framework accommodates diverse temporal embeddings generated from multiple streams to enhance the robustness of acoustic modeling. For the diversity of temporal embeddings, we consider feature augmentation with frequency selection, which is to manually segment the full-band of frequency into several sub-bands, and the feature extractor of each stream can select which sub-bands to use as target frequency domain. Different from conventional single-stream solution wherein each utterance would only be processed for one time, in this framework, there are multiple streams processing it in parallel. The input utterance for each stream is pre-processed by a frequency selector within specified frequency range, and post-processed by mean normalization. The normalized temporal embeddings of each stream will flow into a pooling layer to generate fused embeddings. We conduct extensive experiments on VoxCeleb dataset, and the experimental results demonstrate that multi-stream CNN significantly outperforms single-stream baseline with 20.53 % of relative improvement in minimum Decision Cost Function (minDCF).
In this report, we describe the submission of ShaneRun's team to the VoxCeleb Speaker Recognition Challenge (VoxSRC) 2020. We use ResNet-34 as encoder to extract the speaker embeddings, which is referenced from the open-source voxceleb-trainer. We also provide a simple method to implement optimum fusion using t-SNE normalized distance of testing utterance pairs instead of original negative Euclidean distance from the encoder. The final submitted system got 0.3098 minDCF and 5.076 % ERR for Fixed data track, which outperformed the baseline by 1.3 % minDCF and 2.2 % ERR respectively.
As an approximate nearest neighbor search technique, hashing has been widely applied in large-scale image retrieval due to its excellent efficiency. Most supervised deep hashing methods have similar loss designs with embedding learning, while quantizing the continuous high-dim feature into compact binary space. We argue that the existing deep hashing schemes are defective in two issues that seriously affect the performance, i.e., bit independence and bit balance. The former refers to hash codes of different classes should be independent of each other, while the latter means each bit should have a balanced distribution of +1s and -1s. In this paper, we propose a novel supervised deep hashing method, termed Hadamard Codebook based Deep Hashing (HCDH), which solves the above two problems in a unified formulation. Specifically, we utilize an off-the-shelf algorithm to generate a binary Hadamard codebook to satisfy the requirement of bit independence and bit balance, which subsequently serves as the desired outputs of the hash functions learning. We also introduce a projection matrix to solve the inconsistency between the order of Hadamard matrix and the number of classes. Besides, the proposed HCDH further exploits the supervised labels by constructing a classifier on top of the outputs of hash functions. Extensive experiments demonstrate that HCDH can yield discriminative and balanced binary codes, which well outperforms many state-of-the-arts on three widely-used benchmarks.
Online hashing has attracted extensive research attention when facing streaming data. Most online hashing methods, learning binary codes based on pairwise similarities of training instances, fail to capture the semantic relationship, and suffer from a poor generalization in large-scale applications due to large variations. In this paper, we propose to model the similarity distributions between the input data and the hashing codes, upon which a novel supervised online hashing method, dubbed as Similarity Distribution based Online Hashing (SDOH), is proposed, to keep the intrinsic semantic relationship in the produced Hamming space. Specifically, we first transform the discrete similarity matrix into a probability matrix via a Gaussian-based normalization to address the extremely imbalanced distribution issue. And then, we introduce a scaling Student t-distribution to solve the challenging initialization problem, and efficiently bridge the gap between the known and unknown distributions. Lastly, we align the two distributions via minimizing the Kullback-Leibler divergence (KL-diverence) with stochastic gradient descent (SGD), by which an intuitive similarity constraint is imposed to update hashing model on the new streaming data with a powerful generalizing ability to the past data. Extensive experiments on three widely-used benchmarks validate the superiority of the proposed SDOH over the state-of-the-art methods in the online retrieval task.
Online image hashing has received increasing research attention recently, which receives large-scale data in a streaming manner to update the hash functions on-the-fly. Its key challenge lies in the difficulty in balancing the learning timeliness and model accuracy. To this end, most works exploit a supervised setting, i.e., using class labels to boost the hashing performance, which defects in two aspects: First, large amount of training batches are required to learn up-to-date hash functions, which however largely increase the learning complexity. Second, strong constraints, e.g., orthogonal or similarity preserving, are used, which are however typically relaxed and lead to large accuracy drop. To handle the above challenges, in this paper, a novel supervised online hashing scheme termed Hadamard Matrix Guided Online Hashing (HMOH) is proposed. Our key innovation lies in the construction and usage of Hadamard matrix, which is an orthogonal binary matrix and is built via Sylvester method. To release the need of strong constraints, we regard each column of Hadamard matrix as the target code for each class label, which by nature satisfies several desired properties of hashing codes. To accelerate the online training, the LSH is first adopted to align the length of target code and the to-be-learned binary code. And then, we treat the learning of hash functions as a set of binary classification problems to fit the assigned target code. Finally, we propose to ensemble the learned models in all rounds to maximally preserve the information of past streaming data. The superior accuracy and efficiency of the proposed method are demonstrated through extensive experiments on three widely-used datasets comparing to various state-of-the-art methods.
A new kind of six degree-of-freedom teaching manipulator without actuators is designed, for recording and conveniently setting a trajectory of an industrial robot. The device requires good gravity balance and operating force performance to ensure being controlled easily and fluently. In this paper, we propose a process for modeling the manipulator and then the model is used to formulate a multi-objective optimization problem to optimize the design of the testing manipulator. Three objectives, including total mass of the device, gravity balancing and operating force performance are analyzed and defined. A popular non-dominated sorting genetic algorithm (NSGA-II-CDP) is used to solve the optimization problem. The obtained solutions all outperform the design of a human expert. To extract design knowledge, an innovization study is performed to establish meaningful implicit relationship between the objective space and the decision space, which can be reused by the designer in future design process.