This paper describes speaker verification (SV) systems submitted by the SpeakIn team to the Task 1 and Task 2 of the Far-Field Speaker Verification Challenge 2022 (FFSVC2022). SV tasks of the challenge focus on the problem of fully supervised far-field speaker verification (Task 1) and semi-supervised far-field speaker verification (Task 2). In Task 1, we used the VoxCeleb and FFSVC2020 datasets as train datasets. And for Task 2, we only used the VoxCeleb dataset as train set. The ResNet-based and RepVGG-based architectures were developed for this challenge. Global statistic pooling structure and MQMHA pooling structure were used to aggregate the frame-level features across time to obtain utterance-level representation. We adopted AM-Softmax and AAM-Softmax to classify the resulting embeddings. We innovatively propose a staged transfer learning method. In the pre-training stage we reserve the speaker weights, and there are no positive samples to train them in this stage. Then we fine-tune these weights with both positive and negative samples in the second stage. Compared with the traditional transfer learning strategy, this strategy can better improve the model performance. The Sub-Mean and AS-Norm backend methods were used to solve the problem of domain mismatch. In the fusion stage, three models were fused in Task1 and two models were fused in Task2. On the FFSVC2022 leaderboard, the EER of our submission is 3.0049% and the corresponding minDCF is 0.2938 in Task1. In Task2, EER and minDCF are 6.2060% and 0.5232 respectively. Our approach leads to excellent performance and ranks 1st in both challenge tasks.
Intelligent drug delivery trolley is an advanced intelligent drug delivery equipment. Compared with traditional manual drug delivery, it has higher drug delivery efficiency and lower error rate. In this project, an intelligent drug delivery car is designed and manufactured, which can recognize the road route and the room number of the target ward through visual recognition technology. The trolley selects the corresponding route according to the identified room number, accurately transports the drugs to the target ward, and can return to the pharmacy after the drugs are delivered. The intelligent drug delivery car uses DC power supply, and the motor drive module controls two DC motors, which overcomes the problem of excessive deviation of turning angle. The trolley line inspection function uses closed-loop control to improve the accuracy of line inspection and the controllability of trolley speed. The identification of ward number is completed by the camera module with microcontroller, and has the functions of adaptive adjustment of ambient brightness, distortion correction, automatic calibration and so on. The communication between two cooperative drug delivery vehicles is realized by Bluetooth module, which achieves efficient and accurate communication and interaction. Experiments show that the intelligent drug delivery car can accurately identify the room number and plan the route to deliver drugs to the far, middle and near wards, and has the characteristics of fast speed and accurate judgment. In addition, two drug delivery trolleys can cooperate to deliver drugs to the same ward, with high efficiency and high cooperation.
Node classification is a central task in graph data analysis. Scarce or even no labeled data of emerging classes is a big challenge for existing methods. A natural question arises: can we classify the nodes from those classes that have never been seen? In this paper, we study this zero-shot node classification (ZNC) problem which has a two-stage nature: (1) acquiring high-quality class semantic descriptions (CSDs) for knowledge transfer, and (2) designing a well generalized graph-based learning model. For the first stage, we give a novel quantitative CSDs evaluation strategy based on estimating the real class relationships, so as to get the "best" CSDs in a completely automatic way. For the second stage, we propose a novel Decomposed Graph Prototype Network (DGPN) method, following the principles of locality and compositionality for zero-shot model generalization. Finally, we conduct extensive experiments to demonstrate the effectiveness of our solutions.
Leveraging line features to improve localization accuracy of point-based visual-inertial SLAM (VINS) is gaining interest as they provide additional constraints on scene structure. However, real-time performance when incorporating line features in VINS has not been addressed. This paper presents PL-VINS, a real-time optimization-based monocular VINS method with point and line features, developed based on the state-of-the-art point-based VINS-Mono \cite{vins}. We observe that current works use the LSD \cite{lsd} algorithm to extract line features; however, LSD is designed for scene shape representation instead of the pose estimation problem, which becomes the bottleneck for the real-time performance due to its high computational cost. In this paper, a modified LSD algorithm is presented by studying a hidden parameter tuning and length rejection strategy. The modified LSD can run at least three times as fast as LSD. Further, by representing space lines with the Pl\"{u}cker coordinates, the residual error in line estimation is modeled in terms of the point-to-line distance, which is then minimized by iteratively updating the minimum four-parameter orthonormal representation of the Pl\"{u}cker coordinates. Experiments in a public benchmark dataset show that the localization error of our method is 12-16\% less than that of VINS-Mono at the same pose update frequency. %For the benefit of the community, The source code of our method is available at: https://github.com/cnqiangfu/PL-VINS.
Leveraging line features to improve location accuracy of point-based visual-inertial SLAM (VINS) is gaining importance as they provide additional constraint of scene structure regularity, however, real-time performance has not been focused. This paper presents PL-VINS, a real-time optimization-based monocular VINS method with point and line, developed based on state-of-the-art point-based VINS-Mono \cite{vins}. Observe that current works use LSD \cite{lsd} algorithm to extract lines, however, the LSD is designed for scene shape representation instead of specific pose estimation problem, which becomes the bottleneck for the real-time performance due to its expensive cost. In this work, a modified LSD algorithm is presented by studying hidden parameter tuning and length rejection strategy. The modified LSD can run three times at least as fast as the LSD. Further, by representing a line landmark with Pl\"{u}cker coordinate, the line reprojection residual is modeled as midpoint-to-line distance then minimized by iteratively updating the minimum four-parameter orthonormal representation of the Pl\"{u}cker coordinate. Experiments in public EuRoc benchmark dataset show the location error of our method is down 12-16\% compared to VINS-Mono at the same work frequency on a low-power CPU @1.1 GHz without GPU parallelization. For the benefit of the community, we make public the source code: \textit{https://github.com/cnqiangfu/PL-VINS