Speaker protection algorithm is to leverage the playback signal properties to prevent over excursion while maintaining maximum loudness, especially for the mobile phone with tiny loudspeakers. This paper proposes efficient DL solutions to accurately model and predict the nonlinear excursion, which is challenging for conventional solutions. Firstly, we build the experiment and pre-processing pipeline, where the feedback current and voltage are sampled as input, and laser is employed to measure the excursion as ground truth. Secondly, one FFTNet model is proposed to explore the dominant low-frequency and other unknown harmonics, and compares to a baseline ConvNet model. In addition, BN re-estimation is designed to explore the online adaptation; and INT8 quantization based on AI Model efficiency toolkit (AIMET\footnote{AIMET is a product of Qualcomm Innovation Center, Inc.}) is applied to further reduce the complexity. The proposed algorithm is verified in two speakers and 3 typical deployment scenarios, and $>$99\% residual DC is less than 0.1 mm, much better than traditional solutions.
Meta-structures are widely used to define which subset of neighbors to aggregate information in heterogeneous information networks (HINs). In this work, we investigate existing meta-structures, including meta-path and meta-graph, and observe that they are initially designed manually with fixed patterns and hence are insufficient to encode various rich semantic information on diverse HINs. Through reflection on their limitation, we define a new concept called meta-multigraph as a more expressive and flexible generalization of meta-graph, and propose a stable differentiable search method to automatically optimize the meta-multigraph for specific HINs and tasks. As the flexibility of meta-multigraphs may propagate redundant messages, we further introduce a complex-to-concise (C2C) meta-multigraph that propagates messages from complex to concise along the depth of meta-multigraph. Moreover, we observe that the differentiable search typically suffers from unstable search and a significant gap between the meta-structures in search and evaluation. To this end, we propose a progressive search algorithm by implicitly narrowing the search space to improve search stability and reduce inconsistency. Extensive experiments are conducted on six medium-scale benchmark datasets and one large-scale benchmark dataset over two representative tasks, i.e., node classification and recommendation. Empirical results demonstrate that our search methods can automatically find expressive meta-multigraphs and C2C meta-multigraphs, enabling our model to outperform state-of-the-art heterogeneous graph neural networks.
The swift and precise detection of vehicles plays a significant role in intelligent transportation systems. Current vehicle detection algorithms encounter challenges of high computational complexity, low detection rate, and limited feasibility on mobile devices. To address these issues, this paper proposes a lightweight vehicle detection algorithm based on YOLOv7-tiny (You Only Look Once version seven) called Ghost-YOLOv7. The width of model is scaled to 0.5 and the standard convolution of the backbone network is replaced with Ghost convolution to achieve a lighter network and improve the detection speed; then a self-designed Ghost bi-directional feature pyramid network (Ghost-BiFPN) is embedded into the neck network to enhance feature extraction capability of the algorithm and enriches semantic information; and a Ghost Decouoled Head (GDH) is employed for accurate prediction of vehicle location and species; finally, a coordinate attention mechanism is introduced into the output layer to suppress environmental interference. The WIoU loss function is employed to further enhance the detection accuracy. Ablation experiments results on the PASCAL VOC dataset demonstrate that Ghost-YOLOv7 outperforms the original YOLOv7-tiny model. It achieving a 29.8% reduction in computation, 37.3% reduction in the number of parameters, 35.1% reduction in model weights, 1.1% higher mean average precision (mAP), the detection speed is higher 27FPS compared with the original algorithm. Ghost-YOLOv7 was also compared on KITTI and BIT-vehicle datasets as well, and the results show that this algorithm has the overall best performance.
White matter (WM) tract segmentation based on diffusion magnetic resonance imaging (dMRI) plays an important role in the analysis of human health and brain diseases. However, the annotation of WM tracts is time-consuming and needs experienced neuroanatomists. In this study, to explore tract segmentation in the challenging setting of minimal annotations, we propose a novel framework utilizing only one annotated subject (subject-level one-shot) for tract segmentation. Our method is constructed by proposed registration-based peak augmentation (RPA) and uncertainty-based refining (URe) modules. RPA module synthesizes pseudo subjects and their corresponding labels to improve the tract segmentation performance. The proposed URe module alleviates the negative influence of the low-confidence voxels on pseudo subjects. Experimental results show that our method outperforms other state-of-the-art methods by a large margin, and our proposed modules are effective. Overall, our method achieves accurate whole-brain tract segmentation with only one annotated subject. Our code is available at https://github.com/HaoXu0507/ISBI2023-One-Shot-WM-Tract-Segmentation.
Speech-driven 3D face animation aims to generate realistic facial expressions that match the speech content and emotion. However, existing methods often neglect emotional facial expressions or fail to disentangle them from speech content. To address this issue, this paper proposes an end-to-end neural network to disentangle different emotions in speech so as to generate rich 3D facial expressions. Specifically, we introduce the emotion disentangling encoder (EDE) to disentangle the emotion and content in the speech by cross-reconstructed speech signals with different emotion labels. Then an emotion-guided feature fusion decoder is employed to generate a 3D talking face with enhanced emotion. The decoder is driven by the disentangled identity, emotional, and content embeddings so as to generate controllable personal and emotional styles. Finally, considering the scarcity of the 3D emotional talking face data, we resort to the supervision of facial blendshapes, which enables the reconstruction of plausible 3D faces from 2D emotional data, and contribute a large-scale 3D emotional talking face dataset (3D-ETF) to train the network. Our experiments and user studies demonstrate that our approach outperforms state-of-the-art methods and exhibits more diverse facial movements. We recommend watching the supplementary video: https://ziqiaopeng.github.io/emotalk
This work proposes a novel joint design for multiuser multiple-input multiple-output wiretap channels. The base station exploits a switching network to connect a subset of its antennas to the available radio frequency chains. The switching network and transmit beamformers are jointly designed to maximize the weighted secrecy sum-rate for this setting. The principal design problem reduces to an NP-hard mixed-integer non-linear programming. We invoke the fractional programming technique and the penalty dual decomposition method to develop a tractable iterative algorithm that effectively approximates the optimal design. Our numerical investigations validate the effectiveness of the proposed algorithm and its superior performance compared with the benchmark.
In today's multilingual lexical databases, the majority of the world's languages are under-represented. Beyond a mere issue of resource incompleteness, we show that existing lexical databases have structural limitations that result in a reduced expressivity on culturally-specific words and in mapping them across languages. In particular, the lexical meaning space of dominant languages, such as English, is represented more accurately while linguistically or culturally diverse languages are mapped in an approximate manner. Our paper assesses state-of-the-art multilingual lexical databases and evaluates their strengths and limitations with respect to their expressivity on lexical phenomena of linguistic diversity.
Mitosis detection is one of the challenging problems in computational pathology, and mitotic count is an important index of cancer grading for pathologists. However, current counts of mitotic nuclei rely on pathologists looking microscopically at the number of mitotic nuclei in hot spots, which is subjective and time-consuming. In this paper, we propose a two-stage cascaded network, named FoCasNet, for mitosis detection. In the first stage, a detection network named M_det is proposed to detect as many mitoses as possible. In the second stage, a classification network M_class is proposed to refine the results of the first stage. In addition, the attention mechanism, normalization method, and hybrid anchor branch classification subnet are introduced to improve the overall detection performance. Our method achieves the current highest F1-score of 0.888 on the public dataset ICPR 2012. We also evaluated our method on the GZMH dataset released by our research team for the first time and reached the highest F1-score of 0.563, which is also better than multiple classic detection networks widely used at present. It confirmed the effectiveness and generalization of our method. The code will be available at: https://github.com/antifen/mitosis-nuclei-detection.
Semi-supervised learning frameworks usually adopt mutual learning approaches with multiple submodels to learn from different perspectives. To avoid transferring erroneous pseudo labels between these submodels, a high threshold is usually used to filter out a large number of low-confidence predictions for unlabeled data. However, such filtering can not fully exploit unlabeled data with low prediction confidence. To overcome this problem, in this work, we propose a mutual learning framework based on pseudo-negative labels. Negative labels are those that a corresponding data item does not belong. In each iteration, one submodel generates pseudo-negative labels for each data item, and the other submodel learns from these labels. The role of the two submodels exchanges after each iteration until convergence. By reducing the prediction probability on pseudo-negative labels, the dual model can improve its prediction ability. We also propose a mechanism to select a few pseudo-negative labels to feed into submodels. In the experiments, our framework achieves state-of-the-art results on several main benchmarks. Specifically, with our framework, the error rates of the 13-layer CNN model are 9.35% and 7.94% for CIFAR-10 with 1000 and 4000 labels, respectively. In addition, for the non-augmented MNIST with only 20 labels, the error rate is 0.81% by our framework, which is much smaller than that of other approaches. Our approach also demonstrates a significant performance improvement in domain adaptation.