Recently, Synthetic data-based Instance Segmentation has become an exceedingly favorable optimization paradigm since it leverages simulation rendering and physics to generate high-quality image-annotation pairs. In this paper, we propose a Parallel Pre-trained Transformers (PPT) framework to accomplish the synthetic data-based Instance Segmentation task. Specifically, we leverage the off-the-shelf pre-trained vision Transformers to alleviate the gap between natural and synthetic data, which helps to provide good generalization in the downstream synthetic data scene with few samples. Swin-B-based CBNet V2, SwinL-based CBNet V2 and Swin-L-based Uniformer are employed for parallel feature learning, and the results of these three models are fused by pixel-level Non-maximum Suppression (NMS) algorithm to obtain more robust results. The experimental results reveal that PPT ranks first in the CVPR2022 AVA Accessibility Vision and Autonomy Challenge, with a 65.155% mAP.
An automatic speaker verification system aims to verify the speaker identity of a speech signal. However, a voice conversion system manipulates the original person's speech signal to make it sound like the target speaker's voice and deceive the speaker verification system. Most countermeasures for voice conversion-based spoofing attacks are designed to discriminate bona fide speech from spoofed speech for speaker verification systems. In this paper, we investigate the problem of source speaker identification -- inferring the identity of the source speaker given the voice converted speech. To perform source speaker identification, we simply add voice-converted speech data with the label of source speaker identity to the genuine speech dataset during speaker embedding network training. Experimental results show the feasibility of source speaker identification when training and testing with converted speeches from the same voice conversion model(s). When testing on converted speeches from an unseen voice conversion algorithm, the performance of source speaker identification improves when more voice conversion models are used during training.
Recently, the target speech separation or extraction techniques under the meeting scenario have become a hot research trend. We propose a speaker diarization aware multiple target speech separation system (SD-MTSS) to simultaneously extract the voice of each speaker from the mixed speech, rather than requiring a succession of independent processes as presented in previous solutions. SD-MTSS consists of a speaker diarization (SD) module and a multiple target speech separation (MTSS) module. The former one infers the target speaker voice activity detection (TSVAD) states of the mixture, as well as gets different speakers' single-talker audio segments as the reference speech. The latter one employs both the mixed audio and reference speech as inputs, and then it generates an estimated mask. By exploiting the TSVAD decision and the estimated mask, our SD-MTSS model can extract the speech of each speaker concurrently in a conversion recording without additional enrollment audio in advance.Experimental results show that our MTSS model outperforms our baselines with a large margin, achieving 1.38dB SDR, 1.34dB SI-SNR, and 0.13 PESQ improvements over the state-of-the-art SpEx+ baseline on the WSJ0-2mix-extr dataset, respectively. The SD-MTSS system makes a significant improvement than the baseline on the Alimeeting dataset as well.
Integrated sensing and communication (ISAC) is emerging as a key enabler to address the growing spectrum congestion problem and satisfy increasing demands for ubiquitous sensing and communication. By sharing various resources and information, ISAC achieves much higher spectral, energy, hardware, and economic efficiencies. Concurrently, reconfigurable intelligent surface (RIS) technology has been deemed as a promising approach due to its capability of intelligently manipulating the wireless propagation environment in an energy and hardware efficient manner. In this article, we analyze the potential of deploying RIS to improve communication and sensing performance in ISAC systems. We first describe the fundamentals of RIS and its applications in traditional communication and sensing systems, then introduce the principles of ISAC and overview existing explorations on RIS-assisted ISAC, followed by one case study to verify the advantages of deploying RIS in ISAC systems. Finally, open challenges and research directions are discussed to stimulate this line of research and pave the way for practical applications.
To alleviate the challenges of building Knowledge Graphs (KG) from scratch, a more general task is to enrich a KG using triples from an open corpus, where the obtained triples contain noisy entities and relations. It is challenging to enrich a KG with newly harvested triples while maintaining the quality of the knowledge representation. This paper proposes a system to refine a KG using information harvested from an additional corpus. To this end, we formulate our task as two coupled sub-tasks, namely join event extraction (JEE) and knowledge graph fusion (KGF). We then propose a Collaborative Knowledge Graph Fusion Framework to allow our sub-tasks to mutually assist one another in an alternating manner. More concretely, the explorer carries out the JEE supervised by both the ground-truth annotation and an existing KG provided by the supervisor. The supervisor then evaluates the triples extracted by the explorer and enriches the KG with those that are highly ranked. To implement this evaluation, we further propose a Translated Relation Alignment Scoring Mechanism to align and translate the extracted triples to the prior KG. Experiments verify that this collaboration can both improve the performance of the JEE and the KGF.
Learning efficient graph representation is the key to favorably addressing downstream tasks on graphs, such as node or graph property prediction. Given the non-Euclidean structural property of graphs, preserving the original graph data's similarity relationship in the embedded space needs specific tools and a similarity metric. This paper develops a new graph representation learning scheme, namely EGG, which embeds approximated second-order graph characteristics into a Grassmann manifold. The proposed strategy leverages graph convolutions to learn hidden representations of the corresponding subspace of the graph, which is then mapped to a Grassmann point of a low dimensional manifold through truncated singular value decomposition (SVD). The established graph embedding approximates denoised correlationship of node attributes, as implemented in the form of a symmetric matrix space for Euclidean calculation. The effectiveness of EGG is demonstrated using both clustering and classification tasks at the node level and graph level. It outperforms baseline models on various benchmarks.
Bundle recommendation systems aim to recommend a bundle of items for a user to consider as a whole. They have become a norm in modern life and have been applied to many real-world settings, such as product bundle recommendation, music playlist recommendation and travel package recommendation. However, compared to studies of bundle recommendation approaches in areas such as online shopping and digital music services, research on meal recommendations for restaurants in the hospitality industry has made limited progress, due largely to the lack of high-quality benchmark datasets. A publicly available dataset specialising in meal recommendation research for the research community is in urgent demand. In this paper, we introduce a meal recommendation dataset (MealRec) that aims to facilitate future research. MealRec is constructed from the user review records of Allrecipe.com, covering 1,500+ users, 7,200+ recipes and 3,800+ meals. Each recipe is described with rich information, such as ingredients, instructions, pictures, category and tags, etc; and each meal is three-course, consisting of an appetizer, a main dish and a dessert. Furthermore, we propose a category-constrained meal recommendation model that is evaluated through comparative experiments with several state-of-the-art bundle recommendation methods on MealRec. Experimental results confirm the superiority of our model and demonstrate that MealRec is a promising testbed for meal recommendation related research. The MealRec dataset and the source code of our proposed model are available at https://github.com/WUT-IDEA/MealRec for access and reproducibility.
Symbol-level precoding (SLP), which converts the harmful multi-user interference (MUI) into beneficial signals, can significantly improve symbol-error-rate (SER) performance in multi-user communication systems. While enjoying symbolic gain, however, the complicated non-linear symbol-by-symbol precoder design suffers high computational complexity exponential with the number of users, which is unaffordable in realistic systems. In this paper, we propose a novel low-complexity grouped SLP (G-SLP) approach and develop efficient design algorithms for typical max-min fairness and power minimization problems. In particular, after dividing all users into several groups, the precoders for each group are separately designed on a symbol-by-symbol basis by only utilizing the symbol information of the users in that group, in which the intra-group MUI is exploited using the concept of constructive interference (CI) and the inter-group MUI is also effectively suppressed. In order to further reduce the computational complexity, we utilize the Lagrangian dual, Karush-Kuhn-Tucker (KKT) conditions and the majorization-minimization (MM) method to transform the resulting problems into more tractable forms, and develop efficient algorithms for obtaining closed-form solutions to them. Extensive simulation results illustrate that the proposed G-SLP strategy and design algorithms dramatically reduce the computational complexity without causing significant performance loss compared with the traditional SLP schemes.
Intelligent reflecting surface (IRS) has been regarded as a promising and revolutionary technology for future wireless communication systems owing to its capability of tailoring signal propagation environment in an energy/spectrum/hardware-efficient manner. However, most existing studies on IRS optimizations are based on a simple and ideal reflection model that is impractical in hardware implementation, which thus leads to severe performance loss in realistic wideband/multi-band systems. To deal with this problem, in this paper we first propose a more practical and more tractable IRS reflection model that describes the difference of reflection responses for signals at different frequencies. Then, we investigate the joint transmit beamforming and IRS reflection beamforming design for an IRS-assisted multi-cell multi-band system. Both power minimization and sum-rate maximization problems are solved by exploiting popular second-order cone programming (SOCP), Riemannian manifold, minimization-majorization (MM), weighted minimum mean square error (WMMSE), and block coordinate descent (BCD) methods. Simulation results illustrate the significant performance improvement of our proposed joint transmit beamforming and reflection design algorithms based on the practical reflection model in terms of power saving and rate enhancement.
In this paper, we investigate the potential of employing reconfigurable intelligent surface (RIS) in integrated sensing and communication (ISAC) systems. In particular, we consider an RIS-assisted ISAC system in which a multi-antenna base station (BS) simultaneously performs multi-user multi-input singleoutput (MU-MISO) communication and target detection. We aim to jointly design the transmit beamforming and receive filter of the BS, and the reflection coefficients of the RIS to maximize the sum-rate of the communication users, while satisfying a worst-case radar output signal-to-noise ratio (SNR), the transmit power constraint, and the unit modulus property of the reflecting coefficients. An efficient iterative algorithm based on fractional programming (FP), majorization-minimization (MM), and alternative direction method of multipliers (ADMM) is developed to solve the complicated non-convex problem. Simulation results verify the advantage of the proposed RIS-assisted ISAC scheme and the effectiveness of the developed algorithm.