Abstract:Purpose: This work proposes a novel self-supervised noise-adaptive image denoising framework, called Repetition to Repetition (Rep2Rep) learning, for low-field (<1T) MRI applications. Methods: Rep2Rep learning extends the Noise2Noise framework by training a neural network on two repeated MRI acquisitions, using one repetition as input and another as target, without requiring ground-truth data. It incorporates noise-adaptive training, enabling denoising generalization across varying noise levels and flexible inference with any number of repetitions. Performance was evaluated on both synthetic noisy brain MRI and 0.55T prostate MRI data, and compared against supervised learning and Monte Carlo Stein's Unbiased Risk Estimator (MC-SURE). Results: Rep2Rep learning outperforms MC-SURE on both synthetic and 0.55T MRI datasets. On synthetic brain data, it achieved denoising quality comparable to supervised learning and surpassed MC-SURE, particularly in preserving structural details and reducing residual noise. On the 0.55T prostate MRI dataset, a reader study showed radiologists preferred Rep2Rep-denoised 2-average images over 8-average noisy images. Rep2Rep demonstrated robustness to noise-level discrepancies between training and inference, supporting its practical implementation. Conclusion: Rep2Rep learning offers an effective self-supervised denoising for low-field MRI by leveraging routinely acquired multi-repetition data. Its noise-adaptivity enables generalization to different SNR regimes without clean reference images. This makes Rep2Rep learning a promising tool for improving image quality and scan efficiency in low-field MRI.
Abstract:Magnetic resonance imaging (MRI) reconstruction has largely been dominated by deep neural networks (DNN); however, many state-of-the-art architectures use black-box structures, which hinder interpretability and improvement. Here, we propose an interpretable DNN architecture for self-supervised MRI reconstruction and denoising by directly parameterizing and learning the classical primal-dual splitting, dubbed LPDSNet. This splitting algorithm allows us to decouple the observation model from the signal prior. Experimentally, we show other interpretable architectures without this decoupling property exhibit failure in the self-supervised learning regime. We report state-of-the-art self-supervised joint MRI reconstruction and denoising performance and novel noise-level generalization capabilities, where in contrast black-box networks fail to generalize.
Abstract:In the pursuit of Artificial General Intelligence (AGI), automating the generation and evaluation of novel research ideas is a key challenge in AI-driven scientific discovery. This paper presents Relative Neighbor Density (RND), a domain-agnostic algorithm for novelty assessment in research ideas that overcomes the limitations of existing approaches by analyzing the distribution patterns of semantic neighbors rather than simple distances. We first developed a scalable methodology to create validation datasets without expert labeling, addressing a fundamental challenge in novelty assessment. Using these datasets, we demonstrate that our RND algorithm achieves state-of-the-art (SOTA) performance in computer science (AUROC=0.808) and biomedical research (AUROC=0.757) domains. Most significantly, while SOTA models like Sonnet-3.7 and existing metrics show domain-specific performance degradation, RND maintains consistent effectiveness across domains, outperforming all benchmarks by a substantial margin (0.782 v.s. 0.597) on cross-domain evaluation. These results validate RND as a generalizable solution for automated novelty assessment in scientific research.
Abstract:Users and creators are two crucial components of recommender systems. Typical recommender systems focus on the user side, providing the most suitable items based on each user's request. In such scenarios, a few items receive a majority of exposures, while many items receive very few. This imbalance leads to poorer experiences and decreased activity among the creators receiving less feedback, harming the recommender system in the long term. To this end, we develop a creator-side recommender system, called DualRec, to answer the following question: how to find the most suitable users for each item to enhance the creators' experience? We show that typical user-side recommendation algorithms, such as retrieval and ranking algorithms, can be adapted into the creator-side versions with just a few modifications. This greatly simplifies algorithm design in DualRec. Moreover, we discuss a unique challenge in DualRec: the user availability issue, which is not present in user-side recommender systems. To tackle this issue, we incorporate a user availability calculation (UAC) module to effectively enhance DualRec's performance. DualRec has already been implemented in Kwai, a short video recommendation system with over 100 millions user and over 10 million creators, significantly improving the experience for creators.
Abstract:We consider the problem of matrix completion with graphs as side information depicting the interrelations between variables. The key challenge lies in leveraging the similarity structure of the graph to enhance matrix recovery. Existing approaches, primarily based on graph Laplacian regularization, suffer from several limitations: (1) they focus only on the similarity between neighboring variables, while overlooking long-range correlations; (2) they are highly sensitive to false edges in the graphs and (3) they lack theoretical guarantees regarding statistical and computational complexities. To address these issues, we propose in this paper a novel graph regularized matrix completion algorithm called GSGD, based on preconditioned projected gradient descent approach. We demonstrate that GSGD effectively captures the higher-order correlation information behind the graphs, and achieves superior robustness and stability against the false edges. Theoretically, we prove that GSGD achieves linear convergence to the global optimum with near-optimal sample complexity, providing the first theoretical guarantees for both recovery accuracy and efficacy in the perspective of nonconvex optimization. Our numerical experiments on both synthetic and real-world data further validate that GSGD achieves superior recovery accuracy and scalability compared with several popular alternatives.
Abstract:Radio maps enrich radio propagation and spectrum occupancy information, which provides fundamental support for the operation and optimization of wireless communication systems. Traditional radio maps are mainly achieved by extensive manual channel measurements, which is time-consuming and inefficient. To reduce the complexity of channel measurements, radio map estimation (RME) through novel artificial intelligence techniques has emerged to attain higher resolution radio maps from sparse measurements or few observations. However, black box problems and strong dependency on training data make learning-based methods less explainable, while model-based methods offer strong theoretical grounding but perform inferior to the learning-based methods. In this paper, we develop a deep unrolled low-rank tensor completion network (DULRTC-RME) for radio map estimation, which integrates theoretical interpretability and learning ability by unrolling the tedious low-rank tensor completion optimization into a deep network. It is the first time that algorithm unrolling technology has been used in the RME field. Experimental results demonstrate that DULRTC-RME outperforms existing RME methods.
Abstract:Modern decision-making scenarios often involve data that is both high-dimensional and rich in higher-order contextual information, where existing bandits algorithms fail to generate effective policies. In response, we propose in this paper a generalized linear tensor bandits algorithm designed to tackle these challenges by incorporating low-dimensional tensor structures, and further derive a unified analytical framework of the proposed algorithm. Specifically, our framework introduces a convex optimization approach with the weakly decomposable regularizers, enabling it to not only achieve better results based on the tensor low-rankness structure assumption but also extend to cases involving other low-dimensional structures such as slice sparsity and low-rankness. The theoretical analysis shows that, compared to existing low-rankness tensor result, our framework not only provides better bounds but also has a broader applicability. Notably, in the special case of degenerating to low-rank matrices, our bounds still offer advantages in certain scenarios.
Abstract:Top-$K$ recommendation involves inferring latent user preferences and generating personalized recommendations accordingly, which is now ubiquitous in various decision systems. Nonetheless, recommender systems usually suffer from severe \textit{popularity bias}, leading to the over-recommendation of popular items. Such a bias deviates from the central aim of reflecting user preference faithfully, compromising both customer satisfaction and retailer profits. Despite the prevalence, existing methods tackling popularity bias still have limitations due to the considerable accuracy-debias tradeoff and the sensitivity to extensive parameter selection, further exacerbated by the extreme sparsity in positive user-item interactions. In this paper, we present a \textbf{Pop}ularity-aware top-$K$ recommendation algorithm integrating multi-behavior \textbf{S}ide \textbf{I}nformation (PopSI), aiming to enhance recommendation accuracy and debias performance simultaneously. Specifically, by leveraging multiple user feedback that mirrors similar user preferences and formulating it as a three-dimensional tensor, PopSI can utilize all slices to capture the desiring user preferences effectively. Subsequently, we introduced a novel orthogonality constraint to refine the estimated item feature space, enforcing it to be invariant to item popularity features thereby addressing our model's sensitivity to popularity bias. Comprehensive experiments on real-world e-commerce datasets demonstrate the general improvements of PopSI over state-of-the-art debias methods with a marginal accuracy-debias tradeoff and scalability to practical applications. The source code for our algorithm and experiments is available at \url{https://github.com/Eason-sys/PopSI}.
Abstract:Point cloud video (PCV) is a versatile 3D representation of dynamic scenes with many emerging applications. This paper introduces U-Motion, a learning-based compression scheme for both PCV geometry and attributes. We propose a U-Structured multiscale inter-frame prediction framework, U-Inter, which performs layer-wise explicit motion estimation and compensation (ME/MC) at different scales with varying levels of detail. It integrates both higher and lower-scale motion features, in addition to the information of current and previous frames, to enable accurate motion estimation at the current scale. In addition, we design a cascaded spatial predictive coding module to capture the inter-scale spatial redundancy remaining after U-Inter prediction. We further propose an effective context detach and restore scheme to reduce spatial-temporal redundancy in the motion and latent bit-streams and improve compression performance. We conduct experiments following the MPEG Common Test Condition and demonstrate that U-Motion can achieve significant gains over MPEG G-PCC-GesTM v3.0 and recently published learning-based methods for both geometry and attribute compression.
Abstract:This paper presents the NPU-HWC system submitted to the ISCSLP 2024 Inspirational and Convincing Audio Generation Challenge 2024 (ICAGC). Our system consists of two modules: a speech generator for Track 1 and a background audio generator for Track 2. In Track 1, we employ Single-Codec to tokenize the speech into discrete tokens and use a language-model-based approach to achieve zero-shot speaking style cloning. The Single-Codec effectively decouples timbre and speaking style at the token level, reducing the acoustic modeling burden on the autoregressive language model. Additionally, we use DSPGAN to upsample 16 kHz mel-spectrograms to high-fidelity 48 kHz waveforms. In Track 2, we propose a background audio generator based on large language models (LLMs). This system produces scene-appropriate accompaniment descriptions, synthesizes background audio with Tango 2, and integrates it with the speech generated by our Track 1 system. Our submission achieves the second place and the first place in Track 1 and Track 2 respectively.