Abstract:Real-world scientific applications frequently encounter incomplete observational data due to sensor limitations, geographic constraints, or measurement costs. Although neural operators significantly advanced PDE solving in terms of computational efficiency and accuracy, their underlying assumption of fully-observed spatial inputs severely restricts applicability in real-world applications. We introduce the first systematic framework for learning neural operators from partial observation. We identify and formalize two fundamental obstacles: (i) the supervision gap in unobserved regions that prevents effective learning of physical correlations, and (ii) the dynamic spatial mismatch between incomplete inputs and complete solution fields. Specifically, our proposed Latent Autoregressive Neural Operator(LANO) introduces two novel components designed explicitly to address the core difficulties of partial observations: (i) a mask-to-predict training strategy that creates artificial supervision by strategically masking observed regions, and (ii) a Physics-Aware Latent Propagator that reconstructs solutions through boundary-first autoregressive generation in latent space. Additionally, we develop POBench-PDE, a dedicated and comprehensive benchmark designed specifically for evaluating neural operators under partial observation conditions across three PDE-governed tasks. LANO achieves state-of-the-art performance with 18--69$\%$ relative L2 error reduction across all benchmarks under patch-wise missingness with less than 50$\%$ missing rate, including real-world climate prediction. Our approach effectively addresses practical scenarios involving up to 75$\%$ missing rate, to some extent bridging the existing gap between idealized research settings and the complexities of real-world scientific computing.
Abstract:Posts in software Q\&A sites often consist of three main parts: title, description and code, which are interconnected and jointly describe the question. Existing tag recommendation methods often treat different modalities as a whole or inadequately consider the interaction between different modalities. Additionally, they focus on extracting information directly from the post itself, neglecting the information from external knowledge sources. Therefore, we propose a Retrieval Augmented Cross-Modal (RACM) Tag Recommendation Model in Software Q\&A Sites. Specifically, we first use the input post as a query and enhance the representation of different modalities by retrieving information from external knowledge sources. For the retrieval-augmented representations, we employ a cross-modal context-aware attention to leverage the main modality description for targeted feature extraction across the submodalities title and code. In the fusion process, a gate mechanism is employed to achieve fine-grained feature selection, controlling the amount of information extracted from the submodalities. Finally, the fused information is used for tag recommendation. Experimental results on three real-world datasets demonstrate that our model outperforms the state-of-the-art counterparts.
Abstract:With the development of Internet technology and the expansion of social networks, online platforms have become an important way for people to obtain information. The introduction of tags facilitates information categorization and retrieval. Meanwhile, the development of tag recommendation systems not only enables users to input tags more efficiently, but also improves the quality of tags. However, current tag recommendation methods only consider the content of the current post and do not take into account the influence of user preferences. Since the main body of tag recommendation is the user, it is very necessary to obtain the user's tagging habits. Therefore, this paper proposes a tag recommendation algorithm (MLP4STR) based on the dynamic preference of user's behavioral sequence, which models the user's historical post information and historical tag information to obtain the user's dynamic interest changes. A pure MLP structure across feature dimensions is used in sequence modeling to model the interaction between tag content and post content to fully extract the user's interests. Finally tag recommendation is performed.




Abstract:Multi-label text classification (MLTC) is one of the key tasks in natural language processing. It aims to assign multiple target labels to one document. Due to the uneven popularity of labels, the number of documents per label follows a long-tailed distribution in most cases. It is much more challenging to learn classifiers for data-scarce tail labels than for data-rich head labels. The main reason is that head labels usually have sufficient information, e.g., a large intra-class diversity, while tail labels do not. In response, we propose a Pairwise Instance Relation Augmentation Network (PIRAN) to augment tailed-label documents for balancing tail labels and head labels. PIRAN consists of a relation collector and an instance generator. The former aims to extract the document pairwise relations from head labels. Taking these relations as perturbations, the latter tries to generate new document instances in high-level feature space around the limited given tailed-label instances. Meanwhile, two regularizers (diversity and consistency) are designed to constrain the generation process. The consistency-regularizer encourages the variance of tail labels to be close to head labels and further balances the whole datasets. And diversity-regularizer makes sure the generated instances have diversity and avoids generating redundant instances. Extensive experimental results on three benchmark datasets demonstrate that PIRAN consistently outperforms the SOTA methods, and dramatically improves the performance of tail labels.