Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Jin Shi

Intern-S1-Pro: Scientific Multimodal Foundation Model at Trillion Scale

Mar 26, 2026

Yicheng Zou, Dongsheng Zhu, Lin Zhu, Tong Zhu, Yunhua Zhou, Peiheng Zhou, Xinyu Zhou, Dongzhan Zhou, Zhiwang Zhou, Yuhao Zhou(+164 more)

Abstract:We introduce Intern-S1-Pro, the first one-trillion-parameter scientific multimodal foundation model. Scaling to this unprecedented size, the model delivers a comprehensive enhancement across both general and scientific domains. Beyond stronger reasoning and image-text understanding capabilities, its intelligence is augmented with advanced agent capabilities. Simultaneously, its scientific expertise has been vastly expanded to master over 100 specialized tasks across critical science fields, including chemistry, materials, life sciences, and earth sciences. Achieving this massive scale is made possible by the robust infrastructure support of XTuner and LMDeploy, which facilitates highly efficient Reinforcement Learning (RL) training at the 1-trillion parameter level while ensuring strict precision consistency between training and inference. By seamlessly integrating these advancements, Intern-S1-Pro further fortifies the fusion of general and specialized intelligence, working as a Specializable Generalist, demonstrating its position in the top tier of open-source models for general capabilities, while outperforming proprietary models in the depth of specialized scientific tasks.

Via

Access Paper or Ask Questions

OmniDocBench: Benchmarking Diverse PDF Document Parsing with Comprehensive Annotations

Dec 10, 2024

Linke Ouyang, Yuan Qu, Hongbin Zhou, Jiawei Zhu, Rui Zhang, Qunshu Lin, Bin Wang, Zhiyuan Zhao, Man Jiang, Xiaomeng Zhao(+10 more)

Figure 1 for OmniDocBench: Benchmarking Diverse PDF Document Parsing with Comprehensive Annotations

Figure 2 for OmniDocBench: Benchmarking Diverse PDF Document Parsing with Comprehensive Annotations

Figure 3 for OmniDocBench: Benchmarking Diverse PDF Document Parsing with Comprehensive Annotations

Figure 4 for OmniDocBench: Benchmarking Diverse PDF Document Parsing with Comprehensive Annotations

Abstract:Document content extraction is crucial in computer vision, especially for meeting the high-quality data needs of large language models (LLMs) and retrieval-augmented generation (RAG) technologies. However, current document parsing methods suffer from significant limitations in terms of diversity and comprehensive evaluation. To address these challenges, we introduce OmniDocBench, a novel multi-source benchmark designed to advance automated document content extraction. OmniDocBench includes a meticulously curated and annotated high-quality evaluation dataset comprising nine diverse document types, such as academic papers, textbooks, slides, among others. Our benchmark provides a flexible and comprehensive evaluation framework with 19 layout category labels and 14 attribute labels, enabling multi-level assessments across entire datasets, individual modules, or specific data types. Using OmniDocBench, we perform an exhaustive comparative analysis of existing modular pipelines and multimodal end-to-end methods, highlighting their limitations in handling document diversity and ensuring fair evaluation. OmniDocBench establishes a robust, diverse, and fair evaluation standard for the document content extraction field, offering crucial insights for future advancements and fostering the development of document parsing technologies. The codes and dataset is available in https://github.com/opendatalab/OmniDocBench.

Via

Access Paper or Ask Questions

Continuous Geometry-Aware Graph Diffusion via Hyperbolic Neural PDE

Jun 03, 2024

Jiaxu Liu, Xinping Yi, Sihao Wu, Xiangyu Yin, Tianle Zhang, Xiaowei Huang, Jin Shi

Figure 1 for Continuous Geometry-Aware Graph Diffusion via Hyperbolic Neural PDE

Figure 2 for Continuous Geometry-Aware Graph Diffusion via Hyperbolic Neural PDE

Figure 3 for Continuous Geometry-Aware Graph Diffusion via Hyperbolic Neural PDE

Figure 4 for Continuous Geometry-Aware Graph Diffusion via Hyperbolic Neural PDE

Abstract:While Hyperbolic Graph Neural Network (HGNN) has recently emerged as a powerful tool dealing with hierarchical graph data, the limitations of scalability and efficiency hinder itself from generalizing to deep models. In this paper, by envisioning depth as a continuous-time embedding evolution, we decouple the HGNN and reframe the information propagation as a partial differential equation, letting node-wise attention undertake the role of diffusivity within the Hyperbolic Neural PDE (HPDE). By introducing theoretical principles \textit{e.g.,} field and flow, gradient, divergence, and diffusivity on a non-Euclidean manifold for HPDE integration, we discuss both implicit and explicit discretization schemes to formulate numerical HPDE solvers. Further, we propose the Hyperbolic Graph Diffusion Equation (HGDE) -- a flexible vector flow function that can be integrated to obtain expressive hyperbolic node embeddings. By analyzing potential energy decay of embeddings, we demonstrate that HGDE is capable of modeling both low- and high-order proximity with the benefit of local-global diffusivity functions. Experiments on node classification and link prediction and image-text classification tasks verify the superiority of the proposed method, which consistently outperforms various competitive models by a significant margin.

* The short version of this work will appear in the Proceedings of the 2024 European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML-PKDD 2024)

Via

Access Paper or Ask Questions

WanJuan-CC: A Safe and High-Quality Open-sourced English Webtext Dataset

Mar 12, 2024

Jiantao Qiu, Haijun Lv, Zhenjiang Jin, Rui Wang, Wenchang Ning, Jia Yu, ChaoBin Zhang, Zhenxiang Li, Pei Chu, Yuan Qu(+14 more)

Figure 1 for WanJuan-CC: A Safe and High-Quality Open-sourced English Webtext Dataset

Figure 2 for WanJuan-CC: A Safe and High-Quality Open-sourced English Webtext Dataset

Figure 3 for WanJuan-CC: A Safe and High-Quality Open-sourced English Webtext Dataset

Figure 4 for WanJuan-CC: A Safe and High-Quality Open-sourced English Webtext Dataset

Abstract:This paper presents WanJuan-CC, a safe and high-quality open-sourced English webtext dataset derived from Common Crawl data. The study addresses the challenges of constructing large-scale pre-training datasets for language models, which require vast amounts of high-quality data. A comprehensive process was designed to handle Common Crawl data, including extraction, heuristic rule filtering, fuzzy deduplication, content safety filtering, and data quality filtering. From approximately 68 billion original English documents, we obtained 2.22T Tokens of safe data and selected 1.0T Tokens of high-quality data as part of WanJuan-CC. We have open-sourced 100B Tokens from this dataset. The paper also provides statistical information related to data quality, enabling users to select appropriate data according to their needs. To evaluate the quality and utility of the dataset, we trained 1B-parameter and 3B-parameter models using WanJuan-CC and another dataset, RefinedWeb. Results show that WanJuan-CC performs better on validation datasets and downstream tasks.

Via

Access Paper or Ask Questions

Deep Learning for MIMO Channel Estimation: Interpretation, Performance, and Comparison

Nov 05, 2019

Hu Qiang, Gao Feifei, Zhang Hao, Jin Shi, Li Geoffrey Ye

Figure 1 for Deep Learning for MIMO Channel Estimation: Interpretation, Performance, and Comparison

Figure 2 for Deep Learning for MIMO Channel Estimation: Interpretation, Performance, and Comparison

Figure 3 for Deep Learning for MIMO Channel Estimation: Interpretation, Performance, and Comparison

Figure 4 for Deep Learning for MIMO Channel Estimation: Interpretation, Performance, and Comparison

Abstract:Deep learning (DL) has emerged as an effective tool for channel estimation in wireless communication systems, especially under some imperfect environments. However, even with such unprecedented success, DL methods still serve as black boxes and the lack of explanations on their internal mechanism severely limits further improvement and extension. In this paper, we present a preliminary theoretical analysis on DL based channel estimation for multiple-antenna systems to understand and interpret its internal mechanism. Deep neural network (DNN) with rectified linear unit (ReLU) activation function is mathematically equivalent to a set of local linear functions corresponding to different input regions. Hence, the DL estimator built on it can achieve universal approximation to a large family of functions by making efficient use of piecewise linearity. We demonstrate that DL based channel estimation does not restrict to any specific signal model and will approach to the minimum mean-squared error (MMSE) estimation in various scenarios without requiring any prior knowledge of channel statistics. Therefore, DL based channel estimation outperforms or is comparable with traditional channel estimation. Simulation results confirm the accuracy of the proposed interpretation and demonstrate the effectiveness of DL based channel estimation under both linear and nonlinear signal models.

* An interpretation to DL based channel estimation

Via

Access Paper or Ask Questions