Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Wei Yang

Huazhong University of Science and Technology, Wuhan, China

LLM4SR: A Survey on Large Language Models for Scientific Research

Jan 08, 2025

Ziming Luo, Zonglin Yang, Zexin Xu, Wei Yang, Xinya Du

Figure 1 for LLM4SR: A Survey on Large Language Models for Scientific Research

Figure 2 for LLM4SR: A Survey on Large Language Models for Scientific Research

Figure 3 for LLM4SR: A Survey on Large Language Models for Scientific Research

Figure 4 for LLM4SR: A Survey on Large Language Models for Scientific Research

Abstract:In recent years, the rapid advancement of Large Language Models (LLMs) has transformed the landscape of scientific research, offering unprecedented support across various stages of the research cycle. This paper presents the first systematic survey dedicated to exploring how LLMs are revolutionizing the scientific research process. We analyze the unique roles LLMs play across four critical stages of research: hypothesis discovery, experiment planning and implementation, scientific writing, and peer reviewing. Our review comprehensively showcases the task-specific methodologies and evaluation benchmarks. By identifying current challenges and proposing future research directions, this survey not only highlights the transformative potential of LLMs, but also aims to inspire and guide researchers and practitioners in leveraging LLMs to advance scientific inquiry. Resources are available at the following repository: https://github.com/du-nlp-lab/LLM4SR

Via

Access Paper or Ask Questions

Cosmos World Foundation Model Platform for Physical AI

Jan 07, 2025

NVIDIA, :, Niket Agarwal, Arslan Ali, Maciej Bala, Yogesh Balaji, Erik Barker, Tiffany Cai, Prithvijit Chattopadhyay, Yongxin Chen(+69 more)

Figure 1 for Cosmos World Foundation Model Platform for Physical AI

Figure 2 for Cosmos World Foundation Model Platform for Physical AI

Figure 3 for Cosmos World Foundation Model Platform for Physical AI

Figure 4 for Cosmos World Foundation Model Platform for Physical AI

Abstract:Physical AI needs to be trained digitally first. It needs a digital twin of itself, the policy model, and a digital twin of the world, the world model. In this paper, we present the Cosmos World Foundation Model Platform to help developers build customized world models for their Physical AI setups. We position a world foundation model as a general-purpose world model that can be fine-tuned into customized world models for downstream applications. Our platform covers a video curation pipeline, pre-trained world foundation models, examples of post-training of pre-trained world foundation models, and video tokenizers. To help Physical AI builders solve the most critical problems of our society, we make our platform open-source and our models open-weight with permissive licenses available via https://github.com/NVIDIA/Cosmos.

Via

Access Paper or Ask Questions

A Large-dimensional Analysis of ESPRIT DoA Estimation: Inconsistency and a Correction via RMT

Jan 06, 2025

Zhengyu Wang, Wei Yang, Xiaoyi Mai, Zenan Ling, Zhenyu Liao, Robert C. Qiu

Figure 1 for A Large-dimensional Analysis of ESPRIT DoA Estimation: Inconsistency and a Correction via RMT

Figure 2 for A Large-dimensional Analysis of ESPRIT DoA Estimation: Inconsistency and a Correction via RMT

Figure 3 for A Large-dimensional Analysis of ESPRIT DoA Estimation: Inconsistency and a Correction via RMT

Figure 4 for A Large-dimensional Analysis of ESPRIT DoA Estimation: Inconsistency and a Correction via RMT

Abstract:In this paper, we perform asymptotic analyses of the widely used ESPRIT direction-of-arrival (DoA) estimator for large arrays, where the array size $N$ and the number of snapshots $T$ grow to infinity at the same pace. In this large-dimensional regime, the sample covariance matrix (SCM) is known to be a poor eigenspectral estimator of the population covariance. We show that the classical ESPRIT algorithm, that relies on the SCM, and as a consequence of the large-dimensional inconsistency of the SCM, produces inconsistent DoA estimates as $N,T \to \infty$ with $N/T \to c \in (0,\infty)$, for both widely- and closely-spaced DoAs. Leveraging tools from random matrix theory (RMT), we propose an improved G-ESPRIT method and prove its consistency in the same large-dimensional setting. From a technical perspective, we derive a novel bound on the eigenvalue differences between two potentially non-Hermitian random matrices, which may be of independent interest. Numerical simulations are provided to corroborate our theoretical findings.

* 25 pages, 8 figures. Part of this work was presented at the IEEE 32nd European Signal Processing Conference (EUSIPCO 2024), Lyon, France, under the title "Inconsistency of ESPRIT DoA Estimation for Large Arrays and a Correction via RMT."

Via

Access Paper or Ask Questions

Video Anomaly Detection with Motion and Appearance Guided Patch Diffusion Model

Dec 12, 2024

Hang Zhou, Jiale Cai, Yuteng Ye, Yonghui Feng, Chenxing Gao, Junqing Yu, Zikai Song, Wei Yang

Figure 1 for Video Anomaly Detection with Motion and Appearance Guided Patch Diffusion Model

Figure 2 for Video Anomaly Detection with Motion and Appearance Guided Patch Diffusion Model

Figure 3 for Video Anomaly Detection with Motion and Appearance Guided Patch Diffusion Model

Figure 4 for Video Anomaly Detection with Motion and Appearance Guided Patch Diffusion Model

Abstract:A recent endeavor in one class of video anomaly detection is to leverage diffusion models and posit the task as a generation problem, where the diffusion model is trained to recover normal patterns exclusively, thus reporting abnormal patterns as outliers. Yet, existing attempts neglect the various formations of anomaly and predict normal samples at the feature level regardless that abnormal objects in surveillance videos are often relatively small. To address this, a novel patch-based diffusion model is proposed, specifically engineered to capture fine-grained local information. We further observe that anomalies in videos manifest themselves as deviations in both appearance and motion. Therefore, we argue that a comprehensive solution must consider both of these aspects simultaneously to achieve accurate frame prediction. To address this, we introduce innovative motion and appearance conditions that are seamlessly integrated into our patch diffusion model. These conditions are designed to guide the model in generating coherent and contextually appropriate predictions for both semantic content and motion relations. Experimental results in four challenging video anomaly detection datasets empirically substantiate the efficacy of our proposed approach, demonstrating that it consistently outperforms most existing methods in detecting abnormal behaviors.

* Accept by AAAI2025

Via

Access Paper or Ask Questions

ProGDF: Progressive Gaussian Differential Field for Controllable and Flexible 3D Editing

Dec 11, 2024

Yian Zhao, Wanshi Xu, Yang Wu, Weiheng Huang, Zhongqian Sun, Wei Yang

Figure 1 for ProGDF: Progressive Gaussian Differential Field for Controllable and Flexible 3D Editing

Figure 2 for ProGDF: Progressive Gaussian Differential Field for Controllable and Flexible 3D Editing

Figure 3 for ProGDF: Progressive Gaussian Differential Field for Controllable and Flexible 3D Editing

Figure 4 for ProGDF: Progressive Gaussian Differential Field for Controllable and Flexible 3D Editing

Abstract:3D editing plays a crucial role in editing and reusing existing 3D assets, thereby enhancing productivity. Recently, 3DGS-based methods have gained increasing attention due to their efficient rendering and flexibility. However, achieving desired 3D editing results often requires multiple adjustments in an iterative loop, resulting in tens of minutes of training time cost for each attempt and a cumbersome trial-and-error cycle for users. This in-the-loop training paradigm results in a poor user experience. To address this issue, we introduce the concept of process-oriented modelling for 3D editing and propose the Progressive Gaussian Differential Field (ProGDF), an out-of-loop training approach that requires only a single training session to provide users with controllable editing capability and variable editing results through a user-friendly interface in real-time. ProGDF consists of two key components: Progressive Gaussian Splatting (PGS) and Gaussian Differential Field (GDF). PGS introduces the progressive constraint to extract the diverse intermediate results of the editing process and employs rendering quality regularization to improve the quality of these results. Based on these intermediate results, GDF leverages a lightweight neural network to model the editing process. Extensive results on two novel applications, namely controllable 3D editing and flexible fine-grained 3D manipulation, demonstrate the effectiveness, practicality and flexibility of the proposed ProGDF.

Via

Access Paper or Ask Questions

Playable Game Generation

Dec 01, 2024

Mingyu Yang, Junyou Li, Zhongbin Fang, Sheng Chen, Yangbin Yu, Qiang Fu, Wei Yang, Deheng Ye

Abstract:In recent years, Artificial Intelligence Generated Content (AIGC) has advanced from text-to-image generation to text-to-video and multimodal video synthesis. However, generating playable games presents significant challenges due to the stringent requirements for real-time interaction, high visual quality, and accurate simulation of game mechanics. Existing approaches often fall short, either lacking real-time capabilities or failing to accurately simulate interactive mechanics. To tackle the playability issue, we propose a novel method called \emph{PlayGen}, which encompasses game data generation, an autoregressive DiT-based diffusion model, and a comprehensive playability-based evaluation framework. Validated on well-known 2D and 3D games, PlayGen achieves real-time interaction, ensures sufficient visual quality, and provides accurate interactive mechanics simulation. Notably, these results are sustained even after over 1000 frames of gameplay on an NVIDIA RTX 2060 GPU. Our code is publicly available: https://github.com/GreatX3/Playable-Game-Generation. Our playable demo generated by AI is: http://124.156.151.207.

Via

Access Paper or Ask Questions

Ref-GS: Directional Factorization for 2D Gaussian Splatting

Dec 01, 2024

Youjia Zhang, Anpei Chen, Yumin Wan, Zikai Song, Junqing Yu, Yawei Luo, Wei Yang

Figure 1 for Ref-GS: Directional Factorization for 2D Gaussian Splatting

Figure 2 for Ref-GS: Directional Factorization for 2D Gaussian Splatting

Figure 3 for Ref-GS: Directional Factorization for 2D Gaussian Splatting

Figure 4 for Ref-GS: Directional Factorization for 2D Gaussian Splatting

Abstract:In this paper, we introduce Ref-GS, a novel approach for directional light factorization in 2D Gaussian splatting, which enables photorealistic view-dependent appearance rendering and precise geometry recovery. Ref-GS builds upon the deferred rendering of Gaussian splatting and applies directional encoding to the deferred-rendered surface, effectively reducing the ambiguity between orientation and viewing angle. Next, we introduce a spherical Mip-grid to capture varying levels of surface roughness, enabling roughness-aware Gaussian shading. Additionally, we propose a simple yet efficient geometry-lighting factorization that connects geometry and lighting via the vector outer product, significantly reducing renderer overhead when integrating volumetric attributes. Our method achieves superior photorealistic rendering for a range of open-world scenes while also accurately recovering geometry.

* Project page: https://ref-gs.github.io/

Via

Access Paper or Ask Questions

StdGEN: Semantic-Decomposed 3D Character Generation from Single Images

Nov 08, 2024

Yuze He, Yanning Zhou, Wang Zhao, Zhongkai Wu, Kaiwen Xiao, Wei Yang, Yong-Jin Liu, Xiao Han

Figure 1 for StdGEN: Semantic-Decomposed 3D Character Generation from Single Images

Figure 2 for StdGEN: Semantic-Decomposed 3D Character Generation from Single Images

Figure 3 for StdGEN: Semantic-Decomposed 3D Character Generation from Single Images

Figure 4 for StdGEN: Semantic-Decomposed 3D Character Generation from Single Images

Abstract:We present StdGEN, an innovative pipeline for generating semantically decomposed high-quality 3D characters from single images, enabling broad applications in virtual reality, gaming, and filmmaking, etc. Unlike previous methods which struggle with limited decomposability, unsatisfactory quality, and long optimization times, StdGEN features decomposability, effectiveness and efficiency; i.e., it generates intricately detailed 3D characters with separated semantic components such as the body, clothes, and hair, in three minutes. At the core of StdGEN is our proposed Semantic-aware Large Reconstruction Model (S-LRM), a transformer-based generalizable model that jointly reconstructs geometry, color and semantics from multi-view images in a feed-forward manner. A differentiable multi-layer semantic surface extraction scheme is introduced to acquire meshes from hybrid implicit fields reconstructed by our S-LRM. Additionally, a specialized efficient multi-view diffusion model and an iterative multi-layer surface refinement module are integrated into the pipeline to facilitate high-quality, decomposable 3D character generation. Extensive experiments demonstrate our state-of-the-art performance in 3D anime character generation, surpassing existing baselines by a significant margin in geometry, texture and decomposability. StdGEN offers ready-to-use semantic-decomposed 3D characters and enables flexible customization for a wide range of applications. Project page: https://stdgen.github.io

* 13 pages, 10 figures

Via

Access Paper or Ask Questions

IP-MOT: Instance Prompt Learning for Cross-Domain Multi-Object Tracking

Oct 30, 2024

Run Luo, Zikai Song, Longze Chen, Yunshui Li, Min Yang, Wei Yang

Abstract:Multi-Object Tracking (MOT) aims to associate multiple objects across video frames and is a challenging vision task due to inherent complexities in the tracking environment. Most existing approaches train and track within a single domain, resulting in a lack of cross-domain generalizability to data from other domains. While several works have introduced natural language representation to bridge the domain gap in visual tracking, these textual descriptions often provide too high-level a view and fail to distinguish various instances within the same class. In this paper, we address this limitation by developing IP-MOT, an end-to-end transformer model for MOT that operates without concrete textual descriptions. Our approach is underpinned by two key innovations: Firstly, leveraging a pre-trained vision-language model, we obtain instance-level pseudo textual descriptions via prompt-tuning, which are invariant across different tracking scenes; Secondly, we introduce a query-balanced strategy, augmented by knowledge distillation, to further boost the generalization capabilities of our model. Extensive experiments conducted on three widely used MOT benchmarks, including MOT17, MOT20, and DanceTrack, demonstrate that our approach not only achieves competitive performance on same-domain data compared to state-of-the-art models but also significantly improves the performance of query-based trackers by large margins for cross-domain inputs.

Via

Access Paper or Ask Questions

Beyond Forecasting: Compositional Time Series Reasoning for End-to-End Task Execution

Oct 08, 2024

Wen Ye, Yizhou Zhang, Wei Yang, Lumingyuan Tang, Defu Cao, Jie Cai, Yan Liu

Figure 1 for Beyond Forecasting: Compositional Time Series Reasoning for End-to-End Task Execution

Figure 2 for Beyond Forecasting: Compositional Time Series Reasoning for End-to-End Task Execution

Figure 3 for Beyond Forecasting: Compositional Time Series Reasoning for End-to-End Task Execution

Figure 4 for Beyond Forecasting: Compositional Time Series Reasoning for End-to-End Task Execution

Abstract:In recent decades, there has been substantial advances in time series models and benchmarks across various individual tasks, such as time series forecasting, classification, and anomaly detection. Meanwhile, compositional reasoning in time series is prevalent in real-world applications (e.g., decision-making and compositional question answering) and is in great demand. Unlike simple tasks that primarily focus on predictive accuracy, compositional reasoning emphasizes the synthesis of diverse information from both time series data and various domain knowledge, making it distinct and extremely more challenging. In this paper, we introduce Compositional Time Series Reasoning, a new task of handling intricate multistep reasoning tasks from time series data. Specifically, this new task focuses on various question instances requiring structural and compositional reasoning abilities on time series data, such as decision-making and compositional question answering. As an initial attempt to tackle this novel task, we developed TS-Reasoner, a program-aided approach that utilizes large language model (LLM) to decompose a complex task into steps of programs that leverage existing time series models and numerical subroutines. Unlike existing reasoning work which only calls off-the-shelf modules, TS-Reasoner allows for the creation of custom modules and provides greater flexibility to incorporate domain knowledge as well as user-specified constraints. We demonstrate the effectiveness of our method through a comprehensive set of experiments. These promising results indicate potential opportunities in the new task of time series reasoning and highlight the need for further research.

Via

Access Paper or Ask Questions