Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Xin Chen

Univ. California, Santa Barbara

SciDFM: A Large Language Model with Mixture-of-Experts for Science

Sep 27, 2024

Liangtai Sun, Danyu Luo, Da Ma, Zihan Zhao, Baocai Chen, Zhennan Shen, Su Zhu, Lu Chen, Xin Chen, Kai Yu

Figure 1 for SciDFM: A Large Language Model with Mixture-of-Experts for Science

Figure 2 for SciDFM: A Large Language Model with Mixture-of-Experts for Science

Figure 3 for SciDFM: A Large Language Model with Mixture-of-Experts for Science

Figure 4 for SciDFM: A Large Language Model with Mixture-of-Experts for Science

Abstract:Recently, there has been a significant upsurge of interest in leveraging large language models (LLMs) to assist scientific discovery. However, most LLMs only focus on general science, while they lack domain-specific knowledge, such as chemical molecules and amino acid sequences. To bridge these gaps, we introduce SciDFM, a mixture-of-experts LLM, which is trained from scratch and is able to conduct college-level scientific reasoning and understand molecules and amino acid sequences. We collect a large-scale training corpus containing numerous scientific papers and books from different disciplines as well as data from domain-specific databases. We further fine-tune the pre-trained model on lots of instruction data to improve performances on downstream benchmarks. From experiment results, we show that SciDFM achieves strong performance on general scientific benchmarks such as SciEval and SciQ, and it reaches a SOTA performance on domain-specific benchmarks among models of similar size. We further analyze the expert layers and show that the results of expert selection vary with data from different disciplines. To benefit the broader research community, we open-source SciDFM at https://huggingface.co/OpenDFM/SciDFM-MoE-A5.6B-v1.0.

* 12 pages, 1 figure, 9 tables. Technical Report, Under Review

Via

Access Paper or Ask Questions

Landscape of Policy Optimization for Finite Horizon MDPs with General State and Action

Sep 25, 2024

Xin Chen, Yifan Hu, Minda Zhao

Abstract:Policy gradient methods are widely used in reinforcement learning. Yet, the nonconvexity of policy optimization imposes significant challenges in understanding the global convergence of policy gradient methods. For a class of finite-horizon Markov Decision Processes (MDPs) with general state and action spaces, we develop a framework that provides a set of easily verifiable assumptions to ensure the Kurdyka-Lojasiewicz (KL) condition of the policy optimization. Leveraging the KL condition, policy gradient methods converge to the globally optimal policy with a non-asymptomatic rate despite nonconvexity. Our results find applications in various control and operations models, including entropy-regularized tabular MDPs, Linear Quadratic Regulator (LQR) problems, stochastic inventory models, and stochastic cash balance problems, for which we show an $\epsilon$-optimal policy can be obtained using a sample size in $\tilde{\mathcal{O}}(\epsilon^{-1})$ and polynomial in terms of the planning horizon by stochastic policy gradient methods. Our result establishes the first sample complexity for multi-period inventory systems with Markov-modulated demands and stochastic cash balance problems in the literature.

Via

Access Paper or Ask Questions

Efficient Polarization Demosaicking via Low-cost Edge-aware and Inter-channel Correlation

Aug 30, 2024

Guangsen Liu, Peng Rao, Xin Chen, Yao Li, Haixin Jiang

Figure 1 for Efficient Polarization Demosaicking via Low-cost Edge-aware and Inter-channel Correlation

Figure 2 for Efficient Polarization Demosaicking via Low-cost Edge-aware and Inter-channel Correlation

Figure 3 for Efficient Polarization Demosaicking via Low-cost Edge-aware and Inter-channel Correlation

Figure 4 for Efficient Polarization Demosaicking via Low-cost Edge-aware and Inter-channel Correlation

Abstract:Efficient and high-fidelity polarization demosaicking is critical for industrial applications of the division of focal plane (DoFP) polarization imaging systems. However, existing methods have an unsatisfactory balance of speed, accuracy, and complexity. This study introduces a novel polarization demosaicking algorithm that interpolates within a three-stage basic demosaicking framework to obtain DoFP images. Our method incorporates a DoFP low-cost edge-aware technique (DLE) to guide the interpolation process. Furthermore, the inter-channel correlation is used to calibrate the initial estimate in the polarization difference domain. The proposed algorithm is available in both a lightweight and a full version, tailored to different application requirements. Experiments on simulated and real DoFP images demonstrate that our two methods have the highest interpolation accuracy and speed, respectively, and significantly enhance the visuals. Both versions efficiently process a 1024*1024 image on an AMD Ryzen 5600X CPU in 0.1402s and 0.2693s, respectively. Additionally, since our methods only involve computational processes within a 5*5 window, the potential for parallel acceleration on GPUs or FPGAs is highly feasible.

* 15 pages, 9 figures

Via

Access Paper or Ask Questions

Multi-level Monte-Carlo Gradient Methods for Stochastic Optimization with Biased Oracles

Aug 20, 2024

Yifan Hu, Jie Wang, Xin Chen, Niao He

Figure 1 for Multi-level Monte-Carlo Gradient Methods for Stochastic Optimization with Biased Oracles

Figure 2 for Multi-level Monte-Carlo Gradient Methods for Stochastic Optimization with Biased Oracles

Figure 3 for Multi-level Monte-Carlo Gradient Methods for Stochastic Optimization with Biased Oracles

Figure 4 for Multi-level Monte-Carlo Gradient Methods for Stochastic Optimization with Biased Oracles

Abstract:We consider stochastic optimization when one only has access to biased stochastic oracles of the objective and the gradient, and obtaining stochastic gradients with low biases comes at high costs. This setting captures various optimization paradigms, such as conditional stochastic optimization, distributionally robust optimization, shortfall risk optimization, and machine learning paradigms, such as contrastive learning. We examine a family of multi-level Monte Carlo (MLMC) gradient methods that exploit a delicate tradeoff among bias, variance, and oracle cost. We systematically study their total sample and computational complexities for strongly convex, convex, and nonconvex objectives and demonstrate their superiority over the widely used biased stochastic gradient method. When combined with the variance reduction techniques like SPIDER, these MLMC gradient methods can further reduce the complexity in the nonconvex regime. Our results imply that a series of stochastic optimization problems with biased oracles, previously considered to be more challenging, is fundamentally no harder than the classical stochastic optimization with unbiased oracles. We also delineate the boundary conditions under which these problems become more difficult. Moreover, MLMC gradient methods significantly improve the best-known complexities in the literature for conditional stochastic optimization and shortfall risk optimization. Our extensive numerical experiments on distributionally robust optimization, pricing and staffing scheduling problems, and contrastive learning demonstrate the superior performance of MLMC gradient methods.

* A preliminary version of this manuscript has appeared in a conference proceeding. Please refer to Yifan Hu, Xin Chen, and Niao He. On the bias-variance-cost tradeoff of stochastic optimization. Advances in Neural Information Processing Systems, 2021

Via

Access Paper or Ask Questions

Scene123: One Prompt to 3D Scene Generation via Video-Assisted and Consistency-Enhanced MAE

Aug 10, 2024

Yiying Yang, Fukun Yin, Jiayuan Fan, Xin Chen, Wanzhang Li, Gang Yu

Figure 1 for Scene123: One Prompt to 3D Scene Generation via Video-Assisted and Consistency-Enhanced MAE

Figure 2 for Scene123: One Prompt to 3D Scene Generation via Video-Assisted and Consistency-Enhanced MAE

Figure 3 for Scene123: One Prompt to 3D Scene Generation via Video-Assisted and Consistency-Enhanced MAE

Figure 4 for Scene123: One Prompt to 3D Scene Generation via Video-Assisted and Consistency-Enhanced MAE

Abstract:As Artificial Intelligence Generated Content (AIGC) advances, a variety of methods have been developed to generate text, images, videos, and 3D objects from single or multimodal inputs, contributing efforts to emulate human-like cognitive content creation. However, generating realistic large-scale scenes from a single input presents a challenge due to the complexities involved in ensuring consistency across extrapolated views generated by models. Benefiting from recent video generation models and implicit neural representations, we propose Scene123, a 3D scene generation model, that not only ensures realism and diversity through the video generation framework but also uses implicit neural fields combined with Masked Autoencoders (MAE) to effectively ensures the consistency of unseen areas across views. Specifically, we initially warp the input image (or an image generated from text) to simulate adjacent views, filling the invisible areas with the MAE model. However, these filled images usually fail to maintain view consistency, thus we utilize the produced views to optimize a neural radiance field, enhancing geometric consistency. Moreover, to further enhance the details and texture fidelity of generated views, we employ a GAN-based Loss against images derived from the input image through the video generation model. Extensive experiments demonstrate that our method can generate realistic and consistent scenes from a single prompt. Both qualitative and quantitative results indicate that our approach surpasses existing state-of-the-art methods. We show encourage video examples at https://yiyingyang12.github.io/Scene123.github.io/.

* arXiv admin note: text overlap with arXiv:2305.11588 by other authors

Via

Access Paper or Ask Questions

AppAgent v2: Advanced Agent for Flexible Mobile Interactions

Aug 05, 2024

Yanda Li, Chi Zhang, Wanqi Yang, Bin Fu, Pei Cheng, Xin Chen, Ling Chen, Yunchao Wei

Figure 1 for AppAgent v2: Advanced Agent for Flexible Mobile Interactions

Figure 2 for AppAgent v2: Advanced Agent for Flexible Mobile Interactions

Figure 3 for AppAgent v2: Advanced Agent for Flexible Mobile Interactions

Figure 4 for AppAgent v2: Advanced Agent for Flexible Mobile Interactions

Abstract:With the advancement of Multimodal Large Language Models (MLLM), LLM-driven visual agents are increasingly impacting software interfaces, particularly those with graphical user interfaces. This work introduces a novel LLM-based multimodal agent framework for mobile devices. This framework, capable of navigating mobile devices, emulates human-like interactions. Our agent constructs a flexible action space that enhances adaptability across various applications including parser, text and vision descriptions. The agent operates through two main phases: exploration and deployment. During the exploration phase, functionalities of user interface elements are documented either through agent-driven or manual explorations into a customized structured knowledge base. In the deployment phase, RAG technology enables efficient retrieval and update from this knowledge base, thereby empowering the agent to perform tasks effectively and accurately. This includes performing complex, multi-step operations across various applications, thereby demonstrating the framework's adaptability and precision in handling customized task workflows. Our experimental results across various benchmarks demonstrate the framework's superior performance, confirming its effectiveness in real-world scenarios. Our code will be open source soon.

Via

Access Paper or Ask Questions

DuA: Dual Attentive Transformer in Long-Term Continuous EEG Emotion Analysis

Jul 30, 2024

Yue Pan, Qile Liu, Qing Liu, Li Zhang, Gan Huang, Xin Chen, Fali Li, Peng Xu, Zhen Liang

Figure 1 for DuA: Dual Attentive Transformer in Long-Term Continuous EEG Emotion Analysis

Figure 2 for DuA: Dual Attentive Transformer in Long-Term Continuous EEG Emotion Analysis

Figure 3 for DuA: Dual Attentive Transformer in Long-Term Continuous EEG Emotion Analysis

Figure 4 for DuA: Dual Attentive Transformer in Long-Term Continuous EEG Emotion Analysis

Abstract:Affective brain-computer interfaces (aBCIs) are increasingly recognized for their potential in monitoring and interpreting emotional states through electroencephalography (EEG) signals. Current EEG-based emotion recognition methods perform well with short segments of EEG data. However, these methods encounter significant challenges in real-life scenarios where emotional states evolve over extended periods. To address this issue, we propose a Dual Attentive (DuA) transformer framework for long-term continuous EEG emotion analysis. Unlike segment-based approaches, the DuA transformer processes an entire EEG trial as a whole, identifying emotions at the trial level, referred to as trial-based emotion analysis. This framework is designed to adapt to varying signal lengths, providing a substantial advantage over traditional methods. The DuA transformer incorporates three key modules: the spatial-spectral network module, the temporal network module, and the transfer learning module. The spatial-spectral network module simultaneously captures spatial and spectral information from EEG signals, while the temporal network module detects temporal dependencies within long-term EEG data. The transfer learning module enhances the model's adaptability across different subjects and conditions. We extensively evaluate the DuA transformer using a self-constructed long-term EEG emotion database, along with two benchmark EEG emotion databases. On the basis of the trial-based leave-one-subject-out cross-subject cross-validation protocol, our experimental results demonstrate that the proposed DuA transformer significantly outperforms existing methods in long-term continuous EEG emotion analysis, with an average enhancement of 5.28%.

* 11 pages, 3 figures

Via

Access Paper or Ask Questions

Exploring and Addressing Reward Confusion in Offline Preference Learning

Jul 22, 2024

Xin Chen, Sam Toyer, Florian Shkurti

Figure 1 for Exploring and Addressing Reward Confusion in Offline Preference Learning

Figure 2 for Exploring and Addressing Reward Confusion in Offline Preference Learning

Figure 3 for Exploring and Addressing Reward Confusion in Offline Preference Learning

Figure 4 for Exploring and Addressing Reward Confusion in Offline Preference Learning

Abstract:Spurious correlations in a reward model's training data can prevent Reinforcement Learning from Human Feedback (RLHF) from identifying the desired goal and induce unwanted behaviors. This paper shows that offline RLHF is susceptible to reward confusion, especially in the presence of spurious correlations in offline data. We create a benchmark to study this problem and propose a method that can significantly reduce reward confusion by leveraging transitivity of preferences while building a global preference chain with active learning.

Via

Access Paper or Ask Questions

GraphScale: A Framework to Enable Machine Learning over Billion-node Graphs

Jul 22, 2024

Vipul Gupta, Xin Chen, Ruoyun Huang, Fanlong Meng, Jianjun Chen, Yujun Yan

Figure 1 for GraphScale: A Framework to Enable Machine Learning over Billion-node Graphs

Figure 2 for GraphScale: A Framework to Enable Machine Learning over Billion-node Graphs

Figure 3 for GraphScale: A Framework to Enable Machine Learning over Billion-node Graphs

Figure 4 for GraphScale: A Framework to Enable Machine Learning over Billion-node Graphs

Abstract:Graph Neural Networks (GNNs) have emerged as powerful tools for supervised machine learning over graph-structured data, while sampling-based node representation learning is widely utilized in unsupervised learning. However, scalability remains a major challenge in both supervised and unsupervised learning for large graphs (e.g., those with over 1 billion nodes). The scalability bottleneck largely stems from the mini-batch sampling phase in GNNs and the random walk sampling phase in unsupervised methods. These processes often require storing features or embeddings in memory. In the context of distributed training, they require frequent, inefficient random access to data stored across different workers. Such repeated inter-worker communication for each mini-batch leads to high communication overhead and computational inefficiency. We propose GraphScale, a unified framework for both supervised and unsupervised learning to store and process large graph data distributedly. The key insight in our design is the separation of workers who store data and those who perform the training. This separation allows us to decouple computing and storage in graph training, thus effectively building a pipeline where data fetching and data computation can overlap asynchronously. Our experiments show that GraphScale outperforms state-of-the-art methods for distributed training of both GNNs and node embeddings. We evaluate GraphScale both on public and proprietary graph datasets and observe a reduction of at least 40% in end-to-end training times compared to popular distributed frameworks, without any loss in performance. While most existing methods don't support billion-node graphs for training node embeddings, GraphScale is currently deployed in production at TikTok enabling efficient learning over such large graphs.

* Proceedings of the 33rd ACM International Conference on Information and Knowledge Management (CIKM 2024), October 21-25, 2024, Boise, ID, USA
* Published in the Proceedings of the 33rd ACM International Conference on Information and Knowledge Management (CIKM 2024), 8 Pages, 12 Figures

Via

Access Paper or Ask Questions

Blind Beamforming for Coverage Enhancement with Intelligent Reflecting Surface

Jul 17, 2024

Fan Xu, Jiawei Yao, Wenhai Lai, Kaiming Shen, Xin Li, Xin Chen, Zhi-Quan Luo

Figure 1 for Blind Beamforming for Coverage Enhancement with Intelligent Reflecting Surface

Figure 2 for Blind Beamforming for Coverage Enhancement with Intelligent Reflecting Surface

Figure 3 for Blind Beamforming for Coverage Enhancement with Intelligent Reflecting Surface

Figure 4 for Blind Beamforming for Coverage Enhancement with Intelligent Reflecting Surface

Abstract:Conventional policy for configuring an intelligent reflecting surface (IRS) typically requires channel state information (CSI), thus incurring substantial overhead costs and facing incompatibility with the current network protocols. This paper proposes a blind beamforming strategy in the absence of CSI, aiming to boost the minimum signal-to-noise ratio (SNR) among all the receiver positions, namely the coverage enhancement. Although some existing works already consider the IRS-assisted coverage enhancement without CSI, they assume certain position-channel models through which the channels can be recovered from the geographic locations. In contrast, our approach solely relies on the received signal power data, not assuming any position-channel model. We examine the achievability and converse of the proposed blind beamforming method. If the IRS has $N$ reflective elements and there are $U$ receiver positions, then our method guarantees the minimum SNR of $\Omega(N^2/U)$ -- which is fairly close to the upper bound $O(N+N^2\sqrt{\ln (NU)}/\sqrt[4]{U})$. Aside from the simulation results, we justify the practical use of blind beamforming in a field test at 2.6 GHz. According to the real-world experiment, the proposed blind beamforming method boosts the minimum SNR across seven random positions in a conference room by 18.22 dB, while the position-based method yields a boost of 12.08 dB.

* 17 pages

Via

Access Paper or Ask Questions