Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Yi Shen

Hypercube Policy Regularization Framework for Offline Reinforcement Learning

Nov 07, 2024

Yi Shen, Hanyan Huang

Abstract:Offline reinforcement learning has received extensive attention from scholars because it avoids the interaction between the agent and the environment by learning a policy through a static dataset. However, general reinforcement learning methods cannot get satisfactory results in offline reinforcement learning due to the out-of-distribution state actions that the dataset cannot cover during training. To solve this problem, the policy regularization method that tries to directly clone policies used in static datasets has received numerous studies due to its simplicity and effectiveness. However, policy constraint methods make the agent choose the corresponding actions in the static dataset. This type of constraint is usually over-conservative, which results in suboptimal policies, especially in low-quality static datasets. In this paper, a hypercube policy regularization framework is proposed, this method alleviates the constraints of policy constraint methods by allowing the agent to explore the actions corresponding to similar states in the static dataset, which increases the effectiveness of algorithms in low-quality datasets. It was also theoretically demonstrated that the hypercube policy regularization framework can effectively improve the performance of original algorithms. In addition, the hypercube policy regularization framework is combined with TD3-BC and Diffusion-QL for experiments on D4RL datasets which are called TD3-BC-C and Diffusion-QL-C. The experimental results of the score demonstrate that TD3-BC-C and Diffusion-QL-C perform better than state-of-the-art algorithms like IQL, CQL, TD3-BC and Diffusion-QL in most D4RL environments in approximate time.

Via

Access Paper or Ask Questions

Hunyuan-Large: An Open-Source MoE Model with 52 Billion Activated Parameters by Tencent

Nov 05, 2024

Xingwu Sun, Yanfeng Chen, Yiqing Huang, Ruobing Xie, Jiaqi Zhu, Kai Zhang, Shuaipeng Li, Zhen Yang, Jonny Han, Xiaobo Shu(+97 more)

Figure 1 for Hunyuan-Large: An Open-Source MoE Model with 52 Billion Activated Parameters by Tencent

Figure 2 for Hunyuan-Large: An Open-Source MoE Model with 52 Billion Activated Parameters by Tencent

Figure 3 for Hunyuan-Large: An Open-Source MoE Model with 52 Billion Activated Parameters by Tencent

Figure 4 for Hunyuan-Large: An Open-Source MoE Model with 52 Billion Activated Parameters by Tencent

Abstract:In this paper, we introduce Hunyuan-Large, which is currently the largest open-source Transformer-based mixture of experts model, with a total of 389 billion parameters and 52 billion activation parameters, capable of handling up to 256K tokens. We conduct a thorough evaluation of Hunyuan-Large's superior performance across various benchmarks including language understanding and generation, logical reasoning, mathematical problem-solving, coding, long-context, and aggregated tasks, where it outperforms LLama3.1-70B and exhibits comparable performance when compared to the significantly larger LLama3.1-405B model. Key practice of Hunyuan-Large include large-scale synthetic data that is orders larger than in previous literature, a mixed expert routing strategy, a key-value cache compression technique, and an expert-specific learning rate strategy. Additionally, we also investigate the scaling laws and learning rate schedule of mixture of experts models, providing valuable insights and guidances for future model development and optimization. The code and checkpoints of Hunyuan-Large are released to facilitate future innovations and applications. Codes: https://github.com/Tencent/Hunyuan-Large Models: https://huggingface.co/tencent/Tencent-Hunyuan-Large

* 17 pages, 4 Figures

Via

Access Paper or Ask Questions

Inverse Reinforcement Learning from Non-Stationary Learning Agents

Oct 18, 2024

Kavinayan P. Sivakumar, Yi Shen, Zachary Bell, Scott Nivison, Boyuan Chen, Michael M. Zavlanos

Figure 1 for Inverse Reinforcement Learning from Non-Stationary Learning Agents

Figure 2 for Inverse Reinforcement Learning from Non-Stationary Learning Agents

Figure 3 for Inverse Reinforcement Learning from Non-Stationary Learning Agents

Figure 4 for Inverse Reinforcement Learning from Non-Stationary Learning Agents

Abstract:In this paper, we study an inverse reinforcement learning problem that involves learning the reward function of a learning agent using trajectory data collected while this agent is learning its optimal policy. To address this problem, we propose an inverse reinforcement learning method that allows us to estimate the policy parameters of the learning agent which can then be used to estimate its reward function. Our method relies on a new variant of the behavior cloning algorithm, which we call bundle behavior cloning, and uses a small number of trajectories generated by the learning agent's policy at different points in time to learn a set of policies that match the distribution of actions observed in the sampled trajectories. We then use the cloned policies to train a neural network model that estimates the reward function of the learning agent. We provide a theoretical analysis to show a complexity result on bound guarantees for our method that beats standard behavior cloning as well as numerical experiments for a reinforcement learning problem that validate the proposed method.

Via

Access Paper or Ask Questions

Distributionally Robust Clustered Federated Learning: A Case Study in Healthcare

Oct 09, 2024

Xenia Konti, Hans Riess, Manos Giannopoulos, Yi Shen, Michael J. Pencina, Nicoleta J. Economou-Zavlanos, Michael M. Zavlanos

Figure 1 for Distributionally Robust Clustered Federated Learning: A Case Study in Healthcare

Figure 2 for Distributionally Robust Clustered Federated Learning: A Case Study in Healthcare

Figure 3 for Distributionally Robust Clustered Federated Learning: A Case Study in Healthcare

Abstract:In this paper, we address the challenge of heterogeneous data distributions in cross-silo federated learning by introducing a novel algorithm, which we term Cross-silo Robust Clustered Federated Learning (CS-RCFL). Our approach leverages the Wasserstein distance to construct ambiguity sets around each client's empirical distribution that capture possible distribution shifts in the local data, enabling evaluation of worst-case model performance. We then propose a model-agnostic integer fractional program to determine the optimal distributionally robust clustering of clients into coalitions so that possible biases in the local models caused by statistically heterogeneous client datasets are avoided, and analyze our method for linear and logistic regression models. Finally, we discuss a federated learning protocol that ensures the privacy of client distributions, a critical consideration, for instance, when clients are healthcare institutions. We evaluate our algorithm on synthetic and real-world healthcare data.

* 8 pages, 3 figures, Accepted to IEEE CDC 2024

Via

Access Paper or Ask Questions

**Towards Efficient Moion Planning for UAVs: Lazy A* Search with Motion Primitives**

Oct 02, 2024

Wentao Wang, Yi Shen, Kaiyang Chen, Kaifan Lu

Figure 1 for Towards Efficient Moion Planning for UAVs: Lazy A* Search with Motion Primitives

Figure 2 for Towards Efficient Moion Planning for UAVs: Lazy A* Search with Motion Primitives

Figure 3 for Towards Efficient Moion Planning for UAVs: Lazy A* Search with Motion Primitives

Figure 4 for Towards Efficient Moion Planning for UAVs: Lazy A* Search with Motion Primitives

Abstract:Search-based motion planning algorithms have been widely utilized for unmanned aerial vehicles (UAVs). However, deploying these algorithms on real UAVs faces challenges due to limited onboard computational resources. The algorithms struggle to find solutions in high-dimensional search spaces and require considerable time to ensure that the trajectories are dynamically feasible. This paper incorporates the lazy search concept into search-based planning algorithms to address the critical issue of real-time planning for collision-free and dynamically feasible trajectories on UAVs. We demonstrate that the lazy search motion planning algorithm can efficiently find optimal trajectories and significantly improve computational efficiency.

Via

Access Paper or Ask Questions

Geometric Analysis of Unconstrained Feature Models with $d=K$

Jul 15, 2024

Shao Gu, Yi Shen

Abstract:Recently, interesting empirical phenomena known as Neural Collapse have been observed during the final phase of training deep neural networks for classification tasks. We examine this issue when the feature dimension d is equal to the number of classes K. We demonstrate that two popular unconstrained feature models are strict saddle functions, with every critical point being either a global minimum or a strict saddle point that can be exited using negative curvatures. The primary findings conclusively confirm the conjecture on the unconstrained feature models in previous articles.

Via

Access Paper or Ask Questions

Real-Time Summarization of Twitter

Jul 11, 2024

Yixin Jin, Meiqi Wang, Meng Li, Wenjing Zhou, Yi Shen, Hao Liu

Figure 1 for Real-Time Summarization of Twitter

Abstract:In this paper, we describe our approaches to TREC Real-Time Summarization of Twitter. We focus on real time push notification scenario, which requires a system monitors the stream of sampled tweets and returns the tweets relevant and novel to given interest profiles. Dirichlet score with and with very little smoothing (baseline) are employed to classify whether a tweet is relevant to a given interest profile. Using metrics including Mean Average Precision (MAP, cumulative gain (CG) and discount cumulative gain (DCG), the experiment indicates that our approach has a good performance. It is also desired to remove the redundant tweets from the pushing queue. Due to the precision limit, we only describe the algorithm in this paper.

* This paper was accepted to International Conference on Artificial Intelligence and Electromechanical Automation 2024

Via

Access Paper or Ask Questions

Communication and Control Co-Design in 6G: Sequential Decision-Making with LLMs

Jul 06, 2024

Xianfu Chen, Celimuge Wu, Yi Shen, Yusheng Ji, Tsutomu Yoshinaga, Qiang Ni, Charilaos C. Zarakovitis, Honggang Zhang

Figure 1 for Communication and Control Co-Design in 6G: Sequential Decision-Making with LLMs

Figure 2 for Communication and Control Co-Design in 6G: Sequential Decision-Making with LLMs

Figure 3 for Communication and Control Co-Design in 6G: Sequential Decision-Making with LLMs

Figure 4 for Communication and Control Co-Design in 6G: Sequential Decision-Making with LLMs

Abstract:This article investigates a control system within the context of six-generation wireless networks. The control performance optimization confronts the technical challenges that arise from the intricate interactions between communication and control sub-systems, asking for a co-design. Accounting for the system dynamics, we formulate the sequential co-design decision-makings of communication and control over the discrete time horizon as a Markov decision process, for which a practical offline learning framework is proposed. Our proposed framework integrates large language models into the elements of reinforcement learning. We present a case study on the age of semantics-aware communication and control co-design to showcase the potentials from our proposed learning framework. Furthermore, we discuss the open issues remaining to make our proposed offline learning framework feasible for real-world implementations, and highlight the research directions for future explorations.

Via

Access Paper or Ask Questions

Deep Learning Powered Estimate of The Extrinsic Parameters on Unmanned Surface Vehicles

Jun 07, 2024

Yi Shen, Hao Liu, Chang Zhou, Wentao Wang, Zijun Gao, Qi Wang

Figure 1 for Deep Learning Powered Estimate of The Extrinsic Parameters on Unmanned Surface Vehicles

Figure 2 for Deep Learning Powered Estimate of The Extrinsic Parameters on Unmanned Surface Vehicles

Figure 3 for Deep Learning Powered Estimate of The Extrinsic Parameters on Unmanned Surface Vehicles

Figure 4 for Deep Learning Powered Estimate of The Extrinsic Parameters on Unmanned Surface Vehicles

Abstract:Unmanned Surface Vehicles (USVs) are pivotal in marine exploration, but their sensors' accuracy is compromised by the dynamic marine environment. Traditional calibration methods fall short in these conditions. This paper introduces a deep learning architecture that predicts changes in the USV's dynamic metacenter and refines sensors' extrinsic parameters in real time using a Time-Sequence General Regression Neural Network (GRNN) with Euler angles as input. Simulation data from Unity3D ensures robust training and testing. Experimental results show that the Time-Sequence GRNN achieves the lowest mean squared error (MSE) loss, outperforming traditional neural networks. This method significantly enhances sensor calibration for USVs, promising improved data accuracy in challenging maritime conditions. Future work will refine the network and validate results with real-world data.

* Accepted by The 9th Asia-Pacific Conference on Intelligent Robot Systems (ACIRS 2024)

Via

Access Paper or Ask Questions

TD3 Based Collision Free Motion Planning for Robot Navigation

May 24, 2024

Hao Liu, Yi Shen, Chang Zhou, Yuelin Zou, Zijun Gao, Qi Wang

Figure 1 for TD3 Based Collision Free Motion Planning for Robot Navigation

Abstract:This paper addresses the challenge of collision-free motion planning in automated navigation within complex environments. Utilizing advancements in Deep Reinforcement Learning (DRL) and sensor technologies like LiDAR, we propose the TD3-DWA algorithm, an innovative fusion of the traditional Dynamic Window Approach (DWA) with the Twin Delayed Deep Deterministic Policy Gradient (TD3). This hybrid algorithm enhances the efficiency of robotic path planning by optimizing the sampling interval parameters of DWA to effectively navigate around both static and dynamic obstacles. The performance of the TD3-DWA algorithm is validated through various simulation experiments, demonstrating its potential to significantly improve the reliability and safety of autonomous navigation systems.

Via

Access Paper or Ask Questions