Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Shotaro Miwa

WINFlowNets: Warm-up Integrated Networks Training of Generative Flow Networks for Robotics and Machine Fault Adaptation

Mar 18, 2026

Zahin Sufiyan, Shadan Golestan, Yoshihiro Mitsuka, Shotaro Miwa, Osmar Zaiane

Abstract:Generative Flow Networks for continuous scenarios (CFlowNets) have shown promise in solving sequential decision-making tasks by learning stochastic policies using a flow and a retrieval network. Despite their demonstrated efficiency compared to state-of-the-art Reinforcement Learning (RL) algorithms, their practical application in robotic control tasks is constrained by the reliance on pre-training the retrieval network. This dependency poses challenges in dynamic robotic environments, where pre-training data may not be readily available or representative of the current environment. This paper introduces WINFlowNets, a novel CFlowNets framework that enables the co-training of flow and retrieval networks. WINFlowNets begins with a warm-up phase for the retrieval network to bootstrap its policy, followed by a shared training architecture and a shared replay buffer for co-training both networks. Experiments in simulated robotic environments demonstrate that WINFlowNets surpasses CFlowNets and state-of-the-art RL algorithms in terms of average reward and training stability. Furthermore, WINFlowNets exhibits strong adaptive capability in fault environments, making it suitable for tasks that demand quick adaptation with limited sample data. These findings highlight WINFlowNets' potential for deployment in dynamic and malfunction-prone robotic systems, where traditional pre-training or sample inefficient data collection may be impractical.

Via

Access Paper or Ask Questions

ViSA: Visited-State Augmentation for Generalized Goal-Space Contrastive Reinforcement Learning

Mar 16, 2026

Issa Nakamura, Tomoya Yamanokuchi, Yuki Kadokawa, Jia Qu, Shun Otsub, Ken Miyamoto, Shotaro Miwa, Takamitsu Matsubara

Abstract:Goal-Conditioned Reinforcement Learning (GCRL) is a framework for learning a policy that can reach arbitrarily given goals. In particular, Contrastive Reinforcement Learning (CRL) provides a framework for policy updates using an approximation of the value function estimated via contrastive learning, achieving higher sample efficiency compared to conventional methods. However, since CRL treats the visited state as a pseudo-goal during learning, it can accurately estimate the value function only for limited goals. To address this issue, we propose a novel data augmentation approach for CRL called ViSA (Visited-State Augmentation). ViSA consists of two components: 1) generating augmented state samples, with the aim of augmenting hard-to-visit state samples during on-policy exploration, and 2) learning consistent embedding space, which uses an augmented state as auxiliary information to regularize the embedding space by reformulating the objective function of the embedding space based on mutual information. We evaluate ViSA in simulation and real-world robotic tasks and show improved goal-space generalization, which permits accurate value estimation for hard-to-visit goals. Further details can be found on the project page: \href{https://issa-n.github.io/projectPage_ViSA/}{\texttt{https://issa-n.github.io/projectPage\_ViSA/}}

* 8 pages, 7 figures, under Review

Via

Access Paper or Ask Questions

TLXML: Task-Level Explanation of Meta-Learning via Influence Functions

Jan 24, 2025

Yoshihiro Mitsuka, Shadan Golestan, Zahin Sufiyan, Sheila Schoepp, Shotaro Miwa, Osmar R. Zaïane

Figure 1 for TLXML: Task-Level Explanation of Meta-Learning via Influence Functions

Figure 2 for TLXML: Task-Level Explanation of Meta-Learning via Influence Functions

Figure 3 for TLXML: Task-Level Explanation of Meta-Learning via Influence Functions

Figure 4 for TLXML: Task-Level Explanation of Meta-Learning via Influence Functions

Abstract:The scheme of adaptation via meta-learning is seen as an ingredient for solving the problem of data shortage or distribution shift in real-world applications, but it also brings the new risk of inappropriate updates of the model in the user environment, which increases the demand for explainability. Among the various types of XAI methods, establishing a method of explanation based on past experience in meta-learning requires special consideration due to its bi-level structure of training, which has been left unexplored. In this work, we propose influence functions for explaining meta-learning that measure the sensitivities of training tasks to adaptation and inference. We also argue that the approximation of the Hessian using the Gauss-Newton matrix resolves computational barriers peculiar to meta-learning. We demonstrate the adequacy of the method through experiments on task distinction and task distribution distinction using image classification tasks with MAML and Prototypical Network.

* 22 pages

Via

Access Paper or Ask Questions

A Study of the Efficacy of Generative Flow Networks for Robotics and Machine Fault-Adaptation

Jan 06, 2025

Zahin Sufiyan, Shadan Golestan, Shotaro Miwa, Yoshihiro Mitsuka, Osmar Zaiane

Figure 1 for A Study of the Efficacy of Generative Flow Networks for Robotics and Machine Fault-Adaptation

Figure 2 for A Study of the Efficacy of Generative Flow Networks for Robotics and Machine Fault-Adaptation

Figure 3 for A Study of the Efficacy of Generative Flow Networks for Robotics and Machine Fault-Adaptation

Figure 4 for A Study of the Efficacy of Generative Flow Networks for Robotics and Machine Fault-Adaptation

Abstract:Advancements in robotics have opened possibilities to automate tasks in various fields such as manufacturing, emergency response and healthcare. However, a significant challenge that prevents robots from operating in real-world environments effectively is out-of-distribution (OOD) situations, wherein robots encounter unforseen situations. One major OOD situations is when robots encounter faults, making fault adaptation essential for real-world operation for robots. Current state-of-the-art reinforcement learning algorithms show promising results but suffer from sample inefficiency, leading to low adaptation speed due to their limited ability to generalize to OOD situations. Our research is a step towards adding hardware fault tolerance and fast fault adaptability to machines. In this research, our primary focus is to investigate the efficacy of generative flow networks in robotic environments, particularly in the domain of machine fault adaptation. We simulated a robotic environment called Reacher in our experiments. We modify this environment to introduce four distinct fault environments that replicate real-world machines/robot malfunctions. The empirical evaluation of this research indicates that continuous generative flow networks (CFlowNets) indeed have the capability to add adaptive behaviors in machines under adversarial conditions. Furthermore, the comparative analysis of CFlowNets with reinforcement learning algorithms also provides some key insights into the performance in terms of adaptation speed and sample efficiency. Additionally, a separate study investigates the implications of transferring knowledge from pre-fault task to post-fault environments. Our experiments confirm that CFlowNets has the potential to be deployed in a real-world machine and it can demonstrate adaptability in case of malfunctions to maintain functionality.

Via

Access Paper or Ask Questions

Enhancing Hardware Fault Tolerance in Machines with Reinforcement Learning Policy Gradient Algorithms

Jul 21, 2024

Sheila Schoepp, Mehran Taghian, Shotaro Miwa, Yoshihiro Mitsuka, Shadan Golestan, Osmar Zaïane

Figure 1 for Enhancing Hardware Fault Tolerance in Machines with Reinforcement Learning Policy Gradient Algorithms

Figure 2 for Enhancing Hardware Fault Tolerance in Machines with Reinforcement Learning Policy Gradient Algorithms

Figure 3 for Enhancing Hardware Fault Tolerance in Machines with Reinforcement Learning Policy Gradient Algorithms

Figure 4 for Enhancing Hardware Fault Tolerance in Machines with Reinforcement Learning Policy Gradient Algorithms

Abstract:Industry is rapidly moving towards fully autonomous and interconnected systems that can detect and adapt to changing conditions, including machine hardware faults. Traditional methods for adding hardware fault tolerance to machines involve duplicating components and algorithmically reconfiguring a machine's processes when a fault occurs. However, the growing interest in reinforcement learning-based robotic control offers a new perspective on achieving hardware fault tolerance. However, limited research has explored the potential of these approaches for hardware fault tolerance in machines. This paper investigates the potential of two state-of-the-art reinforcement learning algorithms, Proximal Policy Optimization (PPO) and Soft Actor-Critic (SAC), to enhance hardware fault tolerance into machines. We assess the performance of these algorithms in two OpenAI Gym simulated environments, Ant-v2 and FetchReach-v1. Robot models in these environments are subjected to six simulated hardware faults. Additionally, we conduct an ablation study to determine the optimal method for transferring an agent's knowledge, acquired through learning in a normal (pre-fault) environment, to a (post-)fault environment in a continual learning setting. Our results demonstrate that reinforcement learning-based approaches can enhance hardware fault tolerance in simulated machines, with adaptation occurring within minutes. Specifically, PPO exhibits the fastest adaptation when retaining the knowledge within its models, while SAC performs best when discarding all acquired knowledge. Overall, this study highlights the potential of reinforcement learning-based approaches, such as PPO and SAC, for hardware fault tolerance in machines. These findings pave the way for the development of robust and adaptive machines capable of effectively operating in real-world scenarios.

Via

Access Paper or Ask Questions

Unsupervised Work Behavior Pattern Extraction Based on Hierarchical Probabilistic Model

May 16, 2024

Issei Saito, Tomoaki Nakamura, Toshiyuki Hatta, Wataru Fujita, Shintaro Watanabe, Shotaro Miwa

Figure 1 for Unsupervised Work Behavior Pattern Extraction Based on Hierarchical Probabilistic Model

Figure 2 for Unsupervised Work Behavior Pattern Extraction Based on Hierarchical Probabilistic Model

Figure 3 for Unsupervised Work Behavior Pattern Extraction Based on Hierarchical Probabilistic Model

Figure 4 for Unsupervised Work Behavior Pattern Extraction Based on Hierarchical Probabilistic Model

Abstract:Evolving consumer demands and market trends have led to businesses increasingly embracing a production approach that prioritizes flexibility and customization. Consequently, factory workers must engage in tasks that are more complex than before. Thus, productivity depends on each worker's skills in assembling products. Therefore, analyzing the behavior of a worker is crucial for work improvement. However, manual analysis is time consuming and does not provide quick and accurate feedback. Machine learning have been attempted to automate the analyses; however, most of these methods need several labels for training. To this end, we extend the Gaussian process hidden semi-Markov model (GP-HSMM), to enable the rapid and automated analysis of worker behavior without pre-training. The model does not require labeled data and can automatically and accurately segment continuous motions into motion classes. The proposed model is a probabilistic model that hierarchically connects GP-HSMM and HSMM, enabling the extraction of behavioral patterns with different granularities. Furthermore, it mutually infers the parameters between the GP-HSMM and HSMM, resulting in accurate motion pattern extraction. We applied the proposed method to motion data in which workers assembled products at an actual production site. The accuracy of behavior pattern extraction was evaluated using normalized Levenshtein distance (NLD). The smaller the value of NLD, the more accurate is the pattern extraction. The NLD of motion patterns captured by GP-HSMM and HSMM layers in our proposed method was 0.50 and 0.33, respectively, which are the smallest compared to that of the baseline methods.

Via

Access Paper or Ask Questions