Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Evangelos Chatzaroulas

Meta-World+: An Improved, Standardized, RL Benchmark

May 16, 2025

Reginald McLean, Evangelos Chatzaroulas, Luc McCutcheon, Frank Röder, Tianhe Yu, Zhanpeng He, K. R. Zentner, Ryan Julian, J K Terry, Isaac Woungang(+2 more)

Abstract:Meta-World is widely used for evaluating multi-task and meta-reinforcement learning agents, which are challenged to master diverse skills simultaneously. Since its introduction however, there have been numerous undocumented changes which inhibit a fair comparison of algorithms. This work strives to disambiguate these results from the literature, while also leveraging the past versions of Meta-World to provide insights into multi-task and meta-reinforcement learning benchmark design. Through this process we release a new open-source version of Meta-World (https://github.com/Farama-Foundation/Metaworld/) that has full reproducibility of past results, is more technically ergonomic, and gives users more control over the tasks that are included in a task set.

Via

Access Paper or Ask Questions

In-Context Ensemble Improves Video-Language Models for Low-Level Workflow Understanding from Human Demonstrations

Sep 26, 2024

Moucheng Xu, Evangelos Chatzaroulas, Luc McCutcheon, Abdul Ahad, Hamzah Azeem, Janusz Marecki, Ammar Anwar

Figure 1 for In-Context Ensemble Improves Video-Language Models for Low-Level Workflow Understanding from Human Demonstrations

Figure 2 for In-Context Ensemble Improves Video-Language Models for Low-Level Workflow Understanding from Human Demonstrations

Figure 3 for In-Context Ensemble Improves Video-Language Models for Low-Level Workflow Understanding from Human Demonstrations

Figure 4 for In-Context Ensemble Improves Video-Language Models for Low-Level Workflow Understanding from Human Demonstrations

Abstract:A Standard Operating Procedure (SOP) defines a low-level, step-by-step written guide for a business software workflow based on a video demonstration. SOPs are a crucial step toward automating end-to-end software workflows. Manually creating SOPs can be time-consuming. Recent advancements in large video-language models offer the potential for automating SOP generation by analyzing recordings of human demonstrations. However, current large video-language models face challenges with zero-shot SOP generation. We explore in-context learning with video-language models for SOP generation. We report that in-context learning sometimes helps video-language models at SOP generation. We then propose an in-context ensemble learning to further enhance the capabilities of the models in SOP generation.

* multimodal in-context ensemble learning, video-language models, SOP generation, pseudo-labels, in-context learning, prompt engineering

Via

Access Paper or Ask Questions